هدف:هدف مطالعه بررسی میزان کارآیی موتورهای کاوشوبدر بازیابی اطلاعاتدرحوزهاطلاعاتودانش است.
روششناسی:روش تحقیق پیمایشی توصیفی است.تعداد 5 کلیدواژه از مقاله 2016 top trends in academic librariesدر مجله College and Research Librariesبا رتبهبندی 20 متخصص رشته انتخاب شد. سپس 50 مدرک بازیابی شده ابتدای لیست پنج موتور کاوش گوگل، یاهو، اسک، بینگ و ای.او.ال. در گردآوردقرار گرفت و با رعایت فرمول جستجو یکسان، کارآیی موتورهای کاوش بررسی شد.
یافتهها:بر اساس یافتهها کارایی گوگل از نظر شاخصههای مانعیت، ضریب F و نسبت ناکامی به ترتیب با مقادیر 824/0، 360/0 و 8/8 ؛ و اسک با مقادیر 378/0 و 60/0 از نظر شاخصههای نسبت تازگی و ضریب کاوش مورد انتظاردر سطح بالاتری هستند. ضمنا گوگل و اسک به ترتیب با مقدار جامعیت نسبی 24/0 و 22/0 و در رابطه با شاخصه نسبت پوشش، یاهو با 714/0 وضعیت بهتری دارد.
نتیجهگیری:برای دقت بازیابی موتور کاوش گوگل،برای اطلاعات جدید موتور کاوش اسک و برای جامعیت بازیابی موتور کاوش گوگل همراه با یاهو دارای کارایی قابل قبول هستند.
عنوان مقاله [English]
Evaluation of the performance of web search engines in retrieving the information in the field of information and knowledge based on seven indicators
Background and Objectives: The advent of World Wide Web (WWW) in 1990s that was followed by emergence of a large number of web pages made using of automatic information retrieval systems necessary. The first web search engine with a capability of full text search was developed by Brian Pinkerton in the Washington University. The Web Crawler was able to index the plain texts and allowed the users to search the expressions on the internet. Later, the Lycos, Infoseek in 1994, Excite and Yahoo in 1995 Inktomi in 1996, Google in Sep. 1997 (Gross, 2015), M.S.N and Overture (Sahu, Mahapatra and Balabantaray, 2016) were emerged to overcome the complexity resulted from a surge in the information within the web. According to Wu and Lee (2004), the services delivering by web search play a significant role for those users who seek to elicit information sources to meet their needs which had not been available for them before. Today, the search engines are recognized as an access pass to a huge size of information on the internet, providing the services and tools tomeet a variety of users' information requirements. For this reason, the evaluation of efficiency and performance of search engines is very important because it is necessary for developers and users. (Azimzadeh, Badie and Esnaashari, 2016). As far as Mc Carthy (2006: quoted in Ewans, 2007) claims, a vast majority of the people visiting the web sites access to the webpages or contents of interest through search engines rather than link directly from other pages.
The search engine is a software by which the users search the needed various information on the internet as well as retrieval of related outcomes (Mivule, 2017). Craft, Metzler and Strohman (2015) define a search engine as a practical use of techniques retrieving information within the large scale text sets that with different forms reflect those capabilities for whichthey are designed purposely. On the other word, the search engines are the programs that are employing to find the documents matched to the specific keywords on the WWW setting and retrieve a list of documents containing the searched keywords (Khorsheed, Madbouly and Guirguis, 2015). According to Craft, et. al (2015) the capabilities of recovering information are dramatically involving in the structured multimedia documents, meaningful textual contents and other media, relevance, evaluation, information needs, the effective ranking algorithms and interaction with the users, what are still concerned the researchers investigating in retrieval of information. From Ali, Jhandir, Lee, On and Choi (2017) viewpoints, while the data performance for internet acts a s a fuel to back running it, its extensiveness has caused much problems for the users.
While the degree of the users' confidence to the search engines and relying on it to display authentic outcomes is questionable, providing suitable, relevant, and high quality information for the users using webpages contents and links between pages is a big challenge for service providers (the search engines). It is while Xu, Wang and Goh (1998) believe that the numerous search engines have been developed to give technically better performance. It indicates that there has been lack of expected qualitative features fromusers 'viewpoints.
Given that in the various studies a few and mostly specific measures such as precision and recall have been considered, on the basis of this assumption that a relevant document collection is ever the same without involvement of user, in the present study the relative precision, relative recall, F-number, coverage ratio, freshness ratio, the expected search ratio and failure are investigated as a set of measures evaluating the retrieval efficiency of the search engines in the information and knowledge domain. Therefore, the primary purpose of this study is to determine the retrieval efficiency of the five search engines given the indicators of interest. Meanwhile, the secondary purpose of this article is to identify the retrieval efficiency of the search engines based on such indicators as the relative precision, relative recall, F-number, coverage ratio, freshness ratio, the expected search ratio and failure. The main question is that in this domain how the web search engines operate efficiently to retrieve the information.
Methodology: The present study in term of target is applied one and is descriptive in term of survey method. By taking into account the search function and search term as identical condition (the query AND pdf), the authors have evaluated the efficiency of the search engines based on what is observed in the retrieval results. To measure the variables, several formulas related to relative precision, relative recall, F-number, coverage ratio, freshness ratio, the expected search ratio and failure have been used. Alexa-based ranking, this study also investigates such search engines as Google, Bing,Yahoo, Ask and AOL as the most commonly used search engines. In this research, the data was collected through library studies, in order to write a research literature, checklists and through direct observation of the searched outcomes. For this purpose, based on article, Top Trends in Academic Libraries2106, published in the SAGE database, Collage Research Libraries journal, the new domains and future subfields in the information science were determined in 15 terms/ keywords. Then, the 15 terms/keywords were analyzed and to determine the search keywords they were assigned to 20 experts of the field to give a weighted score in order to be placed in the list of this study by a value given to them. Then, 5 keywords with higher weights were selected for searching. The 5 keywords were individually assigned to the 5 search engines each. In the next stage, out of total retrieved outcomes of each search engine for each keyword, 50 documents were placed on the top of the list to evaluate the retrieval efficiency measures in the researcher-made collection. In order to distinguish relevance of documents, according to Zhang, Xu, Wang and Lee (2006)'recommendations, the keyword iteration in the documents, abstracts and their keywords was considered as a measure. To assess the reliability of retrieved results, the retest method was used. For this purpose, over two phases, at 15-day interval (winter 2017) the search and retrieval were conducted again. The correlation results of the two experiments were tested and confirmed at the R=0.89. To analyze the data, the Excel 2013 was employed.
Findings: The results showed that the search engines of Google, Ask and Yahoo are of better performance than the other search engines in term of relative precision, relative recall, F-number harmonic mean and failure criterion; the expected search coefficient indicators and freshness ratio; coverage ratio respectively. However, in spite of the results obtained for the study search engines based on the different indicators, in general they are not in ideal situation where in most cases they are lower than the average. Given that in the previously studied search engines, the scientific domain and indicators of efficiency were different than the present one and they have focused mostly on the relative precision and relative recall, it is not possible to compare all the findings. However, the findings from the present study based on indicator of relative precision are consistent with that of studies of Shafi and Rader (2005), Ali and Gole (2016) and on the recall with that of Janson and Molina (2006), Kumar and Prakash (2009), Wang et. Al (2012), Ali and Gole (2016) and on the indicator of coverage ratio with that of Mohammad Ismael and Mansoor Kiakie (2011), Esfandyari Moghaddam (2012) and Janson and Molina (2006).
Discussion: It can be concluded that in spite of the search engines have been gained a score and ranked in this study but they are still far from ideals. It follows that the challenges related to evaluation of information retrieval efficiency, despite of using different search engines, implementation of various strategies and different ranking algorithms and methods of document indexing, has to be yet removed and they need more studies. It should be noted that the comparison of the results from this study with some prior findings indicates that neither of search engines can alone meet the required efficiency indicators. Thus, given the different indicators the users have to assign their queries to the search engines. On the one hand, designing the specific search engines with regard to diversity, extent and lexical relationship in the different domains of sciences is very necessary. On the other hand, it is the time to use various patterns such as visual searching, using multilingual thesaurus, retrieving based on weighed indexing in the interface of both specific and general search engines.