Document Type : Original Article

Authors

1 Department of Information Management, Ahvaz Branch, Islamic Azad University, Ahvaz, Iran.

2 Department of Information Management, Ahvaz Branch, Islamic Azad University, Ahvaz, Iran

Abstract

Background and Objectives: The advent of World Wide Web (WWW) in 1990s that was followed by emergence of a large number of web pages made using of automatic information retrieval systems necessary. The first web search engine with a capability of full text search was developed by Brian Pinkerton in the Washington University. The Web Crawler was able to index the plain texts and allowed the users to search the expressions on the internet. Later, the Lycos, Infoseek in 1994, Excite and Yahoo in 1995   Inktomi in 1996, Google in Sep. 1997 (Gross, 2015), M.S.N and Overture (Sahu, Mahapatra and Balabantaray, 2016) were emerged to overcome the complexity resulted from a surge in the information within the web. According to Wu and Lee (2004), the services delivering by web search play a significant role for those users who seek to elicit information sources to meet their needs which had not been available for them before.  Today, the search engines are recognized as an access pass to a huge size of information on the internet, providing the services and tools tomeet a variety of users' information requirements. For this reason, the evaluation of efficiency and performance of search engines is very important because it is necessary for developers and users. (Azimzadeh, Badie and Esnaashari, 2016). As far as Mc Carthy (2006: quoted in Ewans, 2007) claims, a vast majority of the people visiting the web sites access to the webpages or contents of interest through search engines rather than link directly from other pages.
The search engine is a software by which the users search the needed various information on the internet as well as retrieval of related outcomes (Mivule, 2017). Craft, Metzler and Strohman (2015) define a search engine as a practical use of techniques retrieving information within the large scale text sets that with different forms reflect those capabilities for whichthey are designed purposely. On the other word, the search engines are the programs that are employing to find the documents matched to the specific keywords on the WWW setting and retrieve a list of documents containing the searched keywords (Khorsheed, Madbouly and Guirguis, 2015). According to Craft, et. al (2015) the capabilities of recovering information are dramatically involving in the structured multimedia documents, meaningful textual contents and other media, relevance, evaluation, information needs, the effective ranking algorithms and interaction with the users, what are still concerned the researchers investigating in retrieval of information. From Ali, Jhandir, Lee, On and Choi (2017) viewpoints, while the data performance for internet acts a s a fuel to back running it, its extensiveness has caused much problems for the users.
While the degree of the users' confidence to the search engines and relying on it to display authentic outcomes is questionable, providing suitable, relevant, and high quality information for the users using webpages contents and links between pages is a big challenge for service providers (the search engines). It is while Xu, Wang and Goh (1998) believe that the numerous search engines have been developed to give technically better performance. It indicates that there has been lack of expected qualitative features fromusers 'viewpoints.
Given that in the various studies a few and mostly specific measures such as precision and recall have been considered, on the basis of this assumption that a relevant document collection is ever the same without involvement of user, in the present study  the relative precision, relative recall, F-number, coverage ratio, freshness ratio, the expected search    ratio and failure are investigated as a set of measures evaluating the retrieval efficiency of the search engines in the information and knowledge domain. Therefore, the primary purpose of this study is to determine the retrieval efficiency of the five search engines given the indicators of interest. Meanwhile, the secondary purpose of this article is to identify the retrieval efficiency of the search engines based on such indicators as the relative precision, relative recall, F-number, coverage ratio, freshness ratio, the expected search    ratio and failure. The main question is that in this domain how the web search engines operate efficiently to retrieve the information.
Methodology: The present study in term of target is applied one and is descriptive in term of survey method. By taking into account the search function and search term as identical condition (the query AND pdf), the authors have evaluated the efficiency of the search engines based on what is observed in the retrieval results.  To measure the variables, several formulas related to relative precision, relative recall, F-number, coverage ratio, freshness ratio, the expected search    ratio and failure have been used. Alexa-based ranking, this study also investigates such search engines as Google, Bing,Yahoo, Ask and AOL as the most commonly used search engines. In this research, the data was collected through library studies, in order to write a research literature, checklists and through direct observation of the searched outcomes. For this purpose, based on article, Top Trends in Academic Libraries2106, published in the SAGE database, Collage Research Libraries journal, the new domains and future subfields in the information science were determined in 15 terms/ keywords. Then, the 15 terms/keywords were analyzed and to determine the search keywords they were assigned to 20 experts of the field to give a weighted score in order to be placed in the list of this study by a value given to them. Then, 5 keywords with higher weights were selected for searching. The 5 keywords were individually assigned to the 5 search engines each. In the next stage, out of total retrieved outcomes of each search engine for each keyword, 50 documents were placed on the top of the list to evaluate the retrieval efficiency measures in the researcher-made collection. In order to distinguish relevance of documents, according to Zhang, Xu, Wang and Lee (2006)'recommendations, the keyword iteration in the documents, abstracts and their keywords was considered as a measure. To assess the reliability of retrieved results, the retest method was used. For this purpose, over two phases, at 15-day interval (winter 2017) the search and retrieval were conducted again. The correlation results of the two experiments were tested and confirmed at the R=0.89. To analyze the data, the Excel 2013 was employed.
Findings: The results showed that the search engines of Google, Ask and Yahoo are of better performance than the other search engines in term of relative precision, relative recall, F-number harmonic mean and failure criterion; the expected search coefficient indicators and freshness ratio; coverage ratio respectively. However, in spite of the results obtained for the study search engines based on the different indicators, in general they are not in ideal situation where in most cases they are lower than the average.  Given that in the previously studied search engines, the scientific domain and indicators of efficiency were different than the present one and they have focused mostly on the relative precision and relative recall, it is not possible to compare all the findings. However, the findings from the present study based on indicator of relative precision are consistent with that of studies of Shafi and Rader (2005), Ali and Gole (2016)   and on the recall with that of Janson and Molina (2006), Kumar and Prakash (2009), Wang et. Al (2012), Ali and Gole (2016) and on the indicator of coverage ratio with that of Mohammad Ismael and Mansoor Kiakie (2011), Esfandyari Moghaddam (2012) and Janson and Molina (2006).
Discussion:  It can be concluded that in spite of the search engines have been gained a score and ranked in this study but they are still far from ideals. It follows that the challenges related to evaluation of information retrieval efficiency, despite of using different search engines, implementation of various strategies and different ranking algorithms and methods of document indexing, has to be yet removed and they need more studies. It should be noted that the comparison of the results from this study with some prior findings indicates that neither of search engines can alone meet the required efficiency indicators. Thus, given the different indicators the users have to assign their queries to the search engines. On the one hand, designing the specific search engines with regard to diversity, extent and lexical relationship in the different domains of sciences is very necessary. On the other hand, it is the time to use various patterns such as visual searching, using multilingual thesaurus, retrieving based on weighed indexing in the interface of both specific and general search engines.

Keywords

اسفندیاری­مقدم، علیرضا. (1391). میزان هم­پوشانی نتایج بازیابی شده کلیدواژه­های تخصصی پزشکی در موتورهای کاوش عمومی وب. مدیریت اطلاعات سلامت، 24، 203-214.
اشرفی ریزی، حسن، کاظم­پور، زهرا. (1386). نقش و کاربرد تفکر انتقادی در ارزیابی منابع اینترنتی. اطلاع­شناسی، 17و18، 119–132.
بیزاییتس، ریکاردو، ریبرو، برتیه. (1385). قلمروهای نو در بازیابی اطلاعات (ترجمه علی­حسین قاسمی). تهران: چاپار.
چشمه­سهرابی، مظفر.(1378). تأثیر استفاده از اصطلاحنامه در بانکهای اطلاعاتی کتابشناختی بر میزان جامعیت، مانعیت و مدت زمان جستجوی اطلاعات بازیابی شده.  پایاننامه کارشناسی ارشد، استاد راهنما عباس حری، دانشگاه تربیت مدرس: دانشکده علوم تربیتی و روانشناسی.
شاکری، صدیقه. (1387). میزان جامعیت و مانعیت ابزارهای کاوش فارسی اینترنت در بازیابی اطلاعات در حوزه کتابداری و اطلاع­رسانی. فصلنامه کتاب، 73، 177-200.
صراطی شیرازی، منصوره. (1388). مقایسه میزان دقت موتورهای کاوش عمومی و تخصصی پزشکی در بازیابی مدارک مربوط به بیماری­های کودکان. فصلنامه کتاب، 77، 77-94.
کوشا، کیوان. (1381). ابزارهای کاوش اینترنت: اصول، مهارت­ها و امکانات جستجو در وب، تهران: نشر کتابدار.
محمداسماعیل، صدیقه، منصورکیایی، ربابه. (1390). مقایسه موتورها و ابرموتورهای کاوش عمومی در بازیابی اطلاعات علم فیزیک و میزان هم­پوشانی آن­ها. مطالعات ملی کتابداری و سازماندهی اطلاعات، شماره 87، 130-140.
Ali, S. & Gul, S. (2016). Search engine effectiveness using query classification: A study. Online Information Review, 40(4), 515-528. Doi:https://doi.org/10.1108/OIR-07-2015-0243.
Ali, T., Jhandir, Z., Lee, I., On, B-W. & Choi, G. S. (2017). Evaluating retrieval effectiveness by sustainable rank list. Sustainability, 9, 1203, 1-20. Doi:10.3390/su9071203.
Azimzadeh, M., Badie, R. & Esnaashari, M. M. (2016). A review on web search engines' automatic evaluation methods and how to select the evaluation method. 2nd International Conference on Web Research, ICWR, 27-28 April, Tehran, pp78-83.
Croft, W. B., Metzler, D. & Strohman, T. (2015). Search Engines: Information Retrieval in Practice. London: Pearson Education, Inc, 518 pages.
Domachowski, A., Griesbaum, J. & Heuwing, B. (2015). Perception and effectiveness of search advertising on smartphones. Proceedings of the Association for Information Science and Technology, 53(1), 1-10. Doi: 10.1002/pra2.2016.14505301074.
Evans, M. P. (2007). Analysing Google rankings through search engine optimization data. Internet Research, 17(1), 21-37. Doi:http://dx.doi.org/10.1108/10662240710730470
Gross, A. M. (2015). Information retrieval in Arabic: An evaluation of three multilingual search engines on their capabilities in dealing with Arabic search queries. International Journal of Information Technology and Business Management, 4(1), 1-21.
Jansen, B. J. & Molina, P. R. (2006). The effectiveness of web search engines for retrieving relevant ecommerce links. Information Processing and Management, 42, 1075-1098. doi: 10.1016/j.ipm.2005.09.003.
Katumba, S. & Coetzee, S. (2017). Employing search engine optimization (SEO) techniques for improving the discovery of geospatial resources on the web. International Journal of Geo-Information, 6, 284, 1-20. doi:10.3390/ijgi6090284
Khorsheed, K. O., Madbouly, M. & Guirguis, S. K. (2015). Search engine optimization using data mining approach. International Journal of Computer Engineering and Applications, 9(6-1), 184-200.
Kulkarni, A. (2013). Efficient and effective large-scale search. PhD. Thesis, Advisor: Jamie Callan. The university of Melbourne, Language and Information Technologies Department. 167 pages. Retrieved 06/07/2017 from: www.lti.cs.cmu.edu.
Levene, M. (2010). An Introduction to Search Engines and Web Navigation. [2nd Edition]. New Jeresy: John Wiley & Sons. 463 pages.
Mivule, K. (2017). Web search query privacy, an end-user perspective. Journal of Information Security, 8, 56-74. Retrieved 06/07/2017 from: http://dx.doi.org/10.4236/jis.2017.81005
Sahu, S. K., Mahapatra, D. P. & Balabantray, R. C. (2016). Comparative study of search engines in context of features and semantics. Journal of Theoretical and Applied Information Technology, 88(2), 210-218.
Sampath kumar, B. T. & Prakash, J. N. (2009). Precision and relative recall of search engines: A comparative study of Google and Yahoo. Singapore Journal of Library & Information Management, 38, 124–137.
Shafi, S. M., & Rather, R. A. (2005). Precision and recall of five search engines for retrieval of scholarly information in the field of biotechnology. Webology, 2 (2), Article 12. Available at: http://www.webology.org/2005/v2n2/a12.html.
 Sibuyi, T. & Dehinbo, J. O. (2016). Optimization and effectiveness of search engine results. Proceedings of the World Congress on Engineering and Computer Science, WCECS, 19-21 October, San Francisco, USA. Vol. 1.
Wang, L., Wang, J., Wang, M., Li, Y., Liang, Y., & Xu, D. (2012). Using internet search engines to obtain medical information: A comparative study.  Journal of Medical Internet Research14(3), e74.
 http://doi.org/10.2196/jmir.1943
Wu S., Li J. (2004). Effectiveness evaluation and comparison of web search engines and Meta-search engines. In: Li Q., Wang G., Feng L. (Eds.) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, Vol. 3129. Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-540-27772-9_31.
Xie, M., Wang, H. & Goh, T. N. (1998). Quality dimensions of search engines. Journal of Information Science, 24(5), 365-372. https://doi.org/10.1177/016555159802400509.