نوع مقاله : علمی پژوهشی

نویسندگان

1 استادیار، گروه زبان‌شناسی و زبان‌های خارجی، دانشگاه پیام نور، تهران، ایران

2 دانشیار، گروه علم اطلاعات و دانش‌شناسی، دانشگاه پیام نور، تهران، ایران

چکیده

هدف: در این پژوهش، با استفاده از تکنیک علم‌سنجی به نام طیف‌سنجی سال انتشار ماخذ، مهم‌ترین آثار تاریخی در حوزه‌ی ترجمه‌ی ماشینی مورد شناسایی و تحلیل قرار گرفته‌اند.
روش‌شناسی: این پژوهش از نوع مطالعات علم‌سنجی می‌باشد که با استفاده از تکنیک RPYS انجام شده است. جامعۀ این پژوهش را تعداد 7899 مقاله در حوز‌ه‌ی ترجمه‌ی ماشینی تشکیل می‌دهد که بین سال‌های 1945 تا 2021 در وب آو ساینس نمایه شده‌اند. در راستای هدف پژوهش، ارجاعات استفاده شده در کلیه مقالاتی که در حوزه ترجمه‌ی ماشینی به چاپ رسیده‌اند مورد بررسی قرار گرفت  (193912 ارجاع). پس از استخراج کلیۀ آثاری که در قسمت فهرست منابع این مقالات درج شده بودند، با استفاده از برنامه نرم‌افزاری CRExplorer نتایج مورد تجزیه و تحلیل قرار گرفت.
یافته‌ها: نتایج نشان داد که Shannon (1948) نخستین کسی بود که با معرفی مفهوم آنتروپی در فناوری اطلاعات مقدمات ترجمه‌ی ماشینی را فراهم ساخت. پس از ایشان ، ویور در سال 1949 با طرح ترجمه‌ی کلمه به کلمه در ترجمه‌ی ماشینی آماری، این حوزه را متحول ساخت. اما Levenshtein (1966) بود که با معرفی انقلابی "کدهای باینری" باعث گردید که ترجمه‌ی ماشینی قادر به تصحیح، حذف و بازنگری در ترجمه شود. همچنین در سال 1977، دمپستر و همکارانش الگوریتم آماری جدیدی را جهت دستیابی به احتمال حداکثر در داده‌های ناقص ارائه کردند. در خاتمه، آثار دانشمندانی همچون Brown et al., 1993) ؛ Hochreiter & Schmidhuber, 1997) ؛et al., 2002  Papineni؛ Koehn et al., 2007) در شکل‌دهی و توسعه‌ی ترجمه‌ی ماشینی از تاثیرگذاری بسزایی برخوردار بودند.
نتیجه‌گیری: براساس بررسی و مطالعه آثارتاثیرگذار در حوزه ترجمه‌ی ماشینی، نقش علم ریاضیات و آمار در شکل‌گیری این حوزه مشهود است. هر چند پس از پیشرفت‌ها در علم فناوری اطلاعات و زبانشناسی پیکره‌ای،  ترجمه‌ی ماشینی، دقیق‌تر و با صحت بیشتر و متناسب با زبان مبدا به ترجمه می‌پردازد.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Tracing the historical origins of machine translation: A bibliometric analysis via RPYS

نویسندگان [English]

  • Ebrahim Ezzati Larsari 1
  • Ali Akbar Khasseh 2

1 Assistant Professor, Department of linguistics and foreign languages, Payame Noor University, Tehran, Iran

2 Associate Professor, Department of Knowledge and Information Science, Payame Noor University, Tehran, Iran

چکیده [English]

Background and Objectives: Among the features that have doubled the importance of the field of machine translation is its interdisciplinary nature; so that the influence of this field from sciences such as mathematics, statistics, probability, natural language processing, formal syntax, corpus linguistics and information technology is inevitable. Each of the aforementioned sciences has played a significant role in the emergence, formation and development of machine translation in some way. For this reason, researchers from different disciplines, each with their own expertise and interest, have turned to research in the field of machine translation and have caused the rapid development of this field. This study, employing a scientometric technique called Referenced Publication Years Spectroscopy (RPYS), concerns analyzing the most important historic works published in the area of Machine Translation.
Methodology:  This study is conducted by a scientometric approach. Preliminary data of this study have been extracted from Web of Science. For this purpose, the references used in all the papers in Machine Translation since 1945 to the end of 2021 were studied. This search strategy led to 7899 records, covering 193912 references. Using RPYS software named CRExplorer, the revised data were analyzed.
Findings: Results exposed that the supreme enterprise of MT was initiated by Shannon in 1948;
the one who illuminated the notion of entropy in information technology. After that,
Weaver contributed to the appearance of statistical word by word MT in 1949. Moreover, it
 was Levenshtein (1966) who introduced the revolutionary notion of binary codes capable
of correcting deletions, insertions, and reversals in MT. In addition, Dempster et al. (1977) presented a novel statistical EM algorithm concerning maximum likelihood from incomplete
data. Later, the works of scholars Brown et al. (1993), Hochreiter (1997), Papineni et al.
(2002), and Koehn et al. (2007) have been instrumental in shaping and developing MT.  
Discussion: Based on the study of influential works in the field of machine translation, the
role of mathematics and statistics in the formation of this field is evident. However, after
advances in information technology and body linguistics, machine translation translates
more accurately and conformingly to the source language.

کلیدواژه‌ها [English]

  • Citation Analysis
  • Machine Translation
  • Scientomerics
  • Referenced Publication Years Spectroscopy (RPYS)
Ballandonne, M. (2019). The historical roots (1880–1950) of recent contributions (2000–2017) to ecological economics: Insights from reference publication year spectroscopy. Journal of Economic Methodology, 26(4), 307-326. DOI: 10.1080/1350178X.2018.1554227
Bar-Hillel, Y. (1960). The present status of automatic translation of languages. Advances in computers, 1, 91-163. https://doi.org/10.1016/S0065-2458(08)60607-5
Bolte, J., & Pauwels, E. (2020). A mathematical model for automatic differentiation in machine learning. arXiv preprint arXiv:2006.02080. https://doi.org/10.48550/arXiv.2006.02080
Bornmann, L., & Marx, W. (2014). The wisdom of citing scientists. Journal of the Association for Information Science and Technology, 65(6), 1288-1292. https://doi.org/10.1002/asi.23100
Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J., Mercer, R. L., & Roossin, P. S. (1990). A statistical approach to machine translation. Computational Linguistics, 16(2), 79-85. https://aclanthology.org/J90-2002.pdf
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263-311. https://aclanthology.org/J93-2003.pdf
Carl, M., & Báez, M. C. T. (2019). Machine translation errors and the translation process: A study across different languages. Journal of Specialised Translation, 31, 107-132. https://www.researchgate.net/publication/335920678_Machine_translation_errors_and_the_translation_process_a_study_across_different_languages.
Chakraverty, S., Sahoo, D. M., & Mahato, N. R. (2019). Mcculloch–Pitts neural network model. In Concepts of Soft Computing (167-173). DOI:10.1007/978-981-13-7430-2_11
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
Comins, J. A., & Leydesdorff, L. (2016). Identification of long-term concept-symbols among citations: Can documents be clustered in terms of common intellectual histories? arXiv preprint arXiv:1601.00288. https://doi.org/10.48550/arXiv.1601.00288
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM Algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22.  https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Dew, K. N., Turner, A. M., Choi, Y. K., Bosold, A., & Kirchhoff, K. (2018). Development of machine translation technology for assisting health communication: A systematic review. Journal of biomedical informatics, 85, 56-67. https://doi.org/10.1016/j.jbi.2018.07.018
Fiala, D., & Bornmann, L. (2020). Reference publication year spectroscopy (RPYS) of computer science papers from Eastern Europe. Aslib Journal of Information Management, 72(3), 305-319. DOI:10.1108/AJIM-06-2019-0142
Gile, D. (2015). Analyzing translation studies with scientometric data: From CIRIN to citation analysis. Perspectives, 23(2),1-9. DOI:10.1080/0907676X.2014.972418
Gupta, B. M., & Dhawan, S. M. (2019). Machine Translation Research: A Scientometric Assessment of Global Publications Output during 2007-16. DESIDOC Journal of Library & Information Technology, 39(1), 31-38. DOI:10.14429/djlit.39.1.13558
Habibi, R., Mokhtarpour, R., & Khasseh, A. A. (2018). Analysis of Evolutionary Trends in Global Entrepreneurship Research using Scientometric Techniques. Journal of Entrepreneurship Development, 10(4), 575-594. https://doi.org/10.22059/jed.2018.246176.652
Hardiyanti, M. (2021). Identifying The Common Type of Spelling Error by Leveraging Levenshtein Distance and N-gram. Scientific Journal of Informatics, 8(1), 71-75. DOI:10.15294/sji.v8i1.29273
Heidarimoghadam, R., Khasseh, A. A., Vakilimofrad, H., Fattahi, A., & Amiri, M. R. (2021). Identification and Analysis of the Historical Origins of Ergonomics by Referenced Publication Year's Spectroscopy. Iranian Journal of Ergonomics, 9 (2), 42-57. DOI:10.30699/jergon.9.2.42. (In Persian)
Hilbert, M. (2021). Information Theory for Human and Social Processes. Multidisciplinary Digital Publishing Institute, 23(1), 9. DOI:10.3390/e23010009
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. doi: https://doi.org/10.1162/neco.1997.9.8.1735
Hutchins, W. J. (1986). Machine translation: past, present, future. Ellis Horwood Chichester. https://mt-archive.net/70/CL-1988-Kittredge.pdf
Khasseh, A. A., & Mokhtarpour, R. (2016). Tracing the historical origins of knowledge management issues through referenced publication years spectroscopy (RPYS). Journal of Knowledge Management, 20(6), 1393-1404. DOI:10.1108/JKM-01-2016-0019. (In Persian)
Khasseh, A. A., Asghariyan, N., Tajedini, O., Moosavi, A., & Ghazizadeh, H. (2019). Identification and Analysis of the His-torical Origins of Occupational Therapy by Referenced Publication Years Spectroscopy. Scientometrics Research Journal, 5(9), 161-184. https://doi.org/10.22070/rsci.2019.3710.1230. (In Persian)
Khullar, P. (2021). Are Ellipses important for Machine Translation?. Computational Linguistics, 47(4), 1-10. DOI:10.1162/coli_a_00414
Koehn, P. (2010). Statistical Machine Translation. Netherlands: Cambridge University Press. https://www.google.com/books/edition/Statistical_Machine_Translation/4v_Cx1wIMLkC?hl=en
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., & Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, (177-180). Association for Computational Linguistics. http://www.aclweb.org/anthology/P07-2045
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1), 79-86. DOI:10.1214/AOMS/1177729694
Kwon, J., Ho, N., & Caramanis, C. (2021). On the minimax optimality of the EM algorithm for learning two-component mixed linear regression. International Conference on Artificial Intelligence and Statistics, PMLR, (130), 1405-1413. https://proceedings.mlr.press/v130/kwon21b.html
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. biometrics, 33(1), 159-74. PMID: 843571. https://pubmed.ncbi.nlm.nih.gov/843571/
Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th annual international conference on Systems documentation, Toronto, Ontario, Canada. https://doi.org/10.1145/318723.318728
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady, 10, 707-710. Corpus ID: 60827152
Leydesdorff, L., Bornmann, L., Marx, W., & Milojević, S. (2014). Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST. Journal of Informetrics, 8(1), 162-174. https://doi.org/10.1016/j.joi.2013.11.006
Locke, W. N., & Booth, A. D. (1956). Machine translation of languages. American Documentation, 7(2), 135-136. DOI: 10.1002/asi.5090070209
Manz, O. (2021). Entropy Coding. In Well Packed–Not a Bit Too Much (17-27). DOI:10.1007/978-3-658-34737-6_4
Marx, W., Bornmann, L., Barth, A., & Leydesdorff, L. (2014). Detecting the historical roots of research fields by reference publication year spectroscopy (RPYS). Journal of the Association for Information Science and Technology, 65(4), 751-764. DOI:10.1002/asi.23089
Mokhtarpour, R., Khasseh, A. (2017). Tracing the Historical Origins of Research Methodology Issues through Referenced Publication Years Spectroscopy (RPYS). Journal of Studies in Library and Information Science, 9(20), 43-58.  https://doi.org/10.22055/slis.2017.13186. (In Persian)
Mousavi Chelak A., Khasseh A. A., Soheili F. (2018). Exploring the evolution of reference services literature using Reference Publication Year Spectroscopy (RPYS). Research on Information Science & Public Libraries, 24(1),103-124. http://publij.ir/article-1-1707-en.html. (In Persian)
Och, F. J., & Ney, H. (2000). Improved statistical alignment models. Proceedings of the 38th annual meeting of the association for computational linguistics. https://doi.org/10.3115/1075218.107527
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: A method for automatic evaluation of machine translation. Proceedings of the 40th annual meeting of the Association for Computational Linguistics. DOI:10.3115/1073083.1073135
Petrilli, S. (2021). Translation translation. BRILL. https://brill.com/display/title/27737?language=en
Reifler, E. (1950). Studies in Mechanical Translation, n 1, MT. Mimeographed.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536. http://dx.doi.org/10.1038/323533a0
Saha, S., & Jose, J. (2020). Shannon entropy as a predictor of avoided crossing in confined atoms. International Journal of Quantum Chemistry, 120(22), 26374. DOI:10.1002/qua.26374
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. https://doi.org/10.1145/361219.361220
Scheidsteger, T., & Haunschild, R. (2020). Telling the story of solar energy meteorology into the satellite era by applying (co-citation) reference publication year spectroscopy. Scientometrics, 125(2), 1159-1177. DOI: 10.1007/s11192-020-03597-0
Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423. https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
Shao, C., Feng, Y., Zhang, J., Meng, F., & Zhou, J. (2021). Sequence-Level Training for Non-Autoregressive Neural Machine Translation. arXiv preprint arXiv:2106.08122.
https://doi.org/10.48550/arXiv.2106.08122
Soheili F., Khasseh A. A. (2015). Historical Origins of Information Behavior Research by Reference Publication Year Spectroscopy. Journal of Information Processing and Management (JIPM), 31(1), 3-26. 10.35050/JIPM010.2015.001. (In Persian)
Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. Proceedings of International Conference on Spoken Language Processing, Denver, 901-904. https://www.scirp.org/reference/referencespapers?referenceid=2664291
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Proceedings of the 27th International Conference on Neural Information Processing Systems. 2, Montreal, Canada. https://dl.acm.org/doi/10.5555/2969033.2969173
Ulitkin, I., Filippova, I., Ivanova, N., & Poroykov, A. (2021). Automatic evaluation of the quality of machine translation of a scientific text: the results of a five-year-long experiment. E3S Web of Conferences.DOI:10.1051/e3sconf/202128408001
Umeozor, S. N. (2020). Information Retrieval: A Communication Process in the 21st Century Library. International Journal of Knowledge Content Development & Technology, 10(2), 7-18. https://journals.sfu.ca/ijkcdt/index.php/ijkcdt/article/view/339
Vieira, L. N., O’Hagan, M., & O’Sullivan, C. (2021). Understanding the societal impacts of machine translation: A critical review of the literature on medical and legal use cases. Information, Communication & Society, 24(11), 1515-1532. https://doi.org/10.1080/1369118X.2020.1776370
Wray, K. B., & Bornmann, L. (2014). Philosophy of science viewed through the lens of Referenced Publication Years Spectroscopy (RPYS). Scientometrics, 102(3), 1987-1996. DOI:10.1007/s11192-014-1465-6
Wu, Y., & Xu, R. (2011). The Application of Chomsky's Syntactic Theory in Translation Study. Journal of Language Teaching and Research, 2(2), 396. DOI:10.4304/jltr.2.2.396-399
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K.,  Klingner, J., Shah, A.,  Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., & Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
https://doi.org/10.48550/arXiv.1609.08144
Yao, Q., Li, X., Luo, F., Yang, L., Liu, C., & Sun, J. (2019). The historical roots and seminal research on health equity: A referenced publication year spectroscopy (RPYS) analysis. International Journal for Equity in Health, 18(1), 1-15. https://equityhealthj.biomedcentral.com/articles/10.1186/s12939-019-1058-3