Researchers Combine DeepL and GPT-4 to Automate (Research) Questionnaire Translation

研究人员结合DeepL和GPT-4来自动化(研究)问卷翻译

2024-08-19 09:30 slator

本文共439个字,阅读需5分钟

阅读模式 切换至中文

In a July 30, 2024 research paper, Otso Haavisto and Robin Welsch from Aalto University presented a web application designed to simplify the process of adapting questionnaires for different languages and cultures. This tool aims to assist researchers conducting cross-cultural studies, enhancing the quality and efficiency of questionnaire adaptation, while promoting equitable research practices. Haavisto and Welsch highlighted that translating questionnaires is often costly and “resource-intensive,” requiring multiple independent translators and extensive validation processes. According to the authors, this complexity has led to inequalities in research, particularly in non-English-speaking and low-income regions where access to quality questionnaires is limited. In questionnaire translation, maintaining semantic similarity is crucial to ensure that the translated version retains the same meaning as the original. As the authors noted, “semantic similarity is more important than word-by-word match.” According to the authors, cultural nuances and colloquial expressions can further complicate this process, making it difficult to achieve accurate translations. To address these challenges, they developed a web application that allows users to translate questionnaires, edit translations, backtranslate to the source language for comparisons against the original, and receive evaluations of translation quality generated by a large language model (LLM). The tool integrates DeepL for initial translations and GPT-4 for evaluating and suggesting improvements. The decision to use DeepL was based on its “reliable output and promising results in translating scientific text,” which the authors said was essential for the accuracy of research questionnaires. “We set out to develop a prototype of a questionnaire translation tool that would exploit the versatility of LLMs in natural language processing tasks to the benefit of researchers conducting cross-cultural studies,” they said. Haavisto and Welsch tested the tool’s effectiveness through two online studies: one involving 10 participants testing the English-German language pair and another involving 20 participants testing the English-Portuguese language pair. Both studies showed “promising results regarding LLM adoption in the questionnaire translation process,” according to the authors. The study’s findings indicated that machine translation, when supplemented by GPT-4-generated quality scores, leads to translation quality and semantic similarity comparable to traditional translation. Participants also found the GPT-4-generated suggestions “moderately helpful” and accurate in representing translation quality. Haavisto and Welsch also noted that LLM-generated translation quality evaluations can assist researchers in identifying and addressing context-specific issues in their translations, highlighting that “this is the first step towards more equitable questionnaire-based research, powered by AI.” The tool currently supports translations in English, German, Portuguese, and Finnish — although Finnish remains untested. The code for the prototype is publicly available on GitHub, inviting further exploration and contributions from the community.
在2024年7月30日的一篇研究论文中,阿尔托大学的Otso Haavisto和Robin Welsch提出了一个网络应用程序,旨在简化针对不同语言和文化调整问卷的过程。 这一工具旨在协助研究人员进行跨文化研究,提高调查问卷调整的质量和效率,同时促进公平的研究做法。 Haavisto和Welsch强调,翻译调查问卷通常成本高昂且“资源密集型”,需要多名独立翻译人员和大量的验证过程。根据作者的说法,这种复杂性导致了研究中的不平等,特别是在非英语和低收入地区,获得高质量问卷的机会有限。 在问卷翻译中,保持语义相似性是确保译文与原文保持相同含义的关键。正如作者所指出的,“语义相似性比逐字匹配更重要。”作者认为,文化上的细微差别和口语表达会使这一过程更加复杂,难以实现准确的翻译。 为了应对这些挑战,他们开发了一个Web应用程序,允许用户翻译问卷,编辑翻译,回译到源语言以与原文进行比较,并接收由大型语言模型(LLM)生成的翻译质量评估。 该工具集成了用于初始翻译的DeepL和用于评估和建议改进的GPT-4。使用DeepL的决定是基于其“可靠的输出和翻译科学文本的有希望的结果”,作者说这对研究问卷的准确性至关重要。 “我们着手开发一个问卷翻译工具的原型,该工具将利用LLM在自然语言处理任务中的多功能性,以使进行跨文化研究的研究人员受益,”他们说。 Haavisto和Welsch通过两项在线研究测试了该工具的有效性:一项涉及10名参与者测试英语-德语语言对,另一项涉及20名参与者测试英语-葡萄牙语语言对。根据作者的说法,这两项研究都显示了“在问卷翻译过程中采用LLM的可喜成果”。 该研究的结果表明,当机器翻译辅以GPT-4生成的质量分数时,其翻译质量和语义相似性可与传统翻译相媲美。参与者还发现GPT-4生成的建议“适度有用”,并且在代表翻译质量方面准确。 Haavisto和Welsch还指出,法学硕士生成的翻译质量评估可以帮助研究人员识别和解决翻译中的特定上下文问题,并强调“这是在人工智能的支持下实现更公平的基于问卷的研究的第一步。” 该工具目前支持英语、德语、葡萄牙语和芬兰语的翻译-尽管芬兰语尚未经过测试。原型的代码在GitHub上公开,邀请社区进一步探索和贡献。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文