A research paper published on May 2, 2019 compared the performance of translators who used machine translation post-editing (often called PEMT) and interactive translation prediction (or ITP). The results suggest that ITP may be the better method of human-machine interaction for translators.
If ITP sounds familiar, it is because the approach has been pioneered by Silicon Valley-based Lilt. The startup launched in 2015 as an ITP-powered translation productivity tool (a.k.a. CAT) aimed at individual linguists, was sued by SDL (then settled), received millions in funding from VC giant Sequoia Capital and, over time, pivoted to a managed services business model.
While translation productivity tools used with PEMT pre-populate target translation segments with raw MT output, which a linguist would then review and edit, ITP acts more like an auto-complete feature that suggests target translations below the segment as the linguist works in an empty target segment. Additionally, ITP dynamically takes the linguist’s partial translations into account and suggest better translations for the rest of the sentence.
The outcome of the PEMT vs. ITP face-off could decide how the vast majority of translators interact with content for years to come.
How do the two approaches stack up against each other? Graduate student Rebecca Knowles from John Hopkins University, PhD Marina Sanchez Torron from University of Auckland, and Professor Philipp Koehn, also from John Hopkins, conducted the study and authored the paper. They compiled their findings in a report entitled “A user study of neural interactive translation prediction.”
Whereas previous research used ITP systems based on statistical machine translation (SMT), this time, Koehn’s team deployed neural machine translation (NMT) in ITP — a first, according to the study’s authors. The neural ITP system they deployed was previously developed by Knowles and Koehn based on the University of Edinburgh’s nematus NMT model. Speaking to Slator via email, Prof. Koehn said that system they developed “showed significantly better results in simulation studies.”
“So, the obvious question was if this also leads to practical translator productivity increases by professional translators. These kind of studies are always a bit tricky since translators have to get used to a tool and a new way of working, and it is hard to do this at scale,” Koehn said. “Any time we [do] user studies we also have to deal with the very large variance between translators. Still, it is encouraging to see that this may not just be a more enjoyable way to interact with machine translation but also lead to more productive work by at least some of the translators.”
The researchers used a straightforward methodology: build nematus into an ITP environment, train it with millions of sentence pairs, and have participating professional translators use PEMT and the neural ITP system and provide feedback.
The nematus-powered ITP system used in the study was put into CASMACAT, a translation productivity tool developed between 2011-14 under a European Union Programme for Research and Technological Development. The authors employed the same datasets in the 2013 Workshop on Statistical Machine Translation (WMT13) to train their system. The entire training dataset contained nearly four million sentence pairs.
Participants in the study consisted of eight English-into-Spanish professional translators working on eight news texts mindful of specific guidelines meant to maximize the quantitative data generated for the study. The linguists were also asked to provide feedback on their experience with the neural ITP system.
If the number of participants seems a little underwhelming, the authors note that “cost and convenience motivated our sample size, quality assessment and language pair choices, therefore restricting the application of our findings.”
The sample size was further reduced by 17% due to technical issues and a translator choosing not to adhere to instructions. That same translator made it clear they were not open to working with neural ITP in their very negative feedback on the technology.
The researchers measured translation productivity based on three general categories, further broken down into 11 finer variables. The three categories were (1) temporal effort or processing time; (2) technical effort; (3) final translation quality.
The eight translators were also provided a questionnaire regarding ITP. According to the authors, “sample results for eight out of the 11 variables are favorable to ITP.”
During the study, neural ITP provided more accurate predictions than the researchers expected, but they also discovered “fluency Issues are more than twice as frequent in ITP as in PE[MT].”
They noted, however, that CASMACAT being a non-production environment and lacking such features as grammar auto-correct “very likely contributed” to this — a valid point, given that commercial translation productivity software developers focus much of their time on improving UI and adding extra features to the underlying technology.
They also found that “in terms of improvement over time, none of the models could determine whether productivity indicators improved over time in ITP.”
Feedback from the professional translators was generally very positive, save for that one linguist who provided negative feedback on every question. The study found that translator experience with PEMT may also play a role in the perception of neural ITP.
The researchers said the translators who had used PEMT before did not have any negative views toward neural ITP, regardless of how experienced they were in their profession. At the same time, they noted “some indication that translators who have formal PE[MT] training or provide PE[MT] services frequently benefited the most from ITP.”
According to the researchers, “Regardless of their translation experience, professional translators with little or no PE[MT] experience […] may be more reluctant to engage in ITP.” The two participants who expressed negative views of ITP had little to no PEMT experience.
The difference in the cognitive and translation processes between PEMT and ITP meant that using ITP resulted in “less time researching terminology,” the translators said.
A couple of translators expressed concern about the translator’s role in an ITP-driven environment or “how in such scenarios, MT priming means ‘the voice of the translator is lost,’ and how the user-friendliness and speed of the ITP system may generate overconfidence on the translator side and lead to mistakes or wrong decisions if the required exigence and rigor levels are not there, on the user’s side’.”
Editor’s note: This story has been updated to add input from one of the paper’s authors.
2019年5月2日发表的一篇研究论文比较了使用机器翻译后编辑(通常称为 PEMT )和交互式翻译预测( ITP )的译者的表现。结果表明, ITP 是译者进行人机交互的较好方法。
如果说 ITP 听起来很熟悉,那是因为这种方法是由硅谷的 Litt 首创的。成立于2015年的这家初创公司是一家以 ITP 为动力的翻译生产力工具(简称 CAT ),目标客户是个别语言学家。该公司被 SDL 起诉(随后被和解),获得了风投巨头红杉资本( Sequoia Capital )数百万美元的资金,并随着时间的推移,转向了一种托管服务商业模式。
虽然与 PEMT 一起使用的翻译生产力工具使用原始 MT 输出预先填充目标翻译段,然后语言学家将对其进行审查和编辑,但 ITP 更像是一个自动完成的功能,它建议在该部分下面进行目标翻译,因为语言学家在一个空的目标部分工作。此外, ITP 动态地考虑了语言学家的部分翻译,并为句子的其余部分提供了更好的翻译建议。
PEMT 与.ITP 的对峙可能决定未来几年绝大多数翻译者如何与内容交互。
这两种方法如何相互叠加?来自约翰霍普金斯大学的研究生 Rebecca Knowles 博士、来自奥克兰大学的 Marina Sanchez Toron 博士和来自约翰霍普金斯大学的 Philipp Koehn 教授进行了这项研究并撰写了这篇论文。他们在一份题为“神经交互翻译预测的用户研究”的报告中编辑了他们的发现。
此前的研究使用了基于统计机器翻译( SMT )的 ITP 系统,而这一次, Koehn 的团队在 ITP 中部署了神经机器翻译( NMT ),这是第一次。他们部署的神经 ITP 系统之前是由 Knowles 和 Koehn 基于爱丁堡大学的线虫 NMT 模型开发的。通过电子邮件与斯莱特教授交谈。Koehn 表示,他们开发的系统“在模拟研究中显示出明显更好的结果”。
“因此,显而易见的问题是,这是否也会导致专业译者提高实际译者的生产率。这些研究总是有些棘手,因为译者必须习惯于一种工具和一种新的工作方式,而且很难在规模上做到这一点,” Koehn 说。“每当我们进行用户研究时,我们都必须处理翻译员之间的巨大差异。但令人鼓舞的是,这可能不仅仅是一种更愉快的与机器翻译互动的方式,而且至少会带来一些翻译工作者更有成效的工作。”
研究人员使用了一种简单的方法:将线虫构建到 ITP 环境中,用数百万对句子进行训练,让参与翻译的专业人员使用 PEMT 和神经 ITP 系统并提供反馈。
该研究中使用的由线虫驱动的 ITP 系统被放入 CASMACAT 中, CASMACAT 是根据欧盟研究和技术发展方案在2011-14年期间开发的翻译生产力工具。作者在2013年统计机器翻译研讨会( WMT13)中使用了相同的数据集来培训他们的系统。整个培训数据集包含近400万个句子对。
该研究的参与者包括八名英语到西班牙语的专业翻译人员,他们致力于研究八种新闻文本,同时考虑到旨在最大限度地扩大研究产生的量化数据的具体指导方针。语言学家也被要求对他们在神经 ITP 系统中的经验提供反馈。
如果参与者的数量看起来不那么令人满意,作者指出,“成本和便利推动了我们的样本量、质量评估和语言选择,因此限制了我们的研究结果的应用。”
由于技术问题和翻译人员选择不遵守指令,样品尺寸进一步减少了17%。这位翻译清楚地表明,他们不愿意与神经性 ITP 合作,因为他们对这项技术的反馈非常消极。
研究人员根据三个一般类别来衡量翻译效率,进一步细分为11个更精细的变量。三种类型分别为:(1)时间努力或加工时间;(2)技术努力;(3)最终翻译质量。
8名笔译员还收到了关于国际笔译员协会的问卷。根据作者的说法,“11个变量中有8个的样本结果有利于 ITP 。”
在研究期间,神经性 ITP 提供了比研究人员预期的更准确的预测,但他们也发现“在 ITP 中,流畅性问题的频率是 PE [ MT ]的两倍多。”
然而,他们指出, CASMACAT 是一个非生产环境,缺乏语法自动校正等功能“很有可能”促成了这一点——这是一个有效的观点,因为商业翻译生产力软件开发人员将大部分时间集中在改进 UI 和为底层技术添加额外功能上。
他们还发现,“就随时间的推移而改善而言,没有一个模型能够确定生产率指标是否随着时间的推移而改善。”
除了一位语言学家对每个问题都给出了否定的反馈外,专业翻译的反馈通常是非常积极的。研究发现,翻译经验与 PEMT 也可能发挥作用,在感知神经 ITP 。
研究人员说,以前使用 PEMT 的译者对神经性 ITP 没有任何负面看法,不管他们在职业上有多丰富的经验。与此同时,他们指出,“一些迹象表明,接受过正式的 PE ( MT )培训或提供 PE ( MT )服务的译者经常从 ITP 中受益最多。”
研究人员称,“不管他们的翻译经验如何,专业的翻译工作者很少或没有 PE [ MT ]经验[……]可能更不愿意参与 ITP 。”两位对 ITP 表达否定意见的参与者几乎没有 PEMT 经验。
翻译人员说, PEMT 和 ITP 在认知和翻译过程上的差异意味着,使用 ITP 会减少“研究术语的时间”。
两位译者对译者在 ITP 驱动的环境中的角色表达了担忧,或者说“在这种情况下, MT 启动意味着‘译者的声音丢失了’。“以及 ITP 系统的用户友好程度和速度如何在翻译者方面产生过度自信,并在用户方面不存在所需的存在和严格程度时导致错误或错误决策。”
编辑注意:这个故事已经被更新,以增加来自论文作者之一的输入。
以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。
阅读原文