Amazon Releases New Iteration of Neural Machine Translation Toolkit Sockeye

亚马逊发布新一代的神经机器翻译工具包Sockeye

2020-08-13 16:20 slator

本文共271个字,阅读需3分钟

阅读模式 切换至中文

On August 11, 2020 researchers at Amazon detailed advances made in Sockeye 2, a new iteration of the e-commerce giant’s open source, sequence-to-sequence toolkit for neural machine translation (NMT). Now available on Github, the paper describes Sockeye 2 as providing “out-of-the-box support for quickly training strong Transformer models for research or production.” Amazon introduced the original Sockeye in July 2017, after acquiring Pittsburgh, Pennsylvania-based MT vendor Safaba. Since then, Amazon has forged ahead with localization projects via machine learning offerings that were once the exclusive territory of language service providers (LSPs), including machine dubbing and quality estimation of translated subtitles. Over the past three years, Sockeye, which powers Amazon Translate, has been referenced in at least 25 scientific publications, including winning submissions to Conference on Machine Translation (WMT) evaluations. Amazon is not the only player contributing to Sockeye 2’s improvements over its predecessor. The paper specifically credits Intel and NVIDIA for performance improvements on Sockeye inference and Transformer implementation, respectively. The authors — five Amazon research scientists and an external advisor, University of Edinburgh professor Kenneth Heafield — attribute Sockeye 2’s significant gains primarily to streamlined Gluon implementation; support for state-of-the-art architectures and efficient decoding; and improved model training. By adopting Gluon, “the latest and preferred API of MXNet,” Sockeye 2 requires about 25% less Python code, and improves its training speed by 14%, compared to Sockeye. The simplified Gluon code base is meant to enable rapid development and experimentation. Inspired by the success of self-attentional models, the researchers focused on Transformer architecture and found that “deep encoders with shallow decoders are competitive in BLEU and significantly faster for decoding.”
2020年8月11日,亚马逊的研究人员详细介绍了Sockeye 2的进展,Sockeye 2是这家电子商务巨头用于神经机器翻译的开源,序列到序列工具包的新迭代。 现在仅可以在Github上获得的这篇论文认为Sockeye2提供了开箱即用的支持,是一种“用于研究或生产的快速训练的强大的变压器模型。 2017年7月,亚马逊在收购了位于宾夕法尼亚州匹兹堡的MT供应商Safaba之后,推出了最初的Sockeye。 从那以后,亚马逊就通过机器学习产品推进本地化项目,这些产品曾经是语言服务提供商的专属领域,包括机器配音和翻译字幕的质量评估。 在过去的三年里,支持亚马逊翻译的Sockeye已经在至少25份科学出版物中被引用,包括在机器翻译会议评估中获奖的参赛作品。 亚马逊并不是唯一一个对Sockeye2的改进做出贡献的参与者。 这篇论文特别赞扬英特尔和英伟达分别在Sockeye推理和变压器实现方面的性能改进。 作者--亚马逊研究的五位科学家和一位外部顾问(爱丁堡大学教授肯尼斯·希菲尔德)将Sockeye2的显著收益主要归因于精简的胶子实现,最先进的体系结构和有效的解码支持和改进的模型训练。 与Sockeye相比,通过采用了“MXNet最新和首选的API”Gluon,Sockeye 2所需的Python代码减少了约25%,训练速度提高了14%。 简化胶子代码库的目的是使快速开发和实验成为可能。 受自我注意模型成功的启发,研究人员将重点放在变压器架构上,并发现“带浅层解码器的深度编码器在BLEU中具有竞争力,而且用于解码的速度明显更快。”

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文