How ‘Pseudo-Refinement Triplets’ Can Improve Large Language Models in Translation

“伪精化三元组”如何改进翻译中的大型语言模型

2024-07-16 10:30 slator

本文共437个字,阅读需5分钟

阅读模式 切换至中文

In a June 22, 2024 paper, Zhaopeng Feng, Ruizhe Chen, Zijie Meng, and Zuozhu Liu from Zhejiang University, along with Yan Zhang from Tencent, presented “Ladder,” a model-agnostic and cost-effective tool designed to boost the performance of large language models (LLMs) in machine translation (MT). Unlike conventional methods that require extensive computing resources, significant data, and human annotations, Ladder leverages so-called “pseudo-refinement triplets” created from LLMs, reducing the need for additional human effort. Pseudo-refinement triplets are a source sentence, an intermediate translation generated by an LLM, and a reference translation (the refined translation). This approach enables the creation of training data for MT refinement in an automated manner and “without extra labor costs.” The researchers noted that these triplets share a format similar to the automatic post-editing (APE) triplets, which typically consist of a source sentence, a translation with errors, and post-edits. However, the APE annotation process requires substantial human resources for tasks such as evaluation, error identification, and post-editing. The triplets are then categorized into three hierarchies: easy, medium, and hard. Easy translations have significant differences from the reference, allowing for more improvements. On the other hand, hard translations are almost perfect, making them difficult to refine, and medium translations fall in between these two categories. This categorization is based on the COMET scores assigned to each triplet. A hierarchical fine-tuning follows, which involves training the Ladder model in a progressive manner, starting with easy examples and gradually moving to medium and hard examples. This hierarchical fine-tuning approach allows the model to learn and improve its refining performance incrementally. “Instead of directly fine-tuning a translation-target LLM, we train an LLM to refine translations using refinement datasets without human evaluation or post-edits, employing an instruction-following refinement task,” said the researchers. Ladder can integrate with any general-purpose LLM to improve translation performance without requiring significant changes to the existing model structure. This flexibility makes Ladder a versatile tool for various LLMs. The researchers tested Ladder in two ways. Firstly, they checked how well Ladder can improve different types of language models, including those designed specifically for translation tasks, including BigTranslate, NLLB, and ALMA, and more general language models, including GPT-3.5 and GPT-4. Secondly, we compared Ladder to the best-known methods for refining translations or post-editing, including Unbabel’s TowerInstruct. They found that Ladder can significantly improve the overall translation quality across most translation-specific and general-purpose LLMs. Ladder can “elevate raw translations to the level of top-tier open-source models,” they said. The paper and the code are available on GitHub.
在2024年6月22日的一篇论文中,浙江大学的Zhaopeng Feng,Ruizhe Chen,Zijie Meng和Zuozhu Liu以及腾讯的Yan Zhang提出了“Ladder”,这是一种模型不可知且具有成本效益的工具,旨在提高机器翻译(MT)中大型语言模型(LLM)的性能。 与需要大量计算资源、重要数据和人工注释的传统方法不同,Ladder利用了从LLM创建的所谓“伪细化三元组”,减少了对额外人力的需求。 伪精化三元组是源句子、由LLM生成的中间翻译和参考翻译(精化翻译)。 这种方法能够以自动化的方式创建用于MT细化的训练数据,并且“无需额外的劳动力成本”。 研究人员指出,这些三元组的格式类似于自动后期编辑(APE)三元组,通常由源句子,错误翻译和后期编辑组成。然而,APE注释过程需要大量的人力资源来执行评估、错误识别和后期编辑等任务。 然后将三元组分为三个层次:简单,中等和困难。简单的翻译与参考文献有显着差异,可以进行更多改进。另一方面,硬翻译几乎是完美的,这使得它们很难精炼,而中等翻译介于这两类之间。这种分类是基于分配给每个三元组的COMET分数。 接下来是分层微调,这涉及以渐进的方式训练Ladder模型,从简单的示例开始,逐渐移动到中等和困难的示例。这种分层微调方法允许模型逐步学习和改进其细化性能。 研究人员说:“我们不是直接微调目标LLM,而是训练LLM使用精化数据集来精化翻译,而无需人工评估或后期编辑,采用精化后的精化任务。 Ladder可以与任何通用LLM集成,以提高翻译性能,而无需对现有模型结构进行重大更改。这种灵活性使Ladder成为各种LLM的通用工具。 研究人员以两种方式测试了梯子。首先,他们检查了Ladder如何改进不同类型的语言模型,包括专门为翻译任务设计的语言模型,包括BigTranslate,NLLB和ALMA,以及更通用的语言模型,包括GPT-3.5和GPT-4。其次,我们将Ladder与最著名的翻译或后期编辑方法进行了比较,包括Unbabel的TowerInstruct。 他们发现,Ladder可以显着提高大多数特定于翻译和通用LLM的整体翻译质量。他们说,Ladder可以“将原始翻译提升到顶级开源模型的水平”。 论文和代码可以在GitHub上找到。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文