How Well Does Llama 3.1 Perform for Text and Speech Translation?

Llama 3.1在文本和语音翻译方面表现如何?

2024-08-07 12:00 slator

本文共778个字,阅读需8分钟

阅读模式 切换至中文

Meta’s research team introduced Llama 3.1 on July 23, 2023, calling it “the world’s largest and most capable openly available foundation model.” Llama 3.1 is available in various parameter sizes — 8B, 70B, and 405B — providing flexibility for deployment based on computational resources and specific application needs. On April 18, 2024, Meta announced the Llama 3 family of large language models, which initially included only the 8B and 70B sizes. This latest release introduced the 405B model along with upgraded versions of the 8B and 70B models. Llama 3.1 models represent a significant advancement over their predecessor, Llama 2, being pre-trained on an extensive corpus of 15 trillion multilingual tokens, a substantial increase from Llama 2’s 1.8 trillion tokens. With a context window of up to 128k tokens — previously limited to 8k tokes — they offer notable improvements in multilinguality, coding, reasoning, and tool usage. Llama 3.1 maintains a similar architecture to Llama and Llama 2 but achieves performance improvements through enhanced data quality, diversity, and increased training scale. Meta’s research team tested Llama 3.1 on over 150 benchmark datasets covering a wide range of languages. They found that their “flagship model” with 405B parameters is competitive with leading models across various tasks and is close to matching the state-of-the-art performance. The smaller models are also “best-in-class,” outperforming alternative models with comparable numbers of parameters. In multilingual tasks, the small Llama 3.1 8B model surpassed Gemma 2 9B and Mistral 7B, while Llama 3.1 70B outperformed Mixtral 8Xx22B and GPT 3.5 Turbo. Llama 3.1 405B is on par with Claude 3.5 Sonnet and outperformed GPT-4 and GPT 4o. Meta’s research team emphasized that Llama 3.1 405B is “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in […] multilingual translation,” among other tasks. They expressed optimism about the potential for creating innovative applications leveraging the model’s multilingual capabilities and extended context length, stating, “we can’t wait to see what the community does with this work.”. In addition to language processing, the development of Llama 3.1 included multimodal extensions that enable image recognition, video recognition, and speech understanding capabilities. Although these multimodal extensions are still under development, initial results indicate competitive performance in image, video, and speech tasks. Meta’s research team specifically evaluated Llama 3.1 on automatic speech recognition (ASR) and speech translation. In ASR, they compared its performance against Whisper, SeamlessM4T, and Gemini. Llama 3.1 outperformed Whisper and SeamlessM4T across all benchmarks and performed similarly to Gemini, demonstrating “strong performance on speech recognition tasks.” In speech translation tasks, where the model was asked to translate non-English speech into English text, Llama 3.1 again outperformed Whisper and SeamlesM4T. “The performance of our models in speech translation highlights the advantages of multimodal foundation models for tasks such as speech translation,” Meta’s team said. They also shared details of the development process to help the research community understand the key factors of multimodal foundation model development and encourage informed discussions about the future of these models. “We hope sharing our results early will accelerate research in this direction,” they said. Meta’s launch of Llama 3.1 has created a buzz in the AI community. Since the release, many people have taken to X and LinkedIn to call it a “game-changer” or “GPT-4 killer,” recognizing this moment as “the biggest moment for open-source AI.” Additionally, they have talked about a “seismic shift in business transformation,” explaining that this is going to “revolutionize how companies work.” Posts are filled with examples showing the many different ways Llama 3.1 can be used, building from phone assistants to document assistants and code assistants. Meta has released all Llama 3.1 models under an updated community license, promoting further innovation and responsible development towards artificial general intelligence (AGI). “We hope that the open release of a flagship model will spur a wave of innovation in the research community, and accelerate a responsible path towards the development of artificial general intelligence” they said. Additionally, they believe that the release of Llama 3.1 will encourage the industry to adopt open and responsible practices in AGI development. The Meta research team acknowledges that there is still much to explore, including more device-friendly sizes, additional modalities, and further investment in the agent platform layer. The models are available for download on llama.meta.com and Hugging Face and ready for immediate development within a broad ecosystem of partner platforms, including AWS, NVIDIA, Databricks, Groq, Dell, Azure, Google Cloud, and Snowflake. Ahmad Al-Dahle, who leads Meta’s generative AI efforts, wrote in a post on X, “With Llama 3.1 in NVIDIA AI Foundry we’ll see enterprises to easily create custom AI services with the world’s best open source AI models.”
Meta的研究团队于2023年7月23日推出了Llama 3.1,称其为“世界上最大、功能最强大的公开可用基础模型”。 Llama 3.1提供各种参数大小——8B、70B和405B——根据计算资源和特定应用需求提供部署灵活性。2024年4月18日,Meta宣布了Llama 3系列大型语言模型,最初仅包括8B和70B尺寸。这一最新版本引入了405B型号以及8B和70B型号的升级版本。 Llama 3.1模型比其前身Llama 2有了重大进步,在15万亿多语言令牌的广泛语料库上进行了预训练,比Llama 2的1.8万亿令牌大幅增加。凭借高达128k令牌的上下文窗口(以前仅限于8k令牌),它们在多语言、编码、推理和工具使用方面提供了显著的改进。 Llama 3.1保持了与Llama和Llama 2类似的架构,但通过增强数据质量、多样性和增加训练规模来实现性能改进。 Meta的研究团队在涵盖多种语言的150多个基准数据集上测试了Llama 3.1。他们发现,他们的405B参数的“旗舰型号”在各种任务上都与领先型号具有竞争力,并且接近匹配最先进的性能。较小的型号也是“同类最佳”,优于参数数量相当的替代型号。 在多语言任务中,小型Llama 3.1 8B型号超越了Gemma 2 9B和Mistral 7B,而Llama 3.1 70B则超越了Mixtral 8Xx22B和GPT 3.5 Turbo。美洲驼3.1 405B与克劳德3.5十四行诗不相上下,表现优于GPT-4和GPT 4o。 Meta的研究团队强调,Llama 3.1 405B是“第一个公开可用的模型,在【……】多语言翻译的最先进功能方面可以与顶级人工智能模型相媲美”。 他们对利用该模型的多语言功能和扩展的上下文长度创建创新应用程序的潜力表示乐观,并表示,“我们迫不及待地想看看社区对这项工作做了什么。”。 除了语言处理,Llama 3.1的开发还包括多模态扩展,支持图像识别、视频识别和语音理解功能。 尽管这些多模态扩展仍在开发中,但初步结果表明在图像、视频和语音任务中具有竞争力的性能。 Meta的研究团队专门对Llama 3.1在自动语音识别(ASR)和语音翻译方面进行了评估。在ASR中,他们将其性能与Whisper、SeamlessM4T和Gemini进行了比较。Llama 3.1在所有基准测试中都优于Whisper和SeamlessM4T,表现与Gemini相似,展示了“在语音识别任务上的强劲表现”。 在语音翻译任务中,模型被要求将非英语语音翻译成英语文本,Llama 3.1再次优于Whisper和SeamlesM4T。Meta的团队表示:“我们的模型在语音翻译方面的表现凸显了多模态基础模型在语音翻译等任务方面的优势。” 他们还分享了开发过程的细节,以帮助研究界了解多模态基础模型开发的关键因素,并鼓励对这些模型的未来进行知情讨论。“我们希望尽早分享我们的结果将加速这个方向的研究,”他们说。 Meta推出Llama 3.1在人工智能社区引起了轰动。自发布以来,许多人在X和LinkedIn上称其为“游戏规则改变者”或“GPT-4杀手”,认为这一时刻是“开源人工智能最重要的时刻”。此外,他们还谈到了“业务转型的巨大转变”,并解释说这将“彻底改变公司的工作方式”。 帖子中充满了展示Llama 3.1的许多不同使用方式的示例,从电话助手到文档助手和代码助手。 Meta在更新的社区许可下发布了所有Llama 3.1模型,促进了人工智能(AGI)的进一步创新和负责任的发展。 他们表示:“我们希望旗舰模型的开放发布能够刺激研究界的创新浪潮,并加速人工智能发展的负责任道路。”此外,他们认为Llama 3.1的发布将鼓励行业在AGI开发中采用开放和负责任的实践。 Meta研究团队承认,还有很多需要探索的地方,包括更适合设备的尺寸、额外的模态以及对代理平台层的进一步投资。 这些模型可在llama.meta.com和Hugging Face上下载,并准备在广泛的合作伙伴平台生态系统中立即开发,包括AWS、NVIDIA、Databricks、Groq、戴尔、Azure、Google Cloud和Snowflake。 Meta生成式人工智能工作的负责人Ahmad Al-Dahle在X上的一篇帖子中写道,“借助NVIDIA AI Foundry中的Llama 3.1,我们将看到企业可以使用世界上最好的开源人工智能模型轻松创建定制人工智能服务。”

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文