Terminology can be extracted either manually, by highlighting words on documents and transferring them to a program, such as Word or Excel, or automatically, by using terminology extraction tools.
In DejaVu, for example, in Lexicon (that’s how their tool is called) you have the choice to select phrases containing 2, 3, 4, or 5 words that would probably indicate that it has identified a term. Then Lexicon allows you to export those phrases into Excel, where you can filter them and see which ones are an actual term. That is basically what an extraction tool does. It is not as hard as you would imagine. The only problem is that, for long documents, you will get a lot of “hits”, so you have to be patient and take some time to review the results.
The Wiki gives a comprehensive list of terminology extraction tools. I have only personally used DejaVu and Trados Extract and Xbench. Some of the following tools have been mentioned in a post by the Linkedin Terminology Group about what translators and terminologists were using as an extraction tool. Please note that the commentaries were taken from their owner’s websites and are not a personal assessment.
Acrolinx – (Commercial) The foundation of Acrolinx’s terminology management system is a central database that stores terminology, keywords, trademarks, brands, and other words and phrases that are specific to your organization. Webside provides useful webinars on how to use it, as well as a section with an overview about terminology management.
ApSIC Xbench (Commercial) With Xbench you are just a hotkey away from your terminology. Just load your bilingual references on Xbench and press Ctrl+Alt+Insert from any Windows application when you want to find a term.
CrossTerm by Across (Commercial) is the terminology system of Across. It facilitates the maintenance and use of a consistent corporate terminology and of digital dictionaries. You can store preferred terms along with synonyms, definitions, images, usage and grammar information, etc. The storage of so-called do-not-use words prevents the use of terms to be avoided in company texts for either technical or marketing reasons.
FiveFilters (free). Term Extraction from FiveFilters.org is a free software project to help you extract terms (e.g. for use as tags) through a web service. Given some text it will return a list of terms with (hopefully) the most relevant first. Terms can be returned in a variety of formats. The application is intended to be a simple, free alternative to Yahoo’s Term Extraction service.
Lexicon, DejaVu (Atril) (A review here) (Commercial) Atril provides a 5 minute video on how to use it.
MultiCorpora‘s Terminology Management System, (Commercial) a tool of their software MultiTrans Prism, is easy to use, fully TBX (industry standard format for terminology data exchange) compliant, and has many automated features that save you time and money. When integrated with MultiTrans Prism‘ suite of tools, it can ensure terminology consistency at the start of a translation project and check completed translations for fidelity to approved vocabulary. This eliminates the factor of human error in maintaining terminology consistency.
Multifultor, by Rolf Keller, is a timesaving program for looking up terminology in dictionaries and almost any data source which you can use to look up a word or phrase. Multifultor can search websites, dictionaries, files and the Windows Index as data sources.
MultiTerm Extract, Trados (SDL) (Commercial). Scribb provides an easy to use guide. Read their tips and tricks here.
qterm by Kilgray (memoQ) (commercial) is a full-fledged browser-based terminology management system that connects directly to the memoQ server. Using qTerm, companies and organizations can turn their terminology into a corporate asset that facilitates internal and external communication, increases brand awareness, improves the quality of technical communication and cuts the costs of misunderstanding.Similis (Download here) – Free. Similis is a Translation Memory (TM) program of French origin, supporting English, German, French, Italian, Spanish, Portuguese and Dutch. It includes a linguistic analysis
engine to break down segments into chunks and generate corresponding Term Bases (TB) or glossaries.
SynchroTerm (Terminotix) (Commercial) s a powerful tool for extracting terms and efficiently creating terminology records from source and target document pairs, bitexts and translation memories. Its user-friendly windows allow you to perform sophisticated extraction, search and context checking functions. With SynchroTerm, your translation archives become a gold mine from which you can quickly extract terminology.
Syn-Tactic (Free) Syn-Tactic technology will help you to reduce your overall translation costs and cycle time by the extraction of relevant terminology from your technical documentation and use it to feed an advanced machine translation system which will pre-translate your documentation in a shorter time, more cost-efficient and in a more consistent manner.
T-Manager by Rafael Guzman. The tool is made of Excel macros that you can run to validate, diagnose, customize, compare glossaries, do aligned comparisons, leverage and extract terminology, and generate reports.
TaaS (Terminology as a Service) – Free; Beta. Search terminology in various sources. Identify term candidates in your documents and extract them automatically. Look up translation candidates in various sources. Refine and approve terms and their translations. Share your terminology with other users. Collaborate with your friends & colleagues. Use your terminology in other working environment. Watch their 3 minute intro in youtube.
TerMine (Free) by The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. NaCTeM is operated by the University of Manchester.
Webterm by Star TS (Free) gives you the power to manage and update your terminology on a global basis. Teams across the globe can communicate their views and implement terminology revisions, making the process faster, cheaper and more efficient. Changes to terminology are available instantly around the world.TermStar Transit NXT (Commercial) is a comprehensive system for terminology use and administration. Importing and exporting terminology, printing dictionaries, searching for terms (including their declined and conjugated forms) as well as other functions will fully meet the needs of a terminologist. TermStar is available either as an independent application or as an integrated part of Transit.
UniTerm by Acolada has been designed to create, to edit and to manage professionally corporate terminologies and special language dictionaries. Developed on the requirements of today´s dictionary systems and of translation tool´s terminology management systems, UniTerm offers a unique set of funcionalities. Read more on the product sheet.
Read the blog WordLo for more useful tools (free). I also recommend you join for Linkedin’s Terminology Group and read more about tools.
“10 things you should know about automatic terminology extraction” A guest post by Uwe Muegge at Lingua Greca’s blog.
术语提取既可以通过突出显示文档上的单词并将其传输到程序(如Word或Excel)来手动提取,也可以通过使用术语提取工具来自动提取。
例如,在DejaVu中,在Lexicon(这是他们的工具的名称)中,您可以选择包含2、3、4或5个单词的短语,这些单词可能表明它已经识别了一个术语。然后Lexicon允许您将这些短语导出到Excel中,在Excel中您可以筛选它们并查看哪些是实际的术语。这就是提取工具的基本功能,并没有想象的那么难。唯一的问题是,对于较长的文档,你会得到很多“项”,所以你必须要有耐心,花点时间来审核结果。
Wiki提供了一个术语提取工具的全面列表。我个人只使用过DejaVu、Trados Extract和Xbench。领英术语圈在一篇文章中提到了以下一些工具,这篇文章是关于译者和术语学家使用什么作为提取工具的。请注意,评论摘自其网站,并不是个人评价。
Acrolinx(商业):Acrolinx术语管理系统的基础是一个中央数据库,用于存储术语、关键字、商标、品牌和其他特定于您的组织的单词和短语。Webside提供了关于如何使用它的有用的网络研讨会,以及关于术语管理的概述部分。
ApSIC Xbench(商业):有了Xbench,您只需一个热键就可以获取您的术语了。只需在Xbench上加载双语参考资料,当您想要查找术语时,从任何Windows应用程序中按Ctrl+Alt+Insert即可。
CrossTerm by crosss(商业):是Across的术语系统。它便于维护公司术语和数字词典及保证使用的一致性。您可以存储首选术语以及同义词、定义、图像、用法和语法信息等。存储所谓的“不使用”词汇可以防止由于技术或营销原因而在公司文本中避免使用术语。
FiveFilters(免费)。从FiveFilters.org中提取术语是免费的软件项目,帮助您通过网络服务提取术语(例如用作标记)。给它某个文本后,它将先返回最相关的术语列表。术语可以以多种格式返回。该应用程序旨在成为雅虎术语提取服务的简单免费的替代方案。
Lexicon,DejaVu(Atril)(商用)Atril提供了一个5分钟的使用教程视频。
Multicorpora术语管理系统(商业)是其软件Multictrans Prism的工具,易于使用,完全符合TBX(术语数据交换行业标准格式),并且具有许多自动化功能,可为您节省时间和金钱。当与MultiTrans Prism的工具套件集成时,它可以在翻译项目开始时确保术语的一致性,并检查已完成的翻译是否符合批准的术语。这消除了在保持术语一致性方面的人为错误因素。
Multifultor,由Rolf Keller开发,是一个节省时间的程序,用于在字典和几乎任何可以用来查找单词或短语的数据源中查找术语。Multifultor可以搜索网站、字典、文件、Windows索引等数据源。
MultiTerm Extract,Trados(SDL)(商用)。Scribb提供了一个易于使用的指南。在这里阅读他们的提示和诀窍。
Kilgray开发的qterm(memoQ)(商用):是一个成熟的基于浏览器的术语管理系统,它直接连接到memoQ服务器。使用qTerm,公司和组织可以将他们的术语转化为企业资产,从而促进内部和外部交流,提高品牌知名度,改善技术交流的质量并减少误解的成本。Similis(免费):Similis是一个源自法国的翻译记忆(TM)程序,支持英语、德语、法语、意大利语、西班牙语、葡萄牙语和荷兰语。它包括语言分析引擎将段分解成块并生成相应的术语库(TB)或词汇表。
SynchroTerm(Terminotix)(商用)是一个强大的工具,用于从源语和目的语文档对、双语文本和翻译存储器中提取术语并有效地创建术语记录。其用户友好的窗口允许您执行复杂的提取、搜索和上下文检查功能。有了SynchroTerm,您的翻译档案就成了一座金矿,您可以从中快速提取术语。
Syn-Tactic(免费):Syn-Tactic技术通过从技术文档中提取相关术语,帮助您降低总体翻译成本和周期时间,并将其用于先进的机器翻译系统,该系统将在更短的时间内,以更低的成本、更一致的方式预翻译您的文档。
Rafael Guzman开发的T-Manager:该工具由Excel宏组成,您可以运行这些宏来验证、诊断、自定义、比较词汇表、进行对齐比较、利用和提取术语以及生成报告。
TaaS(Terminology as a Service)(免费):在各种来源中搜索术语。确定文档中的候选术语并自动提取它们。在各种来源中查找候选翻译,完善和批准术语及其翻译。您还可以与其他用户共享您的术语,或与您的朋友、同事协作。在YouTube上观看他们3分钟的介绍。
TerMine(Free)由英国国家文本挖掘中心(NaCTeM)开发,是世界上第一个由政府资助的文本挖掘中心。我们提供文本挖掘服务,以响应英国学术界的要求。NaCTeM由曼彻斯特大学运营。
由Star TS开发的Webterm(免费)使您能够在全球范围内管理和更新您的术语。全球各地的团队可以交流他们的观点并实施术语修订,从而使过程更快速、更便宜和更高效。
Termstar Transit NXT(商用)是一个术语使用和管理的综合系统。该工具具有导入和导出术语、打印词典、搜索术语(包括其变形)以及其他功能,这能完全满足术语学家的需要。TermStar既可以作为一个独立的应用程序,也可以作为Transit的集成部分。
Acolada开发的UniTerm旨在创建、编辑和专业管理公司术语和特殊语言词典。UniTerm是根据当今词典系统和翻译工具的术语管理系统的需求开发的,它提供了一组独特的功能。更多信息请参见产品说明书。
阅读博客WordLo以获取更多有用的工具(免费)信息。我还建议您加入领英(Linkedin)术语圈,阅读更多有关工具的信息。
Uwe Muegge在Lingua Greca博客上发表的文章《关于自动术语提取你应该知道的10件事》。
以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。
阅读原文