The Issue of Data Security and Machine Translation

数据安全与机器翻译问题

2019-12-13 19:10 SDL blog

本文共2355个字,阅读需24分钟

阅读模式 切换至中文

As the world becomes more digital and the volume of mission-critical data flows continue to expand, it is becoming increasingly important for global enterprises to adapt to the rapid globalization, and increasingly digital-first world we live in. As organizations change the way they operate, generate revenue and create value for their customers, new compliance risks are emerging — presenting a challenge to compliance, which must proactively monitor, identify, assess and mitigate risks like those tied to fundamentally new technologies and processes. Digital transformation is driven and enabled by data, and thus the value of data security and governance also rise in importance and organizational impact. At the WEF forum in Davos, CEOs have identified cybersecurity and data privacy as two of the most pressing issues of the day, and even regard breakdown with these issues as a general threat to enterprise, society, and government in general. While C-level executives understand the need for cybersecurity as their organizations undergo digital transformation, they aren’t prioritizing it enough, according to a recent Deloitte report based on a survey of 500 executives. The report, “The Future of Cyber Survey 2019,” reveals that there is a disconnect between organizational aspirations for a “digital everywhere” future, and their actual cyber posture. Those surveyed view digital transformation as one of the most challenging aspects of cyber risk management, and yet indicated that less than 10% of cyber budgets are allocated to these digital transformation efforts. The report goes on to say that this larger cyber awareness is at the center of digital transformation. Understanding that is as transformative as cyber itself—and to be successful in this new era, organizations should embrace a “cyber everywhere” reality. Cybersecurity breakdowns and data breach statistics Are these growing concerns about cybersecurity justified? It certainly seems so when we consider these facts:A global survey in 2018 by CyberEdge across 17 countries and 20 industries found that 78% of respondents had experienced a network breach.The ISACA survey  of cybersecurity professionals points out that it is increasingly difficult to recruit and retain technically adept cybersecurity professionals. They also found that 50% of cyber pros believe that most organizations underreport cybercrime even if they are required to report it, and 60% said they expected at least one attack within the next year.Radware estimates that an average cyber-attack in 2018 costs an enterprise around $1.67M. The costs can be significantly higher, e.g. a breach at Maersk is estimated to have cost around $250 - $300 million, because of the brand damage, loss of productivity, loss of profitability, falling stock prices, and other negative business impacts in the wake of the breach.Risk Based Security reports that there were over 6500 data breaches and that more than 5 billion records were exposed in 2018. The situation is not better in 2019, and over 4 billion records were exposed in the first six months of 2019.An IBM Security study revealed that the financial impact of data breaches on organizations. According to this study, the cost of a data breach has risen 12% over the past 5 years and now costs $3.92 million on average. The average cost of a data breach in the U.S. is $8.19 million, more than double the worldwide average.As would be expected, with Hacking as the top breach type, attacks originating outside of the organization were also the most common threat source. However misconfigured services, data handling mistakes and other inadvertent exposure by authorized persons, exposed far more records than malicious actors were able to steal. Data security and cybersecurity in the legal profession Third party professional services firms are often a target for malicious attacks because the possibility of acquiring high value information is high. Records show that law firms relationships with third-party vendors are a frequent point of exposure to cyber breaches and accidental leaks. Law.com obtained a list of more than 100 law firms that had reported data breaches and estimate that even more are falling victim to this problem, but simply don’t report it to avoid scaring clients and minimize potential reputational damage.Austin Berglas, former head of the FBI’s cyber branch in New York and now global head of professional services at cybersecurity company BlueVoyant, said law firms are a top target among hackers because of the extensive high-value client information they possess. Hackers understand that law firms are a “one-stop shop” for sensitive and proprietary corporate information, merger & acquisitions related data, and emerging intellectual property information.As custodians of highly sensitive information, law firms are inviting targets for hackers.The American Bar Association reported in 2018 that 23% of firms had reported a breach at some point, up from 14% in 2016. Six percent of those breaches resulted in the exposure of sensitive client data. Legal documents have to pass through many hands as a matter of course, reams of sensitive information pass through the hands of lawyers and paralegals, and then they go through the process of being reviewed and signed by clients, clerks, opposing counsels, and judges. When they finally get to the location where records are stored, they are often inadvertently exposed to others—even firm outsiders—who shouldn’t have access to them at all.A Logicforce legal industry score for cybersecurity health among law firms has increased from 54% in 2018 to 60% in 2019, but this is still lower than many other sectors. Increasingly clients are also asking for audits to ensure that security practices are current and robust. A recent ABA Formal Opinion states: “Indeed, the data security threat is so high that law enforcement officials regularly divide business entities into two categories: those that have been hacked and those that will be.”Lawyers are failing on cybersecurity, according to the American Bar Association Legal Technology Resource Center’s ABA TechReport 2019. “The lack of effort on security has become a major cause for concern in the profession.”“A lot of firms have been hacked, and like most entities that are hacked, they don’t know that for some period of time. Sometimes, it may not be discovered for a minute or months and even years.” Vincent I. Polley, a lawyer and co-author of a recent book on cybersecurity for the ABA.As the volume of multilingual content explodes, a new risk emerges: public, “free” machine translation provided by large internet services firms who systematically harvest and store the data that passes through these “free” services.  With the significantly higher volumes of cross-border partnerships, globalization in general, and growth in international business, employee use of public MT has become a new source of confidential data leakage. Public machine translation use and data security In the modern era, it is estimated that on any given day, several trillion words are run through the many public machine translation options available across the internet today. This huge volume of translation is done largely by the average web consumer, but there is increasing evidence that a growing portion of this usage is emanating from the enterprise when urgent global customer, collaboration, and communication needs are involved. This happens because publicly available tools are essentially frictionless and require little “buy in” from a user who don’t understand the data leakage implications.  The rapid rate of increase in globalization has resulted in a substantial and ever growing volume of multilingual information that needs to be translated instantly as a matter of ongoing business practice. This is a significant risk for the global enterprise or law firm as this short video points out. Content transmitted for translation by users is clearly subject to terms of use agreements that entitle the MT provider to store, modify, reproduce, distribute, and create derivative works. At the very least this content is fodder for machine learning algorithms that could also potentially be hacked or expose data inadvertently.Consider the following:At the SDL Connect 2019 conference recently, a speaker from a major US semiconductor company described the use of public MT at his company. When this activity was carefully monitored by IT management, they found that as much as 3 to 5 GB of enterprise content was being cut and pasted into public MT portals for translation on a daily basis. Further analysis of the content revealed that the material submitted for translation included future product plans, customer problem related communications, sensitive HR issues, and other confidential business process content.In September, 2017, the Norwegian news agency NRK reported data that they found that had been free translated on a site called Translate.Com that included “notices of dismissal, plans of workforce reductions and outsourcing, passwords, code information and contracts”. This was yet another site that offered free translation, but reserved the right to examine the data submitted “to improve the service.” Subsequently, searches by Slator uncovered other highly sensitive data of both personal and corporate content.A recent report from the Australian Strategic Policy Institute (ASPI) makes some claims about how China uses state-owned companies, which provide machine translation services, to collect data on users outside China. The author, Samantha Hoffman, argues that the most valuable tools in China’s data-collection campaign are technologies that users engage with for their own benefit; machine translation services being a prime example. This is done through a company called GTCOM, which Hoffman said describes itself as a “cross-language big data” business, offers hardware and software translation tools that collect data — lots of data. She estimated that GTCOM, which works with both corporate and government clients, handles the equivalent of up to five trillion words of plain text per day, across 65 languages and in over 200 countries. GTCOM is a subsidiary of a Chinese state-owned enterprise that the Central Propaganda Department directly supervises, and thus data collection is presumed to be an active and ongoing process.After taking a close look at the enterprise market needs and the current realities of machine translation use we can summarize the situation as follows:There is a growing need for always-available, and secure enterprise MT solutions to support the digitally-driven globalization that we see happening in so many industries today. In the absence of having such a secure solution available, we can expect that there will be substantial amounts of “rogue use” of public MT portals with resultant confidential data leakage risks.The risks of using public MT portals are now beginning to be understood. The risk is not just related to inadvertent data leakage, but is also closely tied to the various data security and privacy risks presented by submitting confidential content into the data-grabbing, machine learning infrastructure, that underlie these “free” MT portals. There is a growing list of US companies already subjected to GDPR-related EU regulatory actions, including, Amazon, Apple, Facebook, Google, Netflix, Spotify and Twitter. Experts have stated that Chinese companies are likely to be the next wave of regulatory enforcement, and the violators list is expected to grow. The executive focus on digital transformation is likely to drive more attention to the concurrent cybersecurity implications of hyper-digitalization. Information Governance is likely to become much more of a mission-critical function as the digital footprint of the modern enterprise grows, and becomes much more strategic. The legal market requirement: an end to end solution Thus, we see today, having language translation at scale capabilities has become an imperative for the modern global enterprise.  The needs for translation can range from rapid translation of millions of documents in an eDiscovery or compliance scenario, to very careful and specialized translation of critical contract and court-ready documentation on to an associate collaborating with colleagues from a foreign outpost. Daily communications in global matters are increasingly multilingual. Given the volume, variety and velocity of the information that needs translation, legal professionals must consider translation solutions that involve both technology and human services. The requirements can vary greatly, and can require different combinations of man-machine collaboration, that include some or all of these different translation production models:MT-Only for very high volumes like in eDiscovery, and daily communicationsMT + Human Terminology OptimizationMT + Post-EditingSpecialized Expert Human Translation SDL Machine Translation: designed for the enterprise SDL is a leader in developing secure, private, scalable enterprise-ready MT technology that can be deployed on premise, or in a private cloud, and also provides related expert services to ensure optimal tailored deployment. SDL’s NLP technology team bench is deeper than any other in the translation industry and the company’s MT technology is used by the largest global enterprises in the world, as well as many governmental agencies focused on national security and intelligence gathering activities. From the outset, SDL has focused on developing enterprise-friendly capabilities that include the following:Guaranteed data security & privacyFlexible deployment options that include on premise, cloud or a combination of both as dictated by usage needsBroad range of adaptation and customization capabilities so that MT systems can be optimized for each individual clientIntegration with primary enterprise IT infrastructure and software e.g. Office, Translation Management Systems, Relativity and other eDiscovery platformsRest API that allows connectivity to any proprietary systems that you may employ. Broad range of expert consulting services both on the MT technology aspects and the linguistic issuesTightly integrated with professional human translation services to handle end-to-end translation requirements.SDL’s translation capabilities range from handling large eDiscovery litigation related projects using MT enhanced with expert developed client-specific glossaries and search terms to improve the ability to identify relevant documents, to specialized and expert human translation services for critical content. SDL’s secure translation supply chain solution provides an enterprise-class, vendor agnostic, secure translation platform that allows you to combine regulatory compliance and translation best practice. SDL has the most sophisticated and comprehensive end-to-end translation solution capabilities in the industry today, powered by over 1,400 in-house translators working closely with linguistic AI technology enables tools and technology.To find out how SDL can support your multilingual eDiscovery-related data processing and translation strategy, please visit our multilingual eDiscovery pages, which will provide more insight on what we can do for you. To find out more about the Relativity end-to-end translation solutions capabilities look here, or watch this short video to learn more about how SDL can help.
随着世界变得越来越数字化,任务关键型数据流的数量不断扩大,全球企业适应快速全球化和我们生活的数字第一世界变得越来越重要。随着组织改变运营方式、创收并为客户创造价值,新的合规风险正在出现——这对合规构成挑战,必须主动监控、识别、评估和降低与基本新技术和流程相关的风险。数字转换是由数据驱动和启用的,因此数据安全和治理的价值也会提高重要性和组织影响。在达沃斯的世界经济论坛( WEF )论坛上,首席执行官们将网络安全和数据隐私视为当今最紧迫的两个问题,甚至将这些问题的崩溃视为对企业、社会和政府的总体威胁。德勤( Deloitte )最近一份基于对500名高管的调查的报告显示,尽管 C 级高管明白,随着企业进行数字化转型,他们需要网络安全,但他们对网络安全的重视不够。这份报告《2019年网络调查的未来》( The Future of CyberSurvey 2019)揭示,组织对“数字化无处不在”未来的渴望与他们实际的网络姿态之间存在脱节。接受调查的人认为,数字转型是网络风险管理中最具挑战性的方面之一,但他们表示,这些数字转型努力的网络预算中只有不到10%。报告还说,这种更大的网络意识是数字化转型的中心。认识到这与网络本身一样具有变革性——要在这个新时代取得成功,组织应该接受“网络无处不在”的现实。 网络安全细目和数据违规统计 这些对网络安全日益增长的担忧是否合理?当我们考虑这些事实时,情况的确如此: CyberEdge 在2018年对17个国家和20个行业进行的一项全球调查发现,78%的受访者经历了网络漏洞。ISACA 对网络安全专业人员的调查指出,招募和留住技术熟练的网络安全专业人员越来越困难。他们还发现,50%的网络专业人士认为,大多数组织即使被要求报告网络犯罪,也没有充分报告网络犯罪。60%的人表示,他们预计明年至少会有一次网络犯罪。Radware 估计,2018年网络攻击的平均成本约为167万美元。成本可能显著较高,例如,由于品牌受损、生产力损失、盈利能力丧失、股价下跌以及违约后其他负面业务影响,估计 Maersk 的违约成本约为2.5亿至3亿美元。基于风险的安全报告显示,2018年有超过6500起数据泄露事件,超过50亿条记录曝光。2019年的情况并不好,2019年前六个月有超过40亿的记录曝光。IBM 的一项安全研究揭示了数据泄露对组织的财务影响。根据这项研究,在过去5年中,数据泄露的成本上升了12%,现在平均成本为392万美元。美国数据泄露的平均成本为819万美元,是全球平均水平的两倍多。正如预期的那样,黑客攻击是最常见的攻击类型,来自组织之外的攻击也是最常见的威胁来源。然而,由于服务配置错误、数据处理错误和授权人员的其他意外暴露,暴露的记录远远超过恶意参与者能够窃取的记录。 法律专业的数据安全和网络安全 第三方专业服务公司往往是恶意攻击的目标,因为获取高价值信息的可能性很高。记录显示,律师事务所与第三方供应商的关系经常暴露于网络漏洞和意外泄漏。法律。com 获得了一份100多家律师事务所的名单,这些律师事务所报告了数据泄露事件,并估计更多的律师事务所正在成为这一问题的受害者,但只是没有报告这一问题,以避免吓跑客户,并尽量减少潜在的声誉损害。美国联邦调查局( FBI )纽约网络部门前负责人、网络安全公司 BlueVoyant 全球专业服务主管奥斯丁•伯格拉斯( Austin Berglas )表示,由于拥有大量高价值客户信息,律师事务所是黑客攻击的首要目标。黑客了解到,律师事务所是敏感和专有公司信息、并购相关数据以及新兴知识产权信息的“一站式商店”。作为高度敏感信息的保管者,律师事务所正在邀请黑客攻击目标。2018年,美国律师协会( American Bar Association )报告称,23%的公司在某个时候报告了违规行为,高于2016年的14%。6%的违规导致敏感客户数据的泄露。当然,法律文件必须通过许多人的手,大量敏感信息通过律师和律师助理的手,然后他们通过审查和签署的过程,由客户,办事员,反对的顾问和法官。当他们最终到达存储记录的地点时,他们往往会不经意地接触到其他人——甚至是公司外部人士——他们根本不应该接触到这些记录。律师事务所对网络安全健康的逻辑力法律行业得分从2018年的54%上升到2019年的60%,但仍低于许多其他行业。越来越多的客户也要求审计,以确保安全实践是最新的和稳健的。ABA 最近的一份正式意见指出:“事实上,数据安全威胁如此之高,以至于执法官员经常将商业实体分为两类:被黑客攻击的实体和将被黑客攻击的实体。”根据美国律师协会法律技术资源中心的 ABA TechReport 2019,律师在网络安全方面的失败。“缺乏对安全的努力已成为该行业关注的主要原因。”“很多公司都遭到了黑客攻击,就像大多数被黑客攻击的实体一样,他们在一段时间内不知道这一点。有时,可能在一分钟、几个月甚至几年内都无法发现。”Vincent I.Polley 是一位律师,也是 ABA 网络安全新书的合著者。随着多语言内容的数量激增,出现了一种新的风险:大型互联网服务公司提供的公共“免费”机器翻译,这些公司系统地收集和存储通过这些“免费”服务传递的数据。随着跨境合作、全球化和国际业务的增长,员工使用公共 MT 已成为机密数据泄漏的新来源。 公共机器翻译使用与数据安全 在现代时代,据估计,在任何一天,几万亿字通过许多公共机器翻译选项运行在今天的互联网上。这种大量的翻译主要是由普通的网络用户完成的,但是越来越多的证据表明,当涉及到紧急的全球客户、协作和通信需求时,这种使用越来越多地来自企业。这是因为公开可用的工具基本上是无摩擦的,并且不需要从不了解数据泄漏影响的用户那里“购买”。全球化的迅速增长导致了大量和不断增长的多种语文信息,需要作为一种持续的业务做法立即加以翻译。正如这段视频所指出的,这对全球企业或律师事务所来说是一个重大风险。用户传输用于翻译的内容显然受使用协议条款的约束,这些条款使 MT 提供商有权存储、修改、复制、分发和创建衍生作品。至少,这个内容是机器学习算法的素材,这些算法也可能被潜在地被黑客攻击或无意中暴露数据。考虑以下几点:最近在 SDL Connect 2019会议上,一家美国主要半导体公司的发言人介绍了他的公司使用公共 MT 的情况。当 IT 管理层仔细监控此活动时,他们发现多达3-5GB 的企业内容正在被剪切并粘贴到公共 MT 门户中进行日常翻译。对内容的进一步分析表明,提交翻译的材料包括未来的产品计划、客户问题相关的沟通、敏感的人力资源问题以及其他机密的业务流程内容。2017年9月,挪威通讯社 NRK 报道了他们发现在一个名为 Translate 的网站上免费翻译的数据。Com 包括“解雇通知、裁员计划和外包、密码、代码信息和合同”。这是另一个提供免费翻译的网站,但保留审查“改进服务”提交的数据的权利。随后, Slator 的搜索发现了其他高度敏感的个人和企业内容数据。澳大利亚战略政策研究所( ASPI )最近的一份报告称,中国如何利用提供机器翻译服务的国有企业收集中国境外用户的数据。作者萨曼莎·霍夫曼( Samantha Hoffman )认为,中国数据收集活动中最有价值的工具是用户为了自己的利益而使用的技术;机器翻译服务是主要的例子。这是通过一家名为 GTCOM 的公司完成的,霍夫曼说,该公司自称是一家“跨语言大数据”企业,提供收集数据的硬件和软件翻译工具——大量数据。她估计,与公司和政府客户合作的 GTCOM 每天处理相当于5万亿字的普通文本,涉及65种语言,在200多个国家。GTCOM 是中央宣传部直接监督的中国国有企业的子公司,因此数据收集被认为是一个积极和持续的过程。在仔细研究了企业市场需求和机器翻译使用的现状之后,我们可以总结出以下情况:现在越来越需要始终可用和安全的企业 MT 解决方案来支持数字化驱动的全球化,我们看到当今许多行业都在发生这种全球化。在没有这种安全解决方案的情况下,我们可以预期公共 MT 门户的大量“流氓使用”会导致机密数据泄露风险。现在开始了解使用公共 MT 门户的风险。风险不仅与无意中的数据泄漏有关,还与通过将机密内容提交到这些“免费” MT 门户的基础数据抓取、机器学习基础设施中所带来的各种数据安全和隐私风险密切相关。越来越多的美国公司已经在接受与 GDPR 相关的欧盟监管行动,包括亚马逊( Amazon )、苹果( Apple )、 Facebook 、谷歌( Google )、 Netflix 、 Spotify 和 Twitter 。专家们表示,中国企业很可能成为下一波监管执法浪潮,违规者名单预计将会增加。管理层对数字化转型的关注很可能会促使人们更加关注超数字化对网络安全的影响。随着现代企业的数字化足迹的增长,信息治理很可能成为一种任务关键型功能,并变得更具战略性。 法律市场要求:端到端解决方案 因此,我们今天看到,拥有大规模的语言翻译能力已成为现代全球企业的当务之急。翻译的需求可以从在 eDiscovery 或法规遵从性场景中快速翻译数百万份文档,到对关键合同和法庭准备的文档进行非常谨慎和专门的翻译,到与来自国外的同事合作。全球事务的日常交流越来越多地使用多种语言。鉴于需要翻译的信息数量、种类和速度,法律专业人员必须考虑涉及技术和人力服务的翻译解决方案。这些需求可能会有很大的差异,并且可能需要不同的人机协作组合,其中包括一些或所有这些不同的翻译生产模型: MT-Only 用于非常高的卷,如 eDiscovery 和日常通信 SDL 机器翻译:为企业设计 SDL 在开发安全的、私有的、可扩展的企业就绪 MT 技术方面处于领先地位,这些技术可以部署在现场或私有云中,并且还提供相关的专家服务,以确保最佳的定制部署。SDL 的 NLP 技术团队比翻译行业的任何其他团队都要深入,并且该公司的 MT 技术被世界上最大的全球企业以及许多专注于国家安全和情报收集活动的政府机构所使用。从一开始, SDL 就致力于开发企业友好的功能,包括以下功能:保证数据安全和 privacyFlexible 部署选项,其中包括云或两者的组合(根据使用需求而定)广泛的适应和定制功能,以便 MT 系统可以针对每个单独客户进行优化与主要企业 IT 基础架构和软件(如 Office 、翻译管理系统、 Relativity 和其他 eDiscovery 平台)的集成限制原料药(API),允许连接到您可能使用的任何专有系统。在 MT 技术方面和语言问题上提供广泛的专家咨询服务紧密结合专业的人工翻译服务来处理端到端翻译需求。SDL 的翻译功能包括使用 MT 处理大型 eDiscovery 诉讼相关项目,并通过专家开发的特定客户词汇表和搜索项增强 MT 的功能,以提高识别相关文档的能力,以及关键内容的专业和专家翻译服务。SDL 的安全翻译供应链解决方案提供企业级、供应商不可知、安全的翻译平台,使您能够结合法规遵从性和翻译最佳实践。SDL 拥有当今业界最先进、最全面的端到端翻译解决方案能力,拥有1400多名内部翻译人员,他们与语言 AI 技术密切合作,支持工具和技术。要了解 SDL 如何支持与多语言 eDiscovery 相关的数据处理和翻译策略,请访问我们的多语言 eDiscovery 页面,以了解我们可以为您做什么。要了解更多有关 Relativity 端到端翻译解决方案功能的信息,请查看此视频,了解 SDL 如何提供帮助。

以上中文文本为机器翻译,存在不同程度偏差和错误,请理解并参考英文原文阅读。

阅读原文