Article Details
Global Watchtower
Common Sense Advisory Blogs
European Commission Releases Its Translation Memories -- Millions of Words in 231 Language Pairs
Posted by Renato S. Beninatto on January 21, 2008  in the following blogs: Translation and Localization
Pages | |

Anybody can now download over one million segments of multilingual Translation Memory in TMX format for the Acquis Communautaire -- the entire body of European legislation, including all the treaties, regulations, and directives adopted by the European Union (EU) and the rulings of the European Court of Justice.

The Directorate-General for Translation of the European Commission opened up to public access aligned sentences ("translation units") extracted from one of its large shared translation memories in Euramis (European Advanced Multilingual Information System). This memory contains most, but not all, of the documents of the Acquis Communautaire, plus some other documents which are not part of the Acquis.

How does this affect buyers and suppliers of language services and technology?
  • Shared corpus. Every international organization, private or public, generates multilingual content that is suited for the creation of parallel corpora. The most commonly used are the Hansard, drawn from the proceedings of the Canadian Parliament and the official records of the United Nations. The files made available by the European Commission are the first released in TMX format, which allows LSPs and freelancers to easily leverage the content in day-to-day translation.

  • Language combinations. Aligned corpora tend to cover only two languages. With 22 languages, millions of words, and a variety of subjects, this DGT database becomes useful to a broader audience and allows the development of automated translation tools for non-traditional language pairs like Danish and Polish.

  • Machine Translation. Aligned segments of text are the foundation for statistical machine translation (think Language Weaver and Google Translate).

  • Productivity. Smart LSPs will import these translation memories into their SDL Trados, Star Transit, or Atril DéjàVu databases and pre-process translations before allocating them to translators. The significance of this release is that now this leverage can be achieve in many more language pairs.
Assume that initiatives like the TM Marketplace, Lingotek, and Linguistic Data Consortium, will incorporate these files into their systems. Let's hope that other multilateral and multilingual organizations like NATO, PAHO, UNESCO, and the World Bank take the cue and release their aligned memories. But Common Sense Advisory believes that the day will come when commercial enterprises stop looking at translation memories as proprietary data and start sharing it with the language industry, just as some already share glossaries.


Post a Comment

Email address :(Your Email Address Will Not Be Displayed)

Your Comments
Enter Code given below :    

Link To This Page

Bookmark this page using the following link:http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=475&moduleId=391

Do you have a website? You can place a link to this page by copying and pasting the code below.
Keywords: Translation memory, Translation technologies

Refine Your Search
Skip Navigation Links.
Skip Navigation Links.

Terms of Use | Privacy Statement | Contact Us
Copyright © 2016 Common Sense Advisory, Inc. All Rights Reserved.