On December 16 last year, the General Court of the European Union determined that the European Commission (E.C.) had violated Systran’s intellectual property and awarded the company €12.01 million. The E.C. promptly shut down its Systran-powered machine translation (MT) portal, posting the following message: “We regret to inform you that, following the ruling of the General Court of 16 December 2010, the service has been temporarily stopped. We are analysing alternatives allowing to restore the service as soon as this is possible.” European Commission staff members with critical translation needs were encouraged to contact the Directorate General for Translation’s Demand Management Unit for alternatives.
Unfortunately, those E.C. alternatives won’t include the latest technology dedicated to the translation of European Union legal documents into 56 language pairs announced by the Commission’s former supplier – unless staff members sign up for an account. This new prototype of Systran’s hybrid MT engine was developed in conjunction with the University of Edinburgh’s Philip Koehn, whose expertise went into building the EuroMatrix MT project between all European languages and Asia Online’s statistical MT engine. The project’s goal is “to build the best translation engines for the 462 possible language combinations of the European Union.”
The prototype is built on the Systran Enterprise 7 Server and previews some features that will show up in the company’s upcoming software-as-a-service (SaaS). Its user interface supports the ability to create and manage translation projects in a simple online workflow, allows the leverage of existing dictionaries and translation memories in the process, and supports the export of translation data in TMX format.
The new Systran hybrid engine was trained on the Acquis Communautaire corpus, which comprises the entire body of European Union legislation, including all its treaties, regulations, and directives. The corpus is available online and is available through a project of the Joint Research Center of the European Commission. Bilingual dictionaries were also automatically created from the Acquis Communautaire, yielding 20,000+ entries for each language pair. These dictionaries are part of the new engine’s hybrid translation process.
We spoke with Systran CEO Dimitris Sabatakakis who told us that his team was hard at work developing another prototype engine based on the European Parliament corpus. The value that he sees offering the market is “the ability to develop company-specific and domain-specific translation engines with high levels of domain-tuned quality.” As such, anyone signing up to try out the new Systran engine had better have an interest in E.U. legal documents, but Sabatakakis says that future instances of the engine will offer similar quality for other domains.
What does this prototype mean for the market? As we have note in earlier posts, the rules-based MT engines have been adding statistical modules to offer hybrid solutions, supporting application programming interfaces, and improving their user interfaces. What Systran has done with this release is demonstrate both its new engine’s ability to be tuned to specific domains and its SaaS credentials, although the first instance won’t get the usage that it might have seen prior to December 16th, 2010. In any case, this new Systran prototype shows that there is plenty of room for competition and innovation in the MT sector.