Can We Expect all Pervasive Machine Translation in 2012?
2011 witnessed much development in the area of Machine Translation, and at Milengo we predict that it will continue in 2012. The 3 MT models: Statistical MT, non-customizable MT (à la Google) and rules based MT will likely continue to be used. However, MT for business will increasingly come to mean customizable Statistical Machine Translation.
MT for Consumers.
MT in the consumer market has the potential to become a universal technology since more people now have constant internet access due to the prevalence of smart phones. GPS and Google Maps have already established location-based services as a ubiquitous technology, enabling a host of innovative applications and services, from simple apps in gadgets to truly global systems. So will machine translation become a universal service as well?
Unless English really does become THE world language, consumers will need multilingual text/voice interfaces on their gadgets and for their apps. Even on the Web, the dominance of English is diminishing whilst demand for free on-the-fly machine translation is growing. On-the-fly MT can become a great facilitator of truly global Web 2 and community services, whilst also offering a cost-efficient solution for multilingual communication within, and between, global businesses (translation of chat logs and emails “at Internet speed”).
Couple this with reliable speech recognition and synthesis (“Siri” on the iPhone and GT’s “listen” button gives us some indication that the technology is already there), as well as the deployment of artificial intelligence “in the cloud”, and we will be able to communicate ever more with people and software in many languages.
Is MT-enabled multilingual communication going to become just as prevalent as GPS-enabled services and applications? Well there are serious privacy concerns when all our communication is processed by an invisible translation engine, but then the same concerns apply to GPS-enabled location-awareness (all our movements can be tracked).
Another factor limiting the widespread use of this kind of technology is the cost. Taking your cell phone abroad and using it to translate real-time using voice input requires a lot of data to be sent back and forth, which invariably incur costs, especially when abroad. So until roaming costs for smart phones are reduced to an acceptable level, we probably won’t see the widespread adoption of real-time MT as a translation tool for the masses.
MT for Business.
Traditionally, the translation industry has served 2 communication models. Business-to-business (B2B) communications in different languages between (or within) businesses, and business-to-consumer (B2C) communications, both of which involve the localization of user documentation, web content, software, marketing collateral, product labeling etc. into many languages.
While there are currently no established business models, MT can also facilitate consumer-to-business (C2B) communications in the customers’ native languages directly to the business, instead of going through local offices, reseller networks, etc. (e.g. user/market feedback, customer/product support requests). Consumer-to-consumer (C2C) is another model, enabling multilingual communications between consumers on user forums and communities, other community initiatives, or multilingual Web2 in general.
The Giants (Google, Microsoft, Apple…) will cover C2C needs with pervasive cloud-based services, with MT (and possible AI) running in the background to connect Web 2 users, gadgets, apps and global services and to reduce language barriers. They will also offer B2C and C2B (market intelligence) solutions for businesses which will become commonplace within the market.
Community initiatives have been essential to improving open-source MT solutions (e.g. Moses). Some seek to serve C2C on a community basis (e.g. itranslate4.eu). It is unreasonable to expect overall standard adoption of MT without such community initiatives – A David with brainpower can be just as important in shaping the future as a Goliath with market power. However, there is little direct funding for this meaning that development is likely to be slow.
IT businesses with access to the appropriate technological resources will be able to leverage cloud-based MT like Google Translate API v2, and build translation portals and workflow solutions using open-source or relatively cheap software available on the Internet, integrating MT into their business processes.
The Language Industry.
LSPs like Milengo will continue to gradually integrate MT into traditional B2B and B2C processes. For example, software UI localization could be an easy target for MT (there are even solutions to translate software UI on-the-fly into languages not covered by localization). Customized MT engines will be developed for very specific domains e.g. technical, automotive, medical, and will be combined with extensive, pre-translated product terminology and specific language pairs (possibly merging RBMT with SMT for languages with complex grammar or Asian languages).
Companies that require a secure, controlled MT environment for confidentiality and data protection will be served well by LSPs that offer MT with post-editing as an alternative or complementary approach to “Translation-only” customer needs.
New Business Models.
C2C does not seem to be a viable service for LSPs, since it can easily be covered by free cloud-based services from “giants” and (for specific domains/languages) by community initiatives. C2B (translation of customer-generated multilingual content for businesses) could be interesting as a new revenue stream.
LSP added value:
LSPs will ramp up promotion of MT and the value proposition will be further defined to cover:
- customized MT engines
- that are operated in a closed environment (confidentiality)
- seamless integration of optional human linguist input
- post-editing and QA
- translation (for critical content)
The biggest obstacle facing LSPs that wish to develop MT solutions is a lack of adequate bilingual corpora to train SMT engines. With high volumes of customer-generated content, it also becomes difficult to identify exactly what needs to be translated and businesses most probably need aggregated/filtered results. This could be based on frequency of keywords (potential AdWords) in customer-generated content. This also lends itself nicely to patterns and trends for business intelligence (data mining) to identify a hierarchy of needs, rather than translating everything in one go.
An MT solution is more viable where product terminology is well-defined and available translation memories from former localization projects can be used to train the MT engine. MT could enable more centralized customer support instead of relying on local offices or reseller networks. It could also be used to analyze and evaluate customer support information globally and help develop solutions to product issues reported by the customers themselves (e.g. on user forums).
MT is a big topic and set to grow. It will continue to cause controversy amongst translators who are concerned for their livelihoods, whilst it is viewed cautiously by businesses worried about cost and quality. However, we are now entering a period of development where the advantages are beginning to outweigh the disadvantages.
More and more business cases will be identified for the use of MT; on the consumer side advances in mobile technology and reductions in cost of ownership will put MT in the hands of more people than ever before. Will there be a watershed moment where MT becomes as ubiquitous as Google Maps? Maybe, but one thing is for certain, MT will play an even more important role in 2012 than it ever has, and we’re excited to see what the future brings.
- Localization Trends: 2011 review Part 2 (milengo.com)
- Statistical Machine Translation: Building Your Own Viable Engine (milengo.com)