AUTOMATIC TRANSLATION BASED ON CONTEXT (English / Spanish)

Despite the amount of money invested in research and development, machine translation is far from being a real option for companies, even when you have a human reviewer behind the process (for example, King et al, 2003). In recent years, a new line of research on machine translation has been carried out by a group of researchers in New York (the company "Meaningful Machines"), whose preliminary results are very promising. The objective of the research is to develop this idea in relation to the design of an automatic translation system that has real value (which means that the results offered are usable at the company level [see Alpaca reports, 1996]) .

The originality of this method lies in the attention or focus in the context of the words in the texts, their origin and destination in the translation, so that the words are not translated into individual units, but always within a context, co -text, or n-gram previously decoded. Thus, for example, correcting morpho-syntactic formations such as gender (for example, the "Red House") do not occur because the system can apply a rules-based approach, but because, in contrast, the system builds a previous n-gram training that follows a statistical significance. In this case, the n-gram "the red house" is segmented and transformed into "red" and "house" in the dictionary, to be captured as "The Red House" in a huge Spanish corpus ( Statistically, this order is the most important, with 99.8%, more than "Red House").

The method serves to overcome obstacles in machine translation, often associated with syntax and discursive style. Therefore, the translation would not be a mere transfer of words and structures from one language to another, but of meanings and uses, consistent / coherent with the target language. Another example would be the passive voice in English (for example, "the house was built"), which could be correctly translated as "The house was built" by the translation systems of many machines. However, the CBMT application would be based on a set of statistics for the body of the results and obtain a more generalized use of "house" and "Build", producing the best option "the house was built", which would automatically derive from the effective processing of the massive corpus.

This system would even anticipate that an expression cannot be produced correctly by checking that there are no statistically sound options available in the corpus. As a result, in the statistical comparison, the system will return the signatures or contexts of the given expression and re-enact a different search so that the synonyms can be assigned. For example, if "put off the meeting" cannot be transferred as "postpone the meeting" because the dictionary does not transfer "put off" with this meaning, or because the corpus did not show that option, the system will search for other contexts for "the meeting "or by other words that precede phrasal verb. In this case, the tool would look for other options, such as "postpone", or "lead to", etc., for expression in the corpus.