Lemmatization and factors being used to prepare training data for a term-aware MT model.

Term-Aware Machine Translation

Businesses requesting translations like to see consistency in the tone and terminology used. However, when many different translators work to translate documents for the business, it is difficult to ensure that translations remain consistent over time. While the use of NMT can improve translation speed for businesses, fixing inconsistent terminology represents a significant portion of the post-editing effort for translators. A robust NMT system should be able to incorporate term translations from human-curated term banks as guidance, to ensure more consistent translations from the start.

In this talk, that I made for a friend who runs a Belarus NLP meetup from NLProc.by, I presented:

  • How term banks are used in the typical translation industry workflow
  • Challenges and methods for identifying terms from the term bank in the source sentence
  • Methods for incorporating term guidance in the NMT system, agnostic to the specific term bank used at inference time

The recording of the presentation which was made in the public meetup can be viewed on YouTube:

Written on April 25, 2020. See a bug? Contact me on social (below).