locked
Best practices for translating XML documents? RRS feed

  • Question

  • What are best practices for translating XML documents with MT Hub?

    And how can I export the translated document to validate the output markup?

    Right now (translating the documents as plain text), I'm getting:

    - spaces inserted into tags (<em> > < em >)

    - significant reordering of tags vs text

    - tag names are getting translated

    Examples follow (enu > ptb):

    <pair name="position" type="integer">1</pair>            < nome par = "cargo" type = "inteiro" > 1 < / par >

    <li>Go to the <em>Payment</em> section of your <a href="https://www.linkedin.com/secure/settings" target="_blank">Settings</a> page.</li>     

    < li > vá até a seção de < em > < /em > do pagamento da sua < a href = "https://www.linkedin.com/secure/settings" target = blank"> página de configurações de </a >. </li >

    <li>Click <em>Manage Billing</em> Info in the upper right corner.</li>      

    Clique em < li > < em > Gerenciar cobrança < /em > informações no canto superior direito canto. </li >

    <li>Click the <em>Edit</em> link and make your updates.</li>         

    Clique no link de < em > Editar < /em > de < li > e faça suas atualizações. </li >



    • Edited by MikeD_AMTA Wednesday, February 18, 2015 7:17 PM
    Wednesday, February 18, 2015 7:10 PM

Answers

  • Made an edit to my above answer. Important to consider the sentence breaking vs sentence internal nature of your XML elements.

    Chris Wendt
    Microsoft Translator

    Friday, February 20, 2015 4:42 PM

All replies

  • - Transform your XML document to XHTML, using an XSL transform. Choose the appropriate HTML elements for your untranslatable elements. Say <code> or <script> for code. Consider which elements are sentence internal and which ones are sentence ending. Use the appropriate HTML tags. Example: <title> is sentence ending, <a> is sentence internal. Look at your HTML document in a browser after the transform: if it doesn't look right, it won't translate right. Avoid at all cost to introduce sentence breaks where they don't belong, or glue words into sentences, where they shouldn't, say in a table.

    - Make sure the resulting document is not longer than 10000 characters. If it is, split into smaller, but complete elements.

    - Translate with content-type="text/html".

    - Transform (XSLT) back to your original XML schema.

    Content-type text/html is designed to preserve HTML formatting and nesting. text/plain will not do that. However, you are pretty much forced into text/plain if you need to translate incomplete XML elements.

    Let us know how it goes,
    Chris Wendt
    Microsoft Translator



    Thursday, February 19, 2015 5:24 PM
  • Thanks once again, Chris!

    : )

    Best,

    Mike

    Thursday, February 19, 2015 5:28 PM
  • Thanks. That said, we should not be messing with the individual tags even in plain text. Let me check....
    Thursday, February 19, 2015 5:31 PM
  • Made an edit to my above answer. Important to consider the sentence breaking vs sentence internal nature of your XML elements.

    Chris Wendt
    Microsoft Translator

    Friday, February 20, 2015 4:42 PM