none
translating non-well-formed HTML

    Question

  • Hello,

    My problem is that I'd like to translate sites divided into multiple requests, which worked fine with the Google Translate API, but with your service there are some issues which make this impossible:

    If I use "text/html" encoding the content has to be well formed HTML, and there are tags inserted into the translation if I translate some HTML in multiple parts. If I use "text/plain" then tags are misplaced in the translation.

    You can find some more details here: http://code.google.com/p/jquery-translate/issues/detail?id=75

    Saturday, December 03, 2011 2:01 PM

All replies

  • Hi ZD,

    you are right, in HTML mode the tags will be handled correctly, only if the HTML string is well formed. Missing closing tags will be added, orphaned closing tags will be deleted.

    Chris Wendt
    Microsoft Translator

    Thursday, December 08, 2011 3:46 PM
    Owner
  • What OP want is: do not close tags and do not delete orphaned tags. because most of webpage on internet is not well-formatted.
    • Edited by Eric7z Thursday, December 08, 2011 3:58 PM
    Thursday, December 08, 2011 3:58 PM
  • Exactly! Please try to leave the HTML as it is, it makes your service unusable for quite a lot of people. I guess it's not as easy to implement it as it sounds but the Google API does this correctly.

    Friday, December 09, 2011 1:07 PM
  • Thanks for the feedback: we heard you. Note that a certain well-formedness will always be required for reasonable behavior of marked-up content. Examples:

    - cannot break within a tag
    - translate/notranslate state is lost across a break

    Chris Wendt
    Microsoft Translator

    Monday, December 12, 2011 4:54 PM
    Owner
  • Also, this feature would be essential to support browsers that can't send GET requests longer than about 2000 characters. If you could make it work this way that would certainly introduce some ambiguity, but again, this worked out pretty well with the Google API.
    Monday, December 12, 2011 10:39 PM
  • Are you kidding? My website is 100% compatible with XHTML 1.0 Strict and HTML 5.0 strict without any javascript bypasses, and your API still breakes it completly.

     

    Or maybe microsoft has some different standards?

    Sunday, December 18, 2011 9:05 PM
  • Are you kidding? My website is 100% compatible with XHTML 1.0 Strict and HTML 5.0 strict without any javascript bypasses, and your API still breakes it completly.

     

    Or maybe microsoft has some different standards?

    We all know Microsoft has different standards, always
    Monday, March 26, 2012 9:55 AM
  • Hi Mike and Eric,

    using XHTML makes it easy: you walk the DOM to an element that is smaller than 10K characters, and translate it.

    If you find something breaking, please let us know right here.

    Chris Wendt
    Microsoft Translator

    Monday, March 26, 2012 1:20 PM
    Owner
  • Chris,

    some browsers (Microsoft Internet Explorer) do not support sending GET requests longer than 2000 characters. This is a much harder limit especially if you take into account that urlencoded Chinese letters for example take up 9 characters!

    Monday, March 26, 2012 1:46 PM
  • Hi Balazs,

    I appreciate the input. We'll consider the well-formedness requirement for the next major revision of the API.

    Chris Wendt
    Microsoft Translator

    Sunday, April 01, 2012 10:45 PM
    Owner