none
Word conversion of tabulator characters in HTML RRS feed

  • Question

  • I'm looking for an algorithm or a library that allows the conversion of tabulator characters to spaces, similar to Microsoft Word which substitutes them with the appropriate number of spaces depending on the font size and the amount of characters.

    Is there a way to know how Word performs such a conversion or if a 3rd party library is available?

    Monday, May 23, 2011 2:55 PM

Answers

  • You are correct that saving as filtered HTML generates spaces in place of tabs but the end result doesn't seem to line up very well. I'm afraid I have no idea what Word does, nor, given how poorly it does it, do I care to find out, and am not aware of any document detailing it.
     
    I think, if I wanted to do this I would consider creating tables, but to do it - or indeed anything - you would need to know where the tab stops were set.
     
    I'm afraid I can't help much more - sorry. I'll leave this for anyone else with any bright ideas.
     

    Enjoy,
    Tony
    www.WordArticles.com
    Wednesday, May 25, 2011 4:56 PM

All replies

  • Tab characters are not converted to other characters, and certainly not dependent upon font size - they are set at (user-)defined distances. There are one or two instances where characters are counted if you are using character units, but Word doesn't really do it properly. Can you be a bit more specific about what it is you are trying to do?
     

    Enjoy,
    Tony
    www.WordArticles.com
    Monday, May 23, 2011 3:11 PM
  • Thank your for your quick reply.

    I'm receiving a HTML which contains something similar to this:

     

    <p class="cs69E3FE35"><span class="cs146AB7EC">asd	asd</span></p>
    

     

    Where between each 'asd' word is a '\t' character. I know that this HTML is not valid, because the HTML standard doesn't support tabulators, but I have no control over the creation process. A simple substitution to four non-breaking spaces is not a good solution because is breaks simple tables made by tabulators.

    I have noticed that Word has an option to save documents as HTML, which can properly imitate tabulators using an appropriate number of non-breaking spaces. The exact number of them depends on the font size and type. For example Word generates such HTML:

     

    <p class=MsoNoSpacing>asd         asd</p>
    <p class=MsoNoSpacing><span style='font-size:18.0pt'>asd   asd</span></p>
    

     

    It looks like a real tabulator in the browser.

    Thanks in advance.


    • Edited by Piotr Zemczak Tuesday, May 24, 2011 3:17 PM Minor corrections
    Tuesday, May 24, 2011 3:16 PM
  • The trouble with using spaces is that they only work with monospace fonts. I hadn't considered HTML when I first saw your question but I just did a quick test (in Word 2007) and the generated HTML includes "style='tab-interval:36.0pt'", so HTML does support tabs (and a quick web search suggests it has done for more than ten years).
     
    This doesn't help you if you are receiving documents that use spaces but do you actually need to recreate documents with spaces? The algorithm - and I don't believe anything is published - is complex, and all I have ever seen suggested when people want to calculate the width of some text is to write the text to a document, let Word do the calculation, and then check what it has done.
     

    Enjoy,
    Tony
    www.WordArticles.com
    Tuesday, May 24, 2011 7:33 PM
  • Unfortunately the style 'tab-interval' as well as 'tab-stops' is only an proposal for the HTML standard:

    http://www.w3.org/People/howcome/t/970224HTMLERB-CSS/WD-tabs-970117.html

    As I said in my previous post I receive HTML documents with tabulator characters ('\t') which are not properly displayed in most browsers. I simply need to correct the HTML in order to duplicate the functionality of tabulators. Microsoft Word 2007 has an option to save HTML documents in filtered mode, which performs some calculations, and inserts spaces regardless if the font is monospace or not.

    I thought that maybe some sort of paper or document was published that describes how the conversion takes place.

    Wednesday, May 25, 2011 8:07 AM
  • You are correct that saving as filtered HTML generates spaces in place of tabs but the end result doesn't seem to line up very well. I'm afraid I have no idea what Word does, nor, given how poorly it does it, do I care to find out, and am not aware of any document detailing it.
     
    I think, if I wanted to do this I would consider creating tables, but to do it - or indeed anything - you would need to know where the tab stops were set.
     
    I'm afraid I can't help much more - sorry. I'll leave this for anyone else with any bright ideas.
     

    Enjoy,
    Tony
    www.WordArticles.com
    Wednesday, May 25, 2011 4:56 PM