locked
Why the UTF-8 with BOM marker requirement?

    Question

  • First of all, a BOM does not make sense (character map wise) on an UTF8 file. Its only useful for UT16/32 files.
    Why this requirement for the bytecode generator? What does it even care in which encoding the files are saved? And why is that a store requirement?
    Is MSFT suggesting we re-save all our used 3rd party libs with a BOM? How about "use whatever 3rd party lib you want" from the build//? Now we have to take care of resaving any file that gets updated by the original author, because no one saves UTF8 files with a BOM.
    Wednesday, June 13, 2012 1:05 PM

Answers

  • The encoding was found to improve app startup time by 15-20% with the way the app host loads and parses everything. Since this greatly affects the perception of overall Win8 performance, it was added to Store certification requirements.

    That's as much as I know about it.

    • Marked as answer by phil_ke Wednesday, June 13, 2012 8:41 PM
    Wednesday, June 13, 2012 4:31 PM
  • Hi Chetan,

    You don't implement bytecode caching, the OS does this for you automatically.  If you have created your JS files with Visual Studio you are already using bytecode caching (when not running in the debugger).  The Windows Application Certification Kit will detect if any of your JS files are not BOM compliant.

    -Jeff


    Jeff Sanders (MSFT)

    Tuesday, August 07, 2012 1:14 PM
    Moderator

All replies

  • The encoding was found to improve app startup time by 15-20% with the way the app host loads and parses everything. Since this greatly affects the perception of overall Win8 performance, it was added to Store certification requirements.

    That's as much as I know about it.

    • Marked as answer by phil_ke Wednesday, June 13, 2012 8:41 PM
    Wednesday, June 13, 2012 4:31 PM
  • Interesting Kraig. So I assume the parser did not have to figure out in which encoding the file was saved? With the UTF BOM it knows already and can trust this, hence the better parsing performance?
    Wednesday, June 13, 2012 8:41 PM
  • Not entirely sure, to be honest. I think it has something to do with bytecode caching, e.g. the pages are converted to bytecode when the app is installed (or maybe on first run), and that helps load time.

    It does mean, by the way, that minification doesn't impact performance; it might reduce package size for download from the Store, but startup time will be the same.

    Wednesday, June 13, 2012 8:50 PM
  • Kraig, can you be so kind and ask the team responsible for the packager/js Engine how this all works, and why there is this strict requirement for the BOM on JS/HTML/CSS?

    For your reference, we are speculating over here, whats might be the reason for all that:

    https://github.com/jashkenas/coffee-script/pull/2389#issuecomment-6429192

    Tuesday, June 19, 2012 4:05 PM
  • Yes, I can do that. No guarantee on the speed of a response :)
    Wednesday, June 20, 2012 3:33 AM
  • Here's what I heard back:

    1. The BOM significantly improves performance, like I said before, primarily for JS files.
    2. UTF-8 is smaller for most source files; it's what the Chakra engine store internally, so using it eliminates a conversion step.
    3. The BOM for JS also helps correctness; without it, the bytecode generator for apps would have to replicate every rule and heuristic that IE uses to determine correct encoding, and this gets really messy for obvious reasons. (I don't have much more on this...the note I got back was a bit cryptic.)

    In the end, the engineering teams determined that the lack of a BOM was too expensive to accommodate, hence the requirement.

    That's as much as I have.

    Thursday, June 21, 2012 6:08 PM
  • Thanks Kraig. Do you mind posting the rather cryptic note? Maybe it contains a bit more technical informations?
    Thursday, June 21, 2012 6:43 PM
  • It went something like this, don't know if it helps:

    For Javascript it resolves a correctness issue in bytecode caching.  The restriction ensures the encoding we use at bytecode generation time is the same as would happen without bytecode generation.  Without a BOM the bytecode generator would need to replicate every rule and heuristics IE uses to decide the correct character encoding.  Full fidelity would mean running the page/app as there are cases where the interpretation of character encoding can change dynamically due to the html.

    Thursday, June 21, 2012 9:36 PM
  • how to implement bytecode caching?

    I mean I have static reference of my .js files but how do I encode the same using utf-8?


    Chetan

    Tuesday, August 07, 2012 4:05 AM
  • Hi Chetan,

    You don't implement bytecode caching, the OS does this for you automatically.  If you have created your JS files with Visual Studio you are already using bytecode caching (when not running in the debugger).  The Windows Application Certification Kit will detect if any of your JS files are not BOM compliant.

    -Jeff


    Jeff Sanders (MSFT)

    Tuesday, August 07, 2012 1:14 PM
    Moderator
  • thanks..

    Chetan

    Thursday, August 23, 2012 12:54 PM
  • Any idea, how can one add BOM to all the existing files of a project and ensure that in future BOM will always be appended to new files?

    Wednesday, March 06, 2013 9:22 AM
  • We use a prebuilt step using nodejs to ensure this. It would be nice if the darn project compiler would emit a warning during compile time so you catch that early on and not only when running the WACK tool
    • Edited by phil_ke Wednesday, March 06, 2013 10:12 AM
    Wednesday, March 06, 2013 10:11 AM
  • Also realize if you are using Visual Studio to create the files they will be correct.

    -Jeff


    Jeff Sanders (MSFT)

    Wednesday, March 06, 2013 12:40 PM
    Moderator
  • You mean using the "Add New Item" function? Because adding existing items does not warn you if the BOM is missing. That's what I mean the IDE/Compiler is not helping much here. Everytime you include a 3rd party library we have to fix that stuff ourself again and again. And all that after MSFT propagated "use whatever 3rd library u want".
    Wednesday, March 06, 2013 12:48 PM
  • Correct.

    Adding other libraries is not handled.


    Jeff Sanders (MSFT)

    Wednesday, March 06, 2013 3:15 PM
    Moderator
  • Could you forward that feature request to the product team? Honestly VS has not come far since 1.0 except for being more unstable (than 6.0) and slower. I really wonder how MSFT develops big applications with it. I mean a non-resizeable properties dialog, srsly?
    Thursday, March 07, 2013 1:27 PM
  • Wow.  Honestly?  I used all versions of VS in production and I am amazed at how far it has come since V1.

    Since you seem to be having issues with performance you can use the feedback tool to help narrow down the issue and do something about it:

    http://blogs.msdn.com/b/visualstudio/archive/2012/06/20/the-visual-studio-2012-feedback-tool-a-better-way-to-submit-bugs.aspx

    Also please add any actionable feedback you may have to the Visual Studio forum.

    -Jeff


    Jeff Sanders (MSFT)

    Thursday, March 07, 2013 2:03 PM
    Moderator
  • Yes Jeff, its that bad. We are using Sublime Text 2 as our editor, external git (bash) and a bunch of postbuild steps for CoffeeScript and the likes. We generate our solution and project files using a custom make file generator and use VS only for deploying and debugging (JS & C++ code).

    The VS Editor is sooooo much behind Eclipse and Sublime Text, its amazing anyone can use it daily and not get frustrated by it (IntelliSense is disabled here too). Intellisense suggests a code fragment and you have to press the down-arrow and ENTER instead of just enter? It gets me every time and then I disable IntelliSense on a new machine.

    One more thing out of my head now: You never could edit a rc file in code and in the rc editor at the same time. And still cannot. Whats so complicated to implement that? Eclipse had editing of XML files and in Rich Editor from very early on. Why can VS not at least ask to close the code view and open the RC editor instead?

    Same with the new Manifest Editor for Win8 Apps. Its a mess. Unlogic and you still cannot work in the XML source file and the rich editor at the same time. You do not know what a declaration does unless you *add* it to the project, then it displays a description.

    At one point we will send feedback to the VS team though. But they seemed to have time to put "Achievements" into VS, makes you wonder what the priorities are.

    Whats MS using internally to develop Win8 apps? The code looks like generated code in most of MSFT's apps.

    Tuesday, March 12, 2013 11:11 AM
  • Thanks for the feedback Phil!

    Definitely provide this to the VS team!  RE: The RC file... Strange, I never realized that.  The package manifest does just that so it should be trivial to have all files that have another editor have the same behavior.  RE: What they use for editing.  All the internal teams I have dealt with are smoking what they are selling.  There is an automated build and test process (for obvious reasons) but editing is all VS from what I have seen.

    -Jeff


    Jeff Sanders (MSFT)

    Tuesday, March 12, 2013 11:57 AM
    Moderator
  • PS:  The observations you have on VS should go here (User Voice):

    http://visualstudio.uservoice.com/forums/121579-visual-studio

    Bugs are best through the feedback tool.


    Jeff Sanders (MSFT)

    Tuesday, March 12, 2013 11:58 AM
    Moderator