locked
How to stop automatic encoding changes? RRS feed

  • Question

  • Hi!

    From time, to time, existing files (.cs) encoding automatically changes from UTF-8 to CodePage-1252, when checked in. I realize that this might happen if the file isn't specifically said to be of an encoding, but this also happens for files that had the UTF-8 Byte Order Mark at the start. When this happens non ASCII-characters (for example åäöÅÄÖ used in Swedish) are replaced with a non-readable character, and therefore information is lost.

    Why is this happening? Is there a way we can stop these changes from happening or at least get a warning that it's about to happen? Can we force TFS to only accept UTF-8 encoded-files? Having to deal with codepages are painful, especially when UTF-8 and UTF-16 exists which, at least for us, covers all printable characters needed.
    Using TFS 2008 and VS 2008.
    Wednesday, October 21, 2009 12:07 PM

Answers

  • Thank you for replying.

    I read Buck Hodges' blog before posting my question but I didn't think what he wrote applied to my situation. I understand that TFS might get the encoding wrong, from my perspective, when I add files to source control and. But in this case, and I apologize for not beeing more clear on this in my original question, the encoding is changed while checking in an existing file.

    We've managed to reproduce the problem. It was ReSharper that removed the UTF-8 BOM and I will describe how to reproduce it, and how to avoid the problem.

    This is what we did:
    • Open a solution in VS, no files open in the text editor.
    • Open a file in the text edito.
    • Perform Rename refactoring (Ctrl-R, Ctrl-R) on a property and make sure "To enable Undo, open all files with changes for editing" isnt't checked.

    Resharper will change all usages, and save the the updated files. If these files have a UTF-8 BOM, it will be removed.

    To avoid encoding chanegs, simply check "To enable Undo, open all files with changes for editing" when doing refactoring. All files that were changed will be opened in Visual Studio, and when saved later, Visual Studio is the one saving, not ReSharper.

    Thanks for your help. I will talk to JetBrains about getting a fix for ReSharper.

    Best Regards
    Håkan Canberger
    • Edited by Håkan Canberger Thursday, October 22, 2009 1:51 PM We found the error ourselves
    • Marked as answer by Håkan Canberger Thursday, October 22, 2009 1:51 PM
    Thursday, October 22, 2009 7:47 AM

All replies

  • Thanks for your post.

    TFS will automatically detect unicode byte order mark and add them with partically encoding. So it is weired here that the files with UTF-8 Byte Order Mark will be added as CodePage-1252.

    Can you do the following check:
    1. Right click on the cs file in the solution explorer
    2. Select "Open with" and select "Binary Editor"
    3. Check the first 3 bytes are: EF BB BF
    4. Add it to source control with pending state
    5. In the Pending Changes window, right click on the file and select Properties
    6. In the general tab, you will see Encoding field

    If it is still not UTF-8, you have to force it to be added as UTF-8 by command line.
    From Buck's blog: How TFS Version Control determines a file's encoding. He mentions that:
    "Unfortunately, TFS does not support changing the encoding of a pending add.  If you need to do that, you will have to undo the pending add, and then re-add the file using the command line and specify the /type option"

    The command line should be: "tf add /type:utf-8 file.cs"

    For your another question about how to force TFS to only access UTF-8 encoded files. It is possible by implementing a check-in policy.
    For detail information, please refer to MSDN: http://msdn.microsoft.com/en-us/library/bb130351.aspx

    Hope it helps and have a nice day.
     

    Hongye Sun [MSFT]
    MSDN Subscriber Support in Forum
    If you have any feedback on our support, please contact msdnmg @ microsoft.com

     


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Thursday, October 22, 2009 2:38 AM
    Moderator
  • Thank you for replying.

    I read Buck Hodges' blog before posting my question but I didn't think what he wrote applied to my situation. I understand that TFS might get the encoding wrong, from my perspective, when I add files to source control and. But in this case, and I apologize for not beeing more clear on this in my original question, the encoding is changed while checking in an existing file.

    We've managed to reproduce the problem. It was ReSharper that removed the UTF-8 BOM and I will describe how to reproduce it, and how to avoid the problem.

    This is what we did:
    • Open a solution in VS, no files open in the text editor.
    • Open a file in the text edito.
    • Perform Rename refactoring (Ctrl-R, Ctrl-R) on a property and make sure "To enable Undo, open all files with changes for editing" isnt't checked.

    Resharper will change all usages, and save the the updated files. If these files have a UTF-8 BOM, it will be removed.

    To avoid encoding chanegs, simply check "To enable Undo, open all files with changes for editing" when doing refactoring. All files that were changed will be opened in Visual Studio, and when saved later, Visual Studio is the one saving, not ReSharper.

    Thanks for your help. I will talk to JetBrains about getting a fix for ReSharper.

    Best Regards
    Håkan Canberger
    • Edited by Håkan Canberger Thursday, October 22, 2009 1:51 PM We found the error ourselves
    • Marked as answer by Håkan Canberger Thursday, October 22, 2009 1:51 PM
    Thursday, October 22, 2009 7:47 AM