none
Azure Websites Delivering HTML Content With the Replacement Character (#65533) Randomly Mixed In RRS feed

  • Question

  • I have an Azure website whose content is predominantly in Russian (i.e. Cyrillic characters).  However, I'm finding that it's randomly replacing some characters with the replacement character � (#65533).  It works fine when served by IIS on my machine, but when uploaded to Azure I see this problem.

    It's nothing to do with the text itself, as when I modify other text, then the problem moves to other parts of the text, or goes away.

    It must be some character encoding issue, but I've tried numerous things from searching online, and nothing works (i.e. setting the content encoding header, setting encoding options in the globalization element in the web.config, etc).

    This problem happens on all browsers and on different types of devices.

    I even tried disabling gzip on the response stream, but the problem still occurs.

    Can anybody help me with what might be going wrong?  You can see an example of the problem in action here: http://englishwithexperts-staging.azurewebsites.net/Test.htm (see the heading of the Academic English section).

    Edit: I just changed the link to a static HTML page demonstrating the problem. The problem has strangely gone away with my latest deployment for the real page (where nothing changed on that page, which has me very confused). But I know it will be back, so help is still sought on the issue.

    Edit 2: As I expected, the problem has cropped up elsewhere on the site.  I just can't work this one out.

    Thanks in advance.

    Chris



    Friday, March 28, 2014 8:30 AM

All replies

  • Hi,

    I checked your test page form this link (http://englishwithexperts-staging.azurewebsites.net/Test.htm ). I found your set the Content-Type as 'utf-8'. Firstly, I want to know how did you set the globalization element in the web.config, Did you do it like this?

    <globalization requestEncoding="utf-8" responseEncoding="utf-8" /> 

    Secondly, how did you get this text? Form database, or html page? If you got those text from database, I suggest you could use those code to replace the space:

    byte[] space = new byte[]{0xc2,0xa0};      string UTFSpace = Encoding.GetEncoding("UTF-8").GetString(space);      HtmlStr = HtmlStr.Replace(UTFSpace,"&nbsp;");

    At the same time, I recommend you used '&nbsp;' to replace space symbol.Please try it.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Sunday, March 30, 2014 1:27 PM
    Moderator
  • Hi Will

    The text is "hard-coded" in the page.  The test page I've provided is just a plain HTML page.  Note that the weird characters are *not* in the "source" of this page.  They only appear when served by Azure.  When I download the file via FTP and try and run it locally via IIS again to prove this, it works fine.  It only happens when served by Azure.  If you'd like to verify this yourself, I've zipped this file and uploaded it to http://englishwithexperts-staging.azurewebsites.net/Test.zip

    My globalization element was like this:

    <globalization enableClientBasedCulture="true" culture="en-US" uiCulture="auto" />

    I added the request/response encoding attributes like this:

    <globalization enableClientBasedCulture="true" culture="en-US" uiCulture="auto" requestEncoding="utf-8" responseEncoding="utf-8" />

    But it made no difference, as you can see if you go to the test page.

    Any other ideas?  This is a really weird one, and I can't help but think it's a bug on the Azure side of things because it makes no sense.  Thanks for your help.

    Chris

    Monday, March 31, 2014 9:16 AM
  • Hi Chris,

    you local IIS may have special language pack installed which is not available on the Web Sites VM.

    I will try to test out myself with your page and update you here.

    Regards,

    Wei

    Tuesday, April 1, 2014 9:59 AM
    Moderator
  • Hi Wei

    That'd be awesome - thanks!  I look forward to seeing what you come up with.

    Chris

    Tuesday, April 1, 2014 11:28 AM
  • Hi Chris,

    I was not able to reproduce the issue with the page myself.

    I deployed it to two dc, west us and hk:

    http://wzhaowestus.azurewebsites.net/test.htm

    http://hk1test.azurewebsites.net/test.htm

     In your page, the first two chars after the space is EF BF BD 90.

    20 EF BF BD 90 D0 BA D0 B0

    For my page, it is DO 90 D0 BA.

    20 D0 90 D0 BA D0 B0

    Is there a problem with the page itself you used?

    Regards,

    Wei

    Thursday, April 3, 2014 7:03 AM
    Moderator
  • Hi Wei

    Thanks for taking the time to test this out.  I think the reason you couldn't reproduce the problem is down to the nature of the problem itself - how changes to unrelated parts of the page can make the problem go away (which makes it incredibly frustrating to reproduce).  For example, if I remove the description meta tag from the page's header and re-upload it, the problem vanishes.  Put it back, and re-upload, and the problem is back again.  (I'm just picking this tag as an example, I can remove a CSS file reference, or change some text elsewhere in the page, and the problem goes away.)

    Given your test doesn't have the CSS files or images (obviously, because I hadn't given them to you), this is in effect a change to the page, resulting in a different outcome.  The best thing is probably for you to reproduce exactly what I have (if you're willing).  

    I'm in the process of trying to put together a test case that will enable you to easily reproduce the problem.  I've created a new site, and like you have failed to reproduce the problem in it (using exactly the same files from my main site!).  I'll try and get it to a reproducible state, and let you know how to do so.

    Thanks

    Chris

    Friday, April 4, 2014 2:00 PM
  • I'm actually having the exact same issue. I'm running WordPress on Azure Websites and when a publish a blog post, I end up seeing '' randomly appearing throughout the text.

    I have no idea what the root cause might be.

    Friday, April 4, 2014 9:57 PM
  • @Sahas: I also created a Wordpress site on Azure this weekend and found the same thing, so the issue doesn't seem to be related to my application.  At least I'm not the only one experiencing this problem!

    @Wei: I've been trying to produce a reproducible test case for you, but have been having no luck in doing so, so far.  All the while, every update to the text in my website results in these characters randomly popping up everywhere - it's been causing a great deal of frustration.

    Given I couldn't provide an easily reproducible test case for you, I tried stripping my application back to see if I could find out if there was something in my environment causing the issue.  I stripped all the files out of the root directory and suddenly the problem vanished in that HTML page. Bingo! Or so I thought.  I started putting the files back again, to see which was the culprit, but then all the files were back and the problem was still gone.  I'm left scratching my head, unsure what is now different (nothing from my end as far as I can tell).  I'll run some more experiments when I get the chance, and report back with my findings, as the problem still exists in my production site (I was doing this under staging).

    Chris

    Sunday, April 6, 2014 1:37 PM
  • thanks guys. a stable repro would be great help for us to looking into this issue.

    Regards,

    Wei

    Monday, April 7, 2014 3:56 AM
    Moderator
  • I've got the same issue for a long time (2 years).  In my blog ( http://blog.miniasp.com/ ), it does appear Replacement Character (#65533) randomly.  I have thousand of posts in my blog.  I think you can easily reproduce this problem by clicking few different pages in my blog.

    It will be easier to find out � character by using a jQuery snippet that run in the F12 Developer Tools.

    Here is the code snippet:

    $('body').html().indexOf('�')

    If the return value larger than 0, which mean the random Replacement Character () appear in the page.

    Could you take a try?  Thanks!


    From: Will
    Blog: http://blog.miniasp.com/
    記載著 Will 在網路世界的學習心得與技術分享

    Sunday, March 27, 2016 6:16 PM
  • Hi Sahas Katta,

    The problem still exist on my blog. Are you still suffering by this?


    Monday, February 11, 2019 11:37 AM
  • I realized one thing.  Only HTML ( text/html ) will appear � character.  I tried all my Web APIs, no � will returned.

    I think it is very possibly a ARR's (application request routing) problem.

    Here is my workaround in my blog:

    Below is the HTML part:

    <div id="PostBody">
       MY POST CONTENT
    </div>

    Below is the JS workaround:

    (function () {
        var postBody = $('#PostBody');
        if (postBody.length > 0 && postBody.html().indexOf('�') > -1) {
            $.get('/api/posts/1', function (data) {
                if (data.Content) {
                    console.log('� Found');
                    postBody.html(data.Content);
                }
            });
        }
    })();


    From: Will
    Blog: http://blog.miniasp.com/
    記載著 Will 在網路世界的學習心得與技術分享


    Monday, February 11, 2019 12:54 PM