cs(User-Agent) bugs in both Standard and Advanced Logging? RRS feed

  • Question

  • User-2019917520 posted

    I've been looking at parsing the logs produced by IIS and have noticed issues when trying to obtain the value of cs(User-Agent) using both the default logging Advanced Logging modules.

    1) Issue with default logging - it's not possible to get the original value of the User Agent.

    IIS replaces all spaces with a +. It's impossible to tell if a User Agent really had a + or a space in it. Some user agents, for instance that of Googlebot, contain a +, eg:

    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)



    So simply replacing all +'s with spaces won't get the original user agent string back. 

    2) Issue with Advanced Logging - spaces in field values cause incorrect tokenisation.

    It's logged like this:

    "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

    Which is actually 4 tokens, not 1:


    So to parse this correctly, we've got to add some extra tokenisation logic, outside of the W3C standard, to workaround this.

    It says in the spec:

    Fields are separated by whitespace, the use of tab characters for this purpose is encouraged.

    Surely had this been done, both these issues would not exist? I wanted to file a bugs with Microsoft about these, but this was the best place I could find to discuss.


    Thursday, August 4, 2016 2:23 PM

All replies

  • User690216013 posted

    The user agent logging in default log files are by design. Even if you attempted to report it as a bug, it would be ignored not only by Microsoft, but also by the users. It is possible to reconstruct the original values, as there are only a limit set of patterns, where you can easily tell that at a specific location of a user agent string whether it should be a + sign or a space. When you said "impossible", I think you probably considered too much.

    I won't comment much on Advanced Logging, but again user agent strings though not strictly defined by IETF/W3C standards have its conventions set up after so many years. Write a parser and assist it with known patterns is a feasible approach, and is not impossible.

    Friday, August 5, 2016 12:21 PM