Multi lingual user input RRS feed

  • Question

  • User770862748 posted


    I need to accept multi lingual user input, but I still need to validate it so it don't include any malicious data. How do I go about this? I was about to use regular expression but noticed that "\w" doesn't accept word characters as I thought it would do.

     Any ideas?

    Monday, January 29, 2007 10:53 AM

All replies

  • User770862748 posted

    \w =

    "Matches any word character. Equivalent to the Unicode general categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9]. http://msdn.microsoft.com/en-us/library/20bw873z.aspx"

     So how do I turn off the ECMAScript-compliant behavior for the RegularExpressionValidator?

    Monday, January 29, 2007 2:33 PM
  • User113421904 posted


    Here is an example removing HTML tags in string:

    System.Text.RegularExpressions.Regex.Replace(html, "<[^>]*>", string.Empty);

    Though the content might in other language, the html tags are still english. You can consider removing specific html tags, e.g. <script>, </script> if you have specific requirement.


    Tuesday, January 30, 2007 3:11 AM
  • User770862748 posted

    Ok, so the standard  way people use to accept multi lingual user input is to accept all input and then remove the things you think is harmful and then encode it.

    Don't like it, would like to use the regularexpressionvalidator control to say which input it will accept.  

    But I guess I have to do this on the server side then. Maybe the Anti-Cross Site Scripting Library can be of any help?...


    Tuesday, January 30, 2007 5:44 AM