none
DQS not removing special characters

    Question


  • I am using DQS and I have a Domain for which I checked the box for 'Normalize String' and give output in the form of lower case characters.

    But when I run my file against this Domain then the output contains lower case characters but the special characters like comma, forward slash are still there.

    Anything wrong I am doing?

    Thanks in advance.

    Tuesday, August 13, 2013 6:48 PM

All replies

  • Hello,

    When you select the Normalize String check box, DQS just ignores the special characters in your source value. DQS internally replaces the special characters either with a null or space to ensure better string comparison for data quality processing. These internal replacements of special characters do not have any impact on the processed or output data.

    If you want to find out special characters in your source data, you could use the regular expresssions in a domain rule to flag values containing special characters, and then clean them up interactively using DQS. For example, you can use the following domain rule to flag values that contain a special character or extra spaces between two words in a value:

    Value matches regular expression = ^[a-zA-Z0-9]+([\s][a-zA-Z0-9]+)*$

    This will flag all the values with special characters as Invalid, and you can then cleanse these values in the Manage and view results page in a cleansing data quality project in DQS. For more information, see this article: Using DQS: How to Find Special Characters and Extra Spaces between Words in your Data.

    Thanks
    Vivek
    (SQL Server Documentation | Twitter: @vivek_msft)


    NOTE: Please remember to appropriately vote a post as "helpful" or mark as "answer" to help the community.


    • Proposed as answer by Ramyakranthi Tuesday, September 17, 2013 6:33 AM
    Wednesday, August 14, 2013 2:03 PM