locked
Trying to strip out characters from SQL Server using bulk insert RRS feed

  • Question

  • User-718146471 posted

    Hello folks, I'm tossing this one your way to see if someone can figure this out. I have my data being imported using bulk insert from a .csv file. The problem is some of the rows in the file have a line break and carriage return where I do not want them. The actual string of characters I need to strip out is CHAR(33), CHAR(13), CHAR(10), CHAR(32) but no matter how I try to alter this, that stupid string of characters remains. Just for clarity it looks like this:

    Now is the eve o !
    f our discontent.

    when it should read: Now is the eve of our discontent. Here is what I tried using two temp tables:

    	SET NOCOUNT ON;
    	CREATE TABLE #Cleanup
    	(
    		SimpleData nvarchar(MAX)
    	)
    	BULK INSERT #Cleanup from 'C:\********************************\count_9-1.csv' 
    	-- Before
    	SELECT * FROM #Cleanup
    
    	DECLARE @ReplVar nvarchar(MAX)
    	SET @ReplVar = CONCAT(CHAR(33), CHAR(13), CHAR(10), CHAR(32));
    	UPDATE tmp
    	SET    tmp.SimpleData = REPLACE(tmp.[SimpleData], @ReplVar, '')
    	FROM   #Cleanup tmp;
    	-- After
    	SELECT * FROM #Cleanup
    

    I see no difference in the table after applying the replace. Any ideas? For clarification, here is the record in question in varbinary format (truncated to focus on trouble spot):

    003100300020005300500032002C0020005300680061007200650050006F0069006E0074002000530065007200760065007200200032003000310030002
    0005300500032002C0020005300680061007200650050006F0069006E007400200046006F0075006E0064006100740069006F006E002000320030003100
    3300200047006F006C006400200061006E00640020005300500031002C00200061006E00640020005300680061007200650050006F0069006E007400200
    053006500720076006500720020003200300031003300200047006F006C006400200061006E0064002000530050003100200061006C006C006F00770073
    002000720065006D006F002100

    0x2000740065002000610074007400610063006B00650072007300200074006F00200065007800650063007500740065002000610072006200690074007
    200610072007900200063006F00640065002000760069006100200061002000630072006100660074006500640020004F00660066006900630065002000
    64006F00630075006D0065006E0074002C00200061006B0061002000220022004D006900630072006F0073006F006600740020004F00660066006900630
    06500200043006F006D0070006F006E0065006E0074002000550073006500200041006600740065007200200046007200650065002000560075006C006E
    00650072006100620069006C006900740079002E00220022002200

    Thursday, September 29, 2016 1:32 PM

Answers

  • User-718146471 posted

    I ended up writing a small C# command line program that can later be migrated to a Windows Service application. There really is no way to do this as an all-in-wonder; would have been very nice if they expanded the BULK IMPORT to allow us to strip out character combinations since we all have to deal with garbage data coming in from time to time.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, September 29, 2016 7:24 PM

All replies

  • User269881539 posted

    Have you not got Exclamation mark, Vertical Tab, Line Feed, Space in your CONCAT.

    Whereas you example has Space, Exclamation Mark, Vertical Tab, Line Feed

    There seems to be no space after the line feed is what I am saying?

    Might also be worth a run through with CHAR(10) and CHAR(13) swapped round?

    Thursday, September 29, 2016 3:10 PM
  • User-718146471 posted

    I ended up writing a small C# command line program that can later be migrated to a Windows Service application. There really is no way to do this as an all-in-wonder; would have been very nice if they expanded the BULK IMPORT to allow us to strip out character combinations since we all have to deal with garbage data coming in from time to time.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, September 29, 2016 7:24 PM