Asked by:
problem extracting quoted strings from a tsv format file

Question
-
User-1202101404 posted
Folks, I'm a long-time fan (and indeed evangelist) of Log Parser. I tell people about it all the time, but I've been stumped today and would appreciate any help from another pair of eyes.
I process several different logs from a vendor whose columns are space-separated, so I use the TSV format with -iseperator:space. That works great, except that one of the logs has a field which tracks a string (within one of the columns), so the vendor has chosen to use a double-quote around that string. Unfortunately, that breaks log parser's recognition of the value in that column. Here's an example of a line that might be in the file:
col1vala col2val "col3 valuea"
col1valb col2valb "col3 valueb"And I could use: "select field1, field2" and get things ok. but if I add field3, the values shown for that column for both records is just:
"col3
Because it saw the space within the string as being the space separator.
Of course, I want as the answer instead:
"col3 valuea"
"col3 valueb"or better:
col3 valuea
col3 valuebIs there any way to do this? I saw an earlier note (from 2004) where someone wrote that TSV files are not supposed to have quoted values. OK, but I can't get the vendor to change that, and what they did made at least a little sense. I suppose if they changed to using a tab to separate the columns instead, I'd be ok.
But short of getting them to do that, can anyone think of a solution? Thanks in advance.
Tuesday, October 7, 2008 12:19 PM
All replies
-
User1138193213 posted
Using a tab character as a separator is probably your best option, and it also has the added advantage of allowing you to open the TSV file in Excel if you need to. ;-]
Tuesday, October 7, 2008 3:03 PM -
User299556178 posted
Hi!
I cannot figure out any other way than either ask the vendor to change the format, or change it yourself (in one or another way). Personally, i would just write a piece of code to change the space separator to a comma, and then use the CSV input format with the iDQuotes parameter.
Tuesday, October 7, 2008 3:34 PM -
User-1202101404 posted
Just to be clear, while I recognize that getting the vendor to change the file to use tabs (or commas) would be one solution, as I noted, that won't be trivial.
I just want to keep this open in case anyone else may think of another solution, short of them doing that, and for my current files (and those of others using this software). Thanks.
Tuesday, October 7, 2008 3:47 PM -
User-1608113210 posted
Sorry about all the editing and reposting to you, it seems that in Chrome the formatting gets a bit poked, so went back to FF - not that you IIS people would care ;-)
Hi Care,
We would need to understand how consistent this behaviour is in the log and what you want to do with the results to give you fully workable answers.
The following method will work in the instance that you have described and could work globally provided your log file consistently has 2 words separated by a space inside the quotes or no additional trailing fields;
logparser "select field1,field2,strcat(replace_chr(field3,'\"',''),replace_chr(field4,'\"','')) as newfield from quotes.tsv" -iseparator:space -headerRow:OFF
This gives you the result below;
field1 field2 newfield
-------- -------- ----------
col1vala col2val col3valuea
col1valb col2valb col3valueb
However, although more longwinded, this would be the best way to get it all out consistently - in this instance, I added a fourth field after your quoted field
This statement;
logparser "select extract_token(text,0,' ') as field1,extract_token(text,1,' ') as field2,extract_token(text,1,'\"') as field3,extract_token(text,2,'\"') as field4 from quotes.tsv" -i:textline
Gives these results;
field1 field2 field3 field4
-------- -------- ----------- ---------
col1vala col2val col3 valuea col4val
col1valb col2valb col3 valueb col4vala
I hope that this helps, if you can give a sample of real data I will work it further for you.
Cheers, Dave
Wednesday, October 8, 2008 5:16 AM -
User-1202101404 posted
Wow, Dave, if you ever read this, I owe you an apology. You responded to my query above a couple of years ago but I either wasn't notified or never noticed, and I'm really sorry.
As for your answer, well, it would work for many, I'm sure, so I'm very glad you shared it, but there are two reasons neither work for me. For the first one, sadly, I *do* have fields that follow the one in question. I just didn't think to mention that in my example. For the second one, sadly, I have other fields that precede the one in question which may or may not also have quoted values, so I can't rely on finding the given pair as the "first" or "second", and so on. So I am still stuck.
Still, perhaps others have benefited from your suggestion.
If anyone is interested in resolving what I've riased, here's a better example of what I have and want (the quoted string "I want this"):
col1vala col2val "I want this" "col4val"
col1valb "" "I want this" "col4val"Stumper. :-)
/charlie
Friday, April 2, 2010 7:45 PM -
User-1202101404 posted
I'd like to bump my question from 2008. There was never a solution (though a few kind folks tried back then.) Perhaps someone seeing it anew may have a suggestion. (I simply had to stop using LP against this file, but the need has come up again, and if I could it would be otherwise perfect for the analysis I must do.)
carehart
Folks, I'm a long-time fan (and indeed evangelist) of Log Parser. I tell people about it all the time, but I've been stumped today and would appreciate any help from another pair of eyes.
I process several different logs from a vendor whose columns are space-separated, so I use the TSV format with -iseperator:space. That works great, except that one of the logs has a field which tracks a string (within one of the columns), so the vendor has chosen to use a double-quote around that string. Unfortunately, that breaks log parser's recognition of the value in that column. Here's an example of a line that might be in the file:
col1vala col2val "col3 valuea"
col1valb col2valb "col3 valueb"And I could use: "select field1, field2" and get things ok. but if I add field3, the values shown for that column for both records is just:
"col3
Because it saw the space within the string as being the space separator.
Of course, I want as the answer instead:
"col3 valuea"
"col3 valueb"or better:
col3 valuea
col3 valuebIs there any way to do this? I saw an earlier note (from 2004) where someone wrote that TSV files are not supposed to have quoted values. OK, but I can't get the vendor to change that, and what they did made at least a little sense. I suppose if they changed to using a tab to separate the columns instead, I'd be ok.
But short of getting them to do that, can anyone think of a solution? Thanks in advance.
Thursday, May 24, 2018 7:29 PM