locked
finding a specific part of a string c# RRS feed

  • Question

  • Hi,

    I am trying to capture various parts of a string, I used string.split but I'm not quite getting the results I want. Here is the string:

    -name 'James' -attributes 'SPRINTING: very good [averages 1 -3, positions] ' -address '14 Addison St ' - track 'grass'

    I want to get the values between '', so the result would be:

    James

     SPRINTING: very good [averages 1 -3, positions]

    14 Addison St

    grass 

    I tried to split the string on '\'' and then on '-' and add them to an array but the second string variable has - here:

    [averages 1 -3, positions]

    Is there a way to find the variables I am looking for using indexof startwith and endswith or something?


    CuriousCoder

    Wednesday, December 5, 2018 5:09 PM

Answers

  • Comma Separated Values (CSV) are an extremely common flatfile database format.  It doesn't matter if the Comma is a | (pipe) or a - (dash) or a , (real, actual comma character) - it's still a CSV.  Command Line Interpreter (CLI) parameter processors are also CSV processing engines.  In this case you need to think in terms of the delimiter as an array element segregator, at which point you realize that you actually have multiple CSVs here - or a multidimensional array in text format.

    using System; namespace ConsoleApplication { class Program { static int Main(string[] args) { int r = 0; string input = @"-name 'James' -attributes 'SPRINTING: very good [averages 1 -3, positions] ' -address '14 Addison St ' - track 'grass'"; // 0x2d is the ASCII code for - and 0x27 is the ASCII code for ' string[] csv = getCSVValues(input,(char)0x2d,(char)0x27); for (int i = 0; i < csv.Length; i++) { Console.WriteLine(csv[i]); } Console.WriteLine(); Console.WriteLine(@"Press any key to exit."); Console.ReadKey(); return 0; } static string[] getCSVValues(string input, char delim, char escapeEncapsulator) { // Initializing the output to 0 allows us to expand from nothing upward as needed. // * This might be easier if the output value was a List<String> rather than an array. string[] returnValue = new string[0]; // Start by getting the individual characters from the input string. char[] inChars = input.ToCharArray(); // If you want to be fancy you might allow multi-character delimiters and escape sequence start/end tokens, but for this // particular thing it's just not needed. // Recognizing that you have a problem with the toplevel array where its delimiter exists inside a field, // you must provide a facility to escape that delimiter so it isn't treated like a delimiter // In your stated case, the "string literal" escape sequence begins and ends with a single quote mark bool inEscape = false; // curString is the current running string, before completion string curString = string.Empty;

                // Begin iteration across all chars from input string             //   * In another thread I recently talked about string literal processing and advised specifically against // iterating per-character with an escape test for each one, in favor of just dumping the whole raw string. // This is a different scenario, where the whole string contains both escaped and unescaped character data, as // opposed to a single string value that contains nothing but escaped data. for (int i = 0; i < inChars.Length; i++) { // First step is to test if this is an escape character. // If the current char is an escape character, then *toggle* inEscape if (inChars[i].Equals((char)0x27)) { inEscape = !inEscape; // If you actually wanted all the possible permutations of the CSV-within-CSV outputs, you would uncomment the next line to pass escapes through //curString += inChars[i]; } else { // If not an escape character, process differently if (inEscape) { // When we're currently inside an escape sequence, spit the current character raw into output curString += inChars[i]; } else if (inChars[i].Equals(delim)) { // When we're currently NOT inside an escape sequence, test if this is a delimiter // If so, finish up with curString // * If output value were a List<string> rather than an array, this might perform a bit better - // or at least it might produce cleaner-looking code... we need to expand the output to contain // the newly created string value System.Array.Resize<string>(ref returnValue, returnValue.Length + 1); // Next add the newly finished string to the output // * And TRIM it so there isn't extraneous whitespace returnValue[returnValue.Length - 1] = curString.Trim(); // And blank curString for the next pass curString = string.Empty; } else { // At this stage we are neither INSIDE an escape sequence or looking directly at a delimiter or escape begin/end token // If you actually wanted all the possible permutations of the CSV-within-CSV outputs, you would uncomment the next line //curString += inChars[i]; // Since you're really only interested in the actual value data found inside escape sequences, you may as well leave the // previous line commented } } } // Finally, it's likely that the very last element in a CSV isn't terminated with an explicit delimiter character. In this case, // simply add the newly constructed string value to the output System.Array.Resize<string>(ref returnValue, returnValue.Length + 1); returnValue[returnValue.Length - 1] = curString.Trim(); // Clean up your toys curString = null; inChars = null; GC.Collect(); return returnValue; } } }


    RegularExpressions are totally useless for XML/HTML, and a whole bunch of other things.  While some convoluted RegEx might actually work for this particular purpose, it would be totally incomprehensible except during the writing/experimenting with it and it would be totally nonportable to any other RegEx engine such as PHP or Python, so I'm calling that a flat-out kludge.  I've never encountered a RegEx issue that couldn't be solved with simple iteration and a tiny bit of logic as demonstrated here.


    It never hurts to try. In the worst-case scenario, you'll learn something.





    Wednesday, December 5, 2018 7:11 PM

All replies

  • Hello, 

    I tried to split the string on '\'' and then on '-' and add them to an array but the second string variable has - here:

    1. split by '\''

    2. remove - started with '\-'

    3. the rest would be what you looking for.

    Assuming - there are no appostrops inside text you need to get.


    Sincerely, Highly skilled coding monkey.

    Wednesday, December 5, 2018 5:26 PM
  • This looks like command line arguments. I would strongly recommend that you forego trying to parse this by hand and simply use one of the many command line parsers that are already available. Split by itself isn't going to work because you have to handle single (or double quoted) values and they are paired. It could be done with Split but I would probably lean toward Regex instead. Even then I think a command line parser would be cleaner. Here's a starter Regex that can parse the above example but hasn't been thoroughly tested for combinations.

    (?<argument>-\w+)\s+(?<argumentvalue>('?[^']*'?)|("?[^"]*"?))
    The matches would contain 4 sets. Each group would have an argument with the argument name and argumentvalue with the value (including quotes). This would handle single and double quotes. Regex supports balanced captures but this doesn't use that option.


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, December 5, 2018 5:58 PM
    Moderator
  • I would probably just step through the input and do all the work yourself. Getting a Regex for something like this can end up be more work than something like this.

    Ethan 

       private void GetSingleQuotedStrings(string parsee)
            {
                List<string> results = new List<string>();
                StringBuilder current = new StringBuilder();
                bool inString = false;
                foreach (char character in parsee)
                {
                    if(character=='\'')
                    {
                        if(inString)
                        {
                            results.Add(current.ToString());
                            current = new StringBuilder();
                            inString = false;
                        }
                        else
                        {
                            inString = true;
                        }
                    }
                    else
                    {
                        current.Append(character);
                    }
                }
            }


    Ethan Strauss

    Wednesday, December 5, 2018 6:27 PM
  • Comma Separated Values (CSV) are an extremely common flatfile database format.  It doesn't matter if the Comma is a | (pipe) or a - (dash) or a , (real, actual comma character) - it's still a CSV.  Command Line Interpreter (CLI) parameter processors are also CSV processing engines.  In this case you need to think in terms of the delimiter as an array element segregator, at which point you realize that you actually have multiple CSVs here - or a multidimensional array in text format.

    using System; namespace ConsoleApplication { class Program { static int Main(string[] args) { int r = 0; string input = @"-name 'James' -attributes 'SPRINTING: very good [averages 1 -3, positions] ' -address '14 Addison St ' - track 'grass'"; // 0x2d is the ASCII code for - and 0x27 is the ASCII code for ' string[] csv = getCSVValues(input,(char)0x2d,(char)0x27); for (int i = 0; i < csv.Length; i++) { Console.WriteLine(csv[i]); } Console.WriteLine(); Console.WriteLine(@"Press any key to exit."); Console.ReadKey(); return 0; } static string[] getCSVValues(string input, char delim, char escapeEncapsulator) { // Initializing the output to 0 allows us to expand from nothing upward as needed. // * This might be easier if the output value was a List<String> rather than an array. string[] returnValue = new string[0]; // Start by getting the individual characters from the input string. char[] inChars = input.ToCharArray(); // If you want to be fancy you might allow multi-character delimiters and escape sequence start/end tokens, but for this // particular thing it's just not needed. // Recognizing that you have a problem with the toplevel array where its delimiter exists inside a field, // you must provide a facility to escape that delimiter so it isn't treated like a delimiter // In your stated case, the "string literal" escape sequence begins and ends with a single quote mark bool inEscape = false; // curString is the current running string, before completion string curString = string.Empty;

                // Begin iteration across all chars from input string             //   * In another thread I recently talked about string literal processing and advised specifically against // iterating per-character with an escape test for each one, in favor of just dumping the whole raw string. // This is a different scenario, where the whole string contains both escaped and unescaped character data, as // opposed to a single string value that contains nothing but escaped data. for (int i = 0; i < inChars.Length; i++) { // First step is to test if this is an escape character. // If the current char is an escape character, then *toggle* inEscape if (inChars[i].Equals((char)0x27)) { inEscape = !inEscape; // If you actually wanted all the possible permutations of the CSV-within-CSV outputs, you would uncomment the next line to pass escapes through //curString += inChars[i]; } else { // If not an escape character, process differently if (inEscape) { // When we're currently inside an escape sequence, spit the current character raw into output curString += inChars[i]; } else if (inChars[i].Equals(delim)) { // When we're currently NOT inside an escape sequence, test if this is a delimiter // If so, finish up with curString // * If output value were a List<string> rather than an array, this might perform a bit better - // or at least it might produce cleaner-looking code... we need to expand the output to contain // the newly created string value System.Array.Resize<string>(ref returnValue, returnValue.Length + 1); // Next add the newly finished string to the output // * And TRIM it so there isn't extraneous whitespace returnValue[returnValue.Length - 1] = curString.Trim(); // And blank curString for the next pass curString = string.Empty; } else { // At this stage we are neither INSIDE an escape sequence or looking directly at a delimiter or escape begin/end token // If you actually wanted all the possible permutations of the CSV-within-CSV outputs, you would uncomment the next line //curString += inChars[i]; // Since you're really only interested in the actual value data found inside escape sequences, you may as well leave the // previous line commented } } } // Finally, it's likely that the very last element in a CSV isn't terminated with an explicit delimiter character. In this case, // simply add the newly constructed string value to the output System.Array.Resize<string>(ref returnValue, returnValue.Length + 1); returnValue[returnValue.Length - 1] = curString.Trim(); // Clean up your toys curString = null; inChars = null; GC.Collect(); return returnValue; } } }


    RegularExpressions are totally useless for XML/HTML, and a whole bunch of other things.  While some convoluted RegEx might actually work for this particular purpose, it would be totally incomprehensible except during the writing/experimenting with it and it would be totally nonportable to any other RegEx engine such as PHP or Python, so I'm calling that a flat-out kludge.  I've never encountered a RegEx issue that couldn't be solved with simple iteration and a tiny bit of logic as demonstrated here.


    It never hurts to try. In the worst-case scenario, you'll learn something.





    Wednesday, December 5, 2018 7:11 PM
  • Hi,

    I am trying to capture various parts of a string, I used string.split but I'm not quite getting the results I want. Here is the string:

    -name 'James' -attributes 'SPRINTING: very good [averages 1 -3, positions] ' -address '14 Addison St ' - track 'grass'

    I want to get the values between '', so the result would be:

    James

     SPRINTING: very good [averages 1 -3, positions]

    14 Addison St

    grass 

    I tried to split the string on '\'' and then on '-' and add them to an array but the second string variable has - here:

    [averages 1 -3, positions]

    Is there a way to find the variables I am looking for using indexof startwith and endswith or something?


    CuriousCoder

    Clearly this problem(!?) is ridiculously simply solved by a (really) simple regex string, and the Regex method Matches().

    Your post states this input:

    string input=
        "-name 'James' "
        +"-attributes 'SPRINTING: very good [averages 1 -3, positions] ' "
        +"-address '14 Addison St ' "
        +"- track 'grass'"
        ;
    

    Notice: you've included a space before track and after - .

    Wednesday, December 5, 2018 11:06 PM
  • Here's the the really simple code, using regex:

    string pat=@"-[^']+'([^']+)'[^-]*";
    Regex
        .Matches(input,pat).OfType<Match>()
        .ToList().ForEach
        (
            m=>Console.WriteLine(m.Groups[1])
        );
    

    P.S.: I've included the extension method OfType<>() just to see if MTaylor learns how to use it, correctly.

    Wednesday, December 5, 2018 11:10 PM
  • Here's the the really simple code, using regex:

    string pat=@"-[^']+'([^']+)'[^-]*";
    Regex
        .Matches(input,pat).OfType<Match>()
        .ToList().ForEach
        (
            m=>Console.WriteLine(m.Groups[1])
        );

    P.S.: I've included the extension method OfType<>() just to see if MTaylor learns how to use it, correctly.

    So where's the bit that extracts only what's inside the single-quote marks?  And why would you include a "skills test" that has nothing to do with DotNET or C#, unless you mean DotNET 3.5 and up with an explicit reference to Linq?  This "ridiculously simple solution" of yours is the first word for sure, but not the last 2 at all.

    People come to MSDN to have questions answered.  Usually those people are new to programming, and almost categorically they're new to whatever language they're posting questions in.  Half solutions and convoluted BS like this will only drive users away.


    It never hurts to try. In the worst-case scenario, you'll learn something.

    Thursday, December 6, 2018 2:20 AM
  • @Andrew B. Painter

    In this forum, it's usual not to post the directives, because it's not code, strictly speaking.
    See the the thousands of web pages as proof.
    Moreover, it's part of the learning process identify which directives should apply.
    Nevertheless, Visual Studio does include the directives, if the user asks to!
       ...although, it will not include the directives for you only, because you are ignorant, and still doesn't know this feature is available.

    >> So where's the bit that extracts only what's inside the single-quote marks?

    It's in the code, but both, your stupidity and ignorance about programming, prevent you from understanding what's quite clear in the code.
    Run the code, and see the output... ROFL...
    Go study Regex.
    Go study LINQ.

    >> ... unless you mean DotNET 3.5 and up ...

    Ridiculous!
    DotNET does not exist!!!
    The correct is .NET !
    Your ignorance is astounding!

    >> unless you mean DotNET 3.5 and up

    Ridiculous!
    .NET Framework is, currently, 4.7.2, and C# Language is 7.3 !
    Your ignorance is, really, astounding!
    Go back to school, dude, if you find any that accepts you as a member...

    >> ... and convoluted BS like this ...

    Your inability to understand the simple code I posted, tells the whole world the BS is inside your head...
    Based on your deseducated criticism, it's also in your mouth; take this opportunity to swallow it, and it'll stuff you, integrating quite well with the matter you're 'built' off.

    >> People come to MSDN to have questions answered. 

    Yes, and I answered it like a master!  :)


    • Edited by ritehere44 Thursday, December 6, 2018 3:47 AM @Andrew B. Painter
    Thursday, December 6, 2018 3:45 AM
  • I liked this solution, I feel like I learned from it and liked the use of the ASCII table which I used with .Peek() before and never thought to use in this scenario.

    Thanks again.


    CuriousCoder

    Thursday, December 6, 2018 11:55 AM
  • I feel compelled to tell you that your choice is really very bad. His code is plain rubbish# code.

    Saturday, December 8, 2018 6:04 AM
  • @Andrew B. Painter

    Comma Separated Values (CSV) are an extremely common flatfile database format.  It doesn't matter if the Comma is a | (pipe) or a - (dash) or a , (real, actual comma character) - it's still a CSV.  Command Line Interpreter (CLI) parameter processors are also CSV processing engines.  In this case you need to think in terms of the delimiter as an array element segregator, at which point you realize that you actually have multiple CSVs here - or a multidimensional array in text format.

    using System; namespace ConsoleApplication { class Program { static int Main(string[] args) { int r = 0; string input = @"-name 'James' -attributes 'SPRINTING: very good [averages 1 -3, positions] ' -address '14 Addison St ' - track 'grass'"; // 0x2d is the ASCII code for - and 0x27 is the ASCII code for ' string[] csv = getCSVValues(input,(char)0x2d,(char)0x27); for (int i = 0; i < csv.Length; i++) { Console.WriteLine(csv[i]); } Console.WriteLine(); Console.WriteLine(@"Press any key to exit."); Console.ReadKey(); return 0; } static string[] getCSVValues(string input, char delim, char escapeEncapsulator) { // Initializing the output to 0 allows us to expand from nothing upward as needed. // * This might be easier if the output value was a List<String> rather than an array. string[] returnValue = new string[0]; // Start by getting the individual characters from the input string. char[] inChars = input.ToCharArray(); // If you want to be fancy you might allow multi-character delimiters and escape sequence start/end tokens, but for this // particular thing it's just not needed. // Recognizing that you have a problem with the toplevel array where its delimiter exists inside a field, // you must provide a facility to escape that delimiter so it isn't treated like a delimiter // In your stated case, the "string literal" escape sequence begins and ends with a single quote mark bool inEscape = false; // curString is the current running string, before completion string curString = string.Empty;

                // Begin iteration across all chars from input string             //   * In another thread I recently talked about string literal processing and advised specifically against // iterating per-character with an escape test for each one, in favor of just dumping the whole raw string. // This is a different scenario, where the whole string contains both escaped and unescaped character data, as // opposed to a single string value that contains nothing but escaped data. for (int i = 0; i < inChars.Length; i++) { // First step is to test if this is an escape character. // If the current char is an escape character, then *toggle* inEscape if (inChars[i].Equals((char)0x27)) { inEscape = !inEscape; // If you actually wanted all the possible permutations of the CSV-within-CSV outputs, you would uncomment the next line to pass escapes through //curString += inChars[i]; } else { // If not an escape character, process differently if (inEscape) { // When we're currently inside an escape sequence, spit the current character raw into output curString += inChars[i]; } else if (inChars[i].Equals(delim)) { // When we're currently NOT inside an escape sequence, test if this is a delimiter // If so, finish up with curString // * If output value were a List<string> rather than an array, this might perform a bit better - // or at least it might produce cleaner-looking code... we need to expand the output to contain // the newly created string value System.Array.Resize<string>(ref returnValue, returnValue.Length + 1); // Next add the newly finished string to the output // * And TRIM it so there isn't extraneous whitespace returnValue[returnValue.Length - 1] = curString.Trim(); // And blank curString for the next pass curString = string.Empty; } else { // At this stage we are neither INSIDE an escape sequence or looking directly at a delimiter or escape begin/end token // If you actually wanted all the possible permutations of the CSV-within-CSV outputs, you would uncomment the next line //curString += inChars[i]; // Since you're really only interested in the actual value data found inside escape sequences, you may as well leave the // previous line commented } } } // Finally, it's likely that the very last element in a CSV isn't terminated with an explicit delimiter character. In this case, // simply add the newly constructed string value to the output System.Array.Resize<string>(ref returnValue, returnValue.Length + 1); returnValue[returnValue.Length - 1] = curString.Trim(); // Clean up your toys curString = null; inChars = null; GC.Collect(); return returnValue; } } }


    RegularExpressions are totally useless for XML/HTML, and a whole bunch of other things.  While some convoluted RegEx might actually work for this particular purpose, it would be totally incomprehensible except during the writing/experimenting with it and it would be totally nonportable to any other RegEx engine such as PHP or Python, so I'm calling that a flat-out kludge.  I've never encountered a RegEx issue that couldn't be solved with simple iteration and a tiny bit of logic as demonstrated here.


    It never hurts to try. In the worst-case scenario, you'll learn something.






    @Andrew B. Painter, aka MR BS,

    Don't hide your ignorance about Regex behind the excuse that there are more than 1 implementation of it; this is Visual C# forum, and C# uses the Regex implemented in .NET Framework ( I have to remind you that DotNet does not exist ) and it is common to all languages that uses it.

    Your code above, is plain rubbish# code, which is common to all ssa seloh like you, because of:

    1) int r = 0; -> you never used this variable.

    2) GC.Collect(); -> you should never use it, without a real good reason, dude.

    3) inside getCSVValues(), you mix hardcoded values with softcoded values (0x27); this is STUPID, dude.

    4) string[] returnValue -> this is the most STUPID use of an array that I ever saw!  Get back to school and learn about List<string>, dude.

    5) The array above, when iterated to show the results, also shows how much STUPID you are, dude. See the output:

    0:*
    1:James*
    2:SPRINTING: very good [averages 1 -3, positions]*
    3:14 Addison St*
    4:grass*

    when using this statement:

        Console.WriteLine("{0}:{1}*",i,csv[i]);

    in Main().

    As, you see, MR BS, your rubbish# code includes 1 more item in the output that doesn't exist in the input!

    For me, it's no surprise, because this is what I expect from ssa seloh like you, MR BS.

    P.S.: if you don't like the way I treat you, ask the manager of these mvps to ban my account; if you don't succeed, this means my treatment to you, is correct!


    • Edited by ritehere44 Saturday, December 8, 2018 6:30 AM
    Saturday, December 8, 2018 6:27 AM
  • As, you see, MR BS, your rubbish# code includes 1 more item in the output that doesn't exist in the input!

    That's actually a pretty good catch.  A person would be better served testing if the curString actually had any data before just blindly stuffing a new element on the end of the array!

                 // Finally, it's likely that the very last element in a CSV isn't terminated with an explicit delimiter character.  In this case, 
                // simply add the newly constructed string value to the output
                if (returnValue.Length > 0)
                {
                     System.Array.Resize<string>(ref returnValue, returnValue.Length + 1);
                     returnValue[returnValue.Length - 1] = curString.Trim();
                }
    

    You legitimately helped to expand and improve on the accepted answer in the thread!


    It never hurts to try. In the worst-case scenario, you'll learn something.

    Saturday, December 8, 2018 7:07 AM
  • The accepted answer is :

    1) WRONG

    2) CODED IN RUBBISH# STYLE

    3) WITH ALL THOSE STUPID ITEMS I MENTIONED BEFORE

    AND IF YOU CORRECT THE CODE, IT WILL STILL BE A HUGE PILE OF CRAP, (BS?) THAT CAME "RITE" FROM INSIDE YOUR HEAD, SSA ELOH.

    Saturday, December 8, 2018 7:16 AM