A specialized String.Split but on a location instead of a character

已锁定 A specialized String.Split but on a location instead of a character

  • Tuesday, September 12, 2006 10:02 PM
    Moderator
     
     
    I am looking to split a string at a specific character count and return an array of strings from that, with the smarts of not splitting on a word....

    The functionality would somewhat mirror string.Split which is used to tokenize, but instead of a character it would have a hard minimum line size.

    Any existing functionality to tap into or thoughts on doing it?

    advTHANKSance


     Example

    string it = "A post to the forums can be helpful";

    strings[] lines = it.splitOnLocation(10);

    Produces an array {"A post to " , "the forums" , " can be " , "helpful" }

All Replies

  • Wednesday, September 13, 2006 2:09 PM
     
     

    Um... A hard minimum line size, or a hard maximum?    (your description & example disagree)

    For a hard max, do a google search for "C# Word Wrap"   (and if I remember, I'll post mine to my blog -- if I forget, I'm sure there are others out there)

     

     

  • Wednesday, September 13, 2006 2:21 PM
    Moderator
     
     
    Thanks James,

    My post could use tweaking to make it a better design document <g>.

    Word wrap is the key as with laying out text on a page so to speak...one would not want chop a word in the middle but move it down to the next line. So I guess the rule would be no more words over the line size with the knowledge that lines returned would be smaller or equal to the total characters specified.

    My post here on the MS forum is primarily to find out if I have overlooked any Microsoft C#/.Net functionality within the framework which speaks to this issue, either directly or indirectly.

    Thanks,
  • Wednesday, September 13, 2006 11:19 PM
     
     Answered

    You might consider using a Regular Expression... the following is quite short should work as you specified:

    string test = "this is an example using a regular expression to wrap some text in lines of ten chars.";
    MatchCollection matches = Regex.Matches (test,
    @".{1,10}(\s|$)");
    foreach (Match m in matches) {
      Console.WriteLine (m.Value);
    }

    This won't work properly if you expect to have words longer than 10 characters: you will only get the last ten chars. But as you didn't provide a specification for that case I'm allowed to assume it as implementation dependent ;)

    HTH
    --mc

     

  • Thursday, September 14, 2006 8:08 AM
    Moderator
     
     
    ExampleAs much I love Regular Expressions, having recently answered an old unanswered post concerning regex just for the challenge of it, using regex totally flew under the radar! Meaning I loved your post!

    I tinkered around with the regex you supplied. I tried to change the match on the last to use negative lookbehind, but I started getiting whitespace at the front of the line...I finally settled on using explicit capture, but more importantly using the (?: ) construct to tell regex to match the ending line whitespace but do not capture it. I placed the line in a named group, just because I like the flexibility of named groups.

    Other than that it is basically your regex. Thanks!

    Also check out the tool Expresso...its nice to use when one needs to post a solution, for it puts in the attributes in a C#/VB/C++ example...see below.
    Regex//  using System.Text.RegularExpressions;

    /// <summary>
    ///  Regular expression built for C# on: Thu, Sep 14, 2006, 02:03:50 AM
    ///  Using Expresso Version: 2.1.2150, http://www.ultrapico.com
    /// 
    ///  A description of the regular expression:
    /// 
    ///  [Line]: A named capture group. [.{1,20}]
    ///      Any character, between 1 and 20 repetitions
    ///  Match expression but don't capture it. [\W]
    ///      Any character that is not alphanumeric
    /// 
    /// 
    /// </summary>
    public static Regex regex = new Regex(
        @"(?<Line>.{1,20})(?:\W)",
        RegexOptions.IgnoreCase
        | RegexOptions.Multiline
        | RegexOptions.ExplicitCapture
        | RegexOptions.CultureInvariant
        | RegexOptions.Compiled
        );