none
Guidance for handling text RRS feed

  • Question

  • I'm looking for guidance on how to handle and manipulate lines of text and I thought using VB (via Visual Studio Community 2017) would be a good way to tackle this. I haven't touched VB since the 6.0 days and I've forgotten nearly all of it!

    What I'm trying to do:

    1. Load a text into a text box
    2. Scan each line and if it encounters any instance of "*" to delete those lines
    3. Sort each line based on particular word it encounters in that line . For example "Red" "Blue" Greeen" etc.
    4. Save the nicely sorted text to a file

    It sounds relatively easy but I'm not sure how to tackle each step. Are there any instructional sites or pointers to get me started?

    Thank you

    Monday, September 3, 2018 1:49 AM

All replies

  • The following (done in Visual Studio 2017) loads a text file excluding lines containing *.

    Text file

    First line
    Second line
    Third * line
    Fourth line
    * test
    Karen
    Payne
    *Black**

    Code

    Imports System.IO
    Public Class Form1
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Dim fileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TextFile1.txt")
            If File.Exists(fileName) Then
                Dim result = File.ReadAllLines(fileName).Where(Function(line) Not line.Contains("*")).ToArray()
                TextBox1.Lines = result
                TextBox1.SelectionStart = 0
            End If
        End Sub
    End Class
    

    For sorting more details are needed. I have ideas but not enough information for you and don't care to guess.

    For saving we modify the code above to place the file name as a private variable to the form.

    Imports System.IO
    Public Class Form1
        Private fileName As String = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TextFile1.txt")
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            If File.Exists(fileName) Then
                Dim result = File.ReadAllLines(fileName).Where(Function(line) Not line.Contains("*")).ToArray()
                TextBox1.Lines = result
                TextBox1.SelectionStart = 0
            End If
        End Sub
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            File.WriteAllLines(fileName, TextBox1.Lines)
        End Sub
    End Class
    


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Monday, September 3, 2018 3:10 AM
    Moderator
  • Hi,

    File reading and writing at the following link

    https://docs.microsoft.com/en-us/dotnet/standard/io/

    Best Regards,

    Alex


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, September 3, 2018 3:14 AM
  • There are lots of ways to go about solving this.  The most appropriate and efficient way will depend entirely on the data you are working with and how you'll use the results.  There are a number of things to determine before deciding how to proceed:

    1. How big are the input text files?  A few lines, lots of lines, hundreds of lines, thousands of lines?
    2. How many input text files are there?  A few, quite a few, hundreds, thousands?
    3. How many sort words are there per file?  Just one word in each line, multiple words in each line?
    4. Is the sort order alphabetical? Is each line ordered alphabetically by the sort word it contains, or does each sort word specify its ordinal in the sort order?
    5. Can a line contain more than one search word? If so, how do you sort? By the first sort word encountered, based on the order of the sort words within the line, by number of sort words present?
    6. Is there any pattern to the lines in a file?  Will the sort word always be in the same position, or must it be found anywhere in each line?
    7. Can you predefine the list of possible sort words?  Are all the words known ahead of time, or must they be discovered in the input text?
    8. Is it the same set of sort words for every file?  Can you use the same sort words for every file, or do you need to have unique sort words for each file?
    9. Does the user need to see the text in the program at any point?  Do you need to display the input text before removing or sorting, does the text need to be shown after removing and sorting but before saving, or can the entire process occur without the user seeing any text (just read one file and create the other).
    10. What does the user actually do with the final output file? Is another plain text file really the most useful representation of the data?

    These are the kinds of questions you have to think about and answer as best you can before starting a project like this.  As stated, there is a broad spectrum of ways to go about doing this; from simple string manipulation, as Karen has shown, to full parsing solutions capable of understanding the grammar and syntax of the input data, allowing for complex searching and sorting based upon spoken-language grammar rules.  The answers you provide for the above questions dictate which end of that spectrum of possible designs you need to lean toward.

    For example, if the input text files are relatively small (a couple hundred, reasonable length, lines of text at most) and there is only one sort word which comes from a predefined list, then simple string manipulation with some LINQ over the loaded lines of text (what Karen shows, with a little more functionality for the sorting) is probably the way to go.

    However, if the input text files are large (thousands of lines of text and/or very long individual lines) and/or the sort words must be discovered from the content and/or the actual use of the output file could benefit from aggregate data such as the count of individual sort word matches or the noun described by the sort word (e.g.  knowing what "blue thing" was mentioned when sorting by a color word), then a per-character parsing solution (perhaps with full English grammar) would probably be better.

    And perhaps the answers to the questions indicate that some combination of the two, or something entirely different like REGEX, is what would be most appropriate.

    Once we have a better idea of what you are doing, why you are doing it, and the details of the text files involved we can get you pointed in the right direction.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Monday, September 3, 2018 2:37 PM
    Moderator
  • - How big are the input text files?  A few lines, lots of lines, hundreds of lines, thousands of lines?

    Very small - less than 20KB
    Probably about 30 lines. Maybe 100 lines at the absolute maximum

    - How many input text files are there?  A few, quite a few, hundreds, thousands?
    Just one

    - How many sort words are there per file?  Just one word in each line, multiple words in each line
    There will be 11 sort words, only one of which will appear in a line.

    - Is the sort order alphabetical? Is each line ordered alphabetically by the sort word it contains, or does each sort word specify its ordinal in the sort order?
    Yes alphabetical, A-Z.  The input file would look like something like this:
    the big red dog jumped over the stick
    ****************************************
    the small toy car was black
    **********************
    ***
    pens are blue and compact
    *******

     I would want to scan that line, pick out the sort word and sort the entire line alphabetically against the sort word while also ignoring any line with an asterix. So in this example the output file would be:
    the small toy car was black
    pens are blue and compact
    the big red dog jumped over the stick

    - Can a line contain more than one search word? If so, how do you sort? By the first sort word encountered, based on the order of the sort words within the line, by number of sort words present?
    No it will only have one sort word

    -Is there any pattern to the lines in a file?  Will the sort word always be in the same position, or must it be found anywhere in each line?
    The sort word position will be variable

    -Can you predefine the list of possible sort words?  Are all the words known ahead of time, or must they be discovered in the input text?
    Yes I have 11 sort words pre-defined that won't change

    -Is it the same set of sort words for every file?  Can you use the same sort words for every file, or do you need to have unique sort words for each file?
    n/a

    -Does the user need to see the text in the program at any point?  Do you need to display the input text before removing or sorting, does the text need to be shown after removing and sorting but before saving, or can the entire process occur without the user seeing any text (just read one file and create the other).
    I would like to see the input file in one box, and the results in another box prior to hitting a save button

    What does the user actually do with the final output file? Is another plain text file really the most useful representation of the data?
    - it will be used just for ease of reading by me. a plain text file is all i need

    Thank you for the thoughtful questions

    Monday, September 3, 2018 4:19 PM
  • For sorting:

    The input file would look like something like this:
    the big red dog jumped over the stick
    ****************************************
    the small toy car was black
    **********************
    ***
    pens are blue and compact
    *******

    I would want to scan that line, pick out the sort word and sort the entire line alphabetically against the sort word while also ignoring any line with an asterix. So in this example the output file would be:

    the small toy car was black
    pens are blue and compact
    the big red dog jumped over the stick

    Monday, September 3, 2018 4:20 PM
  • Try a solution that uses Regular Expressions:

    Dim words() As String = {"Black", "Blue", "Red"}
    
    Dim lines = File.ReadAllLines("path to file…")
    
    Dim result = lines _
                    .Where(Function(s) Not s.Contains("*"c)) _
                    .OrderBy(Function(s) words.Select(Function(w, i) New With {w, i}).FirstOrDefault(Function(p) Regex.IsMatch(s, "\b" & Regex.Escape(p.w) & "\b", RegexOptions.IgnoreCase))?.i) _
                    .ToArray()
    
    TextBox1.Lines = result

    Then write the text to file.



    • Edited by Viorel_MVP Monday, September 3, 2018 8:25 PM
    Monday, September 3, 2018 8:23 PM