locked
Design question - File reader class that searches for specific text RRS feed

  • Question

  • I'm trying to write a class library that accepts a list of text files (user selected from the file dialogue) On this group of files i want to search each file for specific string(s) that are also user defined.  When a string is found, it is passed to another function in a separate library that processes the string.

    My question is how should the string get passed to the other class library?  Would I return the string from the file reader class function and pass that as a parameter, when calling the other function. 

    One thing i know for sure I want to process only one string at a time in sequence each time processing the string, and display the progress (i.e. how far along in bytes of total bytes for all files have been processed.)

    EDIT: Another idea I had was to read the file, when the string is found I would record the position of the reader in the file as an out parameter. So the function returns the string and completes.  But the next time it starts it gets passed the position and continues where it left off.

    Just one problem with this idea is what would initiate another call to that function if the calling library already received the string....

     
    • Edited by sjs1978 Friday, September 28, 2012 10:22 PM
    Friday, September 28, 2012 9:20 PM

Answers

  • Are you usig C# or VBnet?  th esolution would be the same for either, but want to know if I supply some examples of code.  You probably want to create your own class.  You don't even need to return any thing if you use List<> of classes to handle each text for you are processing.  You can look at tis code project for some of your code

    http://www.codeproject.com/Articles/13668/Configuration-File-Reader

    Here is sample c# code to use a List<> object to handle your class

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.IO;
    namespace ConsoleApplication1
    {
        class Program
        {
            static List<MyfileReader> filereader = new List<MyfileReader>();
            static void Main(string[] args)
            {
                MyfileReader reader;
                string[] filenames = { "file1", "file2", "file3" };
                foreach (string filename in filenames)
                {
                    reader = new MyfileReader(filename);
                    filereader.Add(reader);
                }
            }
        }
        class MyfileReader
        {
            public static FileStream stream;
            public static int postion;
            public MyfileReader(string filename)
            {
                stream = new FileStream(filename, FileMode.Open);
            }
        }
    }


    jdweng


    • Edited by Joel Engineer Saturday, September 29, 2012 1:28 AM
    • Proposed as answer by Mike Feng Wednesday, October 3, 2012 10:22 AM
    • Marked as answer by Mike Feng Tuesday, October 9, 2012 4:42 AM
    Saturday, September 29, 2012 1:27 AM

All replies

  • Are you usig C# or VBnet?  th esolution would be the same for either, but want to know if I supply some examples of code.  You probably want to create your own class.  You don't even need to return any thing if you use List<> of classes to handle each text for you are processing.  You can look at tis code project for some of your code

    http://www.codeproject.com/Articles/13668/Configuration-File-Reader

    Here is sample c# code to use a List<> object to handle your class

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.IO;
    namespace ConsoleApplication1
    {
        class Program
        {
            static List<MyfileReader> filereader = new List<MyfileReader>();
            static void Main(string[] args)
            {
                MyfileReader reader;
                string[] filenames = { "file1", "file2", "file3" };
                foreach (string filename in filenames)
                {
                    reader = new MyfileReader(filename);
                    filereader.Add(reader);
                }
            }
        }
        class MyfileReader
        {
            public static FileStream stream;
            public static int postion;
            public MyfileReader(string filename)
            {
                stream = new FileStream(filename, FileMode.Open);
            }
        }
    }


    jdweng


    • Edited by Joel Engineer Saturday, September 29, 2012 1:28 AM
    • Proposed as answer by Mike Feng Wednesday, October 3, 2012 10:22 AM
    • Marked as answer by Mike Feng Tuesday, October 9, 2012 4:42 AM
    Saturday, September 29, 2012 1:27 AM
  • Why do you create a list of file reader objects?  What if there were 100 files to process? Is it ok to open that many streams at once?

    Also I posted my project to SkyDrive

    https://skydrive.live.com/redir?resid=13F0426C29589969!111&authkey=!AMBI74k_v8GzvMQ

    Basically I need to open a file, scan it for specific strings and when I find one, turn it into the appropriate object.  In Mode1 class I want to work on that string  and then write out the result back to an output file.

    I guess my first question is, in my IOHandler project, I can easily use the streamreader to find the string I'm looking for.  How do I convert this to my object I have defined (StringType1) and then how do I get that over to Mode1 Class _ (I'll have a function there that operates on each StringType object.)


    • Edited by sjs1978 Sunday, September 30, 2012 3:22 AM
    Sunday, September 30, 2012 3:20 AM
  • Opening lots of files at one time uses resources (like memory), but if they are not all running at the same time and you have the resources then it is not a problem.  With my code you don't have to have all the files open simultaneously. 

    A class is an object and declaring variable in a class as "public static" makes them properties of the class and accessable by the main function.  You want to create an object for each string you find.  with my code it is automatically an object with any conversion.    With my code once you perform the srearch you can simply close the file after you find the string inside the class then you won't have all the files open simultaneously.  You may just want to close the files where you don't find the string and leave the others open.  You can return a status indicating the string wasn't found and then remove them from the List<MyfileReader>.  I create a List<> so you can easily add or delete as required by your code.  I didn't know how may file you  were going to search.


    jdweng

    Sunday, September 30, 2012 5:58 AM
  • I modified that a bit,   What I have is the Read class that handles the processing of each file by the function ProcessFile.  Since i want to handle multiple types, I need to test each line for each type.  So i was thinking of saving the position of the stream reader at the end of each line (in endOffset variable)  So each time i resume after finding the type, I can test again.  so I would continue calling ProcessFile until it reaches end of stream, then i close the file and move on to the next.    I'm not I am calling each part in the right places.  For instance Should I check the line if it matches with a type in the Read class?  Or should the Read class function be responsible for only for outputting the line to some other class that's specifically handles the checking of the type?

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.IO;
    
    
    namespace IOHandler
    {
        public sealed class Read
        {
            public string[] FileList { get; set; }
    
            private Int64 endOffset = 0;
            private FileStream readStream;
            private StreamReader sr;
    
            private System.Text.RegularExpressions.Regex type1 = new System.Text.RegularExpressions.Regex(@"[0-9]{3}[:][0-9]{4}");
            private System.Text.RegularExpressions.Regex type2 = new System.Text.RegularExpressions.Regex(@"^[@]123");
    
            public Read(string[] fl)
            {
                FileList = fl;
            }
    
            public Int64 EndOffset { get; set; }
    
            public object ProcessFile(string file)
            {
                    readStream = new FileStream(file, FileMode.Open, FileAccess.Read);
                    int x = 0;
                    endOffset = 0;
                    bool found = false;
                    char ch;
                    string line = string.Empty;
    
                object message = null;
    
                    while (!(x < 0)) //do this while not end of line (x = -1)
                    {
    
                        readStream.Position = endOffset;
    
                        //line reader
                        while (found == false)  //keep reading characters until end of line found
                        {
                            x = readStream.ReadByte();
                            if (x < 0)
                            {
                                found = true;
                                break;
                            }
                            // else if ((x == 10) || (x == 13))
                            if ((x == 10) || (x == 13))
                            {
                                ch = System.Convert.ToChar(x);
                                line = line + ch;
                                x = readStream.ReadByte();
                                if ((x == 10) || (x == 13))
                                {
                                    ch = System.Convert.ToChar(x);
                                    line = line + ch;
                                    found = true;
                                }
                                else
                                {
                                    if (x != 10 && (x != 13))
                                    {
                                        readStream.Position--;
                                    }
                                    found = true;
                                }
                            }
                            else
                            {
                                ch = System.Convert.ToChar(x);
                                line = line + ch;
                            }
                        }//while - end line reader 
    
                        
    
                        //examine line (is it one of the supported types?)
                        if (type1.IsMatch(line))
                        {
                            message = GetStringType1(line);
                        }
    
                    }//while not end of line
    
                    endOffset = readStream.Position;
                    return message;
            }
    
            public MessageTypes.Type1.Type1 GetStringType1(string line)
            {
                MessageTypes.Type1.Type1 T1 = new MessageTypes.Type1.Type1(line);
                
                return T1;
            }
        }
    }
    

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Windows;
    using System.Windows.Controls;
    using System.Windows.Data;
    using System.Windows.Documents;
    using System.Windows.Input;
    using System.Windows.Media;
    using System.Windows.Media.Imaging;
    using System.Windows.Navigation;
    using System.Windows.Shapes;
    using IOHandler;
    
    namespace Example
    {
        /// <summary>
        /// Interaction logic for MainWindow.xaml
        /// </summary>
        public partial class MainWindow : Window
        {
            public MainWindow()
            {
                InitializeComponent();
            }
    
            private void button2_Click(object sender, RoutedEventArgs e)
            {
    
            }
    
            //start button
            private void button1_Click(object sender, RoutedEventArgs e)
            {
                if (tabControl1.SelectedIndex == 0)
                {
                    //load file list from main window - Mode1 tab
                    Read read = new Read(new string[2]{@"C:\file1.txt",@"C:\file2.txt"} );
                    
                    //read files
                    foreach (string file in read.FileList)
                    {
                        
                        //while not end of stream
                        read.ProcessFile(file);
                        
                        //write transoformed object
    
                    }
                        
    
                }
            }
        }
    }
    

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    namespace Mode1
    {
        public sealed class Mode1
        {
            //input is the string type
            private object message = null;
            //perform appropriate action on the string type
    
            //output is the converted object
    
            public Mode1(object o)
            {
                if (o is MessageTypes.Type1.Type1)
                {
                    message = o;
                    ConvertMessageType1((MessageTypes.Type1.Type1)message);
                }
            }
    
            public MessageTypes.Type1.Type1 ConvertMessageType1(MessageTypes.Type1.Type1 T1)
            {
                T1.S2[0] = new MessageTypes.Type1.Part("test");
    
                return T1;
            }
        }
    }
    

    Tuesday, October 2, 2012 2:38 PM
  • YOu will need to make the varaibles static if you want them to retain the value when you re-enter the class.  I don't think it realls matters how you organize the classes.  The usual preference is to place all the code that directly interfaces to the stream in one class.  Once you get a line which is a string you can either process the string in the stream class or have another class which does the line intepretation.  I have seen it done both ways.  Normally you want to keep a class to a maximum of 200 to 300 lines (3 or 4 pages when printed).  Once I start writing the actual code I often move functions between classes (or create new classes) when one class grows too large.

    jdweng

    • Proposed as answer by Mike Feng Wednesday, October 3, 2012 10:23 AM
    Tuesday, October 2, 2012 2:57 PM
  • Does this make sense what i did? It seems to be working.  I made the position variable static as you suggested so each time i return from the function, the position is saved and it continues where it left off..  But I took the FileStream object away from the Read class, and pass it to the function instead.  Any suggestion for improving this (in terms of structure and performance) are welcome :)

        public sealed class Read
        {
            public string[] FileList { get; set; }
    
            private static Int64 endOffset = 0;
            private FileStream readStream;
            private StreamReader sr;
    
            private System.Text.RegularExpressions.Regex type1 = new System.Text.RegularExpressions.Regex(@"^[@]123");
    
            public Read(string[] fl)
            {
                FileList = fl;
            }
    
            public object ReturnMessage(FileStream readStream, out int x)
            {
                //readStream = new FileStream(file, FileMode.Open, FileAccess.Read);
                x = 0;
                //endOffset = 0;
                bool found = false;
                char ch;
                string line = string.Empty;
    
                object message = null;
                
                while (!(x < 0)) //do this while not end of line (x = -1)
                {
                    readStream.Position = endOffset;
    
                    //line reader
                    while (found == false)  //keep reading characters until end of line found
                    {
                        x = readStream.ReadByte();
                        if (x < 0)
                        {
                            found = true;
                            break;
                        }
                        // else if ((x == 10) || (x == 13))
                        if ((x == 10) || (x == 13))
                        {
                            ch = System.Convert.ToChar(x);
                            line = line + ch;
                            x = readStream.ReadByte();
                            if ((x == 10) || (x == 13))
                            {
                                ch = System.Convert.ToChar(x);
                                line = line + ch;
                                found = true;
                            }
                            else
                            {
                                if (x != 10 && (x != 13))
                                {
                                    readStream.Position--;
                                }
                                found = true;
                            }
                        }
                        else
                        {
                            ch = System.Convert.ToChar(x);
                            line = line + ch;
                        }
                    }//while - end line reader 
    
    
    
                    //examine line (is it one of the supported types?)
                    if (type1.IsMatch(line))
                    {
                        message = line;
                        endOffset = readStream.Position;
                                            
                        break;
                    }
                    else
                    {
                        endOffset = readStream.Position;
                        found = false;
                        line = string.Empty;
                    }
    
                }//while not end of line
    
               
                return message;
            }
    
        }
     private void button1_Click(object sender, RoutedEventArgs e)
            {
                //test file
                IOHandler.Read reader = new IOHandler.Read(new string[1] { @"W:\d1.txt" });
    
                foreach(string file in reader.FileList){
    
                    FileStream readStream = new FileStream(file, FileMode.Open, FileAccess.Read);
    
                    int finished = 0;
                    while (finished != -1)
                    {
                        string message = (string)reader.ReturnMessage(readStream, out finished);
                        
                        //do something to message
    
                        //the write out converted message
                    }
    
                    readStream.Close();
                }
            }



    • Edited by sjs1978 Wednesday, October 3, 2012 5:10 PM
    Wednesday, October 3, 2012 5:08 PM
  • I don't like returning a generic object.  You are going to have an exception if you try to set a string = null.  I would return an empty string "" when nothing is return instead of null.

                        


    jdweng

    Wednesday, October 3, 2012 7:39 PM
  • I don't like returning a generic object.  You are going to have an exception if you try to set a string = null.  I would return an empty string "" when nothing is return instead of null.

                        


    jdweng

    eventually what I need to do is to transform the string into a specific StringType object.  I think I meant something like object = reader.ReturnMessage(readStream, out finished);  And the ReturnMessage was going to convert the string to its type. Then my Mode class would perform the conversion on the object (detecting which type it was).

    Wednesday, October 3, 2012 10:13 PM
  • I wouldn't use the generic Object Class.  I would develop a common base class for your string objects.  Then create a set of child classes under the base class for each of your different string types.

    jdweng

    Wednesday, October 3, 2012 10:22 PM
  • I wouldn't use the generic Object Class.  I would develop a common base class for your string objects.  Then create a set of child classes under the base class for each of your different string types.

    jdweng

    Would an interface work here?  An interface that defines a string, and all the functions (Modes) I can work on that string.  Each string type would have a different structure so I would need to write my functions different for each type when I implement the interface. 

    Regarding Object, I think I would determine in the Read class when Reading what type the string belongs to and convert it to the appropriate object I defined.   Don't I have to return 'object' type since I don't know ahead of time what it is?

    Friday, October 5, 2012 2:56 AM
  • I hope this doesn't blow your mind

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    namespace ConsoleApplication1
    {
        public class Sentence
        {
            public int numberOfwords = 0;
            public char terminator = '\0';
        }
        public class CompondSentence : Sentence
        {
            public int fragments;
        }
        class Program
        {
            static void Main(string[] args)
            {
                Sentence newSentence = CreateSentence("Make A compond sentenance");
                if (newSentence.GetType() == typeof(CompondSentence))
                {
                    Console.WriteLine("this is a compound sentenance");
                    Console.WriteLine("Number of fragments = " +
                        ((CompondSentence)newSentence).fragments.ToString());
                }
            }
            
            public static Sentence CreateSentence(string input)
            {
                CompondSentence newSentence = new CompondSentence();
                newSentence.fragments = 5;
                return newSentence;
            }
        }
    }


    jdweng

    Friday, October 5, 2012 8:56 AM
  • In my Read class I want to be able to detect the type of string based on some regular expression or character sequence, but currently I define these in the Read class itself (see code above)  I don't think this is a good approach because later on I want to define other string types then I have to change my read class.  If I wanted to remove these definitions from the read class and place them somewhere else, what would be a good approach for this?

    Thanks again


    edit:  I was thinking to put this definition on how to recognize the string type with the class that defines the structure for the type.  That would be a better place to keep it since I would need to add a new class for each type I define anyway.  Say it was simple like a regular expression I'm just not sure how I would get that over to the Read class in an if else type statement for each type I want to check for.
    • Edited by sjs1978 Thursday, October 11, 2012 2:52 PM
    Thursday, October 11, 2012 2:30 PM
  • The regular expression are strings so you can make a new public variable which is a List<> of these sttrings.  You can either place them outside a function or in another class.

    public sealed class structures
    {
        public List<string> expresssions = new List<string>();
    }
    public sealed class Read
    {
    }


    jdweng

    Thursday, October 11, 2012 2:58 PM
  • is there a way to keep the expression in each class, and have something populate the list automatically whenever I added a new type?

    I know this code is wrong, but something like this....

    class SupportedTypes{
    
    StringType1 type1;
    
    StringType2 type2;
    
    StringType3 type3;
    
    }
    
    class theList : SupportedTypes{
    
    List<regex> types;
    
    public compileList(){
    
    foreach(StringType in SupportedTypes)
    
     types.Add(StringType.typeRegex);
    
    }
    
    
    }

    Thursday, October 11, 2012 4:08 PM