none
Search very larger files RRS feed

  • Question

  • I want to search a specific text in a very large file, and get the matching line number.

    Is there a solution/algorithm/tool to achieve the same in .net c#?

    Tuesday, March 5, 2013 12:24 PM

Answers

All replies

  • Use Regular Expression to search. The trickiest part is to compile the Regular Expression based on the search text.

    1. Read the full text of the file to a string 

    2. Compile a regular expression based on the search condition

    you can use the Regular Expression building tools, try Expresso

    http://www.ultrapico.com/Expresso.htm


    It all Happenz Sendil

    • Proposed as answer by sendilg Thursday, March 21, 2013 6:43 AM
    Tuesday, March 5, 2013 1:12 PM
  • You could use regex but if you need the line number then regex won't work as well for the entire file.  Assuming that this is a text file then you'll end up having to stream the file using StreamReader.  You'll then read a line at a time and then search the line using Regex as Sendilg mentioned.  Of course if you're just looking for raw text ("Hello") then it might just be quicker to use IndexOf on the line.  If you need something more complex (begin/end elements) then Regex is the way to go.  Streaming a file is the fastest way to read even very large files.  Note also that you can find multiple lines if necessary.

    If this is an XML file then use XPathDocument instead.

    Michael Taylor - 3/5/2013
    http://msmvps.com/blogs/p3net

    Tuesday, March 5, 2013 4:20 PM
    Moderator
  • If the data in the text file is in columns you could connect to the text file and use a SQL expression.  Actually you could treat the entire row as a single field and perform the search.  Opening a file takes a lot of resources.  When you connect to a file, the file isn't opened which may be more efficient.

    jdweng

    Tuesday, March 5, 2013 4:34 PM
  • i won't recommend using regex, as you stated need to deal with "very large file".

    by c#, typically, use IO stream to perform the checking. if you only want to locate by specific line, i guess you can just check the line-feed/line-break charaters through the stream.

    do something like a SQL fast-foward concept, use the stream, only perform forward search, no backward.

    you read chuck of the file content through steam, small piece/chunk by small piece/chunk (like buffer reading)


    • Edited by Kelmen Wednesday, March 6, 2013 5:36 AM
    Wednesday, March 6, 2013 5:36 AM
  • Thanks all!

    The solutions are focused aroung finding the text in the file.

    I have a problem in reading a larger files. Throws out exceptions(memory related)

    while reading larger files, after waiting for a long time. Waht is the best way to read this file and then search the text.


    As there are many ways to read a file in .net, what is the best in this scenario?
    Wednesday, March 6, 2013 6:13 AM
  • Hi Shanthi,

    Welcome to the MSDN Forum.

    Try this similar thread: http://social.msdn.microsoft.com/Forums/en-US/csharplanguage/thread/3a6e57d7-48a7-40da-9d62-56a308243cdb/  

    http://social.msdn.microsoft.com/Forums/en-US/csharplanguage/thread/70784fe4-2b89-4d58-ae05-38bff94c3006/ 

    And this topic is on the big file: http://msdn.microsoft.com/en-us/library/dd997372.aspx 

    A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory. Starting with the .NET Framework 4, you can use managed code to access memory-mapped files in the same way that native Windows functions access memory-mapped files, as described in Managing Memory-Mapped Files in Win32 in the MSDN Library.

    I hope this will be helpful.

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, March 6, 2013 9:12 AM
    Moderator