none
Remove Duplicate Lines from Text File

    Question

  • I have a text file that contains between 4,000 - 5,000 email addresses (some of which are duplicates).  What is the quickest way to get rid of those duplicates?

    Thanks!
    Friday, September 19, 2008 7:54 PM

Answers

  • Something like this would work:

    List<string> emailAddresses = new List<string>();  
    using (StringReader reader = new StringReader(File.ReadAllText(@"C:\myFile.txt")))  
    {  
        string line = null;  
        while ((line = reader.ReadLine()) != null)  
            if (!emailAddresses.Contains(line))  
                emailAddresses.Add(line);  
    }  
    using (StreamWriter writer = new StreamWriter(File.Open(@"C:\myFile.txt", FileMode.Create)))  
        foreach (string value in emailAddresses)  
            writer.WriteLine(value); 

    David Morton - http://blog.davemorton.net/
    • Marked as answer by jack 321 Wednesday, September 24, 2008 3:32 AM
    Friday, September 19, 2008 8:29 PM
    Moderator

All replies

  • you could read them into an array and check each entry going into the array, or just add them all, sort it, and then check to see if they are back to back and remove them.  Then write them back to the text file. 
    Vince -- http://sportriders.ca
    Friday, September 19, 2008 8:21 PM
  • Something like this would work:

    List<string> emailAddresses = new List<string>();  
    using (StringReader reader = new StringReader(File.ReadAllText(@"C:\myFile.txt")))  
    {  
        string line = null;  
        while ((line = reader.ReadLine()) != null)  
            if (!emailAddresses.Contains(line))  
                emailAddresses.Add(line);  
    }  
    using (StreamWriter writer = new StreamWriter(File.Open(@"C:\myFile.txt", FileMode.Create)))  
        foreach (string value in emailAddresses)  
            writer.WriteLine(value); 

    David Morton - http://blog.davemorton.net/
    • Marked as answer by jack 321 Wednesday, September 24, 2008 3:32 AM
    Friday, September 19, 2008 8:29 PM
    Moderator
  • Hi David,

    Works great!  Thank you so much!

    Friday, September 19, 2008 8:38 PM
  • Waoo thank you very ..its help me :)...great effort keep it up.
    Wednesday, June 16, 2010 1:31 PM
  • Hmm thread necro. :)

    Well anyway, if you liked that solution, you might like this one even better!

    Using linq to solve the problem in one line, assuming that string filename contains the appropriate filename:

    File.WriteAllLines(filename, File.ReadAllLines(filename).Distinct());

    So for example, if the file contains this:

    One
    Two
    Three
    One
    Four
    Three
    Two
    Five
    Four
    Six
    Three
    Seven
    Two
    Five
    Four
    Eight
    Seven
    Nine
    Three
    Ten
    Two
    Eight
    Four
    Eleven
    Seven
    Twelve
    Nine
    Ten
    Thirteen
    Four
    Fourteen
    Fifteen
    Three
    Twelve
    Fifteen
    Fourteen
    Thirteen
    Five
    Fifteen
    Fourteen
    Eight
    Eleven

    It will be changed to contain this:

    One
    Two
    Three
    Four
    Five
    Six
    Seven
    Eight
    Nine
    Ten
    Eleven
    Twelve
    Thirteen
    Fourteen
    Fifteen

     

     

    Wednesday, June 16, 2010 3:30 PM