none
How to Find the Duplicate Values in ArrayList

    Question

  • Hi every body

    I have an ArrayList contains thousands of strings. How can I find the duplicate itmes in this array list



    Thanks
    Sunday, November 12, 2006 5:16 PM

Answers

  • Thanks ahmedilyas

    I found a good example.
    http://www.experts-exchange.com/Programming/Programming_Languages/C_Sharp/Q_21083206.html


    I have to sort the ArrayList first and then i have compare ontacts[ i ] with contacts[ i-1 ]

    contacts.Sort();

    for (int i=1; i <= contacts.Count-1; i++)
    {
    Console.WriteLine(contacts[ i ]);
    Console.WriteLine(contacts[ i-1] );
    if(contacts[ i ].ToString() == contacts[ i-1 ].ToString())
    {
    Console.WriteLine("Duplicate: "+contacts[ i ]);
    }
    }
    Sunday, November 12, 2006 6:39 PM

All replies

  • one obvious way would be to iterate through each item, check to see if an item exists, store it locally in a bool value for example, if it finds it again during the foreach loop, then you know you have a duplicate. Example:

    bool found = false;

    foreach(string currentItem in theArrayList)

    {

       if (currentItem.Equals("value"))

       {

          if (!found)

          {

             found = true;

          }

          else

          {

             MessageBox.Show("Duplicate found");

          }

       }

    }

     

    obviously not a good way as you could have alot of items to get through.

    other way I would suggest is that when you are adding an item, use the .Contains() method of the arraylist collection to see if the item already exists, if so, then don't add it.

    Sunday, November 12, 2006 5:47 PM
    Moderator
  • Thanks ahmedilyas

    I found a good example.
    http://www.experts-exchange.com/Programming/Programming_Languages/C_Sharp/Q_21083206.html


    I have to sort the ArrayList first and then i have compare ontacts[ i ] with contacts[ i-1 ]

    contacts.Sort();

    for (int i=1; i <= contacts.Count-1; i++)
    {
    Console.WriteLine(contacts[ i ]);
    Console.WriteLine(contacts[ i-1] );
    if(contacts[ i ].ToString() == contacts[ i-1 ].ToString())
    {
    Console.WriteLine("Duplicate: "+contacts[ i ]);
    }
    }
    Sunday, November 12, 2006 6:39 PM
  • of course! :-)
    Sunday, November 12, 2006 6:43 PM
    Moderator
  • I was using this method above :
    contacts.Sort();

    for (int i=1; i <= contacts.Count-1; i++)
    {
    Console.WriteLine(contacts[ i ]);
    Console.WriteLine(contacts[ i-1] );
    if(contacts[ i ].ToString() == contacts[ i-1 ].ToString())
    {
    Console.WriteLine("Duplicate: "+contacts[ i ]);
    }
    }

    It finds the duplicates, however it seems to only find the FIRST duplicate.  Any ideas?
    Saturday, December 01, 2007 9:05 PM
  • Try to use 2 For loops

    to find duplicates in entire arraylist.

    for(int i=1;i<list.Count-1;i++)

    {

    for(int j=i+1;j<list.Count-1;i++)

    {

    if(list.get(i)==list.get(j)

    {

    System.out.println("The Duplicate found is"+list.get(i));

    }

    }

    }

    I guess it will work, but i'm not sure in it.

    Saturday, November 03, 2012 5:03 AM
  • You Could use Linq

    private static void Main()
            {
                var list = new ArrayList { "AAAAA", "AAAAA", "BBBBB", "CCCCC", "DDDDD", "EEEEE", "FFFFF", "DDDDD" };
    
                var dublicateGroups =
                    (from string item in list select item).GroupBy(s => s).Select(
                        group => new { Word = group.Key, Count = group.Count() }).Where(x => x.Count >= 2);
                
                foreach (var duplicate in dublicateGroups)
                {
                    Console.WriteLine(duplicate.Word);
                }
    
                // Wait for user
                Console.ReadKey();
            }

    This finds all string that are present two or more times.

    Hope this helps.

    extra:

    for fun I did some performance testing (on my intel I7 16GB laptop), a two foreach loops solution I cancelled after 10 minutes.

    • 1.000.000 items : linq                                              => 00:00:00.0004105
    • 1.000.000 items : sort and compare with previous  => 00:00:03.9922215
    • 5.000.000 items : linq                                              => 00:00:00.0004273
    • 5.000.000 items : sort and compare with previous  => 00:00:20.9535277
    • 10.000.000 items : linq                                            => out of memory exception
    • 10.000.000 items : sort and compare with previous => 00:00:41.9856014

    • Edited by KeesDijk Sunday, November 04, 2012 12:27 PM
    Saturday, November 03, 2012 9:18 AM