none
Most effecient algorithm to find duplicate integers in random array( something that returns the value and index of the duplicate )

    Question

  • Hi guys,

    I know its a cliche question, but I really need to find a very optimised method of traversing an array of random int, no assumptions about those integers, except for one: they have duplicates.

    Now I need to go through them and find out the duplicate value and the index of that value. I tried something using a hashtable to keep stuff in but I am hoping someone will come up with something more effecient.

    Friday, October 14, 2011 2:40 PM

Answers

  • You have no choice but to traverse the array because you cannot assume it is sorted, and then keep record of the integers you are finding in a separate collection, like a hash table.

    Alternatively, but pretty much equivalent, you could sort the incoming array and then just spit out duplicates while you traverse it (array[i] == array[i + 1]).  Without some sort of order, you cannot complete this task at all, I would say.


    Jose R. MCP
    Friday, October 14, 2011 2:48 PM
  • Indexes of both values that are the same?

    You can create a Dictionary (TKey woluld be an index and TValue would be an actual value), then do a loop a double loop and add duplicated to the dictionary. And if someone found in loops, check if this one does not yet exist in dictioanary (if not, then add it to dictionary).

     


    Mitja
    Friday, October 14, 2011 3:03 PM
  •  

    Try this:

       Private Shared Sub ListDuplicates(ByVal values As Integer())
    
          Dim q = From value In values.Select(Function(v, index) New With {.value = v, .index = index}) _
                  Group By value.value Into Group _
                  Where Group.Count > 1
    
          For Each item In q
             Debug.Print(item.value.ToString)
             Debug.Indent()
             For Each item2 In item.Group
                Debug.Print(item2.index.ToString)
             Next
    
             Debug.Unindent()
          Next
    
       End Sub
      
    

    Test:

          Dim values As Integer() = {10, 3, 5, 1, 8, 3, 10, 2, 4, 8, 3}
    
          ListDuplicates(values)
    
    

    Don't know if it's "sufficient efficient". :)

    Armin
    Friday, October 14, 2011 3:10 PM
  • Check out this code I did:

     

                int[] array1 = { 5, 7, 8, 10 };
                int[] array2 = { 2, 5, 10, 12 };
                Dictionary<int, int> dic = new Dictionary<int, int>();
    
                for (int i = 0; i < array1.Length; i++)
                {
                    for (int j = 0; j < array2.Length; j++)
                    {
                        if (array1[i].Equals(array2[j]))
                        {
                            if (!dic.ContainsValue(array2[j]))
                            {
                                dic.Add(j, array2[j]);//j is here an index, and 2nd param is value
                                //index is taken from 2nd array!
                            }
                        }
                    }
                }
    



    Mitja
    Friday, October 14, 2011 4:02 PM
  • Try this code:

                int[] array1 = { 5, 7, 8, 10 };
                int[] array2 = { 10, 4, 5, 12 };
                Dictionary<int, int> dic = new Dictionary<int, int>();
    
                for (int i = 0; i < array1.Length; i++)
                {
                    for (int j = 0; j < array2.Length; j++)
                    {
                        if (array1[i].Equals(array2[j]))
                        {
                            if (!dic.ContainsValue(array2[j]))
                            {
                                dic.Add(j, array2[j]);//j is here an index, and 2nd param is value
                                //index is taken from 2nd array!
                            }
                        }
                    }
                }
    



    Result will be 2 items in dictionary (5 and 10).

    5 will be at index 2 (2nd item in 2nd array)

    10 will at index 0 (1st item in 2nd array).


    Mitja
    Friday, October 14, 2011 4:06 PM
  • Hey,

     

    So I would implement it like this:

     

     

    Dictionary<int, int> nums = //initialize
    loop through array:
    {
      if(nums.ContainsKey(value of array)
      {
         nums[value of array]++;
      }
      else
      {
        nums.add(value of array, 1);
      }
    }

     

     

    This is optimal and guessing what you implemented. It runs in O(n) (pretty much optimal afaik).

    Since the add, get, and contains key of the dictionary are O(1).

     

    Unless there is some trick that can be used based on the data or some helpful guessing O(n) is probably the best possbile.

     

     

    Thanks,

     

    Brad

     



    Friday, October 14, 2011 4:25 PM
  • You can use this method to solve your requirement if the integer collection is not very much huge
            static void Main(string[] args)
            {
                int[] source = { 1, 2, 4, 1, 2, 1, 1, 5, 8, 10, 10, 88, 999, 444, 343, 10, 100, 11, 99, 11, 99 };
    
                //<value, duplicated value's index list>
                Dictionary<int, List<int>> dic = new Dictionary<int, List<int>>();
    
                int count = source.Length;
                for (int i = 0; i < count; i++)
                {
                    if (!dic.ContainsKey(source[i]))
                    {
                        dic.Add(source[i], new List<int>());
                    }
                    dic[source[i]].Add(i);
                }
    
                //print
                foreach (int key in dic.Keys)
                {
                    Console.Write(key + "--");
                    dic[key].ForEach(delegate(int index)
                    {
                        Console.Write("\t" + index);
                    });
                    Console.WriteLine();
                }
    
                Console.ReadLine();
            }
    And if there's a very very huge collection, then I think you can use a sort algorithm to order it first, and then split the collection(ensure the same int value in the same sub collection, you can compare each sub colleciton's beginning number and the before collection's ending number). And then define the same number Dictionary to receive each collection. Then you can start multi threads to use my above algorithm find duplicate numbers and store them to each Dictionary. This is the parallel way for very very huge collection. My this way can avoid using locker, so the speed will fast, it will depends on the CPU cores only I think. But then you will need more work to consider the index storing when short them at the beginning.
    Mike [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.



    Saturday, October 15, 2011 7:45 AM

All replies

  • Hi guys,

    I know its a cliche question, but I really need to find a very optimised method of traversing an array of random int, no assumptions about those integers, except for one: they have duplicates.

    Now I need to go through them and find out the duplicate value and the index of that value. I tried something using a hashtable to keep stuff in but I am hoping someone will come up with something more effecient.

    Friday, October 14, 2011 2:27 PM
  • Hi guys,

    I know its a cliche question, but I really need to find a very optimised method of traversing an array of random int, no assumptions about those integers, except for one: they have duplicates.

    Now I need to go through them and find out the duplicate value and the index of that value. I tried something using a hashtable to keep stuff in but I am hoping someone will come up with something more effecient.

    Friday, October 14, 2011 2:27 PM
  • You have no choice but to traverse the array because you cannot assume it is sorted, and then keep record of the integers you are finding in a separate collection, like a hash table.

    Alternatively, but pretty much equivalent, you could sort the incoming array and then just spit out duplicates while you traverse it (array[i] == array[i + 1]).  Without some sort of order, you cannot complete this task at all, I would say.


    Jose R. MCP
    Friday, October 14, 2011 2:48 PM
  • Indexes of both values that are the same?

    You can create a Dictionary (TKey woluld be an index and TValue would be an actual value), then do a loop a double loop and add duplicated to the dictionary. And if someone found in loops, check if this one does not yet exist in dictioanary (if not, then add it to dictionary).

     


    Mitja
    Friday, October 14, 2011 3:03 PM
  •  

    Try this:

       Private Shared Sub ListDuplicates(ByVal values As Integer())
    
          Dim q = From value In values.Select(Function(v, index) New With {.value = v, .index = index}) _
                  Group By value.value Into Group _
                  Where Group.Count > 1
    
          For Each item In q
             Debug.Print(item.value.ToString)
             Debug.Indent()
             For Each item2 In item.Group
                Debug.Print(item2.index.ToString)
             Next
    
             Debug.Unindent()
          Next
    
       End Sub
      
    

    Test:

          Dim values As Integer() = {10, 3, 5, 1, 8, 3, 10, 2, 4, 8, 3}
    
          ListDuplicates(values)
    
    

    Don't know if it's "sufficient efficient". :)

    Armin
    Friday, October 14, 2011 3:10 PM
  •  

    Try this:

       Private Shared Sub ListDuplicates(ByVal values As Integer())
    
          Dim q = From value In values.Select(Function(v, index) New With {.value = v, .index = index}) _
                  Group By value.value Into Group _
                  Where Group.Count > 1
    
          For Each item In q
             Debug.Print(item.value.ToString)
             Debug.Indent()
             For Each item2 In item.Group
                Debug.Print(item2.index.ToString)
             Next
    
             Debug.Unindent()
          Next
    
       End Sub
      
    

    Test:

          Dim values As Integer() = {10, 3, 5, 1, 8, 3, 10, 2, 4, 8, 3}
    
          ListDuplicates(values)
    
    

    Don't know if it's "sufficient efficient". :)

    Armin

    Armin,

    Interesting code piece, I am a C# programmer and I am looking for a vb.net to C# conversion tool, don't know if you have one already or you can type this in C#.

    Friday, October 14, 2011 3:16 PM
  • You have no choice but to traverse the array because you cannot assume it is sorted, and then keep record of the integers you are finding in a separate collection, like a hash table.

    Alternatively, but pretty much equivalent, you could sort the incoming array and then just spit out duplicates while you traverse it (array[i] == array[i + 1]).  Without some sort of order, you cannot complete this task at all, I would say.


    Jose R. MCP


    my solution was the first suggestion you made, building a dictionary and keeping records in it, and then if I ever get if(int exists in dictionary )

    {

    add int value and index to another list which maintains them.

    }

     

    I was just hoping to find a more effeiencet way.

    Friday, October 14, 2011 3:19 PM
  • Indexes of both values that are the same?

    You can create a Dictionary (TKey woluld be an index and TValue would be an actual value), then do a loop a double loop and add duplicated to the dictionary. And if someone found in loops, check if this one does not yet exist in dictioanary (if not, then add it to dictionary).

     


    Mitja


    hi Mitja, thanks for the reply,

    I kinda have that answer, except that in the first loop I add the array elements to the dictionary, and everytime I am about to add another array element I check to see if it exists in the dictionary, if it does, then I just add it to another data structure ( like a list, called duolicateds list ).

     

    I was hoping to see a better, more optimised way to perform this.

     

    Friday, October 14, 2011 3:55 PM
  • Check out this code I did:

     

                int[] array1 = { 5, 7, 8, 10 };
                int[] array2 = { 2, 5, 10, 12 };
                Dictionary<int, int> dic = new Dictionary<int, int>();
    
                for (int i = 0; i < array1.Length; i++)
                {
                    for (int j = 0; j < array2.Length; j++)
                    {
                        if (array1[i].Equals(array2[j]))
                        {
                            if (!dic.ContainsValue(array2[j]))
                            {
                                dic.Add(j, array2[j]);//j is here an index, and 2nd param is value
                                //index is taken from 2nd array!
                            }
                        }
                    }
                }
    



    Mitja
    Friday, October 14, 2011 4:02 PM
  • Try this code:

                int[] array1 = { 5, 7, 8, 10 };
                int[] array2 = { 10, 4, 5, 12 };
                Dictionary<int, int> dic = new Dictionary<int, int>();
    
                for (int i = 0; i < array1.Length; i++)
                {
                    for (int j = 0; j < array2.Length; j++)
                    {
                        if (array1[i].Equals(array2[j]))
                        {
                            if (!dic.ContainsValue(array2[j]))
                            {
                                dic.Add(j, array2[j]);//j is here an index, and 2nd param is value
                                //index is taken from 2nd array!
                            }
                        }
                    }
                }
    



    Result will be 2 items in dictionary (5 and 10).

    5 will be at index 2 (2nd item in 2nd array)

    10 will at index 0 (1st item in 2nd array).


    Mitja
    Friday, October 14, 2011 4:06 PM
  • Hey,

     

    So I would implement it like this:

     

     

    Dictionary<int, int> nums = //initialize
    loop through array:
    {
      if(nums.ContainsKey(value of array)
      {
         nums[value of array]++;
      }
      else
      {
        nums.add(value of array, 1);
      }
    }

     

     

    This is optimal and guessing what you implemented. It runs in O(n) (pretty much optimal afaik).

    Since the add, get, and contains key of the dictionary are O(1).

     

    Unless there is some trick that can be used based on the data or some helpful guessing O(n) is probably the best possbile.

     

     

    Thanks,

     

    Brad

     



    Friday, October 14, 2011 4:25 PM
  • Come on, this is the 3rd same thread!!!

    Please stop answering on this one!!

    1 is ENOUGH!


    Mitja
    Friday, October 14, 2011 4:31 PM
  • How do you define efficient?  A memory hog will be quicker than the algorithm that minimizes memory use  and vice-versa.
    Friday, October 14, 2011 9:54 PM
  • You can use this method to solve your requirement if the integer collection is not very much huge
            static void Main(string[] args)
            {
                int[] source = { 1, 2, 4, 1, 2, 1, 1, 5, 8, 10, 10, 88, 999, 444, 343, 10, 100, 11, 99, 11, 99 };
    
                //<value, duplicated value's index list>
                Dictionary<int, List<int>> dic = new Dictionary<int, List<int>>();
    
                int count = source.Length;
                for (int i = 0; i < count; i++)
                {
                    if (!dic.ContainsKey(source[i]))
                    {
                        dic.Add(source[i], new List<int>());
                    }
                    dic[source[i]].Add(i);
                }
    
                //print
                foreach (int key in dic.Keys)
                {
                    Console.Write(key + "--");
                    dic[key].ForEach(delegate(int index)
                    {
                        Console.Write("\t" + index);
                    });
                    Console.WriteLine();
                }
    
                Console.ReadLine();
            }
    And if there's a very very huge collection, then I think you can use a sort algorithm to order it first, and then split the collection(ensure the same int value in the same sub collection, you can compare each sub colleciton's beginning number and the before collection's ending number). And then define the same number Dictionary to receive each collection. Then you can start multi threads to use my above algorithm find duplicate numbers and store them to each Dictionary. This is the parallel way for very very huge collection. My this way can avoid using locker, so the speed will fast, it will depends on the CPU cores only I think. But then you will need more work to consider the index storing when short them at the beginning.
    Mike [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.



    Saturday, October 15, 2011 7:45 AM
  • I am writing to check the status of the issue on your side. 
    What about this problem now? 
    Would you mind letting us know the result of the suggestions?

    Mike [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, October 19, 2011 9:04 AM