locked
Find duplicates in a array

    Question

  • Hi people !!!

    I have a huge unsorted array of strings like

    vector = {"2421024141", "325216182","2463112099","2416997168","11114721047","4116940195","1191138134","231244164123 ",..........}

    and i want to store in another String array, each different value and the posicions where these values are repeated (duplicates)

    lets say:

    duplicates={{vector_value, position, position,.....},{vector_value, position, position,.....}, and so on......}

    I made a copy of vector and sorted:

    vector.CopyTo(arrayCopied,0)

    Array.Sort(arrayCopied)

    then i loop like this:

    For x = 0 To vector.Length - 1
                For y = 0 To vector.Length - 1
                    If vector(x) = arrayCopied(x) Then
                        'Found duplicate

                         'How to save values and positions?????

                    End If
                Next
    Next

    My english i too bad, sorry, I cant explaint myself better !!!!

    Any help is really wellcome.

    Thanks a lot.


    Friday, March 23, 2012 7:46 PM

Answers

  • To get duplicates and their index in the array

        Public Sub ListDuplicates(ByVal sender As String())
            Dim q = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                    Group By value.value Into Group _
                    Where Group.Count > 1
            For Each item In q
                ' item with duplicate
                Console.WriteLine(item.value.ToString)
                ' index in the List
                For Each item2 In item.Group
                    Console.WriteLine("{0,4}", item2.index.ToString)
                Next
            Next
        End Sub

    Usage

    Dim SomeArray As String() = {"11111", "23456", "11111", "23456", "87356", "12345", "12345", "88888"}
    ListDuplicates(SomeArray)

    You could have the procedure become a function which returns the index and item which becomes more complex than I think needed. Other options can be thought thru via http://msdn.microsoft.com/en-us/vstudio/bb737918

    Hope this helps you on your way.


    KSG

    Friday, March 23, 2012 8:05 PM
  • Something like this should do:

        Public Function ListDuplicates(ByVal sender As String()) As IList
            Dim duplicates = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                             Group By value.value Into Group _
                             Where Group.Count > 1
    
            Dim result = From item In duplicates _
                         Select New With {.Value = item.value, _
                                          .Index = Join((From g In item.Group Select CStr(g.index)).ToArray, ",")}
            Return result.ToList
        End Function
    
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim SomeArray As String() = {"11111", "23456", "11111", "23456", "87356", "12345", "12345", "88888"}
            Dim duplicatesArray = ListDuplicates(SomeArray)
    
            '' test to see what is in duplicatesArray
            For Each item In duplicatesArray
                MessageBox.Show(item.Value & vbTab & item.Index)
            Next
        End Sub


    Pradeep, Microsoft MVP (Visual Basic)
    http://pradeep1210.wordpress.com

    Friday, March 23, 2012 9:46 PM
  • Here is a method to try out my code and pradeep1210 code where both have merits to them. On a Windows form place two DataGridView controls each with two columns.

    Form code

    Public Class YourFormName
        Private SomeArray As String() = _
            { _
                "11111", "23456", "11111", "23456", "87356", "12345", "11111", "12345", "88888" _
            }
        Private Sub ExecuteDemo()
            Dim Items1 = SomeArray.ListDuplicates1
            For Each Ele In Items1
                For row As Integer = 0 To Ele.List.Count - 1
                    If row = 0 Then
                        DataGridView1.Rows.Add(New Object() {Ele.Item, Ele.List(row)})
                    Else
                        DataGridView1.Rows.Add(New Object() {Nothing, Ele.List(row)})
                    End If
                Next
            Next
            Dim Items2 = SomeArray.ListDuplicates2
            For Each Ele In Items2
                DataGridView2.Rows.Add(New Object() {Ele.Value, Ele.Index})
            Next
        End Sub
        Private Sub YourFormName_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            ExecuteDemo()
        End Sub
    End Class

    Place the following code into a code module, not a form.

    Module DuplicateListerCode
        ''' <summary>
        ''' 
        ''' </summary>
        ''' <param name="sender"></param>
        ''' <returns></returns>
        ''' <remarks>
        ''' Kevininstructor code
        ''' </remarks>
        <System.Diagnostics.DebuggerStepThrough()> _
        <System.Runtime.CompilerServices.Extension()> _
        Public Function ListDuplicates1(ByVal sender As String()) As List(Of DuplicateItem)
            Dim Result As New List(Of DuplicateItem)
            Dim q = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                    Group By value.value Into Group _
                    Where Group.Count > 1
            For Each item In q
                Dim LineList As New List(Of Int32)
                For Each item2 In item.Group
                    LineList.Add(item2.index)
                Next
                Result.Add(New DuplicateItem With {.Item = item.value, .List = LineList})
            Next
            Return Result
        End Function
        ''' <summary>
        ''' 
        ''' </summary>
        ''' <param name="sender"></param>
        ''' <returns></returns>
        ''' <remarks>
        ''' Original author pradeep1210
        ''' http://social.msdn.microsoft.com/Forums/en-US/vblanguage/thread/8044ccfe-f19b-4af7-8a13-5a623fe3953f
        ''' Minor tweak by Kevininstructor
        ''' </remarks>
        <System.Diagnostics.DebuggerStepThrough()> _
        <System.Runtime.CompilerServices.Extension()> _
        Public Function ListDuplicates2(ByVal sender As String()) As IEnumerable(Of DuplicateItem1)
            Dim duplicates = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                             Group By value.value Into Group _
                             Where Group.Count > 1
            Dim result = From item In duplicates _
                         Select New DuplicateItem1 With {.Value = item.value, _
                                          .Index = Join((From g In item.Group Select CStr(g.index)).ToArray, ",")}
            Return result
        End Function
        ' Both classes done under VS2010 auto-implement properties
        ' If using a version of VS below VS2010 you would need to write out the
        ' properties i.e. Set and Get for each property
        Public Class DuplicateItem
            Public Property Item As String
            Public Property List As New List(Of Int32)
            Public Sub New()
            End Sub
            Public Overrides Function ToString() As String
                Return String.Join(",", List.ToArray)
            End Function
        End Class
        Public Class DuplicateItem1
            Public Property Value As String
            Public Property Index As String
            Public Sub New()
            End Sub
        End Class
    End Module


    KSG

    Saturday, March 24, 2012 1:35 AM
  • I would have used LINQ as well, great choice :) For removing all duplicates that Distinct keyword will come in handy.

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    	Dim vector As Integer() = {235236, 236644, 33333, 45745, 33333, 44677, 33333, 44677}
    	RemDups(vector)
    	Console.WriteLine(String.Join(", ", vector))
    End Sub
    
    Private Sub RemDups(ByRef Input_Array As Integer())
    	String_Array = (From Obj In Input_Array Distinct Select Obj).ToArray
    End Sub
    Here's an easier way to return a list of all the different values only once.


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~




    Saturday, March 24, 2012 4:40 AM
  • I would have used LINQ as well, great choice :) For removing all duplicates that Distinct keyword will come in handy.

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    	Dim vector As Integer() = {235236, 236644, 33333, 45745, 33333, 44677, 33333, 44677}
    	RemDups(vector)
    	Console.WriteLine(String.Join(", ", vector))
    End Sub
    
    Private Sub RemDups(ByRef String_Array As Integer())
    	String_Array = (From Obj In String_Array Distinct Select Obj).ToArray
    End Sub
    Here's an easier way to return a list of all the different values only once.


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~



    Hello Ace,

    I agree if the OP wanted to simple remove duplicates my suggestion would be over kill but the OP wanted the index of the duplicates hence more code.

    From their question

    and i want to store in another String array, each different value and the posicions where these values are repeated (duplicates)


    KSG

    Saturday, March 24, 2012 5:00 AM
  • Yeah I realized that, just showing how easy it would be with LINQ to remove duplicates, it's a bit more advanced listing off the duplicated items and their positions. So don't get me wrong, I was only trying to help by adding to the discussion on optionality here for choices and a further demonstration of what LINQ can do. You did good :)

    Cheers


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~

    Saturday, March 24, 2012 5:15 AM

All replies

  • To get duplicates and their index in the array

        Public Sub ListDuplicates(ByVal sender As String())
            Dim q = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                    Group By value.value Into Group _
                    Where Group.Count > 1
            For Each item In q
                ' item with duplicate
                Console.WriteLine(item.value.ToString)
                ' index in the List
                For Each item2 In item.Group
                    Console.WriteLine("{0,4}", item2.index.ToString)
                Next
            Next
        End Sub

    Usage

    Dim SomeArray As String() = {"11111", "23456", "11111", "23456", "87356", "12345", "12345", "88888"}
    ListDuplicates(SomeArray)

    You could have the procedure become a function which returns the index and item which becomes more complex than I think needed. Other options can be thought thru via http://msdn.microsoft.com/en-us/vstudio/bb737918

    Hope this helps you on your way.


    KSG

    Friday, March 23, 2012 8:05 PM
  • Hi !!!

    thank you so much, but dont know to put it in my code !!! too smart.

    seems a query into memory.....

    I work with forms not console. Need to keep results.

    Anyway thanks  a lot.

    Friday, March 23, 2012 8:34 PM
  • Something like this should do:

        Public Function ListDuplicates(ByVal sender As String()) As IList
            Dim duplicates = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                             Group By value.value Into Group _
                             Where Group.Count > 1
    
            Dim result = From item In duplicates _
                         Select New With {.Value = item.value, _
                                          .Index = Join((From g In item.Group Select CStr(g.index)).ToArray, ",")}
            Return result.ToList
        End Function
    
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim SomeArray As String() = {"11111", "23456", "11111", "23456", "87356", "12345", "12345", "88888"}
            Dim duplicatesArray = ListDuplicates(SomeArray)
    
            '' test to see what is in duplicatesArray
            For Each item In duplicatesArray
                MessageBox.Show(item.Value & vbTab & item.Index)
            Next
        End Sub


    Pradeep, Microsoft MVP (Visual Basic)
    http://pradeep1210.wordpress.com

    Friday, March 23, 2012 9:46 PM
  • Here is a method to try out my code and pradeep1210 code where both have merits to them. On a Windows form place two DataGridView controls each with two columns.

    Form code

    Public Class YourFormName
        Private SomeArray As String() = _
            { _
                "11111", "23456", "11111", "23456", "87356", "12345", "11111", "12345", "88888" _
            }
        Private Sub ExecuteDemo()
            Dim Items1 = SomeArray.ListDuplicates1
            For Each Ele In Items1
                For row As Integer = 0 To Ele.List.Count - 1
                    If row = 0 Then
                        DataGridView1.Rows.Add(New Object() {Ele.Item, Ele.List(row)})
                    Else
                        DataGridView1.Rows.Add(New Object() {Nothing, Ele.List(row)})
                    End If
                Next
            Next
            Dim Items2 = SomeArray.ListDuplicates2
            For Each Ele In Items2
                DataGridView2.Rows.Add(New Object() {Ele.Value, Ele.Index})
            Next
        End Sub
        Private Sub YourFormName_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            ExecuteDemo()
        End Sub
    End Class

    Place the following code into a code module, not a form.

    Module DuplicateListerCode
        ''' <summary>
        ''' 
        ''' </summary>
        ''' <param name="sender"></param>
        ''' <returns></returns>
        ''' <remarks>
        ''' Kevininstructor code
        ''' </remarks>
        <System.Diagnostics.DebuggerStepThrough()> _
        <System.Runtime.CompilerServices.Extension()> _
        Public Function ListDuplicates1(ByVal sender As String()) As List(Of DuplicateItem)
            Dim Result As New List(Of DuplicateItem)
            Dim q = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                    Group By value.value Into Group _
                    Where Group.Count > 1
            For Each item In q
                Dim LineList As New List(Of Int32)
                For Each item2 In item.Group
                    LineList.Add(item2.index)
                Next
                Result.Add(New DuplicateItem With {.Item = item.value, .List = LineList})
            Next
            Return Result
        End Function
        ''' <summary>
        ''' 
        ''' </summary>
        ''' <param name="sender"></param>
        ''' <returns></returns>
        ''' <remarks>
        ''' Original author pradeep1210
        ''' http://social.msdn.microsoft.com/Forums/en-US/vblanguage/thread/8044ccfe-f19b-4af7-8a13-5a623fe3953f
        ''' Minor tweak by Kevininstructor
        ''' </remarks>
        <System.Diagnostics.DebuggerStepThrough()> _
        <System.Runtime.CompilerServices.Extension()> _
        Public Function ListDuplicates2(ByVal sender As String()) As IEnumerable(Of DuplicateItem1)
            Dim duplicates = From value In sender.Select(Function(v, index) New With {.value = v.ToUpper, .index = index}) _
                             Group By value.value Into Group _
                             Where Group.Count > 1
            Dim result = From item In duplicates _
                         Select New DuplicateItem1 With {.Value = item.value, _
                                          .Index = Join((From g In item.Group Select CStr(g.index)).ToArray, ",")}
            Return result
        End Function
        ' Both classes done under VS2010 auto-implement properties
        ' If using a version of VS below VS2010 you would need to write out the
        ' properties i.e. Set and Get for each property
        Public Class DuplicateItem
            Public Property Item As String
            Public Property List As New List(Of Int32)
            Public Sub New()
            End Sub
            Public Overrides Function ToString() As String
                Return String.Join(",", List.ToArray)
            End Function
        End Class
        Public Class DuplicateItem1
            Public Property Value As String
            Public Property Index As String
            Public Sub New()
            End Sub
        End Class
    End Module


    KSG

    Saturday, March 24, 2012 1:35 AM
  • I would have used LINQ as well, great choice :) For removing all duplicates that Distinct keyword will come in handy.

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    	Dim vector As Integer() = {235236, 236644, 33333, 45745, 33333, 44677, 33333, 44677}
    	RemDups(vector)
    	Console.WriteLine(String.Join(", ", vector))
    End Sub
    
    Private Sub RemDups(ByRef Input_Array As Integer())
    	String_Array = (From Obj In Input_Array Distinct Select Obj).ToArray
    End Sub
    Here's an easier way to return a list of all the different values only once.


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~




    Saturday, March 24, 2012 4:40 AM
  • I would have used LINQ as well, great choice :) For removing all duplicates that Distinct keyword will come in handy.

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    	Dim vector As Integer() = {235236, 236644, 33333, 45745, 33333, 44677, 33333, 44677}
    	RemDups(vector)
    	Console.WriteLine(String.Join(", ", vector))
    End Sub
    
    Private Sub RemDups(ByRef String_Array As Integer())
    	String_Array = (From Obj In String_Array Distinct Select Obj).ToArray
    End Sub
    Here's an easier way to return a list of all the different values only once.


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~



    Hello Ace,

    I agree if the OP wanted to simple remove duplicates my suggestion would be over kill but the OP wanted the index of the duplicates hence more code.

    From their question

    and i want to store in another String array, each different value and the posicions where these values are repeated (duplicates)


    KSG

    Saturday, March 24, 2012 5:00 AM
  • Yeah I realized that, just showing how easy it would be with LINQ to remove duplicates, it's a bit more advanced listing off the duplicated items and their positions. So don't get me wrong, I was only trying to help by adding to the discussion on optionality here for choices and a further demonstration of what LINQ can do. You did good :)

    Cheers


    If a post helps you in any way or solves your particular issue, please remember to use the Propose As Answer option or Vote As Helpful
    ~ "The universe is an intelligence test." - Timothy Leary ~

    Saturday, March 24, 2012 5:15 AM
  • Thanks to all of you.

    Because of your answers, finally  have solved the problem.

    I really appreciate your work.


    Saturday, March 24, 2012 9:44 AM