none
remove duplicate line from txtfiie RRS feed

  • Question

  • Use the following code to group text files into a single file
    I want to delete the duplicate lines after the first comma
     
    Example

    1,ahmed,alie

    2,ahmed,alie

    1,saied,alie

    1,gaber,ebrahim

    result

    1,ahmed,alie

    1,saied,alie

    1,gaber,ebrahim

    code

     Dim SourcePath As String = "D:\txt\"
        Dim DestPath As String = "D:\txt\combined.csv"
        Private Sub Button4_Click_1(sender As System.Object, e As System.EventArgs) Handles Button4.Click
            Dim lst As New List(Of String)
            For Each f As String In IO.Directory.GetFiles(SourcePath, "*.csv", IO.SearchOption.TopDirectoryOnly)
                Dim lst2 As List(Of String) = IO.File.ReadAllLines(f).ToList
                For Each s As String In lst2
                    If Not lst.Contains(s.IndexOf(",")) Then
                        lst.Add(s)
                    End If
                Next
            Next
            IO.File.WriteAllLines(DestPath, lst.ToArray)
            MsgBox("Done!")
        End Sub


    • Edited by monemas Wednesday, February 28, 2018 3:42 PM
    Wednesday, February 28, 2018 3:40 PM

Answers

  • Hi

    Here is some code that does what I am thinking you want

    Option Strict On
    Option Explicit On
    Public Class Form1
      'change to your own paths
      Dim SourcePath As String = "C:\Users\lesha\Desktop\Plans\New folder\TextFil"
      Dim DestPath As String = "C:\Users\lesha\Desktop\Plans\New folder\TextFil\combined.csv"
    
      ' change to the Button Click event you are using
      Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    	Dim dict As New Dictionary(Of String, String)
    	Dim lst As New List(Of String)
    	For Each f As String In IO.Directory.GetFiles(SourcePath, "*.txt", IO.SearchOption.TopDirectoryOnly)
    	  For Each s As String In IO.File.ReadAllLines(f)
    		Dim a() As String = s.Split(","c)
    		Dim part As String = a(1) & "," & a(2)
    		If Not dict.Keys.Contains(part) Then
    		  dict.Add(part, a(0))
    		End If
    	  Next
    	Next
    	For Each kvp As KeyValuePair(Of String, String) In dict
    	  lst.Add(kvp.Value & "," & kvp.Key)
    	Next
    
    	IO.File.WriteAllLines(DestPath, lst.ToArray)
    	MessageBox.Show("Done!")
      End Sub
    End Class
    EDIT: You will need to check the file extensions too as I used plain text files for this example.


    Regards Les, Livingston, Scotland


    • Edited by leshay Wednesday, February 28, 2018 5:14 PM
    • Marked as answer by monemas Thursday, March 1, 2018 11:03 PM
    Wednesday, February 28, 2018 4:26 PM
  • Note this does one file, you would need to adapt for multiple files e.g. get a list of files, loop through them, append to the string builder and write to disk.

    I created data.txt in the project folder (blank lines are on purpose)

    1,ahmed,alie
    
    2,ahmed,alie
    1,saied,alie
    
    
    1,gaber,ebrahim

    Concrete class for reading data

    Public Class Data
        Public Property Column1 As Integer
        Public Property Column2 As String
        Public Property Column3 As String
        Public Overrides Function ToString() As String
            Return $"{Column1},{Column2},{Column3}"
        End Function
    End Class

    One line to get distinct lines as per your request followed by adding those lines to a string builder which is suitable for writing back to disk.

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    
        Dim dataList = (From line In IO.File.ReadAllLines(IO.Path.Combine(
                            AppDomain.CurrentDomain.BaseDirectory, "Data.txt"))
                        Where Not String.IsNullOrWhiteSpace(line)
                        Let data = line.Split(","c)
                        Select New Data With
                            {
                                .Column1 = CInt(data(0)),
                                .Column2 = data(1),
                                .Column3 = data(2)}
                            ).
                        ToList.GroupBy(Function(x) x.Column2).
                            Select(Function(x) x.First).
                        ToList
    
    
    
        Dim sb As New System.Text.StringBuilder
        dataList.ForEach(Sub(data) sb.AppendLine(data.ToString))
        '
        ' You can now use sb.Tostring to write to the same
        ' or another file.
        '
        Console.WriteLine(sb.ToString)
    End Sub
    First part reads the lines, exclude empty lines next, split the lines via comma into our class Data then do a grouping on the value after the first comma, select first of each group thus no duplicated.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites



    Wednesday, February 28, 2018 6:16 PM
    Moderator

All replies

  • Hi

    Here is some code that does what I am thinking you want

    Option Strict On
    Option Explicit On
    Public Class Form1
      'change to your own paths
      Dim SourcePath As String = "C:\Users\lesha\Desktop\Plans\New folder\TextFil"
      Dim DestPath As String = "C:\Users\lesha\Desktop\Plans\New folder\TextFil\combined.csv"
    
      ' change to the Button Click event you are using
      Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    	Dim dict As New Dictionary(Of String, String)
    	Dim lst As New List(Of String)
    	For Each f As String In IO.Directory.GetFiles(SourcePath, "*.txt", IO.SearchOption.TopDirectoryOnly)
    	  For Each s As String In IO.File.ReadAllLines(f)
    		Dim a() As String = s.Split(","c)
    		Dim part As String = a(1) & "," & a(2)
    		If Not dict.Keys.Contains(part) Then
    		  dict.Add(part, a(0))
    		End If
    	  Next
    	Next
    	For Each kvp As KeyValuePair(Of String, String) In dict
    	  lst.Add(kvp.Value & "," & kvp.Key)
    	Next
    
    	IO.File.WriteAllLines(DestPath, lst.ToArray)
    	MessageBox.Show("Done!")
      End Sub
    End Class
    EDIT: You will need to check the file extensions too as I used plain text files for this example.


    Regards Les, Livingston, Scotland


    • Edited by leshay Wednesday, February 28, 2018 5:14 PM
    • Marked as answer by monemas Thursday, March 1, 2018 11:03 PM
    Wednesday, February 28, 2018 4:26 PM
  • Note this does one file, you would need to adapt for multiple files e.g. get a list of files, loop through them, append to the string builder and write to disk.

    I created data.txt in the project folder (blank lines are on purpose)

    1,ahmed,alie
    
    2,ahmed,alie
    1,saied,alie
    
    
    1,gaber,ebrahim

    Concrete class for reading data

    Public Class Data
        Public Property Column1 As Integer
        Public Property Column2 As String
        Public Property Column3 As String
        Public Overrides Function ToString() As String
            Return $"{Column1},{Column2},{Column3}"
        End Function
    End Class

    One line to get distinct lines as per your request followed by adding those lines to a string builder which is suitable for writing back to disk.

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    
        Dim dataList = (From line In IO.File.ReadAllLines(IO.Path.Combine(
                            AppDomain.CurrentDomain.BaseDirectory, "Data.txt"))
                        Where Not String.IsNullOrWhiteSpace(line)
                        Let data = line.Split(","c)
                        Select New Data With
                            {
                                .Column1 = CInt(data(0)),
                                .Column2 = data(1),
                                .Column3 = data(2)}
                            ).
                        ToList.GroupBy(Function(x) x.Column2).
                            Select(Function(x) x.First).
                        ToList
    
    
    
        Dim sb As New System.Text.StringBuilder
        dataList.ForEach(Sub(data) sb.AppendLine(data.ToString))
        '
        ' You can now use sb.Tostring to write to the same
        ' or another file.
        '
        Console.WriteLine(sb.ToString)
    End Sub
    First part reads the lines, exclude empty lines next, split the lines via comma into our class Data then do a grouping on the value after the first comma, select first of each group thus no duplicated.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites



    Wednesday, February 28, 2018 6:16 PM
    Moderator
  • Hi monemas,

    You can use  HashSet(Of T) class , it has provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

      Dim path As String = "D:\TestField\Test4.txt"
            Dim lines As New HashSet(Of String)()
            'Read to file
            Using sr As StreamReader = New StreamReader(path)
                Do While sr.Peek() >= 0
                    lines.Add(sr.ReadLine())
                Loop
            End Using
    
            'Write to file
            Using sw As StreamWriter = New StreamWriter(path)
                For Each line As String In lines
                    sw.WriteLine(line)
                Next
            End Using

    Best Regards,

    Cherry


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, March 1, 2018 5:28 AM
    Moderator
  • By this logic, with the given example, you would only want lines that begin with "1".  It appears that any duplicate lines will begin with a number larger than one.  Is that correct?  If so, just analyze the first character and if it is not "1", skip it, otherwise add it to the result list.

    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Thursday, March 1, 2018 2:41 PM
    Moderator