none
How read files (name, size, file type) and path from a huge NAS and store that info on a SQL server RRS feed

  • Question


  • Hello , hope someone can help me.

     

    I've a NAS server in which are stored about 20,000,000 pdf files, in a several folders, so I need to accomplish this:

     

    Read each one of the files and store in a SQL server database the path, file name, size and file type.

    So for each file on the nas server it will be created a new record on a SQL server table, so I’m expecting about a table with 20,000,000 records.

     

    thanks for your help, I'm newbie on VB.net


    • Edited by Luis Vieyra Friday, December 28, 2018 8:45 PM
    Friday, December 28, 2018 8:41 PM

All replies

  • Hi

    Here is some simple code that may assist.

    This example provides (but not implemented) provisions for Progress to be shown to User.

    The Button will interrupt and stop the operation, and if restarted, info gathered will be lost (this could concievably be made to Continue rather than Stop if needed.

    I would imagine that you are already aware that to process that number of files will take a LOOOONG time?

    This example just gathers the info for each file, stores the required info in MyTable and displays it in a DataGridView. The MyTable datatable would be easily integrated in an SQL database (I would imagine)

    There is no exception handling catered for. Just an example that may point the direction for you.

    Image


    Code

    Option Strict On
    Option Explicit On
    Public Class Form1
      ' would likely be user choice
      Dim SourcePath As String = "\\WDMYCLOUD\Les\"
      Dim FI As New List(Of IO.FileInfo)
      Dim MyTable As New DataTable("Freddy")
      Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        With BackgroundWorker1
          .WorkerReportsProgress = True
          .WorkerSupportsCancellation = True
        End With
      End Sub
      Private Sub BackgroundWorker1_DoWork(sender As Object, e As System.ComponentModel.DoWorkEventArgs) Handles BackgroundWorker1.DoWork
        GetInfo()
        If BackgroundWorker1.CancellationPending Then e.Cancel = True
      End Sub
      Sub GetInfo()
        For Each f As String In IO.Directory.GetDirectories(SourcePath)
          Dim sourcefiles As IEnumerable(Of String) = IO.Directory.EnumerateFiles(f, "*.*", IO.SearchOption.AllDirectories)
    
          Display("Processing: " & f)
    
          For Each s As String In sourcefiles
            FI.Add(My.Computer.FileSystem.GetFileInfo(s))
            If BackgroundWorker1.CancellationPending Then Exit For
          Next
          If BackgroundWorker1.CancellationPending Then Exit For
        Next
      End Sub
      Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Select Case Button1.Text
          Case "START", "USER STOPPED"
            Button1.Text = "STOP"
            DataGridView1.DataSource = Nothing
            FI.Clear()
            BackgroundWorker1.RunWorkerAsync()
          Case "STOP"
            BackgroundWorker1.CancelAsync()
            Button1.Text = "START"
            Label2.Text = "FI contains full info for " & FI.Count.ToString & " files."
            DataGridView1.DataSource = FI
            FillMyTable()
        End Select
      End Sub
      Private Sub BackgroundWorker1_RunWorkerCompleted(sender As Object, e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles BackgroundWorker1.RunWorkerCompleted
        Button1.Text = "USER STOPPED"
      End Sub
      Private Sub BackgroundWorker1_ProgressChanged(sender As Object, e As System.ComponentModel.ProgressChangedEventArgs) Handles BackgroundWorker1.ProgressChanged
        ' can setup ProgressBar or something
        ' or show progress if needed
      End Sub
      Sub Display(s As String)
        If InvokeRequired Then
          Invoke(Sub() Label1.Text = s)
        Else
          Label1.Text = s
        End If
      End Sub
      Sub FillMyTable()
        With MyTable
          .Columns.Add("Path")
          .Columns.Add("FileName")
          .Columns.Add("Size(bytes)")
          .Columns.Add("FileType")
        End With
        For Each f As IO.FileInfo In FI
          ' path, file name, size and file type.
          MyTable.Rows.Add(f.FullName, f.Name, f.Length, f.Extension)
        Next
      End Sub
    End Class


    Regards Les, Livingston, Scotland





    Friday, December 28, 2018 10:07 PM
  • Here is a thought on performing the inserts thinking using a DataTable as Les has shown, then use a SQL-Server bulk insert as describe here, with that many files it's not practical to do single inserts but instead perhaps over 5,000 or 10,000 files do the bulk insert.

    Otherwise you would need to have code such as below. Note in the code below I get file type via a C# class which I reference from a C# Class project using VB.NET

    ''' <summary>
    ''' Insert single file record
    ''' </summary>
    ''' <param name="pFileName">Path and file name to insert</param>
    ''' <param name="pNewIdentifier">New primary key for the record</param>
    ''' <returns></returns>
    Public Function AddNewAttachement(pFileName As String, ByRef pNewIdentifier As Integer) As Boolean
        Dim selectedFile As New FileInfo(pFileName)
        Dim sizeInBytes As Long = selectedFile.Length
        Dim filePath As String = Path.GetDirectoryName(pFileName)
        Dim mimeType = MimeTypeMap.GetMimeType(Path.GetExtension(pFileName).Replace(".", ""))
    
        Dim fileByes() As Byte
    
        ' get byte array of file
        Using stream = New FileStream(pFileName, FileMode.Open, FileAccess.Read)
            Using reader = New BinaryReader(stream)
                fileByes = reader.ReadBytes(CInt(stream.Length))
            End Using
        End Using
    
        '
        ' Insert file, get new primary key value which is successful is assigned to NewIdentifier
        '
        Dim statement =
            <SQL>
                INSERT INTO Table1 
                (
                    FileContents,FileName
                )
                VALUES 
                (
                    @FileContents,
                    @FileName
                ); 
                SELECT CAST(scope_identity() AS int);
            </SQL>.Value
    
        Using cn As New SqlConnection() With {.ConnectionString = My.Settings.ConnectionString}
            Using cmd As New SqlCommand() With {.Connection = cn, .CommandText = statement}
    
                ' get file name without path
                Dim fileName As String = Path.GetFileName(pFileName)
    
                cmd.Parameters.Add(New SqlParameter With
                    {
                        .ParameterName = "@FileContents",
                        .SqlDbType = SqlDbType.VarBinary,
                        .Value = fileByes
                    })
    
                cmd.Parameters.AddWithValue("@FileName", fileName)
    
                Try
                    cn.Open()
                    pNewIdentifier = CInt(cmd.ExecuteScalar)
                    Return True
                Catch ex As Exception
                    Return False
                End Try
    
            End Using
        End Using
    End Function
    

    While in debug mode


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Saturday, December 29, 2018 1:13 AM
    Moderator