none
How could I check if import this text file into DB or not ? RRS feed

  • Question

  • Hello everyone .
     
    I have wrote a code which is reading Text Files and save them in a list after that import this list into DB so far it's fine the problem is i have until now 400 text fiels each text fiel has around 300000 until 500000 line and each time want to import a new text flel it will reinsert or reimport the text files which are already imported . that's meaing my data will be always duplicated.
    so how could i check with C# code if this text files is already imported or exist in DB  ?!
     

    here is some code to be more clear : 

    private static List<WebShopDataAccess> GetWebShopDataAccesses(string path)  
            {  
                List<WebShopDataAccess> elements = new List<WebShopDataAccess();  
      
                List<string> lines = File.ReadAllLines(path).ToList();  
    return elements;  
      
    // here i am gettng the list back then import this list into DB during an extensions method with sqlbulk  
    }  
      
     private static void ImportToDB()  
            {  
                string CS =""  
                using (SqlConnection connection = new SqlConnection(CS))  
                {  
                    SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock | SqlBulkCopyOptions.FireTriggers | SqlBulkCopyOptions.UseInternalTransaction, null);  
      
                    bulkCopy.DestinationTableName = "TestWebShop";  
      
                    connection.Open();  
      
                    bulkCopy.WriteToServer(GetListOfWebShop.AsDataTable());  
                }  
            }  

    Wednesday, January 8, 2020 10:12 AM

All replies

  • This depends on a bit what data is in those files.

    Is it enough to keep record of what files you have imported? This means that single line can come only in one file and if it comes in two different files, then it is not considered duplicate. Or do you need to keep record per imported line since duplicate might also be result of importing two different files with same line?

    Based on your question the reason is importing save file twice, so just keeping record what files has been imported and if file is imported again, then ignore it might work.

    Wednesday, January 8, 2020 1:39 PM
  • Hello,

    A couple of ideas.

    • Create a unique constraint, insert records without SqlBulkCopy and know it will be slower yet knowing this consider this process a time consuming operation. A try-catch will be needed as duplicates will throw a runtime exception which can be ignored or better yet written to a log file.
    • Use T-SQL Merge 
    • Using LINQ or Lambda or a SQL query to find duplicates after the insert.

    SQL-Wise the query would look like this where the primary key is included which permits in another task to remove duplicates either in code or in SSMS.

    Here the definition of a duplicate is CompanyName, ContactName, ContactTitle, Address, City and PostalCode. In option one these columns if you go this way would be the columns to create a constraint on.

    SELECT A.*
    FROM Customers A
    INNER JOIN
        (
        SELECT
            CompanyName,
            ContactName,
            ContactTitle,
            Address,
            City,
            PostalCode
        FROM
            Customers
        GROUP BY
            CompanyName,
            ContactName,
            ContactTitle,
            Address,
            City,
            PostalCode
        HAVING COUNT(*) > 1
        ) B
    ON
    A.CompanyName = B.CompanyName AND
    A.ContactName = B.ContactName AND
    A.ContactTitle = B.ContactTitle AND
    A.Address = B.Address AND
    A.City = B.City AND
    A.PostalCode = B.PostalCode
    ORDER BY
        A.CompanyName


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Wednesday, January 8, 2020 2:18 PM
    Moderator
  • Hi,
    Before storing the data, you can separate each line of the data from the text file and put it into a string. Then you can check if the data is already in the database. 
    If not, you can use insert statements.
    Here a code example you can refer to.

    public Form1()
    {
        InitializeComponent();
        //define location of the database
        string connection = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=''";
        //define location of the text file data
        DirectoryInfo di = new DirectoryInfo(@"C:\Users\Data\");
        FileInfo[] fiarr = di.GetFiles("*.txt");
        //define connection to database
        OleDbConnection con = new OleDbConnection(connection);
        String query;
        OleDbDataReader rdr = null;
        con.Open();
        //get all table from database
        OleDbCommand cmd = con.CreateCommand();
        DataTable dt = con.GetSchema("tables");
        DataRow[] dttable = dt.Select();
        con.Close();
        //read each new textfile inside the folder
        foreach (FileInfo fri in fiarr)
        {
            StreamReader sr = new StreamReader(fri.FullName, System.Text.Encoding.Default);
            String line;
            String tabledbs, dbsName;
    
            while ((line = sr.ReadLine()) != null)
            {
                String VRSD, locationID, Name, Age, Time;
                int plantID;
                //process each line of data and put into each variable
                VRSD = line.Substring(0, 4).Trim();
                plantID = Convert.ToInt32(line.Substring(4, 1).Trim());
                locationID = line.Substring(5, 4).Trim();
                Name = line.Substring(9, 5).Trim();
                Age = line.Substring(14, 3).Trim();
                Time = line.Substring(17, 4).Trim();
                //make database name
                dbsName = plantID + locationID;
                con.Open();
                //check if the table exist in database
                for (int i = 0; i < dttable.Length - 9; i++)
                {
                    tabledbs = dttable[i]["TABLE_NAME"].ToString();
                    ArrayList indexlist = new ArrayList();
    
                    if (tabledbs == dbsName)
                    {
                        //if the table exist, status = true
                        status = true;
                        break;
                    }
                }
                con.Close();
                con.Open();
                if (status == true)
                {
                    try
                    {
                        //if the data not in the system, insert statement
                        query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + Name + "','" + Age + "')";
                        cmd = new OleDbCommand(query, con);
                        rdr = cmd.ExecuteReader();
                        con.Close();
                    }
                    catch
                    {
                        //if the data in the system, update statement
                        query = @"update " + dbsName + " set Age='" + Age + "', Time='" + Time + "' where LocationID='" + locationID + "' and PlantID=" + plantID + "";
                        cmd = new OleDbCommand(query, con);
                        rdr = cmd.ExecuteReader();
                        con.Close();
                    }
                }
                else
                {
                    //create new table
                    string attribute = "VRSD String,PlantID Integer, LocationID String, Name String, Age String,Time String";
                    query = @"CREATE TABLE " + plantID + locationID + "(" + attribute + ")";
                    cmd = new OleDbCommand(query, con);
                    cmd.ExecuteNonQuery();                  
                    //insert the data
                    query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + Name + "','" + Age + "','" + Time + "')";
                    cmd = new OleDbCommand(query, con);
                    rdr = cmd.ExecuteReader();
                    con.Close();
                }
                status = false;
            }
            sr.Close();
            //after the text file load into database, the text file moved to history folder
            MessageBox.Show(fri.FullName.ToString(), "File Manager", MessageBoxButtons.OK);
            fri.MoveTo(@"C:\Users\Data\History\" + fri.Name.ToString() + ".txt");
        }
    }

    More details you can refer to this link.
    Best Regards,
    Daniel Zhang


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Friday, January 10, 2020 7:36 AM