locked
Visual Basic Performance for 3 Million record/rows RRS feed

  • Question

  • I have a Visual Basic 2005 .Net program that I written that read a 2000 and 2005 database.  I was very surprise and shocked when it took three days to process only 600,000 rows of a 3 million row result. Because of the time it took to only process 600,000 rows I canceled the batch job and did everything in SQL using Query Analyzer.  I was able to process the same volume in only 45 minutes. 

     

    I was greatly disappointed in the results.  The program reads a result set from the database of 3 million rows.  It then exams the data content of the rows to determine if the record/row should be inserted into another table or ignored.   The program ran for three days and only processed 600,000 rows as of this e-mail. 

     

    Shouldn’t I have been able to process the result set of 3 million in a reasonable amount time? 

     

    I did I miss something that would have made the program run faster? 

     

    Does the dot net framework scale up? What should be done to scale up?  

     

    Moving the process to SQL I was able to process 8 million rows in 1.5 hours.

     

    If the dot net framework regardless of the language can’t scale to heavy volumes we’re in big trouble. 

     

    I appreciate comments and input from anyone. 

     

    Monday, May 8, 2006 7:54 PM

Answers

  • Basically, the problem is bound to be in your code.  At the end of the day, if you could do it in Query Analyser, then your task was defined in SQL, not in code, and the problem would be in how your code sent the SQL, and in what form.

     

    Monday, May 8, 2006 8:41 PM

All replies

  • Basically, the problem is bound to be in your code.  At the end of the day, if you could do it in Query Analyser, then your task was defined in SQL, not in code, and the problem would be in how your code sent the SQL, and in what form.

     

    Monday, May 8, 2006 8:41 PM
  •  

     

    My job in life is to follow Cgraus around and say... "I agree"

    Monday, May 8, 2006 9:38 PM
  • And I'll follow both CGraus and ReneeC and say "I agree, I agree"

     

    Monday, May 8, 2006 9:48 PM
  • I thought some one would blame my code.  It's a very simple program that reads the result set of 3 million rows determines if the record is already in the receiving table.  If the record is not in the receiving table it gets inserted.  I'm also calling stored procedures and passing data to them to create the result set and store the new row. 

    What makes you think its my code without asking for additional information? 

     

    Thanks

     

    Monday, May 8, 2006 11:44 PM
  • Do you agree that I may have problems in my code or that there are performance issues with VB.net?

     

     

    Thanks

     

    Monday, May 8, 2006 11:45 PM
  • Do you agree that there are problems with my code or that there are performance issues with VB.net?

     

    Thanks

     

    Monday, May 8, 2006 11:46 PM
  • Without actually seeing the code .  But if you can do it in Query Analyzer you can do it using SQL which is set based processing and its taking a much shorter period of time then I would hazard the guess that your code was doing something inefficient.  

    However your code may be doing things inefficiently such as doing things on a record by record basis and therefore iterating through 3 million rows doing it one row at a time which is terribly inefficient as against doing a set based processing.  

    For what you are doing - why could you not do a simple left join between the records and where condition that is looking for Null.   

    Example

    Table1
    Field1 - int 
    Field2 - nchar(10)

    Table2
    Field1 - int 
    Description - nchar(10) 

    With the following records in each
    Table 1
    1 foo      
    2 bar      
    3 xyz      
    4 abc      

    Table 2
    2 bar      
    4 abc      


    So in this case you want to show records 1 and records 3 because the Field1 ID's dont exist in Table 2

      Select table1.*
      FROM Table1
      LEFT JOIN Table2 as T2 ON
      Table1.Field1 = T2.Field1
      WHERE t2.Description is null

    If you want to update Table2 with all the records that dont already exist

      INSERT INTO Table2
         (Field1, Description) 
      Select table1.*
      FROM Table1
      LEFT JOIN Table2 as T2 ON
      Table1.Field1 = T2.Field1
      WHERE t2.Description is null

    Now look at Table 2 and you will see 4 records in there.


     

    Tuesday, May 9, 2006 12:16 AM
  • In my case the result set is created from a date range request.  And therfore the result set 3 million rows.  I'm reading thru the rows and creating a new record to be inserted into another table. 

    Is there a better way to read thru large results sets that I'm not a where of?  Doing this in SQL seems to be no problem. 

    I've very open to ideas.

    Thanks

    Tuesday, May 9, 2006 1:19 PM
  • So your sql uses a date range whereas the example used an integer field - if there is a relationship between the tables then doing this using something similar is possible.

    I would definately not be doing this on a row by row basis in VB.

    Even putting your SQL code in a  stored procedure and simply using vb to call the sproc.  And even 45 minutes for the performance sounds pretty horrendous for performance (unless its doing something amazingly complex)

     

    Tuesday, May 9, 2006 3:05 PM
  • I'm not doing anything complicated.  Just reading a row evaluate the columns, determine if the row already exist in an existing table if not just insert the new row.  Pretty plain and straight forward.  I am using textbuilder and other .net methods and classes but nothing big or complicated. 

    Do you know of otheres that are using the dot net framework regardless of the language to process large volumes of data in batch environment?  If so how are they dealing with this? 

    Maybe the solution is to stick close to the data manager/SQL server when processing large volumes in a batch environment.  The .net languages are more geared to processing transactions.  In Ithat environment you are only processing 1 record at a time. 

    Do you agree.

    Thanks

     

    Tuesday, May 9, 2006 7:02 PM
  • You could post some code and we could point out some ways to improve it.
    Tuesday, May 9, 2006 7:13 PM
  • For what you are describing I would definately be looking at processing it on SQL.

    You source data is SQL tables, your writing SQL data records to tables.  SQL is optimized for set based processing and you able to achieve this using SQL Queries.

    Your VB application may initate a call to a SQL stored procedure or something like that but processing on an individual record basis is slow and involves trafficing the data from the server to the client to process it and then sending the records back to the server - so more network traffic as well.

    If it were me and from what you've stated - I'd implement a stored procedure on SQL and simply call this from the VB application (which makes a pretty front end to call the procedure(s) )

     

     

    Tuesday, May 9, 2006 7:18 PM