locked
Loading and Calculation with a lot of data, best solution RRS feed

  • Question

  • Hello

    I  have a problem and i hope a .net pro could give me some advice, i would be really thankful!

    Its basically a question about datatypes.

    I have here a huge csv file with millions of rows entries. I want to load this data in a vb.net application, what datatype should i use, datatable, array or sth. else? It needs to be as fast as possible. Any suggestions on how i can optimize the loading of the data? I once loaded this data in a datatable and it took days.. then i exportet it as xml data and loaded this xml file with the data. this was about thousand times faster.

     

    Thanks for any advice!

     

    best

     

    simplex1

     

     

    Wednesday, December 22, 2010 3:27 PM

Answers

  • And still you don't answer the question in what way you read the CSV file, only that it is slow and gets slower, what you expect for us as answer.

    So do you use the OleDB Text way to get that file in a datatable or do you use a streamreader?

    This is a very easy way to get a csv in a dataset (which is wrapped inside a dataset)

    It uses the standard OleDB feature

    http://www.vb-tips.com/CSVDataSet.aspx

    I hope you don't reply that you are already using that after all my begging for information what you use currently.

     

     


    Merry Christmas
    Cor
    • Marked as answer by Calvin_Gao Friday, December 31, 2010 2:19 AM
    Wednesday, December 22, 2010 7:25 PM

All replies

  • You should first ask yourself if an XML file with millions of rows is real a good solution. 

    If you tell that you have used a datatable is the same as writting nothing, I've seen hundreds of ways persons are using for that, some quick, some extremely slow.

    For the rest your question is only about the fact that it is slow and that it takes a day. But not any kind of idea how you did it currently.

    Fast is relative you know, for a snail it is fast in the way a turtle walks, but for others that turtle is also slow.

    So show us what was going slow and show us what is going thousand times faster.

     

     


    Merry Christmas
    Cor
    Wednesday, December 22, 2010 3:39 PM
  • Can you bulk loads the csv file into SQL Server?  This provides one of the fastest ways to load csv into a structured data store.

    If you can, your application can access the SQL Server table where you loaded the csv file data using ADO.NET SQLCLient commands or a DataSet or Linq To Sql and access will be fast. 


    Mike McIntyre Senior Developer / Partner aZ Software Developers
    Wednesday, December 22, 2010 3:47 PM
  • Thanks for your answer. I use a class for reading the csv and write the entries to a datatable. This is very slow. And it gets slower and slower, the more data the table contains. Then after a day or so, the table was fully loaded with the data of the csv file. I exported it using the datatable.writexml method. Then i wrote a function which loaded this xml file with all the data into a datatable (using datatable.readxml) this was extremly fast. My question is, is it maybe faster to load the data into an array or something like that than in a datatable?

     

    best

     

    simplex

    Wednesday, December 22, 2010 3:50 PM
  • Ok, ruling out SQL Server bulk load...

    Are you using Visual Basic 2010?  If so a new feature called memory mapped files will help.

    If not you'll need to look for all the bottlenecks and eliminate them one by one.

    What version of Visual Studio are you using?

    How many colums are involved?

    Are all csv columns being imported into column of type as string in the data table?

    Are you importing more columns than you need into the data table?

    If you will submit the code you are using to load the data and parse it into the data table we can start looking for bottlenecks.


    Mike McIntyre Senior Developer / Partner aZ Software Developers
    Wednesday, December 22, 2010 5:20 PM
  • And still you don't answer the question in what way you read the CSV file, only that it is slow and gets slower, what you expect for us as answer.

    So do you use the OleDB Text way to get that file in a datatable or do you use a streamreader?

    This is a very easy way to get a csv in a dataset (which is wrapped inside a dataset)

    It uses the standard OleDB feature

    http://www.vb-tips.com/CSVDataSet.aspx

    I hope you don't reply that you are already using that after all my begging for information what you use currently.

     

     


    Merry Christmas
    Cor
    • Marked as answer by Calvin_Gao Friday, December 31, 2010 2:19 AM
    Wednesday, December 22, 2010 7:25 PM