Generating CSV/Fixed Length files from extremely large datasets. RRS feed

  • Question

  • Hi! I have a problem generating files from Oracle datsource where record count goes in range of 1 to 2 million records.
    I have code in place to retrieve the data in chunks of determined size, for example 400000 records at a time.  But for some reason the code fails at different stages of the generation with outofmemory  exceptions on the servers with 8gb of memmory.
    The way i retrieve the data in chunks is as follows:
    1) i get table schema of the table/view that are the source for my data
    2) generate column name list out of the schema
    3) create the following query string:
    "SELECT rownum, " +
                        columns.Trim() +
                        " FROM " + dataSource +
                        dataCondition +
                        " GROUP BY rownum, " + columns.Trim() +
                        " HAVING rownum BETWEEN " + startRange +
                        " AND " + endRange;
    4) then call the GetNextChunk method that each time alters the start/end ranges to get the chunks i need.

    now in the retrieving class I pass the returned datatable to a threadpool where the formatting of the dat occurs and each thread outputs to a file their portion of the data passed to them until they finish the whole datatable.

    The process resumes to get the next chunk of data and does the same thing again.

    I know that it's hard to determine where one might have a memory leak wihtout seeing the code, but I wanted to see if someone has a better solution on how to generate CSV/Fixed Length files give the scenario above from large sources of data in Oracle.

    Thanks in advance!
    Monday, November 12, 2007 6:42 PM