none
Record too big to load into database RRS feed

  • Question

  • Hi,

    Please let me know your suggestions here.

    I have a bunch of CSV files , which has to be loaded in to SQL server database.

    the csv files are for diffrent format and dynamic inorder to the number of columns in it. Sometimes 100 to 200 columns, sometimes 200 to 500 columns.  as the connection can't be defined on such files I used a bulk insert task to dump the files to a staging table creating teh tables on the fly based up the number of columns on the files and execute a store proceduer to do the transformation and to move data to final table. My method works fine for most of the set. But I do have files where it has more than 530 columns and the record size exeeds the max limit of 8060 bytes. The bulk insert task throws error there. How do I handle this situation.

    Thinking of spliting the columns in to two or more tables but since the source files, each of them, are of diffrent format i am not able to define connection, I appreciate your help.

    Thanks,Ann 

    Monday, April 23, 2012 9:05 PM

Answers

  • ArthurZ,

    I see!:-0 Thanks. In this case, it really sounds like a job for XML. That way the static metadata for ssis is in place, while accomodating the unknown incoming data elements (possibly recursive?) that stream in from run to run.

    Suni,

    The references I gave you earlier are about files that are too big to process without splitting--hang on to that info it might be helpful to you later!;-). Why not convert the *.csv files to xml via a script task first, then process using the XML task?

    Here's is very simple console example so you can test this concept outside of SSIS. There is a broader example of this pattern here on msdn:

    filename_x.csv would contain data that looks like: 1000,twin-set sweater black,18.99,6,113.94

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml;
    using System.Xml.Linq;
    using System.IO;
    namespace ConsoleApplication4
    {
        class Program
        {
            static void Main(string[] args)
            {
             
         // public void CvtCsvToXML()
         // {
               
            String[] FileName = File.ReadAllLines(@"\\your_nas\your_dir$\filename_x.csv");
            String xml_ns  = "";
            XElement PO = new XElement("PurchaseOrder", 
                from items in FileName
                 let fields = items.Split(',')
                 select new XElement("poItem",
                        new XElement("poId", fields[0]),
                        new XElement("desc", fields[1]),
                        new XElement("unitPrice", fields[2]),
                        new XElement("howMany", fields[3]),
                        new XElement("priceSum", fields[4])
                  )
                );
          File.WriteAllText(@"\\your_nas\your_dir$\filename.xml", xml_ns + PO.ToString() );
           }
        }
    }

    • Edited by plditallo Tuesday, April 24, 2012 7:45 PM
    • Proposed as answer by Eileen Zhao Thursday, April 26, 2012 3:06 PM
    • Marked as answer by Eileen Zhao Wednesday, May 2, 2012 2:56 AM
    Tuesday, April 24, 2012 7:43 PM

All replies

  • If you are consuming and storing data records that are more than 1 page in the database, you definitely need to revisit the actual requirements, define a feasible solution and implement it to solve the actual (business) problem rather than looking at it as a data-integration exercise. Most certainly, the data from these files will be getting used somewhere, in some shape or form for data analysis and reporting. My suggestion would be to understand these business problems, track them back to the original data source(s), and invest some time in doing some datamodeling around that. Once that is defined, discuss with the "owners" of this data and provide them with your data needs along with the format you'll need these files to be in so that you could consume them without any such issues.

    This might sound like additional work but will save you lots of hassles in the long run.

    Hope this helps!

    Cheers!!

    Muqadder.

    Monday, April 23, 2012 9:12 PM
  • I suggest you simply generate a new package programmatically each time you get the input files, execute the load and you are done.

    The vertical partitioning of the input file is also possible (it is when out of one file with 500 columns you make two of 250), then you combine them at the destination.

    I really advocate building your package using a programming language (e.g. EzAPI).


    Arthur My Blog

    Monday, April 23, 2012 9:35 PM
    Moderator
  • Hi Ann,

    If you can use third-party solutions, check the commercial CozyRoc Data Flow Task Plus. It is an extension of the standard Data Flow Task and it supports dynamic data flow columns at runtime. You can load all your input CSV files using only one For Each Loop container and one Data Flow Task Plus.


    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

    Tuesday, April 24, 2012 2:04 AM
  • i case if I want to spit my destination tables in to two rater than one, how do I achieve that in SSIS when I am not using dataflow task.

    Thanks,

    Suni.

     

    Tuesday, April 24, 2012 1:29 PM
  • I totally agree up on you, but to reach out the source of this data and get the data according to what we model is a total hassle.

    So i prefer a work around at my end , if possible.  

    Tuesday, April 24, 2012 1:39 PM
  • I thought your input file is too wide to be accommodated by whatever consumes it, thus you would need to split the source file rather.

    To split the source file you will need to code say in VB a routine that would 1st check if the file has so many columns it makes sense to vertically split it in say three files. Example:

    MyInputFile looks like

    Col1, Col2, Col3, etc.... Col456

    gets split into MyInputFile1

    Col1, Col2, Col3 etc... Col 150 and

    MyInputFile2

    Col151, etc Col300 and the remainder goes into MyInputFile3.

    Perhaps you will end up processing between 1 to 3 files, the missing columns may be set to empty e.g. Col456 to 500 can get ,,,,, appended, this way you end up with a static metadata suitable for processing by SSIS.

    Now from this point onward, your SSIS package would check if all or fewer than three files exist and then a DFT responsible for loading each such split file would do the heavy-lifting (I just do not know how you map the input), when I say heavy lifting you perhaps load each file into a staging table and then combine the records based on load id/number and ultimately push the result into the destination.


    Arthur My Blog

    Tuesday, April 24, 2012 1:43 PM
    Moderator
  • Hi,

    This is what exactlt I am looking for. Could you please tell me if there is any sample vb coding that i ccan use for the same.

    I appreciate your help.

    Tuesday, April 24, 2012 2:22 PM
  • I searched around but could not locate a suitable blog post or an article. I think writing such code is not a big deal, perhaps it will be a one-liner encompassing the VB's split (on the comma) method like MyArray = LineFile.split(","). Checking then the upperbound of the array of just one (1st) line will yield you the number of columns the file got. Then you can simply navigate to say the 150th comma and cut the line off thus making it reside in file #1, and so forth.

    Arthur My Blog

    Tuesday, April 24, 2012 2:43 PM
    Moderator
  • OliveBeauty,

    To generate multiple flat files from a single data source, you might want to check out  Konesans' Data Generator Source Adapter which would help you split the data into the number of files needed; then check out the Trash Destination Adapter . These two data generators used in combination will help you follow the pattern in:  http://www.sqlis.com/sqlis/post/Generating-Surrogate-Keys.aspx

    As far as I know, these modules are free to download if a cash outlay is an issue--however, it is still a set of 3rd-party components.

    If you need more visual examples, spend some time on the www.sqlis.com website -- there may be an example or two of a complex data flow there. 


    • Edited by plditallo Tuesday, April 24, 2012 4:00 PM
    Tuesday, April 24, 2012 3:59 PM
  • plditallo,

    Suni's issue is that the number of columns in the input varies from a run to a run, however, SSIS works with static metadata only, thus I do not see how generating input files can help here.<abbr class="affil"></abbr>


    Arthur My Blog

    Tuesday, April 24, 2012 4:11 PM
    Moderator
  • ArthurZ,

    I see!:-0 Thanks. In this case, it really sounds like a job for XML. That way the static metadata for ssis is in place, while accomodating the unknown incoming data elements (possibly recursive?) that stream in from run to run.

    Suni,

    The references I gave you earlier are about files that are too big to process without splitting--hang on to that info it might be helpful to you later!;-). Why not convert the *.csv files to xml via a script task first, then process using the XML task?

    Here's is very simple console example so you can test this concept outside of SSIS. There is a broader example of this pattern here on msdn:

    filename_x.csv would contain data that looks like: 1000,twin-set sweater black,18.99,6,113.94

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml;
    using System.Xml.Linq;
    using System.IO;
    namespace ConsoleApplication4
    {
        class Program
        {
            static void Main(string[] args)
            {
             
         // public void CvtCsvToXML()
         // {
               
            String[] FileName = File.ReadAllLines(@"\\your_nas\your_dir$\filename_x.csv");
            String xml_ns  = "";
            XElement PO = new XElement("PurchaseOrder", 
                from items in FileName
                 let fields = items.Split(',')
                 select new XElement("poItem",
                        new XElement("poId", fields[0]),
                        new XElement("desc", fields[1]),
                        new XElement("unitPrice", fields[2]),
                        new XElement("howMany", fields[3]),
                        new XElement("priceSum", fields[4])
                  )
                );
          File.WriteAllText(@"\\your_nas\your_dir$\filename.xml", xml_ns + PO.ToString() );
           }
        }
    }

    • Edited by plditallo Tuesday, April 24, 2012 7:45 PM
    • Proposed as answer by Eileen Zhao Thursday, April 26, 2012 3:06 PM
    • Marked as answer by Eileen Zhao Wednesday, May 2, 2012 2:56 AM
    Tuesday, April 24, 2012 7:43 PM