none
Bulk Insert with non standardvenfnof file RRS feed

  • Question

  • My Bulk Insert gobbles up 1900 lines of text with awesome efficiency only to crash and burn on the last line in he file. The problem is that the last line consists of [$] followed by CrLf. If I manually delete this line, all is well.

    Try as I might I can't find a way to deal with this. Any ideas would be most welcome.

    Richard Campbell


    R Campbell

    Friday, October 2, 2015 9:44 AM

Answers

  • A other strange thing by the way is that there are 4 non-standard rows at the begining of the file and I thought that I should set FIRSTROW = 5. but no, I have to set it to 2 (???).

    The options FIRSTROW and LASTROW are misnamed, they should be FIRSTRECORD and LASTRECORD. Generally, we talk about rows in tables, but records in files. And more specifically, ROW may lure you to think in lines, but BULK INSERT does not think in lines, but it has a binary mindset. It looks for the first field terminator (or the fixed number of bytes), then it looks for the second field terminator. The record terminator is just the terminator for the last field. If the record terminator elsewhere, it is interpreted to be part of the data.

    Apparently you were lucky that these non-standard rows boiled down to exactly two records. Often non-standard header rows morphs with the first real record so that you lose data.

    If you only need 48 of 19000 records in the file, maybe reading the file with OPENROWSET(BULK) is better, as you then could filter out the uninteresting rows with a WHERE clause.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by Dick Campbell Saturday, October 3, 2015 1:04 AM
    Friday, October 2, 2015 9:54 PM

All replies

  • Those bulk-load guys are incredibily sensitive. Or inflexible, if you prefer.

    Your flie may work with BCP, which is a little more forgiving in this case. But BULK INSERT thinks there should be a full suite of records.

    You can do a SELECT COUNT(*) FROM OPENROWSET(BULK) on the file and with a format file which specifies a one-field record with \r\n as teminator. Once you know how many lines there are in the file, you can specify LASTROW to BULK INSERT.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by Dick Campbell Friday, October 2, 2015 9:08 PM
    • Unmarked as answer by Dick Campbell Saturday, October 3, 2015 1:05 AM
    Friday, October 2, 2015 1:29 PM
  • Thanks Erland. At the moment I have LASTROW set to a fixed number (19,000) and I am 99.99999% confident that the 48 rows that I actually need are wthin that range, but that solution just doesn't feel right. I had thought that if I could count the number of rorws easily, I could do just what you suggest, so I think that will be the answer. A other strange thing by the way is that there are 4 non-standard rows at the begining of the file and I thought that I should set FIRSTROW = 5. but no, I have to set it to 2 (???). Whatever works I guess. Why am I bulk loading 20,000 rows just to get 48 riws if data, you may ask? It seems ti be the fastest and eadiest option (much to my surprise). This method takes less than a second whereas PowerShell, for example, takes 5 seconds and I still have load the data from a text file at the end of that.

    R Campbell



    Friday, October 2, 2015 8:17 PM
  • Upon reflection, it would be useful if the was a an option such as SKIPLASTROWS in addition to LASTROW, which in my case would be set to 1.

    R Campbell

    Friday, October 2, 2015 9:11 PM
  • A other strange thing by the way is that there are 4 non-standard rows at the begining of the file and I thought that I should set FIRSTROW = 5. but no, I have to set it to 2 (???).

    The options FIRSTROW and LASTROW are misnamed, they should be FIRSTRECORD and LASTRECORD. Generally, we talk about rows in tables, but records in files. And more specifically, ROW may lure you to think in lines, but BULK INSERT does not think in lines, but it has a binary mindset. It looks for the first field terminator (or the fixed number of bytes), then it looks for the second field terminator. The record terminator is just the terminator for the last field. If the record terminator elsewhere, it is interpreted to be part of the data.

    Apparently you were lucky that these non-standard rows boiled down to exactly two records. Often non-standard header rows morphs with the first real record so that you lose data.

    If you only need 48 of 19000 records in the file, maybe reading the file with OPENROWSET(BULK) is better, as you then could filter out the uninteresting rows with a WHERE clause.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by Dick Campbell Saturday, October 3, 2015 1:04 AM
    Friday, October 2, 2015 9:54 PM
  • Thanks Erland, OPENROWSET with a WHERE clause (predicate) is precisely what I needed. I am very happy with that as a solution. It is even faster, not that speed was an issue. It also forced me to finally come to terms with format files, not so complicated when you make the effort. I am not a huge fan of XML but I found XML format file a liitle easier to deal with, sutprisingly to me at least. Best regards, Dick Campbell

    R Campbell


    Saturday, October 3, 2015 12:10 AM
  • As Erland stated, you can use FIRSTROW & LASTROW.  You should know the FIRSTROW and I doubt this changes.  If you don't know the LASTROW, or if it changes, I don't think you can count the number of rows.  I had to deal with this before, and I tried all kinds of things, and never got it working quite right.  I ended up using C# to strip off the last row, and ran the C# script, as an executable, every night, using the Task Scheduler.  The script below will strip off the first and last row.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.IO;
    using System.Diagnostics;
    
    
    namespace ConsoleApplication3
    {
        class Program
        {
            static void Main(string[] args)
            {
    
                string sourceDirectory = @"C:\ManagedCode\Downloads\";
                try
                {
                    var txtFiles = Directory.EnumerateFiles(sourceDirectory);
    
                    foreach (string currentFile in txtFiles)
                    {
                        //foreach (var currentFile in files)
                        {
                            var lines = File.ReadAllLines(currentFile);
    
                            //Just an example of changing the filename based upon the current name
                            var targetFile = Path.ChangeExtension(currentFile, "prod.txt");
                            File.WriteAllLines(targetFile, lines.Skip(1).Take(lines.Count() - 2));
                            //};
    
                        }
    
                    }
                }
    
                catch (Exception ex)
                {
    
                }
    
            }
        }
    }
    

    As you probably know, Bulk Insert is VERY FAST, but it is NOT very flexible.

    Knowledge is the only thing that I can give you, and still retain, and we are both better off for it.

    Thursday, October 15, 2015 9:37 PM