none
USQL EXTRACT reads large amount of data than the file size

    Question

  • Hi All,

    I am trying to understand what I have missed with this. I have a file that is 3.16 GB in ADLS (and I believe that it is stored with 2 extents). When the files is read using a U-SQL Extract, The Extract box shows 3 vertices and 5.86 GB data as read-data. Why does it show 3 vertices and 5.86 GB for a file sized 3.16 GB?

    Thanks

    Monday, January 1, 2018 2:13 PM

All replies

  • Hi Dinesh,

    I can look into this for you - first I'll need more information though. Could you email me at mabasile<at>Microsoft.com with the following information if possible:

    • A screenshot of your job graph. 
    • Are you using a custom or built-in extractor?
    • If you're using a custom extractor, please send the code for it.

    We can continue this conversation offline in greater detail.  Until then, you can drill down into each vertex to see what data is being read.  Depending on the type of extractor used, it's also possible some data is read more than once, which could contribute to the higher amount of read data. The multiple vertices is due to the large size of the file - we're just spreading out the extraction between vertices. I hope this helps!

    Regards,

    Matt Basile

    Azure Data Lake PM

    Saturday, January 6, 2018 12:21 AM
  • Hi Matt,

    Thanks for the reply, I will send all details with screenshots to given email address.

    Regards

    Dinesh

    Saturday, January 6, 2018 2:50 AM
  • Hi Matt,

    Hop you have received files I sent. Please have a look on them and see.

    Regards

    Dinesh

    Tuesday, January 9, 2018 12:48 AM
  • Hi Matt,

    I have a similar scenario regarding Large files extract and insertion into ADLA table.

    Iam using u-sql Copy and u-sql activity in my pipeline.My File size is more than 6 GB.All is well if i load a file of 1GB into an ADLA table.If file size crosses more than 1 GB ,my insert fails as the columns data will missalign.

    Declare @in = "Filepath/Extract.txt"

              

    @ExtractedData=
        SELECT Int64.Parse(Col1) AS [Col1],
               [Col2],

               Int32.Parse([Col3]) AS [Col3],

     FROM  @in

     USING new CommmonExtractors.CustomExtractor(Encoding.UTF8);

    INSERT INTO dbo.table1
    SELECT *
        FROM @Rowset; 

    My u-sql fails with incorrect format while extracting the data

    Please help

    Wednesday, April 18, 2018 4:56 AM
  • Hi Dinesh,

    By any chance ,Did you got the solution and able to parse the file with large data?

    Thanks,


    Wednesday, April 18, 2018 4:59 AM
  • Hi Indu,

    I have not tried with a file sized more than 4GB, however, I had no issue with files processed.

    Did you check whether the data in the file is consistent? It could be one reason.

    Regards

    Dinesh

    Monday, April 23, 2018 3:33 AM