none
CSV file with time stamp -Like CDC

    Question

  • Hi Guys,

    I got 10 of csv files which are of Size 2gb each  each file as followed.

    Inventory_2017_01_01_1200.csv

    Inventory_2017_01_01_1201.csv

    Inventory_2017_01_01_1202.csv

    Inventory_2017_01_01_1203.csv  .................. etc

    Each file might get in new row, remove existing row , update rows. I want to upload all files to azure data lake as incremental and able to identify or query  as one single source and i want to schedule it as job as it bring 10 files daily which contains latest data to date.

    Regards,

    Navin


    Navin.D http://dnavin.wordpress.com

    Friday, March 3, 2017 8:45 PM

Answers

  • Hi,

    You can do something like that:

    1) Load all files to Data Lake Store and store it just as source data. You can delete it after processing or continue storing as history data.

    2) Use a table in Data Lake Analytics, export to files in Data Lake Store or relational database for storing clean and consistent data (it depends on your final scenario of usage this data)

    3) Use U-SQL for cleansing, merging and preparing data.

    4) Automate this process using Azure Data Factory.

    Hope it helps you. 


    Sergiy Lunyakin (Data Platfrom MVP)

    • Marked as answer by Navind Wednesday, March 8, 2017 8:28 PM
    Saturday, March 4, 2017 6:49 PM

All replies