CSV file with time stamp -Like CDC


  • Hi Guys,

    I got 10 of csv files which are of Size 2gb each  each file as followed.




    Inventory_2017_01_01_1203.csv  .................. etc

    Each file might get in new row, remove existing row , update rows. I want to upload all files to azure data lake as incremental and able to identify or query  as one single source and i want to schedule it as job as it bring 10 files daily which contains latest data to date.




    Friday, March 3, 2017 8:45 PM


  • Hi,

    You can do something like that:

    1) Load all files to Data Lake Store and store it just as source data. You can delete it after processing or continue storing as history data.

    2) Use a table in Data Lake Analytics, export to files in Data Lake Store or relational database for storing clean and consistent data (it depends on your final scenario of usage this data)

    3) Use U-SQL for cleansing, merging and preparing data.

    4) Automate this process using Azure Data Factory.

    Hope it helps you. 

    Sergiy Lunyakin (Data Platfrom MVP)

    • Marked as answer by Navind Wednesday, March 8, 2017 8:28 PM
    Saturday, March 4, 2017 6:49 PM

All replies