none
How to fill blank fields with the data above

    Question

  • Hello,

    I have a csv file which I am trying to process using Azure Data Lake Analytics U-SQL. The original file is semi-structured which I managed to fix using the silent:true flag. Now that it is more structured, I would like to fill the empty cells with the data above.

    My data looks like this:

    1 Patient Information Treatment Date Process Date Claims Ref Amount claimed
    2 Patient A - CLICKS   PHARMACY  2017/01/16 2017/02/01 5NB06f 34.28
    3         84.42
    4         80.13
    5         51.82
    6 Patient A - DR T HILL 2017/02/11 2017/02/14 5PjhEE 400
    7 Patient B - DR VAN   RENSBURG 2017/02/18 2017/02/17 5Q6jwu 585.1
    8         894
    9 Patient C - MEDIRITE   PHARMACY  2017/03/09 2017/03/04 5RrLcz 35.8
    10         84.42
    11         80.13
    12         51.82
    13         11.85
    14 Patient C - DISCHEM   PHARMACY  2017/03/31 2017/03/31 5Utkp7 35.8
    15         90.45
    16         80.13
    17         51.82
    18         12.54

    My problem lies with the empty cells in the first four columns.

    The second row has data which I would like to copy down into the empty cells below it (rows 3-5). The data from row 7 needs to be copied down to row 8, the data from row 9 to be copied down to rows 10-13 and the data from row 14 to be copied to rows 15-18.

    This has to be done without changing the values in the 'Amount claimed' column.

    Does anyone have any ideas on how to achieve this in U-SQL?

    Thank you,

    Alicia

    Tuesday, June 13, 2017 10:49 AM

All replies

  • See discussion at:

    https://stackoverflow.com/questions/44563714/how-to-fill-a-blank-cell-with-the-data-above-using-u-sql 

    Tuesday, June 20, 2017 2:59 PM
  • Hi Alicia,

    I think there are two approaches you could take here.

    1. You could try PIVOT to flatten the records and then UNPIVOT of Amount Claimed and this effectively fill the whitespace since the Patient Information, Treatment Date etc. will be populated for each Amount Claimed value

    2. An alternate approach (and assuming the 1,2,3 are unique row id within your column) is to use windowing functions to try and determine the upper and lower bounds of rows that you need to populate, and then use a CROSS JOIN with a filter to populate these records. This is probably a little less efficient.

    Hope this helps

    Charles

    Tuesday, June 20, 2017 3:14 PM
  • Also - documentation ion PIVOT / UNPIVOT is here:

    https://blogs.msdn.microsoft.com/azuredatalake/2017/03/16/azure-data-lake-u-sql-march-9-2017-updates-deprecations-turn-into-errors-pivotunpivot-cross-adls-account-u-sql-catalog-sharing-nuget-packages-and-more/

    Charles

    Tuesday, June 20, 2017 3:15 PM