none
U-SQL - how to output multiple (N) sets of repeating columns RRS feed

  • Question

  • I have a scenario where I need to "denormalize" an OUTPUT recordset to a csv file - the twist is I that it seems I cannot SELECT two (or more) columns with the same column name.  Say you have a Customer file, then a PhoneNumbers file and you want the resulting csv file to be shaped like this:

    CustomerName,Age,Phonetype,PhoneNumber,PhoneExtension,Phonetype,PhoneNumber,PhoneExtension

    Fred,32,Home,(009)234-0987,,Cell,(555)323-3456,x23

    Sally,29,Work,(555)923-0909,x344,Cell,(444)111-2345,

    I can easily get the data shaped this way, the issue is the duplicate column names.

    Can a MAP_AGG work, or may a custom extractor, combiner, etc.?  Looking for ideas - thanks in advance!


    Bill Blakey

    Thursday, September 6, 2018 9:04 PM

Answers

  • Vairanvan, thank you for the response.  Yes, I agree that with perspective of database design, column/field names must be deterministic and therefore unique.  However, I'm trying to write out to a file to send to another system and need the data in a certain form/shape.

    I was actually ABLE to accomplish what I needed to do today by implementing a Custom Outputter.  I referenced Michael Rys' code here:  https://github.com/Azure/usql/tree/master/Examples/HeaderOutputter

    So when I name the columns in the SELECT clause, I tag each "set" of repeating columns with a #N# then remove that within the Output method of the Custom Outputter.  So for example this input:

    @simple_rs =
    SELECT  *
    FROM (
        VALUES   
             ("ACME","John Smith","10660 N Gainey Center Way","Irvine","AZ","85255","(555)123-4567","cell","(555)333-2222","pager")
            ,("ACME","Mary Green","22 Twain St.","Mesa","CO","87988","(555)123-1522","home","","")
            ,("ACME","Pat Gore","6388 S. South Way #455","Boulder","CO","87999","(555)234-0987","home","(555)333-3333","work")
         ) AS T ([Name#0#],[Name#1#],[Street#1#],[City#1#],[State#1#],[Zip#1#],[Phone#1#],[PhoneType#1#],[Phone#2#],[PhoneType#2#]);
    
    OUTPUT @simple_rs
    TO "/Output/HeaderOutputter/Address/replacement.txt"
    USING new FAFileOutputter.HeaderOutputter(quoting:false, withTypes:false);
    
    
    


    Gets written out as this header:

    Name,Name,Street,City,State,Zip,Phone,PhoneType,Phone,PhoneType


    Bill Blakey

    Friday, September 7, 2018 8:54 PM

All replies

  • It is by database design that column names should be unique. I would advise you to design such that column names are unique. If you have the phoneType value structured, you can make that as a column type and make it nullable if the value is not present. Let me know if that works for you. 

    For eg:

    CustomerName,Age,Home,Work, Cell

    Fred,32,(009)234-0987,,(555)323-3456,x23

    Sally,29,,(555)923-0909,x344,(444)111-2345

    Thursday, September 6, 2018 10:40 PM
    Moderator
  • Vairanvan, thank you for the response.  Yes, I agree that with perspective of database design, column/field names must be deterministic and therefore unique.  However, I'm trying to write out to a file to send to another system and need the data in a certain form/shape.

    I was actually ABLE to accomplish what I needed to do today by implementing a Custom Outputter.  I referenced Michael Rys' code here:  https://github.com/Azure/usql/tree/master/Examples/HeaderOutputter

    So when I name the columns in the SELECT clause, I tag each "set" of repeating columns with a #N# then remove that within the Output method of the Custom Outputter.  So for example this input:

    @simple_rs =
    SELECT  *
    FROM (
        VALUES   
             ("ACME","John Smith","10660 N Gainey Center Way","Irvine","AZ","85255","(555)123-4567","cell","(555)333-2222","pager")
            ,("ACME","Mary Green","22 Twain St.","Mesa","CO","87988","(555)123-1522","home","","")
            ,("ACME","Pat Gore","6388 S. South Way #455","Boulder","CO","87999","(555)234-0987","home","(555)333-3333","work")
         ) AS T ([Name#0#],[Name#1#],[Street#1#],[City#1#],[State#1#],[Zip#1#],[Phone#1#],[PhoneType#1#],[Phone#2#],[PhoneType#2#]);
    
    OUTPUT @simple_rs
    TO "/Output/HeaderOutputter/Address/replacement.txt"
    USING new FAFileOutputter.HeaderOutputter(quoting:false, withTypes:false);
    
    
    


    Gets written out as this header:

    Name,Name,Street,City,State,Zip,Phone,PhoneType,Phone,PhoneType


    Bill Blakey

    Friday, September 7, 2018 8:54 PM
  • Thank you for clarifying and it is great that you found the solution. Thanks for pasting the solution. Appreciate it
    Tuesday, September 11, 2018 8:26 PM
    Moderator
  • Vairavan, I am curious when you stated in your response "If you have the phoneType value structured, you can make that as a column type and make it nullable if the value is not present" - I tried searching for "u-sql column type" but I am unsure what you are referring to. I did read about SQL.MAP and the related operators, but those seem to address simple name/value pairs.

    For my future knowledge, what is a column type in U-SQL, and would you have a simple example to post?  I'm always open to considering other approaches to solve a problem :)

    Thanks!


    Bill Blakey

    Wednesday, September 12, 2018 2:43 PM