Prevent writing 0 row files with header in ADLA


  • Is there a way to prevent an empty file from being written in USQL?

    USING Outputters.Tsv(outputHeader: true) writes a file with just a header row, when there are no rows in the result set.

    For ADF depedency it would be preferable not to write any rows, and throw an exception in the USQL job.



    Andrew Sears

    Wednesday, December 7, 2016 9:03 PM


All replies

  • How about using inline C# to throw an exception if COUNT() == 0?
    Wednesday, December 7, 2016 10:04 PM
  • Michael Rys already answered you in this thread. Your not going to get a better answer than his (he owns Data Lake).

    Thursday, December 8, 2016 7:51 PM
  • Hi there,

    Can you elaborate on the solution?  If I have a detail query without a group by how would I get an aggregate count of rows in the dataset?

    I was going to use a cross join for this type of functionality, however was unsure of performance impacts.



    Andrew Sears

    Thursday, December 8, 2016 8:06 PM
  • Andrew, I don't understand the meaning of your new question.

    Also, wrt Misinformed DNA's comment: the original question answered by Mike Rys was about not writing empty files, but the new question also mentions that you want to throw an exception. Andrew, can you be more precise on what behavior you expect. Is your intention that if a job produces an empty file, it should actually be marked as failed as that's an unexpected outcome, and the job should be retried?

    Friday, December 9, 2016 3:50 PM
  • Yes, I have found that you can actually prevent writing files by throwing an exception for the output in question somewhere between the exporter and outputter.  Since scalar values are not supported, you have to cross join the results of a previous count query to the detail query, and use an inline function to throw an exception where count == 0.

    If you have multiple outputs, only the one getting the exception is not written.  Would have preferred that all outputters failed on exception, however I suppose that is the nature of parallel code.

    It's a bit convoluted and have not tested performance implications however should get the job done.



    Andrew Sears

    Sunday, December 11, 2016 10:12 PM