none
Generating an RowSet with a given number of rows in U-SQL

    Question

  • In Spark, using their SqlContext, one can easily generate a DataFrame with an given number of rows as follows:

    sqlContext.range(0,10000000)

    The DataFrame will have the given number of rows, one column (id) with the integer value. This is really invaluable for generating an arbitrary amount of random rows. For example, generating millions of fake names and phone numbers from a set of common first name and last names and a random phone number generator. 

    But, I've found no way in U-SQL to do the same. There are some hacks (https://blogs.msdn.microsoft.com/azuredatalake/2017/08/18/u-sql-tip-generating-ranges-of-numbers-and-dates/) 

    but I am wondering if there is a better way (a custom processor perhaps)?

    Nathan Dykman

    Thursday, April 12, 2018 11:44 PM

Answers

  • Hi Nathan

    A custom processor is close, but you can lift it into a CROSS APPLY with a C# expression:

    @data = 
      SELECT val 
      FROM (VALUES(1)) AS v(x) 
        CROSS APPLY EXPLODE(Enumerable.Range(0,10000000)) AS T(val);

    The benefit is that you can refer to values in the left-hand side rowset if you have existing data.

    If you just want a rowset generator as in the case above, I suggest to wrap it into a table-valued function so you can simplify the syntax when you call it.

    Note that the expression in the EXPLODE can be any C# expression resulting in an IEnumerable<T>.

    If you want to create several columns with some random data, you can write an Applier UDO.

    More information is available in the documentation and the release notes that announced the extended CROSS APPLY support.


    Michael Rys

    Friday, April 13, 2018 1:05 AM
    Moderator