none
Any tips on properly using SqlUserDefinedCombiner?

    Question

  • Hi,

    I am trying to map & filter entries in one rather big ADLA table by a lookup map (another moderate size ADLA table). I tried simple INNER JOIN, but wasn't lucky to avoid Cartesian product.

    Both sets partition well on the same key, where the 1st set partition is of a moderate size and another - is small enough to cache in memory. Sounded like a good case to try COMBINE T1 WITH T2 ON T1.Key == T2.Key.

    Issue#1: I get managed exceptions from SqlIpProcessor.ResetOutput() at the job execution, whenever there is no matching T1.Key for T2.Key, and T1.Key is specified as a READONLY column. I assumed that is because the join is outer (which I don't really want).

    Issue#2: I get job compile errors, when I specify [SqlUserDefinedCombiner(Mode = CombinerMode.Inner)], something like "The type or namespace name 'IRowComparer' could not be found".

    I can specify no READONLY, and use  [SqlUserDefinedCombiner], but I think it is worth hinting the ADLA that combined results are still in the same partition, and I thought READONLY does that, so I am not happy to lose it.

    Any advise on above? Maybe I do something completely wrong?

    thanks

    Audrius

    Wednesday, October 11, 2017 8:34 AM

All replies

  • Hello,

    Could you please share your table definitions and query for which you are getting the error?

    Thanks

    Amina


    Amina Saify

    Wednesday, November 1, 2017 11:44 PM
  • Hi Amina,

    Thanks for your interest. With time passed issue#2 is gone, I cannot reproduce it anymore. However issue #1 is easily reproduced both locally and in Azure. Please see the repro script and code behind below.

    thanks,

    Audrius

    ------

    @T1 = SELECT * FROM( VALUES ( "One", 11 ), ( "Two", 12 )) AS t(Key, Val);
    
    @T2 = SELECT * FROM( VALUES ( "Two", 22 ), ( "Three", 23 )) AS t(Key, Val);
    
    @res =
        COMBINE @T1 AS l WITH @T2 AS r  ON l.Key == r.Key
        PRODUCE Key, Product int
        READONLY l.Key
        REQUIRED l.Val, r.Val
        USING new CombinerCases.TestCombiner();
    
    DECLARE @outPath string = "/test-output.csv";
    OUTPUT @res TO @outPath
    USING Outputters.Csv(quoting: false, outputHeader:true, escapeCharacter: '\\');

    ------

    using Microsoft.Analytics.Interfaces;
    using System.Collections.Generic;
    
    namespace CombinerCases
        {
        [SqlUserDefinedCombiner(Mode = CombinerMode.Inner)]
        public class TestCombiner : ICombiner
            {
            public override IEnumerable<IRow> Combine (IRowset left, IRowset right, IUpdatableRow output)
                {
                foreach (var r in right.Rows)
                foreach (var l in left.Rows)
                    {
                    var rv = r.Get<int> ("Val");
                    var lv = l.Get<int> ("Val");
                    output.Set ("Product", rv * lv);
                    yield return output.AsReadOnly();
                    }
                }
            }
        }


    Thursday, November 2, 2017 7:10 AM