locked
ADF: How to Union 4 inputs into one output using HIVE? RRS feed

  • Question

  • Hello,

    I am trying to merge 4 inputs into one output using HIVE. ADF is working and completes the time-slices but I am getting 4 blobs as the output instead of a single one.

    Here is my script:

    SET hive.execution.engine=tez;
    SET hive.mapred.supports.subdirectories=TRUE;

    SET mapred.input.dir.recursive=TRUE;

    DROP TABLE IF EXISTS RedeemPart0;
    CREATE EXTERNAL TABLE RedeemPart0
    (
    [my columns]
    )
    STORED AS AVRO
    LOCATION '${hiveconf:Redeem0Loc}';
    DROP TABLE IF EXISTS RedeemPart1;
    CREATE EXTERNAL TABLE RedeemPart1
    (
    [my columns]
    )
    STORED AS AVRO
    LOCATION '${hiveconf:Redeem1Loc}';
    DROP TABLE IF EXISTS RedeemPart2;
    CREATE EXTERNAL TABLE RedeemPart2
    (
    [my columns]
    )
    STORED AS AVRO
    LOCATION '${hiveconf:Redeem2Loc}';
    DROP TABLE IF EXISTS RedeemPart3;
    CREATE EXTERNAL TABLE RedeemPart3
    (
    [my columns]
    )
    STORED AS AVRO
    LOCATION '${hiveconf:Redeem3Loc}';
    DROP TABLE IF EXISTS redeem_output;
    CREATE EXTERNAL TABLE redeem_output
    (
    [my columns]
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE

    LOCATION '${hiveconf:OutputLoc}';

    INSERT OVERWRITE TABLE redeem_output
    SELECT *
    FROM RedeemPart0
    UNION ALL
    SELECT *
    FROM RedeemPart1
    UNION ALL
    SELECT *
    FROM RedeemPart2
    UNION ALL
    SELECT *
    FROM RedeemPart3 

    Friday, November 25, 2016 1:50 PM

All replies

  • so the script generates 4 files, is that correct?

    thats probably due to the fact that HIVE tries to parallelize as much as possible, so 4 tables --> 4 files

    in general, you can point ADF to your '${hiveconf:OutputLoc}' and it will process all files in and below that directory so I dont think this should be a problem here

    if it is an issue, you might try to force HIVE to a single-threaded INSERT by using LIMIT or ORDER BY clauses (i have not done this myself yet but it might work)

    hth,
    -gerhard


    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Monday, November 28, 2016 9:44 AM