locked
exporting data from hive table to csv file. RRS feed

  • Question

  • Hi,

    I am doing Hive activity as same as using the below link,

    https://docs.microsoft.com/en-us/azure/data-factory/data-factory-hive-activity

    In my case i am taking csv file input from the blob storage ,creating the table and my table is getting created and after that I am loading data in table, performing some aggregate function and storing the result in the output table, since I have not defined any storage location for the Hive table so I don't know where that is storing the table.

    After loading the data in output table I want to export the result in csv file which will be on a blob location, for doing the same I have used the below syntex

    "INSERT OVERWRITE DIRECTORY 'wasb://<container name>@<storage account>.blob.core.windows.net/' SELECT * FROM database.table_name;"

    but I am getting the below error:

    "Failed with exception Wrong FS: wasb://cmrmsftblob@cmrmsft.blob.core.windows.net/, expected: file:///"

    Even I tried using the "LOCAL" keyword after "INSERT" keyword but again I am getting the below error,

    "FAILED: ParseException line 3:0 cannot recognize input near 'INSERT' 'LOCAL' 'DIRECTORY' in insert clause"

    I got stuck in this, it will be great if anyone can help me with this,

    Thank you so much.

    Tuesday, May 2, 2017 8:17 AM

Answers

  • There are a few things you need to know about the Hive internal table and external table:

    • The CREATE TABLE command creates an internal table. The data file must be located in the default container.
    • The CREATE TABLE command moves the data file to the /hive/warehouse/<TableName> directory on default storage for the cluster.
    • The CREATE EXTERNAL TABLE command creates an external table. The data file can be located outside the default container.
    • The CREATE EXTERNAL TABLE command does not move the data file.

    For more information, see “HDInsight: Hive Internal and External Tables Intro”.

    There are many options to export data from hive table to csv file:

    Option 1: Hive does not provide a direct method to use the query language to dump to a file as CSV. Using the command INSERT OVERWRITE will output the table as TSV. We have to manually convert it to a CSV.

    For more details, see “Output file as a CSV using Hive in Azure HDInsight”.

    Option 2: You can connect Excel to Hadoop with the Microsoft Hive ODBC driver and pull the data in Excel and save it as CSV.

    For more details, see “Connect Excel to Hadoop with the Microsoft Hive ODBC driver”.

    Option 3: You can directly import the output as CSV from the /hive/warehouse/tablename using cat cmdlet:

    Example:  hadoop fs -cat /user/hive/warehouse/<tableName>/* |tr "\01" "," >>filename.csv

    Option 4: Refer SO thread which addresses “How to export a Hive table into a CSV file? ”.

    Option 5: If you want to export output as csv, you can use hard-coding in the script.

    For more details, see “Hive – Support CSV text file format”.

    Wednesday, May 3, 2017 6:16 PM

All replies

  • There are a few things you need to know about the Hive internal table and external table:

    • The CREATE TABLE command creates an internal table. The data file must be located in the default container.
    • The CREATE TABLE command moves the data file to the /hive/warehouse/<TableName> directory on default storage for the cluster.
    • The CREATE EXTERNAL TABLE command creates an external table. The data file can be located outside the default container.
    • The CREATE EXTERNAL TABLE command does not move the data file.

    For more information, see “HDInsight: Hive Internal and External Tables Intro”.

    There are many options to export data from hive table to csv file:

    Option 1: Hive does not provide a direct method to use the query language to dump to a file as CSV. Using the command INSERT OVERWRITE will output the table as TSV. We have to manually convert it to a CSV.

    For more details, see “Output file as a CSV using Hive in Azure HDInsight”.

    Option 2: You can connect Excel to Hadoop with the Microsoft Hive ODBC driver and pull the data in Excel and save it as CSV.

    For more details, see “Connect Excel to Hadoop with the Microsoft Hive ODBC driver”.

    Option 3: You can directly import the output as CSV from the /hive/warehouse/tablename using cat cmdlet:

    Example:  hadoop fs -cat /user/hive/warehouse/<tableName>/* |tr "\01" "," >>filename.csv

    Option 4: Refer SO thread which addresses “How to export a Hive table into a CSV file? ”.

    Option 5: If you want to export output as csv, you can use hard-coding in the script.

    For more details, see “Hive – Support CSV text file format”.

    Wednesday, May 3, 2017 6:16 PM
  • Thank you so much,

    it helps me a lot....:-)

    Tuesday, May 16, 2017 9:26 AM