none
Errors while Bulk loading data into HBase via Phoenix

    Question

  • I'm creating a table via Phoenix and trying bulk data insert via the CSVBulkImporter mentioned.

    https://blogs.msdn.microsoft.com/azuredatalake/2017/02/14/hdinsight-how-to-perform-bulk-load-with-phoenix/

    CREATE TABLE STAGINGORDERS (cid varchar NOT NULL PRIMARY KEY,CreatedTimestamp date,UpdatedTimestamp date,OrderId varchar,CreatedDateTime date,LastModifiedDateTime date);

    $ HADOOP_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/etc/hbase/conf hadoop jar /usr/hdp/2.6.2.3-1/phoenix/phoenix-4.7.0.2.6.2.3-1-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=000 --table STAGINGORDERS --input /datafiles/orderfile.csv --MyZookeeperQuorumString:2181:/hbase-unsecure -e ':' -g

    Here's the exception I get:

    Exception in thread "main" java.io.FileNotFoundException: Bulkload dir /tmp/72dfe10a-4429-409e-a05e-ecf2088ffba1/STAGINGORDERS not found
            at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:196)
            at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:309)
            at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.prepareHFileQueue(LoadIncrementalHFiles.java:585)
            at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:486)
            at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:403)
            at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:359)
            at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:384)
            at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:361)
            at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:299)
            at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:182)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
            at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:117)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
    18/01/02 22:56:08 WARN conf.Configuration: hbase-site.xml:an attempt to override final parameter: dfs.support.append;  Ignoring.

    Please let me know if I'm missing something out here.

    Tuesday, January 02, 2018 11:09 PM

All replies

  • @Jignesh.R, there's an actual post advising and explaining what the cause of the error message is, it seems to be more of Hbase issue: https://mapr.com/support/s/article/HBase-regions-not-getting-into-recovered-from-transitioned-state?language=en_US

    Tuesday, January 02, 2018 11:32 PM
    Moderator
  • @Jignesh.R, there's an actual post advising and explaining what the cause of the error message is, it seems to be more of Hbase issue: https://mapr.com/support/s/article/HBase-regions-not-getting-into-recovered-from-transitioned-state?language=en_US

    Thanks Adam for your reply. I've tried this solution. Unfortunately, I don't have any temporary file left behind because of bulkload operation. However, when I remove the -g parameter from my bulk load command, I get the following error:

    Error: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV record does not have enough values (has 1, but needs 6)
            at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:201)
            at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:73)
            at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
            at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
            at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
            at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
    Caused by: java.lang.IllegalArgumentException: CSV record does not have enough values (has 1, but needs 6)
            at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:81)
            at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:51)
            at org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
            at org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:170)
            ... 9 more

    Please advise.

    Wednesday, January 03, 2018 8:10 PM
  • Can you quickly check whether the path exists under /tmp ? Could be a permissions issue ?
    Thursday, January 04, 2018 12:33 AM
  • @Gaurav Kanade

    The path does exist under /tmp. But only upto /tmp/72dfe10a-4429-409e-a05e-ecf2088ffba1

    There is no STAGINGORDERS directory created.

    I've also enabled public access to the container, but the issue still persists.

    Thursday, January 04, 2018 7:57 PM