[HDInsight] How to set HDSF as default file system RRS feed

  • 问题

  • Hi Team,

    I have a task to investigate the performace of Hive query when blob file is stored in Azure Storage/HDFS. I've two questions:

    1. Is it possible to use HDFS as default file system in HDInsight? My HDinsight cluster is created on windows azure.

    2. If yes, what should I modify in the configuration file(s)?

    I've tried as blow but the Hive query is failed in the end. Hope you could help me.

    1. Use hadoop shell command to copy a blob file from ASV to HDFS

    2. Create a table with HiveQL. LOCATION parameter of create table sentence is set to a HDFS path 'hdfs://RD**********CE:9000/hive/warehouse/...'.  RD**********CE is name node's computer name.

    3. Load the blob file into test table with hive sentence 'load data inpath...'

    4. Modify core-site.xml in name node and each data node. Property '' is set to 'hdfs://RD**********CE:9000' .

    After step #4, I check it with #ls. The relative path displayed in console is expected. It is matched with the test table's location. Then I submit query 'select * from testtable_hdsf limit 1'. Expected result is returned.

    However, when I try to select one column with query  'select sessionid from testtable_hdsf limit 1', it failed. This query works fine when I use azure storage to store data.

    Hive history file=c:\apps\dist\hive-0.9.0\logs/hive_job_log_RD00155D5907CE$_201309061526_533446434.txt
    Logging initialized using configuration in file:/C:/apps/dist/hive-0.9.0/conf/
    Time taken: 0.48 seconds
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201309061221_0002, Tracking URL = http://jobtrackerhost:50030/jobdetails.jsp?jobid=job_201309061221_0002
    Kill Command = c:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -Dmapred.job.tracker=jobtrackerhost:9010 -kill job_201309061221_0002
    Hadoop job information for Stage-1: number of mappers: 17; number of reducers: 0
    2013-09-06 15:29:57,373 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201309061221_0002 with errors
    Error during job, obtaining debugging information...
    Examining task ID: task_201309061221_0002_m_000018 (and more) from job job_201309061221_0002
    Examining task ID: task_201309061221_0002_r_000000 (and more) from job job_201309061221_0002
    Exception in thread "Thread-21" java.lang.RuntimeException: Error while reading from task log url
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(
    at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(
    Caused by: Server returned HTTP response code: 400 for URL: http://workernode0:50060/tasklog?taskid=attempt_201309061221_0002_m_000018_2&start=-8193
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(
    ... 3 more
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
    MapReduce Jobs Launched:
    Job 0: Map: 17 HDFS Read: 0 HDFS Write: 0 FAIL
    Total MapReduce CPU Time Spent: 0 msec

    Thanks a lot!

    A beginner in HDInsight

    • 已编辑 it民工1985 2013年9月6日 18:34 完善问题。。
    2013年9月6日 18:09