Продажи: 1-800-867-1389

 none
Hadoop -D command line parameters

    Вопрос

  • When running the Hadoop Command Line from the desktop of an Azure Hadoop Cluster, you're actually launching c:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd

    This displays a command prompt where a script parses command-lines that are entered by the user. For each command-line entered, this script then executes Hadoop, utilizing the arguments it has parsed from what the user entered.

    My experience has been that providing command-line parameters via "-D" does not work. (Please enlighten me if you've managed to get it to work.)

    The problem is that -D options are either being ignored, or result in an exception being thrown.

    What I have managed to get working however, is to place a mapred-default.xml file in the c:\apps\dist\hadoop-1.1.0-SNAPSHOT\conf directory.

    But this feels like a bit of a kludge. It's be nicer to put this file somewhere on the file-system and then refer to it when executing Hadoop. I get the feeling that this is what the Hadoop --config parameter is for, but it does not seem to work...

    hadoop --config "c:\" ... 

    ...results in the usage being printed to the console, while...

    hadoop --config c:\ ... 

    ...results in...

    12/11/16 02:34:36 ERROR streaming.StreamJob: Error Launching job : java.lang.Ill
    egalArgumentException: Invalid Connection String

    etc.

    So I guess my question is, could some please provide some guidance about how to specify -D arguments.

    If you'd like a concrete example to solve, then I'd like to get this command-line to work...

    hadoop jar lib\hadoop-streaming.jar -input "asv://data-repository/oracle-fact-system-event" -mapper "..\..\jars\OracleGVMapper.exe" -reducer "..\..\jars\OracleGVReducer.exe" -file "c:\OracleGVMapper.exe" -file "c:\OracleGVReducer.exe" -file "c:\LucyStatisticsSimple.dll" -output "/user/lucy/results-oracle-01" –D map.output.key.field.separator=: mapred.text.key.partitioner.options=k2,4n

    ...with these -D options...

    mapred.skip.mode.enabled=true

    mapred.skip.map.max.skip.records=1

    mapred.skip.attempts.to.start.skipping=1

    io.sort.mb=150

    mapred.job.reuse.jvm.num.tasks=100


    16 ноября 2012 г. 2:42

Все ответы

  • -D type commands need to precede the Streaming Commands, see; http://hadoop.apache.org/docs/r1.1.0/streaming.html#Specifying+Configuration+Variables+with+the+-D+Option

    so your code needs to look like:

    hadoop jar lib\hadoop-streaming.jar -D "mapred.skip.mode.enabled=true" -input "asv://data-repository/oracle-fact-system-event" -mapper ".."  etc....

    Cheers, James


    James Beresford @ www.bimonkey.com & @BI_Monkey
    SSIS / MSBI Consultant in Sydney, Australia
    SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex

    16 ноября 2012 г. 4:38
  • The short answer:

    What you will need to do, for now, is surround the config option with quotation marks; and no space between the -D and param.

    eg. "-Ddfs.block.size=268435456"    (will set the DFS block size to 256 MB).

    The long answer:

    The problem here is the way windows parses and processes command line parameters between the different scripts; this unfortunately creates a slight difference in what you would expect to see. 

    You can also specify parameters in an XML file and pass

    eg: the following two commands should result in identical configurations.

    hadoop jar ..\..\dist\lib\hadoop-streaming.jar "-Dmapreduce.output.fileoutputformat.compress=true" "-Dmapreduce.output.fileoutputformat.compression.codec=org.apache.hadoop.io.compress.GZipCodec" -numReduceTasks=0 -input /user/Isotope/1000lines.txt -output /user/Isotope/ads_logistic -mapper "cmd /c ..\..\jars\sedit.bat" -file sedit.bat

    or
    hadoop jar ..\..\dist\lib\hadoop-streaming.jar -conf myconf.xml -numReduceTasks=0 -input /user/Isotope/1000lines.txt -output /user/Isotope/ads_logistic -mapper "cmd /c ..\..\jars\sedit.bat" -file sedit.bat

    where myconf.xml

    <configuration>
      <property>
        <name>mapreduce.output.fileoutputformat.compress</name>
        <value>true</value>
      </property>
      <property>
        <name>mapreduce.output.fileoutputformat.compression.codec</name>
        <value>org.apache.hadoop.io.compress.GZipCodec</value>
      </property>
    </configuration>

    Thanks,
    Brad Sarsfield
    HDInsight

    17 ноября 2012 г. 0:53