locked
How to Create and Run Apache Pig Jobs in Hadoop cluster in HDInsight RRS feed

  • Question

  • Hi,

    I signed up to Azure free account and created a Apache Hadoop cluster in HDInsight, I am trying to create an apache pig job and run it in the cluster, however, I struggle to find any tutorial as how to do this?

    Can you help please?

    Thanks.


    PiggyZhou

    Friday, May 29, 2020 9:48 AM

All replies

  • Hello,

    Unfortunately, you cannot use Ambari UI to run pig latin jobs.

    Note: To process data using Pig, will need to open an SSH console that is connected to your cluster and then run the pig latin using local mode or mapreduce mode:

    If you are using a Windows client computer:

    1. In the Microsoft Azure portal, on the HDInsight Cluster blade for your HDInsight cluster, click  Secure Shell, and then in the Secure Shell blade, in the Hostname list, note the Host name for your cluster (which should be your_cluster_name-ssh.azurehdinsight.net).

    2. Open PuTTY, and in the Session page, enter the host name into the Host Name box. Then under Connection type, select SSH and click Open. If a security warning that the host certificate cannot be verified is displayed, click Yes to continue.

    3. When prompted, enter the SSH username and password you specified when provisioning the cluster (not the cluster login username).

    If you are using a Mac OS X or Linux client computer:

    1. In the Microsoft Azure portal, on the HDInsight Cluster blade for your HDInsight cluster, click Secure Shell, and then in the Secure Shell blade, in the Hostname list, select the hostname for your cluster. then copy the ssh command that is displayed, which should resemble the following command – you will use this to connect to the head node.

    ssh sshuser@your_cluster_name-ssh.azurehdinsight.net

    2. Open a new terminal session, and paste the ssh command, specifying your SSH user name (not the cluster login username).

    3. If you are prompted to connect even though the certificate can’t be verified, enter yes.

    4. When prompted, enter the password for the SSH username.

    Once you connected to your cluster, to run pig latin as shown:

    You can execute Pig Latin statements:

    • Using grunt shell or command line
    • In mapreduce mode or local mode
    • Either interactively or in batch

    Reference: Pig Manual

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Friday, May 29, 2020 11:22 AM
  • Hello,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    Thursday, June 4, 2020 4:37 PM
  • Hello,

    Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

    Friday, June 5, 2020 12:29 PM