none
Spark application build in local server, how can we access Hive Warehouse located in HDInsight RRS feed

  • Question

  • Hello Techie,

    This is my first project on HDInsight 

    I'm trying to connect to the Hive warehouse directory located in HDInsight by using Spark on IntelliJ.

    I am using Spark 1.6 with Scala and Maven Project.

    Thrift Server details:

    `System.setProperty("hive.metastore.uris", `"thrift://hnaz.xyz123.internal.cloudapp.net:1403")`.

    i am trying to access tables of hive warehouse.

    package Test
    import org.apache.spark.{SparkConf, SparkContext}
    import org.apache.spark.sql.{SQLContext, SaveMode, sources}
    
    
    object TestHive {
      def main(args: Array[String]): Unit = {
    
        // get spark configuration
        val conf = new SparkConf()
          .setAppName("SparkHiveTest")
        conf.setMaster("local[*]")
        System.setProperty("hive.metastore.uris", "thrift://hnaz.xyz123.internal.cloudapp.net:1403")
        import org.apache.spark.sql.hive.HiveContext
        val sc = new  SparkContext(conf)
        val hiveContext = new HiveContext(sc)
        implicit val sqlContext = new SQLContext(sc)
    
        import org.apache.spark.sql.functions._
        import sqlContext.implicits._
    
        val df1 = sqlContext.sql(s"use $data_profiling, sqlContext.sql("show tables")");
    
    
    
      }
    }

    The POM Dependencies look like

     <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_2.11</artifactId>
                <version>${spark.version}</version>
                <scope>compile</scope>
                <!-- provided -->
            </dependency>

    But unfortunately i am not able to connect the node. am i missing something. Please help me.

    Friday, October 25, 2019 7:33 AM

Answers

  • Hello,

    It is not possible to access HDInsight which is in Azure HDInsight using local instance.

    You need to perform the operation on the created Azure HDInsight cluster.

    The above-mentioned tutorial helps to create Apache Spark apps for HDInsight cluster.

    OR

    With Azure HDInsight 4.0 you can integrate Apache Spark and Apache Hive with the Hive Warehouse Connector.

    The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive by supporting tasks such as moving data between Spark DataFrames and Hive tables, and also directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It supports Scala, Java, and Python for development.

    Hope this helps.

    • Marked as answer by Jamiechales Thursday, October 31, 2019 11:03 AM
    Tuesday, October 29, 2019 5:30 AM
    Moderator
  • Hello,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    • Marked as answer by Jamiechales Thursday, October 31, 2019 11:02 AM
    Wednesday, October 30, 2019 8:52 AM
    Moderator

All replies

  • Hello,

    Welcome to Microsoft Azure!

    Note: The setMaster method is used to specify a local cluster. If you want to run the application in a cluster on HDInsight, you replace the argument local[]* with the URL spark://: where and are the IP address and port number of the edge node in the cluster.

    This tutorial demonstrates how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in Scala, and then submit them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). You can use the plug-in in a few ways:

    • Develop and submit a Scala Spark application on an HDInsight Spark cluster.
    • Access your Azure HDInsight Spark cluster resources.
    • Develop and run a Scala Spark application locally.

    Hope this helps.      

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Friday, October 25, 2019 11:17 AM
    Moderator
  • i don't want to run Apllication on cluster. i am running in client mode, on local server.

    will it be possible to access Hive Warehouse which is located in HDInsight from local instance

    How can i read /write Hive warehouse from local Spark Instance using Intellij.

    Please help me.

    Monday, October 28, 2019 7:14 AM
  • Hello,

    It is not possible to access HDInsight which is in Azure HDInsight using local instance.

    You need to perform the operation on the created Azure HDInsight cluster.

    The above-mentioned tutorial helps to create Apache Spark apps for HDInsight cluster.

    OR

    With Azure HDInsight 4.0 you can integrate Apache Spark and Apache Hive with the Hive Warehouse Connector.

    The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive by supporting tasks such as moving data between Spark DataFrames and Hive tables, and also directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It supports Scala, Java, and Python for development.

    Hope this helps.

    • Marked as answer by Jamiechales Thursday, October 31, 2019 11:03 AM
    Tuesday, October 29, 2019 5:30 AM
    Moderator
  • Hello,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    • Marked as answer by Jamiechales Thursday, October 31, 2019 11:02 AM
    Wednesday, October 30, 2019 8:52 AM
    Moderator