locked
HDInsight Spark Network RRS feed

  • Question

  • Hi all,

    i have questions about network for hdinsight spark. I have a 10.72.24.0/24 vnet subnetted as following:

    prod 10.72.24.0/26
    preprod 10.72.24.64/26
    test 10.72.24.128/26
    dev 10.72.24.192/27
    gatewaysubnet 10.72.24.224/28
    internetgateway 10.72.24.240/28

    on every subnet i will be deploy 1 vm sql iaas, 1 webserver, 2 head node and 4 worked node spark hdinsight(this can be change). Actually my vnet situation is be enough to run my infrastructure?

    Second question is about virtual network option on hdinsight, what is the difference if i leave system managed or i set my virtual network virtual network? With system managed i can't reach onpremise enviroment throught vpn or express route?

    system managed

    Virtual Network

    Regards


    Monday, November 4, 2019 8:38 PM

Answers

  • Hi Emanuele86,

    Apache Kafka APIs are not available publicly over the internet. If you deploy HDInsight with Azure virtual network, you can access Apache Kafka APIs publicly over the internet.

    Yes, you can use alternate tools such as ADF to import & export to HDInsight storage if the data source is supported. But, in case you want to transfer data from ADF which is not supported data source.

    If you have two cluster named “Spark” and “Kafka” deployed without vnet, they cannot communicate with each other, but if you deploy Spark and Kafka in a single vnet they both can communicate with each other seamlessly.

    • Marked as answer by Emanuele86 Wednesday, November 20, 2019 9:22 AM
    Thursday, November 7, 2019 5:13 AM

All replies

  • Hi Emanuele86,

    Two answer the both questions, you may go through Network Architecture:

    Azure HDInsight cluster without virtual network configuration:

    By default when you create an HDInsight cluster, the VM nodes within that cluster are configured to communicate with each other, but Internet access to any of the cluster nodes is restricted to just the Head or Edge nodes (and limited to SSH or HTTPS). There are scenarios that need a greater degree of access into the networking environment of the cluster nodes, for example:

    • You need to directly access services on HDInsight that aren't exposed over the Internet. For example, you have consumers or producers that need to directly work with Kafka brokers or clients that need to use the HBase Java API.
    • You need to connect on-premises services to HDInsight. For example, use Oozie to import or export data to from an on-premises SQL Server.
    • You need to create solutions that involve multiple HDInsight clusters of different types. For example, you might want to use Spark or Storm clusters to analyze data stored in a Kafka cluster.
    • You want to restrict access to HDInsight. For example, to prevent inbound traffic from the internet.

    Azure HDInsight cluster with virtual network configuration:

    Greater control over the HDInsight networking environment is achieved by deploying your cluster into an Azure Virtual Network. An Azure Virtual Network allows you to create a secure, persistent network containing the resources you need for your solution. Cloud resources that you want to connect with your HDInsight cluster, such as Virtual Machines and other instances of HDInsight can then be provisioned into the same Virtual Network.

    You can create a site-to-site or point-to-site VPN connection to enable connectivity between resources in an on-premises network and your HDInsight cluster.

    You can also connect two different Virtual Network instances by configuring a VNET-to-VNET connection.

    You can also secure the network perimeter by using Network Security Groups to restrict traffic based on protocol, source and destination.

    In addition to securing in-bound traffic by applying NSGs to subnet of the Virtual Network, you can also configure user-defined routes and control the flow of network traffic through a virtual firewall appliance by deploying your HDInsight cluster into a Virtual Network.

    For more details on using HDInsight within a Virtual Network, see Use Virtual Network

    Hope this helps.      

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Tuesday, November 5, 2019 5:20 AM
    • You need to directly access services on HDInsight that aren't exposed over the Internet. For example, you have consumers or producers that need to directly work with Kafka brokers or clients that need to use the HBase Java API.

    you mean if there is a kafka onpremise?

    • You need to connect on-premises services to HDInsight. For example, use Oozie to import or export data to from an on-premises SQL Server.

    and without put hdinsight in a vnet? I can use data factory to import export data to hdinsight from a sql?

    • You need to create solutions that involve multiple HDInsight clusters of different types. For example, you might want to use Spark or Storm clusters to analyze data stored in a Kafka cluster.

    Without vnet is not possible to connect kafka with spark?

    thanks

    Wednesday, November 6, 2019 5:18 PM
  • Hi Emanuele86,

    Apache Kafka APIs are not available publicly over the internet. If you deploy HDInsight with Azure virtual network, you can access Apache Kafka APIs publicly over the internet.

    Yes, you can use alternate tools such as ADF to import & export to HDInsight storage if the data source is supported. But, in case you want to transfer data from ADF which is not supported data source.

    If you have two cluster named “Spark” and “Kafka” deployed without vnet, they cannot communicate with each other, but if you deploy Spark and Kafka in a single vnet they both can communicate with each other seamlessly.

    • Marked as answer by Emanuele86 Wednesday, November 20, 2019 9:22 AM
    Thursday, November 7, 2019 5:13 AM
  • Hi Emanuele86,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    Tuesday, November 12, 2019 6:28 AM
  • Hi Emanuele86,

    Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

    Wednesday, November 20, 2019 4:45 AM
  • Hi all,

    yes thanks for help at the moment :)

    REgards

    Wednesday, November 20, 2019 9:23 AM