none
Outputter writing to SQL DB

    Question

  • Hi

    For a number of reasons we need to be able to write to an Azure SQL DB as part of a custom outputter. It works fine locally and does exactly as expected (and required). However when run in Azure the outputter is unable to connect to the Azure SQL DB.

    After a bit of prodding and poking we got a error message "The server was not found or was not accessible" and "(provider: TCP Provider, error: 0 - No such host is known.)"

    The database connection is failing as it cant reach the server. I think I have tried the firewall route but I am not certain I have the right IP address range for data lake analytics but I am also wondering if this is a DNS problem. I am in the process of using the external IP address of the server but this is not idea for obvious reasons...

    Any thoughts

    Thanks

    Friday, December 23, 2016 11:33 AM

Answers

  • Custom processors (reducers, outputters, processors) cannot make network-based calls during execution in the cloud. ADLA ability to scale-out execution across large numbers of distributed processing nodes is a key feature for big data processing. To allow this level of scaling, and to support security requirements, user code running on these nodes is not allowed to go off-node during execution. So, calls to network APIs, connecting to remote services, etc. are blocked.

    Additionally, there are a number of issues around getting writes from a large-scale, parallel query environment into a transactional database to work correctly. Therefore the model for writing results into a SQL database is to write output to ADLS and then use ADF or other copy mechanism to load the data into SQL.

    The fact that the local run environment does not block what would be network writes in the cluster is something that we should address in our tooling. We should either block these calls in the local run environment as well, or at least have documentation/warnings to inform users about it. I will open an issue for follow-up with our dev team for that.

    Omid

    Saturday, December 24, 2016 2:40 AM

All replies

  • Custom processors (reducers, outputters, processors) cannot make network-based calls during execution in the cloud. ADLA ability to scale-out execution across large numbers of distributed processing nodes is a key feature for big data processing. To allow this level of scaling, and to support security requirements, user code running on these nodes is not allowed to go off-node during execution. So, calls to network APIs, connecting to remote services, etc. are blocked.

    Additionally, there are a number of issues around getting writes from a large-scale, parallel query environment into a transactional database to work correctly. Therefore the model for writing results into a SQL database is to write output to ADLS and then use ADF or other copy mechanism to load the data into SQL.

    The fact that the local run environment does not block what would be network writes in the cluster is something that we should address in our tooling. We should either block these calls in the local run environment as well, or at least have documentation/warnings to inform users about it. I will open an issue for follow-up with our dev team for that.

    Omid

    Saturday, December 24, 2016 2:40 AM
  • Hi Omid,

    Thanks for the clarification and, yes this all makes sense. In the past we have used polybase to get data into Data Warehouse but the external source support in Azure DB is slightly different. The plan was to use the bulk copy api from within the outputters close virtual method.

    I completely understand the reasoning behind not allowing network based calls from the processing nodes and your clarification will stop me wasting any more time on it.

    It would be useful it the local environment also blocked these calls.

    Appreciate the help, thanks.

    Sunday, January 8, 2017 10:31 PM