none
Why is there no difference in load performance or query performance RRS feed

  • Question

  • Hi,

    I have been testing the load phase of Azure DWH using Azure Data Factory and my results are not making much sense. it feels as though the documentation is for a totally different product.

    Case 1. DWU 200. No other users. The fastest load time I get for several hundred tables ranging from a few rows to millions, from On-Prem SQL Server to Azure DWH is straight BULK Insert via ADF using the default or smallrc, no staging - into Round Robin Heap. This is unexpected as the documentation all stated Polybase is the fastest load method.

    Case 2. I have tried Polybase, with stage to blob, on both smallrc, largerc and xlargerc - into Round Robin Heap. This is generally slower than Case 1, though largerc with Polybase is almost the same load time as Case 1, a few minutes slower actually.

    Why is Bulk Insert faster? It is extremely frustrating when one tries to implement best practise but nothing improves load times. Could it be that Polybase is only faster when the intermediate step of staging to blob is not needed?

    When it comes to queries, changing resource class does not make any difference, there is no improvement in query performance. Has anyone else spent several weeks just testing these different settings and found absolutely no improvement/change? I have noted this on both the on-prem APS (full rack) and Azure DWH 100 to 300. 

    Thanks,

    Jinx.

    Tuesday, January 14, 2020 4:25 PM

Answers

  • Hi Jinx,

    Without information on the size of the dataset and amount of data you are loading the option to use polybase is what is most likely slowing you down as we are writing to blob store and then ingesting the data when you loading from on-premises. At DW200 with Small RC we are loading into 1 Compute with minimal resources. We won't spawn a lot of reader and write threads due to your SLO.  Start by using the largest RC in DW this will aid loading performance with polybase. 

    You can also monitor the amount of DMS workers being created for your statement using the following statement

    SELECT type, count(*) FROM sys.dm_pdw_dms_workers

    WHERE request_id = ' QIDXXXX ' AND step_index = XX

    group by type;

    Another important factor is the table mapping when loading data with polybase, if the mapping is all varchar without taking schema into consideration we can over allocate memory per record which leads to degraded performance with Polybase. 

    In most real world scenarios customers export to storage in bulk and then scheduling ingestion via polybase with highest available RC allocation and SLO which they can afford or which fits the business need.  If you are still not entirely clear or satisfied feel free to log a case with us so that we can take a look at your data set and size and where the performance is degrading. 

    • Marked as answer by Jinxsee Wednesday, January 15, 2020 4:07 PM
    Wednesday, January 15, 2020 3:40 PM

All replies

  • Hi Jinxsee,

    Were all scenarios run against DW200c? Would it be possible to test one scenario against DW200c and the same against  DW500c, where the dynamic resource classes have a more distinct allocation across each class. It is possible that DW100c - DW300c are still memory bound and making a larger jump in deployed resources (DW1000c as an example) will reflect more distinct results.

    Regards,

    Mike

    Tuesday, January 14, 2020 11:10 PM
    Moderator
  • Hi Jinxsee 

    Polybase performance is dependant on file type, table definition and amount of resources available depending on your Concurrency slot for the statement. The larger the SLO and the larger the resource class the more threads will be created and more parallel loading will occur. 

    The fastest object to load into would be a Round robin heap. If you load the object via polybase in ADF and the source object is not deemed "compatible" we would stage the object first and then load into the target object via polybase. 

    ADF manually creates the objects required for Polybase to work, a more accurate test for load performance would be to create the external table to the source file manually and then perform a CTAS to the new Target object which will be pure polybase. 

    I would highly recommend the following article

    https://techcommunity.microsoft.com/t5/DataCAT/Azure-SQL-Data-Warehouse-loading-patterns-and-strategies/ba-p/305456

    Tuesday, January 14, 2020 11:19 PM
  • Hi Mike,

    All tests were done on DW200, the problem is of course the cost. Hours of testing is going to expensive on higher tiers and I can't see my employer being on board with that. I am coming to the conclusion that using DWH at lower tiers is essentially pointless and if your budget is constrained to the point that higher tiers are not going to be an option and data volumes somewhat low, then DWH not the answer. 

    Thanks,

    Jinx.

    Wednesday, January 15, 2020 8:51 AM
  • Hi Charl,

    Thanks for the article link.

    Perhaps my case was not clear, the point of the test was not to prove that loading into DWH via Polybase is fast - I have no doubt that it is as there are plenty of perfect test cases online for that proof, but rather I was attempting to speed up a real world case where conditions are not perfect. 

    We are loading from on-prem application SQL Server (2014 - 2016) to Azure DWH DW200 into round robin heap, the staging step is required due to the source. I take it from what you are saying that staging to blob within ADF, which is my only option to get polybase working, is actually what is slowing the load down? I cannot find a definitive answer anywhere on whether this is the case - i.e pure polybase is fast, but having the staging step (which one often cannot get away from due to compatibility) is slow and therefore polybase is not the fastest loading option and bulk insert is the best you can do.

    Getting an answer to this would be very helpful as we can stop testing this and accept that the load we have is at its fastest considering our data source. 


    Thanks,

    Jinx.

    Wednesday, January 15, 2020 9:25 AM
  • Hi Jinx,

    Without information on the size of the dataset and amount of data you are loading the option to use polybase is what is most likely slowing you down as we are writing to blob store and then ingesting the data when you loading from on-premises. At DW200 with Small RC we are loading into 1 Compute with minimal resources. We won't spawn a lot of reader and write threads due to your SLO.  Start by using the largest RC in DW this will aid loading performance with polybase. 

    You can also monitor the amount of DMS workers being created for your statement using the following statement

    SELECT type, count(*) FROM sys.dm_pdw_dms_workers

    WHERE request_id = ' QIDXXXX ' AND step_index = XX

    group by type;

    Another important factor is the table mapping when loading data with polybase, if the mapping is all varchar without taking schema into consideration we can over allocate memory per record which leads to degraded performance with Polybase. 

    In most real world scenarios customers export to storage in bulk and then scheduling ingestion via polybase with highest available RC allocation and SLO which they can afford or which fits the business need.  If you are still not entirely clear or satisfied feel free to log a case with us so that we can take a look at your data set and size and where the performance is degrading. 

    • Marked as answer by Jinxsee Wednesday, January 15, 2020 4:07 PM
    Wednesday, January 15, 2020 3:40 PM
  • Thanks Charl, that does make sense.
    Wednesday, January 15, 2020 4:07 PM
  • Hi Charl,

    I have been doing some additional testing, with your comment about dataset size in mind, on DW200 on single tables rather than a full load and I can confirm that on large table loads, Polybase with Stage is much faster than Bulk Insert. My results were being confused by a high number of very small table loads where the time is not improved at all and then when looking at the overall ADF runtime there is no improvement, but looking at individual tables I can see the difference.

    Clearly the case for Polybase on big tables is clear and probably splitting loading into different stages, one stage that handles very large tables loaded via Polybase and another stage for smaller tables loaded with bulk insert.

    Jinx.

    23 hours 19 minutes ago