locked
Very slow performance copying from table storage RRS feed

  • Question

  • I have setup a copy activity to copy data from a table storage table to DataLake in an Azure Data Factory pipeline.

    When I run this activity to copy the latest data to DataLake it is extremely slow. The activity read 346 mb of data and wrote 264mb of data. It read and wrote 1,260,825 rows of data. The throughput was 12.99 KB/s. The activity took 7 1/2 hours to run.

    There are > 800,000,000 rows on the table storage table. The copy activity uses a query on the table storage timestamp to only get the latest data.

    I have tried changing the number  of dius etc. and it has made no difference.

    Any ideas on how I can speed up this activity ?

    Monday, March 2, 2020 10:50 AM

All replies

  • Hi 

    Thanks for reaching out. This issue requires a deeper analysis since it is related to performance. For deeper investigation and immediate assistance, if you have a support plan you may file a support ticket, else could you please send an email to AzCommunity@Microsoft.com with the below details, so that we can create a one-time-free support ticket for you to work closely on this matter. 


    Subject of the email: <ATTN-Kranthi: MSDN Thread title>
    Thread URL: <MSDN Thread>
    Subscription ID:  <your subscription id>
    Pipeline Run ID: <>
    Activity Run ID: <>

    Let us know here once the email is sent.

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Tuesday, March 3, 2020 6:23 PM
  • Hi  Andrew,

    Before going for support request, could you please confirm what is the integration runtime you are using on both Source & Sink side? In case if you are using SHIR, what is the version? If it is not the latest version could you please try upgrading your SHIR to latest and see if you still experience the performance issue. (You can get the latest IR from here)

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Wednesday, March 4, 2020 1:34 AM
  • Hi,

    The pipeline uses the standard Azure Data Factory, so I assume it's not using SHIR

    Regards,

    Andrew

    Wednesday, March 4, 2020 11:12 AM
  • Hi Andrew,

    Thanks for your response. After doing further investigation and readings, I assume the slow performance is not on ADF side but it might be on Azure Table storage side. Just to confirm the same, could you please check if your query includes 'PartitionKey' and/or 'RowKey'? If a query doesn't include PartitionKey and/or RowKey the queries will be slow because Table service is doing full table scan. 

    Please refer to below helpful sources:

    Hope the above information helps and let us know if you still need assistance.


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Wednesday, March 4, 2020 10:17 PM
  • Thanks, the query currently uses timestamp I will look at using RowKey.
    Thursday, March 5, 2020 9:59 AM
  • Thanks for your response Andrew. In-case if you still need assistance, please let us know. We will be glad to dive deeper on this.

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Saturday, March 7, 2020 1:47 AM