none
(ADF v2) Oracle DB Source/Azure Blob Sink Throughput very slow

    Question

  • I'm working on reading from an Oracle DB view and writing to an Azure Blob sink.

    Though attempting different scenarios for the sink option and evaluating network connectivity, I am getting horrible throughput.

    The IR client (Azure VM) is in the same region and VNet as the source (Azure VM) as well as the Azure Storage network that is being written to. The IR Client is barely utilized and the IR processes are not often above 10% CPU utilization. Total memory is around 25%.

    Any thoughts to troubleshoot this? Does it matter significantly whether the Blob format is JSON vs Text? We aren't seeing issues interacting with the DB in other scenarios.

    Event Log Entry:

    Report job progress: [MaxMemoryLimit, 1073741824];[Source.WorkingDuration, 118.8008801];[Source.WorkerCount, 1];[Source.PeakWorkerCount, 1];[Source.StartTime, 04/21/2018 14:10:41.939];[Serialization.WorkingDuration, 0.0428152];[Serialization.WorkerCount, 0];[Serialization.PeakWorkerCount, 1];[Compression.WorkingDuration, 0];[Compression.WorkerCount, 0];[Compression.PeakWorkerCount, 0];[Sink.WorkingDuration, 0];[Sink.WorkerCount, 0];[Sink.PeakWorkerCount, 0];[AvailableMemoryLimit, 1065353215];[DataSizeOutbound, 0];[SinkFileCount, 0];[Source.ReadFirstByteTime, 04/21/2018 14:11:08.278];[DataSizeInbound, 31392];[SourceInboundRowCount, 40];[RowCount, 39];[RuntimeStageNames, ["Source","Serialization","Compression","Sink"]];[Source.Type, Oracle];[Serialization.Type, JsonFormat];[Compression.Type, GZip];[Sink.Type, AzureBlob];[TaskPickupTime, 04/21/2018 14:10:38.714];[PreProcess.Duration, 0.07];[Memory.TotalMemoryInMB, 8191.00];[GC.IsServerGC, False];[Memory.ProcessPrivateWorkingSetInMB, 62.66];[Memory.ProcessWorkingSetInMB, 129.99];[Memory.ProcessWorkingSetPeakInMB, 134.45];[Memory.ProcessVirtualBytesInMB, 1019.20];[Memory.ProcessVirtualBytesPeakInMB, 1028.07];[Memory.ProcessPageFileBytesInMB, 233.05];[Memory.ProcessPageFileBytesPeakInMB, 235.67];[Memory.ProcessPrivateBytesInMB, 233.30];[Processor.ProcessCpuUtilizationRatio, 0.10];[Memory.AvailableMemoryInMB, 6321.00];[Memory.CommitLimitInMB, 9471.55];[Memory.CommittedBytesInMB, 2165.11];[Memory.CommittedBytesInUseRatio, 22.86];[Network.NetworkRecieved, 1.91];[Network.NetworkSent, 1.81];[Processor.CpuUtilizationRatio, 0.14];[GC.GCRatio, 4.06];[GC.ProcessGCRatio, 0.00];[LogicalDisk.CFreeSpaceInMB, 101333.00];[LogicalDisk.TotalFreeSpaceInMB, 151150.00];[GatewayNodeName, udfg]

    IR Client version is 3.5.6639.1 and stating it is current.

    Saturday, April 21, 2018 2:30 PM

All replies

  • Hi big_z,

    Have you checked the workload of your Oracle DB with the DBA?

    Thanks,

    David

    Thursday, April 26, 2018 7:22 AM
  • Yes, the DB server load has been reviewed.

    From the IR host we can run SQLDeveloper and run the same query with a very good response rate.

    (There is an open case now with MS and the product team is involved - I will update this post based on what I hear back.)

    Thursday, April 26, 2018 3:13 PM
  • Did you ever hear back from MS?  We are having the same issue

    Steve Coleman

    Wednesday, August 22, 2018 9:15 PM
  • Sorry - I've let this go dormant, hopefully this may help someone in the future.

    Our final result was sometime during troubleshooting some other part of the query process we turned on system wide ODBC tracing. This situation was found using sysinternals process explorer. When running the explorer during one of the data loads there was a file (ultimately the trace file) that was being written to excessively under a user that wasn't the one originally used to set up the trace or located in the location we'd specified.

    After turning off ODBC trace we haven't had an issue since.

    HTH.

    Saturday, November 17, 2018 9:40 PM