locked
Which one to use in multiple SQL query, Parallel.For or Tasks/async? RRS feed

  • Question

  • Hi,

    i've created a data governance tool, in one module of which i schedule a job that runs a few sql queries. It is now running synchronously but i want to convert it to a way that it runs faster.

    the question is, should i use Parallel or Task or Async/Await method in order to achieve what i want?

    EDIT:These queries are completely different from each other. They each run a different data quality checks on different columns or tables. These are all select queries, and i use the ExecuteScalar method returning only the count of the rows that breaks my quality rules, such as "how many null values in ABC column" I scheduled them to run at night as soon as DWH is populated.

    EDIT2: i query Oracle and i write the number of rule violations into Ms Access. And i dont have the access for neither DB schedulers nor Informatica ETL tool. i need to do the scheduling programattically in my own tool. all i want is, all the querying takes 2 hours to run and i want to reduce the duration 

    scheduling isn't the key part here. let me change my question this way, how would achieve this by running these querries now, not by scheduling. i just want to know which one the proper way is, Parallel.For or Task/Async? i am asking this , coz i tried to implement the Task/async way but couldn't manage it. and i am wondering if the Task/async method is not suitable for this goal and should i use the Parallel.For instead, or am i just doing something wrong with Task/async


    Dig your well before you are thirsty


    Monday, October 15, 2018 8:29 AM

Answers

  • Hi ExcelinEfendisi,

    If the data does not conflict, you could try to use Parallel.For. It would save your time.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, October 18, 2018 8:27 AM

All replies

  • 1) Optimize the queries on the database side.

    2) When the queries cannot produce locks, then it further depends on the structure of your application:

    Parallel.For makes only sense, if your queries are a mutable list. If they are fixed number of queries, then Task.WaitAll() is imho the first choice. Async/await should only be used, if you already working async in you application.

    Monday, October 15, 2018 9:02 AM
  • thanks Stefan,

    i jsut edited my post giving more details. could you reevealuate your answer please.

    thanks


    Dig your well before you are thirsty

    Monday, October 15, 2018 10:09 AM
  • Hi ExcelinEfendisi,

    If the data does not conflict, you could try to use Parallel.For. It would save your time.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, October 18, 2018 8:27 AM
  • Thanks Wendy

    so if i have 50 sql queries, there is no point creating 50 threads or tasks in order to speed up the whole job. it will finish in the same duration. correct?


    Dig your well before you are thirsty


    Friday, October 19, 2018 10:02 AM
  • Well, as these are data quality checks, this means, that you need to query a hugh amount of data. Thus this is per se a slow overall operation.

    In this case you simply need to test how long each approach takes. But I don't expect a huge improvement from simply parellaizing these queries.

    Normally the biggest gain could be achieved by rearranging/optimizing those queries. E.g. instead of three queries like  "how many null values in ABC column" run a single one:

    SELECT SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END),
    SUM(CASE WHEN B IS NULL THEN 1 ELSE 0 END),
    SUM(CASE WHEN C IS NULL THEN 1 ELSE 0 END)
    FROM yourTable;


    Friday, October 19, 2018 3:42 PM