none
Integrating multi-threading into this application. Are there any issues you can see arising from this? RRS feed

  • Question

  • Hello.

    Sorry in advance for the long post.

    We currently have a c# console application that pulls medical claims from our client via a web service.  Our application doesn't handle multi-threading at the moment.  Unfortunately, we've recently been told by the client that we're going to start receiving millions of claims all at once (as opposed to the typical few thousand per day) & after doing the math, we've determined it would take weeks for this application to process that many claims, so we thought introducing multi-threading into the application would help.  However, this raised a question of a potential issue: if it's multi-threaded, how do we ensure no claims get lost in the mix and never pulled?

    Our current process is as follows:

    1. We pull an initial list of all claim id's from the client.  This is a one-time pull and we store the list.
    2. We then read off the next claim from that list & call their web service for the details of that claim.  We do this one claim at a time, until we've finished pulling the details of every claim from that list.
    3. Before pulling each claim, we store the value of the claim id we're trying to process into a "Last Claim ID Processed" field in our db table, in case there's ever a system crash and we need to pick up later with the claim id where we left off with.  This way, our application just basically says "ok, I see there was a failure last time we ran.  Lets start off with claim id#12345 rather than starting from the very beginning of the list"
    4. Finally, we store that claim info in our database and move onto the next claim.

    The bulk of our time is at step #2, where we process each claim, so this is the piece we want to split into multi-threading.  Where we can split this process into about 10 threads and each one can asynchronously read the next claim id from the claim list that we pulled in step #1.  Theoretically, that should make our application process claims 10 times faster than it currently is.  But my problem is how would each thread know which other claims are already in the midst of being processed?

    So again, my concern is, how do we ensure no claims get lost in the mix?  For example, I'm concerned about the following scenario:

    1. Thread #1 reads from the claim list & pulls the next claim. Claim id 10001.  Before processing, it stores the value of this claim id into our "Last Claim ID Processed" field in our import summary table, to let us know it's the last claim processed in case of a crash (step #3 above)
    2. Thread #2 reads from the list & pulls claim id 10002 & repeats the same process
    3. Thread #3 reads from the list & pulls claim id 10003 & repeats the same process
    4. Thread #1 finishes processing claim id 10001 & starts pulling claim id 10004. And again, stores that claim id value into the import summary table.
    5. Thread #3 crashes and never finishes pulling claim id 10003...

    In the above scenario, we have a big problem!  Claim 10003 crashes but we'll never know because thread #1 has already overwritten the "Last Claim ID processed" field with 10004.  

    If you've stuck with me this long, I thank you (again, sorry for the long post).  But I guess I'm just looking for a little advice on the best way to handle this new multi-threading process we want to integrate.  I need to be VERY careful that no claims end up lost in the mix.

    Thanks in advance for any advice you can provide!

    Thursday, November 15, 2018 9:17 PM

All replies

  • You can have a single threaded application reads the claims and store them into database, emits any error report for missing fields and the sort on the way, and then have another control table that uses "ClaimID" as key to join, and contains control information like process datetime and user for each steps (some automated process may just use step name as username), current step name, and process start time for current step.

    Now you can have multiple thread burns the claim record according to the business logic. Say, there is a step called "validate maximum claim according to insurance contract"(v_maxclaim), your thread fetches the record which the v_maxclaim_user field is null, or current_step = "v_maxclaim" and process time as passed more than, say, 1 hour. You can also set a retry count field that is reset when the step advances, and stop retry process and alert administrators if the retry count exceed certain value.

    Friday, November 16, 2018 1:50 AM
    Answerer