locked
Frontend, Backend Architecture for Batch Processing RRS feed

  • Question

  • User1529847886 posted

    Hi!

    I would like to <g class="gr_ gr_74 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" id="74" data-gr-id="74">ask</g> some recommendations for designing my application. Any comment and links for references are greatly appreciated.

    I am developing an app to process CSV files and generate Word/PDF files and I'm planning to have a frontend (React) and either a Web API or just library to do <g class="gr_ gr_464 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="464" data-gr-id="464">process</g>.

    normally, this app would process around 200-300 files per batch (CSV)

    my concern is that I want to have the following features:

    1. Create a Portal to upload the file and display them on a table
    2. The CSV file would be uploaded and when the user submits, the file will be processed by the Web API or just a library (whichever is best)
    3. I want to update the table (frontend) to display that the record was processed.
    4. After processing, the user can download the files in a zip file.

    My concerns are

    1. Where would be the batch processing happen? pass the CSV file to the Web API and process it all there (which I think would have <g class="gr_ gr_3104 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="3104" data-gr-id="3104">conflict</g> on updating the table per row)
    2. Do I just create a Web API to process one file <g class="gr_ gr_3315 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="3315" data-gr-id="3315">at at time</g>? and making it async, (Future plan, is to have the Web API to send the output file as an email)
    3. What I do right now is using Spire doc, to generate <g class="gr_ gr_4510 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="4510" data-gr-id="4510">file</g> and save to word, then using MS Interop to open the file and Save as PDF. I'm wondering if I can remove the actual saving of the file on the server, instead have it save in memory (which if I do a batch process, can cause <g class="gr_ gr_5971 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="5971" data-gr-id="5971">memory</g> leak

    Above are my concerns, I would really appreciate any suggestions and references that I can read to accomplish this. <g class="gr_ gr_7540 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar replaceWithoutSep" id="7540" data-gr-id="7540">As</g> I wanted to improve my development skills when it comes to Architecting my application. 

    I am also looking for people to connect to help <g class="gr_ gr_7610 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling multiReplace" id="7610" data-gr-id="7610"><g class="gr_ gr_7625 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar replaceWithoutSep" id="7625" data-gr-id="7625">to</g></g> and where I can learn more.

    Sunday, March 24, 2019 3:27 PM

Answers

All replies

  • User-943250815 posted

    What you looking for, is a Background Task to process files.
    You can make your own task manager or use on of many available, bellow 2 suggestions
    Quartz.Net https://www.quartz-scheduler.net/
    HangFire https://www.hangfire.io/
    Both available on nuget

    Sunday, March 24, 2019 3:42 PM
  • User1529847886 posted

    Hi Jzero,

    thank you for the said links I would have a read through them.

    Sunday, March 24, 2019 3:49 PM
  • User475983607 posted

    200-300 files is a lot of files for a user to input in a browser.  With that being said, uploading multiple files over HTTP is not a new idea and there are many file upload tutorials.  Querying a directory of file names is also very straight forward using the System.IO namespace.

    Where would be the batch processing happen? pass the CSV file to the Web API and process it all there (which I think would have conflict on updating the table per row)

    Processing happens in the server that has the files.  The processing can be kicked off by an HTTP request (Web API), a system file watcher, or a scheduled task.  It depends on the application requirements.   The "conflict on updating the table per row" concern is not clear.  

    Do I just create a Web API to process one file at at time? and making it async, (Future plan, is to have the Web API to send the output file as an email)

    Your requirements are not very clear so it is hard to understand what this application does.   If the user is waiting for a response so they can add edits or fix business errors related to the 300 files being processed then process the files in a background task.  Alert the user when the process is complete.  If the user is not waiting for the response then offload the processing to a scheduled task or code triggered by a file/folder listener.

    What I do right now is using Spire doc, to generate file and save to word, then using MS Interop to open the file and Save as PDF. I'm wondering if I can remove the actual saving of the file on the server, instead have it save in memory (which if I do a batch process, can cause memory leak

    MS Interop is not intended or recommended for use in web applications.

    Sunday, March 24, 2019 3:57 PM
  • User1529847886 posted

    Hi mgebhard,

    I really appreciate your feedback on my concerns. please see below my response.

    200-300 files is a lot of files for a user to input in a browser.  With that being said, uploading multiple files over HTTP is not a new idea and there are many file upload tutorials.  Querying a directory of file names is also very straight forward using the System.IO namespace.

    The user would only upload 2 <g class="gr_ gr_2274 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="2274" data-gr-id="2274">file</g> (CSV) and based on the number of rows on the CSV that would be the output (word/pdf) and the other is the template which would be populated based on the CSV contents.

    Processing happens in the server that has the files.  The processing can be kicked off by an HTTP request (Web API), a system file watcher, or a scheduled task.  It depends on the application requirements.   The "conflict on updating the table per row" concern is not clear.  

    As I mentioned, there <g class="gr_ gr_2102 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace" id="2102" data-gr-id="2102">is</g> only 2 files, one would be the CSV and the other is the word template. What I have in mind is that once the user uploads the CSV file, it would be displayed on a table in the portal, and while the CSV is being processed by row, the Table on the portal would show it's status (that's what I meant by updating the table per row). Not really sure if I need to create a Web API with a batch process or have the portal loop on the <g class="gr_ gr_7062 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" id="7062" data-gr-id="7062">csv</g> rows and call the Web API over and over.

    Your requirements are not very clear so it is hard to understand what this application does.   If the user is waiting for a response so they can add edits or fix business errors related to the 300 files being processed then process the files in a background task.  Alert the user when the process is complete.  If the user is not waiting for the response then offload the processing to a scheduled task or code triggered by a file/folder listener.

    Sorry for that, the application requirements is to have a portal where the user can upload a <g class="gr_ gr_9044 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="9044" data-gr-id="9044">csv</g> and word template, and from that template populate it with the <g class="gr_ gr_10138 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" id="10138" data-gr-id="10138">csv</g> contents. this would be the initial functional requirements that we identified, from there we are planning to extend the portal functionalities. that is why I'm trying to make my application flexible and designing it correctly.

    My Idea right now is Frontend (React) -> Web API

    the concern is if I make <g class="gr_ gr_15588 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="15588" data-gr-id="15588">th</g> Web API accept the CSV as a batch, or just create one service and have the Frontend call it based on the number of rows in the CSV.

    MS Interop is not intended or recommended for use in web applications.

    would you have any recommended libraries that can be used on the server side/web application?

    Sunday, March 24, 2019 4:13 PM
  • User475983607 posted

    This sounds more like mail merge in Word.   Are you sure this needs to be a web application?  

    Sunday, March 24, 2019 4:56 PM
  • User1529847886 posted

    Hi!

    This sounds more like mail merge in Word.   Are you sure this needs to be a web application?  

    Yes, this would be distributed to all our site offices, Mail merge works similarly, but they wanted this to be part of their internal website.

    Sunday, March 24, 2019 6:50 PM
  • User475983607 posted

    Yes, this would be distributed to all our site offices, Mail merge works similarly, but they wanted this to be part of their internal website.

    I'd rather not reinvent the wheel.  Certainly sounds like an Office automation plugin installed through policy.  

    But if you want to build a Web App then go ahead.  There's not much to it at a high level; upload, process, response.  Use Open XML for processing Word.

    https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.mailmerge?view=openxml-2.8.1

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Sunday, March 24, 2019 7:12 PM