Bulk user account creation from CSV data import/ingestion RRS feed

  • General discussion

  • Hi all brilliant minds,

    I am currently working on a fairly complex problem and I would love to get some idea brainstorming going on. I have a C# .NET web application running in Windows Azure, using SQL Azure as the primary datastore.

    Everytime a new user creates an account, all they need to provide is the name, email and password. Upon account creation, we store the core membership data to the SQL database, and all the secondary operations (e.g. sending emails, establishing social relationships, creating profile assets, etc) get pushed onto an Azure Queue and gets picked-up/processed later.

    Now I have a couple of CSV files that contain hundreds of new users (names & emails) that need to be created on the system. I am thinking of automating this by breaking into two parts:

    Part 1: Write a service that ingests the CSV files, parses out the names & emails, and saves this data in storage A

    • This service should be flexible enough to take files with different formats
    • This service does not actually create the user accounts, so this is decoupled from the business logic layer of our application
    • The choice of storage does not have to be SQL, it could also be non-relational datastore (e.g. Azure Tables)
    • This service could be a third-party solution outside of our application platform - so it is open to all suggestions

    Part 2: Write a process that periodically goes through storage A and creates the user accounts from there

    • This is in the "business logic layer" of our application
    • Whenever an account is successfully created, mark that specific record in storage A as processed
    • This needs to be retry-able in case of failures in user account creations

    I'm wondering if anyone has experience with importing bulk "users" from files, and if what I am suggesting sounds like a decent solution.

    Note that Part 1 could be a third-party solution outside of our application platform, so there's no restriction in what language/platform it has to be running in. We are thinking about either using BULK INSERT, or Microsoft SQL Server Integration Services 2008 (SSIS) that ingests and loads data from CSV into SQL datastore. If anyone has worked with these and can provide some pointers that would be greatly appreciated too.. Thanks so much in advance!

    Thursday, November 29, 2012 12:14 AM

All replies

  • I'm not keen on bulk insert, I prefer to have a little more control. I would consider using a queue as storage A. This way you will have more choice about how much to spend on creating workers to feed the queue and process the queue. You'll get a retry mechanism. You'll get a default throttling mechanism, should you get a surge. You'll get a standard way of adding users, rather than have a bulk and non-bulk mechanisms. I don't know you exact business requirements so obviously it's just my feeling at the moment.

    Thursday, November 29, 2012 8:44 AM
  • Thank you pkr2000, that's very interesting and definitely sounds more elegant than a bulk insert into SQL... and it's also nice that we have retrying & throttling mechanisms from queues. I guess now I just have to decide if I want to manage this with the same worker role (as the one that processes new users) depending on the load. I'll give it more thought, thanks!
    Thursday, November 29, 2012 9:46 PM