none
Updating database with file feeds from Internet and then parsing the XML, CSV or Excel data for insert/update/delete in Yukon

    Question

  • Hi,

    I'm developing an ASP.NET web application with SQL Server 2005 standard edition on the backend. I am looking for a solution to the following problem:

    Need to have SQL Server 2005 do the following at 3am each morning:

    1) Download updated pricelists from URLs stored in the database
    2) The pricelists vary in format - some in XML which is ideal, others in CSV which is not bad, a few are in simple excel .xls format which is not so good and some others are formatted with eye-candy for human reading which is terrible. For the moment I am only looking to read in the first three as XML and enter the data into my database. As you can imagine, the formats of each file may differ, so I may have to apply an XSLTransform to get them in a format which can be easily stored into my database although ideally I would like to stream the data in as chunks over the Internet and update the database as I go so that I don't need to store the data or convert it.

    I have a solution to this problem which lies in the application tier (ASP.NET worker process) that works as follows:

    1) Reads in the pricelist asynchronously over the Internet.
    2) Writes the pricelist to disk in its native form
    3) If the file is an XML file applies an xslt transform which converts the file into an appropriate xml format and validates against an xml schema (.xsd) file
    4) If the file is a CSV file, I use XMLCSVReader to read the CSV file as if it were an XML file and once again I transform against an XSLT transform file to get the xml in an appropriate format and then I validate against an xml schema (.xsd) file
    5) If the file is an Excel file, I fill a DataTable with the data read from the file and write a new XML file from the dataset using the DataSet.WriteXML function from the .NET framework. I then run a XSLT transform against this file to get the file in an appropriate format and then validate against an xml schema (.xsd) file
    6) For each file I use ADO.NET to pass the valid xml to a stored procedure which prepares the xml document using sp_xml_preparedocument, creates a temporary table and inserts this data into it. I then compare this data to the data already in the database and insert, update or delete as necessary. If the data is updated I set a flag on the row for reasons that are outside the scope of this question.

    As you can see this is a very convoluted and lengthy solution to this problem. I worked out this solution using ASP.NET 1.1 and SQL2K. It has the following limitations:

    1) Unnecessary usage of resources - I read that SQL Server 2005 has the .NET CLR integrated allowing me to deploy C# code within in the SQL Server process. This means that I should be able to accomplish all of the above exclusively on the database tier. This also means that I do not have to transfer huge amounts of data to sql server via ado.net
    2) ASP.NET did not have the facility for a script to run at specific time. This is not a huge problem - a variety of solutions can be worked without too much effort.

    I'm sure you can think of more reasons why this solution is less than ideal.

    I'd like to find out the following:

    1) what should I read up on to find out if this is a workable solution.
    2) some high level design tips on how you would go about solving this problem

    Any other advice and tips will be greatly appreciated!

    Kind Regards,

    Imran
    Friday, June 09, 2006 9:21 AM