locked
Compressing and storing data RRS feed

  • Question

  • User-1458727574 posted

    Hi all

    This isn't specifically web API related, more a general question looking for ideas.

    I have a web API which gets sent XML data in the HTTP request. Before it processes that XML it copies it to a logging database via Serilog. That's all working nicely, but sometimes the XML documents are huge. For example, one of the controllers in the API takes customers from the XML and creates them in the target system behind the web API.

    When you're sending 15,000 customers, the XML is pretty large. If I were storing the XML in the file system, I could easily zip the XML into a zip file rather than store the raw XML. Any recommendations on compressing the data and storing it in the database? I need to be able to extract the data sometimes as well, and I have no tool to do that so I am looking for ideas.

    Thanks

    Tuesday, January 22, 2019 4:40 PM

All replies

  • User36583972 posted


    Hi friend,

    Hi all

    This isn't specifically web API related, more a general question looking for ideas.

    I have a web API which gets sent XML data in the HTTP request. Before it processes that XML it copies it to a logging database via Serilog. That's all working nicely, but sometimes the XML documents are huge. For example, one of the controllers in the API takes customers from the XML and creates them in the target system behind the web API.

    When you're sending 15,000 customers, the XML is pretty large. If I were storing the XML in the file system, I could easily zip the XML into a zip file rather than store the raw XML. Any recommendations on compressing the data and storing it in the database? I need to be able to extract the data sometimes as well, and I have no tool to do that so I am looking for ideas.

    Thanks

    You can refer the following articles.

    1: ASP.NET Web API GZip compression ActionFilter with 8 lines of code
    https://www.radenkozec.com/asp-net-web-api-gzip-compression-actionfilter/

    2: ASP.NET Web API Compression : the latest code for this article can be found on Github.

    3: 8 ways to improve ASP.NET Web API performance
    https://www.radenkozec.com/8-ways-improve-asp-net-web-api-performance/

    Note: This response contains a reference to a third party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.


    Best Regards,

    Yong Lu

    Wednesday, January 23, 2019 2:43 AM
  • User-1458727574 posted

    This seems to be referring to compressing the requests and responses. I am not bothered about those. If the user sends me an XML doc that is 5000 lines long in the request header, that I can live with. I need to compress it to store it in a varchar(max) field in the logging database that Serilog creates. This is all about compressing the XML for storage. The request/response, I don't care about at the moment.

    edit 

    The reason I need to compress the XML for storage is that when I am checking the log files, selecting data from the Serilog table takes about 6 minutes to pull back about 200 lines. It's insane. I know there is an indexing issue here but coding SQL is something I can do. Indexing and working out what indexes I need to apply are a bit of a dark art. Using the tuning advisor is all well and good which is what people recommend but knowing what it tells you or how to effectively use it are entirely another.

    edit 2 

    This is the sort of thing I'm looking to do:

    https://stackoverflow.com/questions/1089150/net-compression-of-xml-to-store-in-sql-server-database

    Wednesday, January 23, 2019 9:18 AM
  • User1120430333 posted

    https://www.mssqltips.com/sqlservertip/5709/using-compress-and-decompress-in-sql-server-to-save-disk-space/

    Other than that,  use GZip or some other compression utility and compress the data and save to a MS SQL server table using a Max Binary as a table column definition I suspect, compress in  memory and save. 

    Wednesday, January 23, 2019 12:12 PM
  • User753101303 posted

    Hi,

    Seems bit weird to log the web service payload especially when huge. Not sure if Serilog could handle compression behind the scene. Else I would consider:
    - to log a guid and save the payload somewhere else
    - to see if it couldn't be better logged where the data is used. Not sure but I assume it is later used to change customer accounts. I would perhaps see if I couldn't log there which actual changes are done

    Wednesday, January 23, 2019 12:40 PM
  • User-1458727574 posted

    When the client sends the payload and the web API processes the payload and responds, I have no record of the payload so in the event of a problem, I cannot recreate the problem on my dev system because I won't have the original payload. That's why it needs logging somewhere. I suppose I could dump it to another table and have a guid as the primary key and when it writes, return the guid and store the guid as you say. Might be worth a shot.

    Wednesday, January 23, 2019 12:53 PM
  • User36583972 posted


    Hi friend,

    You may can try to use the SQL CLR integration.

    Compressing LOB (XML) Data in the Database
    https://aboutsqlserver.com/2015/04/07/compressing-lob-xml-data-in-the-database/

    How to decompress the compressed XML at SQL Stored Procedure side
    https://social.msdn.microsoft.com/Forums/sqlserver/en-US/70596676-530b-4b35-b39c-fe533376e8e8/how-to-decompress-the-compressed-xml-at-sql-stored-procedure-side?forum=sqlnetfx

    If you have any question about SQL Server, you can go to the MSDN SQL Server forums for getting help.

    Note: This response contains a reference to a third party World Wide Web site. Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there. There are inherent dangers in the use of any software found on the Internet, and Microsoft cautions you to make sure that you completely understand the risk before retrieving any software from the Internet.

    Best Regards,

    Yong Lu

    Thursday, January 24, 2019 1:36 AM
  • User-474980206 posted

    if your database is sqlserver, be sure to use the xml datatype. in this case, the xml is parsed, and stored as a tree which is much smaller than the original xml. you can even set the table to stored compressed for even more savings.

    Friday, January 25, 2019 8:41 PM
  • User-1458727574 posted

    How do I tell Serilog to create the table and stipulate the data type when it creates the standard table? 

    Monday, January 28, 2019 11:17 AM
  • User36583972 posted


    Hi AnyUser,

    How do I tell Serilog to create the table and stipulate the data type when it creates the standard table? 

    Please refer the Serilog.Sinks.MSSqlServer or other Serilog sink.

    Serilog.Sinks.MSSqlServer : A Serilog sink that writes events to Microsoft SQL Server. This sink will write the log event data to a table and can optionally also store the properties inside an XML or JSON column so they can be queried. Important properties can also be written to their own separate columns.


    Best Regards,

    Yong Lu

    Tuesday, January 29, 2019 3:00 AM
  • User-1458727574 posted

    I have read that page, plenty of times. I configure Serilog via the appsettings.json and it shows how to create the standard table, and it also shows how to remove standard columns, and add custom ones, but nothing shows you how, when configuring Serilog this way, how to populate the new custom columns, and nowhere does it show how to change the data type of the standard columns. If I remove some of the standard columns and then re-add them with new data types, how does Serilog react when the datatype is different from what it is expecting?

    Wednesday, January 30, 2019 12:14 PM
  • User36583972 posted

    Hi friend,

    AnyUserNameThatLetsMeIn

    I have read that page, plenty of times. I configure Serilog via the appsettings.json and it shows how to create the standard table, and it also shows how to remove standard columns, and add custom ones, but nothing shows you how, when configuring Serilog this way, how to populate the new custom columns, and nowhere does it show how to change the data type of the standard columns. If I remove some of the standard columns and then re-add them with new data types, how does Serilog react when the datatype is different from what it is expecting?

    I suggest you can go to the Serilog Support Center(Resource- Discussion forum) for getting suitable help.

    If you have a new question you can start a new thread. Please don't ask several questions in the same thread.


    Best Regards,

    Yong Lu

    Thursday, January 31, 2019 3:02 AM
  • User-1458727574 posted

    Been there. They direct you to the gitter chat. I've been there as well and didn't get anything more than being pointed back the same web pages that don't help me, hence I am here asking a wider audience for help.

    Thursday, January 31, 2019 9:32 AM