none
How to manage a huge amount of classe instances in memory ? RRS feed

  • Question

  • Hi there,

    In an application, I load a XML file into a list of classes to be able to use them in my application.

    But some XML files used in this application became quite huge. And when I load them into classes in memory, it use several GB in RAM and became too heavy to use (the application start to lag, freeze, ... due to swap between RAM and disk I guess).

    I'm searching for good practices to solve this kind of memory issues. I need to be able to parse all classes and make some work on it. But without having them all in memory at the same time...

    Any idea will be more than welcome ! :)

    Thanks !

    Fabrice

    Thursday, April 5, 2018 12:58 PM

Answers

  • How are you reading in the XML and how are you deserializing the XML into classes?

    Ideally you want to read the XML in a forward-only read-only manner, which can be done using XmlTextReader. See here for discussion on efficiently reading large XML.

    Then you want to only create classes only as you need them for processing each part of the XML in turn.

    If this doesn't help perhaps you can supply some more information on the code you are currently using to do this.

    Thursday, April 5, 2018 2:07 PM
  • The XmlDocument.Load method, as I understand it, will load and parse the entire document into memory.

    The XmlTextReader class (I've just realised the MSDN recommends the newer XmlReader class) opens the document in a forward-only non-cached way and provides methods to step through it node by node. So it only reads it in as you go along.

    Whether this is better for you really depends on your process. If you need completely random-access to the XML, i.e. you need to jump about and read information from potentially anywhere in the XML in any order, then this may not help you.

    But as long your process is such that you can read a node, create your class, do whatever processing you need and then move to the next node....then the XmlReader class should be much more efficient.

    Thursday, April 5, 2018 3:02 PM

All replies

  • How are you reading in the XML and how are you deserializing the XML into classes?

    Ideally you want to read the XML in a forward-only read-only manner, which can be done using XmlTextReader. See here for discussion on efficiently reading large XML.

    Then you want to only create classes only as you need them for processing each part of the XML in turn.

    If this doesn't help perhaps you can supply some more information on the code you are currently using to do this.

    Thursday, April 5, 2018 2:07 PM
  • Hi RJP1973,

    Great answer !

    So with a XmlTextReader, I can read a XML file without having to open it completely ? And making a few classes at a time (+ do the work and dispose).

    But I have to open the XML file each time I want to work with the content. It takes a long time to open the XML file (with my actual code using XmlDocument.Load). Should it be quicker using XmlTextReader ?

    Thanks for this lead !

    Thursday, April 5, 2018 2:27 PM
  • The XmlDocument.Load method, as I understand it, will load and parse the entire document into memory.

    The XmlTextReader class (I've just realised the MSDN recommends the newer XmlReader class) opens the document in a forward-only non-cached way and provides methods to step through it node by node. So it only reads it in as you go along.

    Whether this is better for you really depends on your process. If you need completely random-access to the XML, i.e. you need to jump about and read information from potentially anywhere in the XML in any order, then this may not help you.

    But as long your process is such that you can read a node, create your class, do whatever processing you need and then move to the next node....then the XmlReader class should be much more efficient.

    Thursday, April 5, 2018 3:02 PM
  • Will try ! Seems to be suitable with my process.

    Thanks again !

    Thursday, April 5, 2018 3:41 PM
  • My understanding is that the problem occurs after the data has been loaded. Is that right? If so then is there any data that is not needed by the processing? If some of the data is needed after the processing but not during the processing then it is a matter of how to get the data after the processing. That could be done by saving the data during the load in a format that makes it easy to get the data later or it could be done by reading the data a second time.


    Sam Hobbs
    SimpleSamples.Info

    Thursday, April 5, 2018 3:43 PM
  • Hi Sam,

    Thanks for your answer !

    To add some details : the XML loaded is a survey (questionnaire + all answers from all respondents). So I can load the questionnaire in RAM and not the respondents (which could be a lot) using the XmlReader.

    And when I need to make some calculations on the respondents, I load them one at a time with the XmlReader. I just hope this will not be too slow.

    Friday, April 6, 2018 8:30 AM