Answered by:
Memory leaks with XSLT Transform?

Question
-
Hello,
I am encountering memory issues with my XslTransform.Transform method. Whenever my program executes that method, I get a huge memory spike of over 700MB and that line alone takes over 2 mins to process:
XslTransform objXSLT; objXSLT = new XslTransform(); objXSLT.Load(XSLTPathName); StreamWriter sw = new StreamWriter(outputFileName); objXSLT.Transform(new XPathDocument(XMLFileName),null,sw,null); //Memory spike here sw.Close();
Another memory spike occurs when I try to load the XmlDocument:
XmlTextReader reader = new XmlTextReader(pathname); XmlDocument doc = new XmlDocument(); doc.Load(reader); //Another memory spike here reader.Close();
I'm using .NET 1.1. My XML file is about 80MB, but the file size is not within my control (ie, I cannot reduce the file size).
Are these known issues for XslTransform in .NET 1.1 (or any newer version)? Are there any workarounds, besides reducing the input XML file size?
Thanks in advance.
Wednesday, April 14, 2010 7:10 AM
Answers
-
Hi,
I would expect some small speedup between 1.1 and 2.0/3.5 without the need to touch your code. But the real speed up will only happen if you re-code your usage of the APIs to use the approach suggested above.
Simply put, the way you use the APIs means we have to do lot of work (namely cache the entire XML in-memory). We did improve some perf characteristics of these APIs in 2.0/3.5, but those are just small improvements. In order to get radically different results you need to use different APIs (in a different way) to not ask us to do the expensive stuff. The most benefitial change would be to read the input XML with XmlReader (use XmlReader.Create to create one in 2.0), parse the top level stuff manually (like the root element and so) and if you find parts which you need to process using XSLT, use the ReadSubtree method to create a reader over just that portion and pass that as the input to the XSLT (which will need to load it in-memory, which is the expensive part).
Thanks,
Vitek Karas [MSFT]- Marked as answer by Windhoek Wednesday, May 5, 2010 3:34 AM
Monday, May 3, 2010 6:32 PMModerator
All replies
-
Whether it is XmlDocument or XPathDocument, both build an in-memory tree model of the complete document so that tree alone can consume memory several times the size of the file although 700MB for a file of 80MB size seems rather excessive. On the other hand you say you get that on the Transform call which would result in both loading the input and transforming it to the output tree and serializing that. The output tree also consumes memory.
I don't think there is much you can do with your C# code to reduce memory consumption, unless of course you change the whole approach and try not to feed the complete XML document to XSLT. Since .NET 2.0 there are methods like ReadSubtree which would allow you to use an XmlReader to read through some huge XML document in a forwards only way with low memory footprint and then to pass on only sections to other APIs like XPathDocument or XmlDocument or XSLT. As you seem to be using .NET 1.1 ReadSubtree is not even an option. Maybe ReadNode , documented as existing since 1.0 could help. In any case such approaches are only helpful if your transformation mainly works on independent subtrees of the complete document and of course even then you need to adapt your stylesheet and your result generation.
MVP Data Platform Development My blog- Proposed as answer by Pawel KadluczkaModerator Thursday, April 15, 2010 1:05 AM
Wednesday, April 14, 2010 10:57 AM -
Thanks Martin.
Just to clarify: Are you suggesting using ReadNode to replace the loading of the XmlDocument using XmlTextReader?
Thursday, April 15, 2010 2:47 AM -
Hi,
The suggestion is to not try to process the entire input in one go (as XSLT needs to cache the entire input for it to work), instead try to split the input into several chunks on which you run your XSLT (obviously this would require the XSLT to be able to handle that). The splitting can be done by using XmlTextReader (since that one will nto cache anything) to find the node you're interested in, and then using ReadNode to load just that node (and its subtree) into memory, to be able to run XSLT on it. Once you're done with that node, you continue using the XmlTextReader to move to the next node of interest and repeat.
Thanks,
Vitek Karas [MSFT]Thursday, April 15, 2010 9:15 AMModerator -
Hi Vitek,
Thank you for your suggestion. Splitting the input into chunks is not really feasible for us because the input XML comes from a 3rd party, and they may change the structure/sequence of the fields any time.
What I'd like to find out from you is: Is this a .NET 1.1 limitation, and is it resolved in .NET 2.0/3.5? If we run these methods on .NET 3.5, are we likely to encounter the same memory leak?
Or do we still need to change our coding approach completely and use ReadSubTree (as suggested above by Martin) or split the input into chunks (as you suggested)?
Thanks very much.
Friday, April 23, 2010 3:54 AM -
Hi,
Note that I didn't suggested that you split the input XML file! I suggested you split the XML which is the input for the XSLT. You can do this programatically using the ReadSubTree (as an example).
The behavior should not change dramatically between .NET 1.1 and 2.0/3.5. I would expect 2.0/3/5 to be faster (as it provides bettwer APIs to do these things), but overall this particular problem seems to be about caching too much data in-memory, and that depends on your usage patterns, not on the particular API.
Thanks,
Vitek Karas [MSFT]Friday, April 23, 2010 11:09 PMModerator -
Hi Vitek,
Yes I noted you weren't suggesting I split the input XML file. Thanks.
You said you expect 2.0/3.5 to be faster as it provides better APIs to do this. Does it mean that without any re-coding (of this function), you expect it to run faster in .NET 3.5? Or do I have to re-code with different methods (eg, splitting the input XML file using XmlTextReader) in order to see any performance improvement?
Thanks.
Monday, May 3, 2010 10:12 AM -
Hi,
I would expect some small speedup between 1.1 and 2.0/3.5 without the need to touch your code. But the real speed up will only happen if you re-code your usage of the APIs to use the approach suggested above.
Simply put, the way you use the APIs means we have to do lot of work (namely cache the entire XML in-memory). We did improve some perf characteristics of these APIs in 2.0/3.5, but those are just small improvements. In order to get radically different results you need to use different APIs (in a different way) to not ask us to do the expensive stuff. The most benefitial change would be to read the input XML with XmlReader (use XmlReader.Create to create one in 2.0), parse the top level stuff manually (like the root element and so) and if you find parts which you need to process using XSLT, use the ReadSubtree method to create a reader over just that portion and pass that as the input to the XSLT (which will need to load it in-memory, which is the expensive part).
Thanks,
Vitek Karas [MSFT]- Marked as answer by Windhoek Wednesday, May 5, 2010 3:34 AM
Monday, May 3, 2010 6:32 PMModerator -
Vitek, thanks very much! It's really helpful.Wednesday, May 5, 2010 3:35 AM