Could Syntax.ParseCompilationUnit (and family) be changed to accept a TextReader instead of a string?
-
16 ноября 2011 г. 2:17
Seems a shame to have to read an entire file into memory as a string instead of pulling it in lazily using a TextReader. Ran a little experiment and it seems like the string is not referenced after the parse:
WeakReference wr = new WeakReference(text); var parse = Syntax.ParseCompilationUnit(text); text = null; GC.Collect(); // at this point the wr is null
When parsing an entire source tree of C# files might all that loading be an inefficient use of memory? Is that not a concern? Or is there some API that does accept a TextReader that I'm missing? Maybe some way with SyntaxTree.ParseCompilationUnit?
Thanks,
Chris- Изменено kingces95 16 ноября 2011 г. 2:20
Все ответы
-
16 ноября 2011 г. 4:25Владелец
SyntaxTree.ParseCompilationUnit() has an overload that accepts an "IText". You could supply your own implementation of IText that operates over a TextReader / a stream to get lazy text processing. I believe that internally SyntaxTree.ParseCompilationUnit()
and the C# and VB parsers are efficient about not reading the entire IText at once etc.When the Roslyn services call SyntaxTree.ParseCompilationUnit() inside Visual Studio, I believe they pass it an IText implementation that operates on top of Visual Studio ITextSnapShot that already process text lazily.
Shyam Namboodiripad | Software Development Engineer in Test | Roslyn Compilers Team -
16 ноября 2011 г. 19:15
Cool. Well, it's a bit of a hassle to implement an IText don't you think? Please consider adding an overload that accepts a TextStream.
Thanks,
Chris -
16 ноября 2011 г. 19:24Владелец
Yes adding an overload that directly accepts a stream would certainly make this easier. I'll log this as a suggestion internally. Thanks for the feedback!
Shyam Namboodiripad | Software Development Engineer in Test | Roslyn Compilers Team -
26 января 2012 г. 4:24I'd go so far as to say that it would be nice to have an overload of ParseCompilationUnit that only took a path and would open the file directly.
-
26 января 2012 г. 5:46ВладелецThanks for the feedback epotter - I have updated the bug that I had previously logged for this issue with your suggestion.
Shyam Namboodiripad | Software Development Engineer in Test | Roslyn Compilers Team -
7 марта 2012 г. 6:30Владелец
Chris,
When we initially designed these APIs our profiling revealed no benefit to parsing text from Streams instead of reading the entirety of the file into memory for source files from disk. We do have an internal Stream based implementation of IText which processes streams but in order to correctly detect UTF encodings must read the entire stream anyway. Based on your feedback we've decide to publish this implementation in the next CTP but to not add overloads which take Stream directly since as I said there aren't compelling perf wins to going that route. If you are experiencing performance bottlenecks in your own applications that can be traced to this please let us know.
Regards,
-ADG
Anthony D. Green | Program Manager | Visual Basic & C# Languages Team
-
20 марта 2012 г. 3:32
Anthony,
The problem I'm concerned with is working set not speed. If I take a dependency on an ill performing library than I can cache to fix speed but there is little I can do to fix working set. This is why working set is a more serious concern.
Roslyn loads and keeps too many strings in memory. Plain and simple. Why keep strings in memory that the user never requests? When Roslyn is integrated into VS (beyond a demo) string working set will be the performance issue.
The working set for a Roslyn AST is majority strings. As such the solution is to have each node of the AST point to the byte in the file that Unicode decoding can begin. This offset is discovered and cached when a file is first loaded (streamed) and the bytes are decoded into characters for tokenization. The result is that after a file has been tokenized and converted into an AST there are no strings loaded in memory. If the user wants a string (e.g. FullSpan) the AST node knows the offset in the file to begin decoding.
Given that streaming solves the working set problem it is natural to expose streams on the public surface area so I'm glad to know that will be in the next version. However BCL convention is that System.IO.Stream (or maybe TextStream) is the correct abstraction -- not IText. TextStream can already operate over an arbitrary underlying data source so introducing a new interface is redundant (not to mention that interfaces do not version).
That's my 2c. :)
Thanks,
Chris- Изменено kingces95 20 марта 2012 г. 3:40
-
22 мая 2012 г. 4:46
If I may throw in my two cents on this...
I realize this may be obvious to some, but given these shortcomings I have been loading files this way:
var code = new StreamReader(pathToTextfile).ReadToEnd(); SyntaxTree tree = SyntaxTree.ParseCompilationUnit(code);
I agree with the need to add overloads as people have mentioned here. This is the easiest way around the issue I have found.
Thanks,
Bob

