none
String/StringBuilder - creating a substring without creating a temp string RRS feed

  • Question

  • Hi all;

    First off, this is for a .net 2.0 DLL that is usually called by a .NET 4.0+ app. (It is 2.0 because this is a commercial library and many are still on 2.0 - 3.5).

    We have cases where we have a string or StringBuilder and need a substring from it. And in various places it is any of the 4 combinations (source = string or StringBuilder, new object = string or StringBuilder). The problem we have is our program is using a lot of memory and all the temporary strings that exist for about 4 instructions are making the grabage collector work a lot.

    The best I have come up with is (our code isn't written this way, I'm just doing this to illustrate concisely:

    string srcString = "thanks for the help";

    StringBuilder srcStrBldr = new StringBuilder("thanks for the help");

    1. StringBuilder middle = new StringBuilder(srcStrBldr.ToString, 7, 7, 7);
    2. StringBuilder middle = new StringBuilder(srcString, 7, 7, 7);
    3. string middle = srcStrBldr.ToString().Substring(7, 7);

    Questions:

    1. When running a .NET 4.0 app, is it still using the .NET 2.0 string class for the internals?
    2. Looking at the StringBuilder code via reflector shows that #1 will work great if the string was created in the same thread and the ArrayLength is short enough. However, running under VisualStudio, where it is a single threaded app, the StringBuilder.m_currentThread == 0 which does not match the current thread - and so a temp string is created. Any way around this? (I tried to get VisualStudio to step into this code - but couldn't make it happen - argh!)
    3. #2 above I think is fine.
    4. #3 above clearly creates a temp string.

    Is there a better way to do this? If it is using the .NET 2.0 class, then being able to access StringBuilder.m_StringValue would let me write code for all of the above that does not create a temp string object. I could do that with reflection but that takes time which then puts me in the speed vs memory usage trade-off. And, what if the internal code changes (unlikely but possible).

    So... any suggestions?

    thanks - dave


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    • Changed type Fred BaoModerator Wednesday, September 24, 2014 5:20 AM The discussion type is more suitable for this case
    • Changed type DavidThi808 Wednesday, September 24, 2014 1:03 PM it's a question
    Sunday, September 14, 2014 6:16 PM

All replies

  • If the dll is reading an entire large text file into memory there isn't much you can do.  The best way of reading text is using ReadLine() method.  RegEx is also problematic with large text files that it also use lots of temporary memory.

    To really troubleshoot problems like this you can open Task Manger and then step through the code to find out what is actually using all the memory instead of guessing.  Some times problems like this can be solved by disposing objects when they are not being used.


    jdweng

    Sunday, September 14, 2014 6:36 PM
  • Hi;

    These are all very small strings (say 5 - 20 characters). And I used dotMemory to measure this - yes those specific cases are creating 13,000,000 temporary strings when my app runs (we are creating million row spreadsheets).

    thanks - dave


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Sunday, September 14, 2014 7:36 PM
  • Then why do you need a dll?  I would open a text file and simply write data one row at a time.  Use JOIN to combine data using a comma as the separator so you have CSV data.  Uses very little memory.

    jdweng

    Sunday, September 14, 2014 10:13 PM
  • Our customers want an XLSX file, with formatting, formulas, etc. You can't do that with a CSV.

    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Sunday, September 14, 2014 10:47 PM
  • You always can read the CSV with Excel and then save as XLSX after you finish.


    jdweng

    Monday, September 15, 2014 7:22 AM
  • Joel - if we did that we would not carry through the formatting, formulas, etc because CSV does not have those concepts.

    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Monday, September 15, 2014 10:50 AM
  • Hi;

    Does anyone have a suggestion on how to reduce the number of objects created for this question?

    thanks - dave


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Wednesday, September 17, 2014 11:04 PM
  • I'm not a big fan of the Interop Library which is the only method that can be used to modify the an excel file from VS that create excel formatting and formula.  It is very inefficient because it uses a scripting language as an interface.  I suspect the dll is using this library.

    What may be a better solution is to use CSV like I suggested, but write an VS addin to import the CSV into excel and create the formatting and formulas in the addin.  It would avoid using the 3rd party dll which is using a lot of memory.


    jdweng

    Thursday, September 18, 2014 12:55 AM
  • Just suggestion about question number 2. I think unless you call thread creation / parallelization library inside your function, .NET will try to do the whole function call in the same thread to avoid the cost of context switch. (C# is not F#, there's no attribute yet to tell your function will not produce any side effect. Running your single function spread in different threads will produce undeterminable bugs.

    If you think switching threads while running your StringBuilder related function, try run it as a whole instead of break it down into different sub-functions.

    Btw, I think it amazing when you say a temp string is created... I thought it should be the whole StringBuilder object passed to new [strokeout]thread[/strokeout] process. Remember, in .NET objects are always passed by reference. This means if it's passed inside the same process, it should only pass the reference to the method. And if it's passed outside the process it would have been deep copied and marshaled out. And I thought StringBuilder keeps "string" as char linked-list array (remember the whole point of having StringBuilder class in the first place is that strings in .NET is immutable, there's no point to store immutable string internally for StringBuilder class) and only build new string when you request it's content in string form (such as calling .ToString()), so I have no idea where this "new temp string" comes from.



    Thursday, September 18, 2014 1:24 AM
    Answerer
  • Joel - I'm asking about StringBuilder, not creating an Excel file. As to are there other ways to do it - yes lots. Every reporting system out there can create XLSX files. Our system reads an XLSX file, merges in data, and generates an XLSX report, carrying formulas, formatting, etc. through from the source template to the generated report.

    And it does this without Excel so no Interop, etc.

    But all of this is irrelevant to the question I asked here. Please stope changing the topic in this question.


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Thursday, September 18, 2014 1:29 PM
  • Hi cheong00;

    I thought the same thing. But when I disassembled the code, it is creating the intermediate objects. There is no thread switch, but the string thread id is 0 and so it acts as though there was a thread switch.

    thanks - dave


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Thursday, September 18, 2014 1:31 PM
  • I think we should clarify something first. Judging by the fact that you can see the string thread ID, you should be running the observation under debugger, right? And like you I did tell you, it should produce string only when being requested. Have you clicked the icon that tells you to refresh in order to get the value (that is for actions that known to create side effect) so it happily generates the temp string for you?
    Friday, September 19, 2014 1:57 AM
    Answerer
  • Hi all;

    Asking again, does anyone know how to resolve my question? Lots of comments below but none are about a resolution to the question I asked.

    thanks - dave


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    Wednesday, September 24, 2014 1:04 PM
  • Hi all;

    Asking again, does anyone know how to resolve my question? Lots of comments below but none are about a resolution to the question I asked.

    thanks - dave


    What we did for the last 6 months - Made the world's coolest reporting & docgen system even more amazing

    If you are locked into using 2.0 assembly that seems to be causing the problems, I don't see how you can make the problems go away without first making the 2.0 assembly go away.  Just an observation.

    Rudy   =8^D


    Mark the best replies as answers. "Fooling computers since 1971."

    http://thesharpercoder.com/

    Wednesday, September 24, 2014 8:30 PM