none
Optimizing string.length operations in C# and system source code RRS feed

  • Question

  • I do standard functions that check the length of the variables. These functions call other functions that also check the length of the variables.

    In the end, to obtain a result, the same checks have been repeated over and over again and I don't want to use global variables to save values is not good and because I would need them for everything.

    What I mean is that calling some functions to others can produce, for example, that string.length is performed many times to the same variable. We have not thought about how to control these situations that delay the final result.

    // SAMPLE 1.
    try
    { string Text = "123";
      for (int index = 0; index < 999; index++)
      { // Error in Mychar[3]. HResult = -2146233080  ErrorExcp = {"Index outside the matrix limits."}
        char Mychar = Text[index];
      }
    }catch (System.Exception ErrorExcp){}
    
    // SAMPLE 2.
    try
    { System.Text.StringBuilder Text_Builder = new System.Text.StringBuilder("123");
      Text_Builder[0] = 'A';
      // Error. HResult = -2146233086 ErrorExcp = {"The index was out of range. It must be a non-negative value and less than the size of the collection.\r\nNombre del parámetro: index"}
      Text_Builder[3] = 'A';
    }catch (System.Exception ErrorExcp){}

    Questions:
    1.- How does the system know the size of the string?

    2.- Where can we find the source codes?

    3.- I would like to know the source code of IndexOf. I have looked at "https://referencesource.microsoft.com/" but it seems that most of the code is missing because it calls a dll.
    Does anyone know where to find it?

    <pre>Question update</pre>
    1.- Following the source code of IndexOf is difficult to follow and in the end you reach a point where you have to investigate further. So I stay like at the beginning.

    2.- String strings have a field that indicates the length size so that it does not have to be checked each time through the string. Suppose that in my standard string comparison functions I perform string.ToLower() every time, or anything else repetitive. If in the variables there were fields such as length, which were also expandable by the user, the first time ToLower() is done we could mark it so that in the following functions the process should not be repeated.

    It seems unimportant but think about the times that the same tasks are repeated in the same variables.

    Question Update

    1.- Following the source code of IndexOf is difficult to follow and in the end you reach a point where you have to investigate further. So I stay like at the beginning.

    2.- String strings have a field that indicates the length size so that it does not have to be checked each time through the string. Suppose that in my standard string comparison functions I perform string.ToLower() every time, or anything else repetitive. If in the variables there were fields such as length, which were also expandable by the user, the first time ToLower() is done we could mark it so that in the following functions the process should not be repeated.

    It seems unimportant but think about the times that the same tasks are repeated in the same variables.

    ----

    I thought that the string type should have properties managed automatically. In this case they would be similar byte fields to the one used with Length.

    Some of the properties would be, for example:
    1.- Length.
    2.- Conversion: None, Uppercase, Lowercase
    3.- Content (Flags): Not indicated, Accentuated, Not Accented, Control Codes, Extended Codes, Letters, Numbers ...



    • Edited by zequion1 Wednesday, November 13, 2019 11:25 AM
    Wednesday, November 13, 2019 4:39 AM

All replies

  • Hi zequion1, 

    Thank you for posting here.

    For your question, you get errors when use ‘ A [3] = “4” ’, and I make a test on my side.

    To use ‘A[3] = “4” without any errors and exceptions’, you need to use ‘string[]’ and enlarge the array.

    Here’s the code.

                string A = "123";
                //A[3] = "4";  // Error: Property or indexer 'string.this[int]' cannot be assigned to -- it is read only
                string[] a = new string[] {"1","2","3" };
                //a[3] = "4";  // Exception: Index was outside the bounds of the array.
                Array.Resize(ref a, a.Length + 1);
                a[3] = "4";  
    

    Besides, .NET GC knows the size of an object when it’s being allocated, because it’s based on the size of the fields/properties within the object and they don’t change. The memory representation of a string looks like this:

    Large parts of String class are implemented in un-managed code, that is in C++ or even Assembly.

    For more details, you can refer to the ‘Optimised String Length’ part of the following article.

    Strings and the CLR - a Special Relationship

    Best Regards,

    Xingyu Zhao



    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, November 13, 2019 6:14 AM
    Moderator
  • If think that many of the sources that are not present in Referencesource were made in C++ or ASM for performance purposes. Therefore, you can focus on other aspects of your program. Length does not seem to be an expensive operation.

    Note that you cannot change a string. You can create another one, maybe using StringBuilder if there are many changes.

    • Edited by Viorel_MVP Wednesday, November 13, 2019 6:26 AM
    Wednesday, November 13, 2019 6:22 AM
  • I have updated the question.
    Wednesday, November 13, 2019 7:03 AM
  • What comes to the ToLower method it creates new string and result also depends what culture is used. If you want to avoid calling ToLower multiple times on same variable, then you need to implement your own solution so that it is not needed.
    Wednesday, November 13, 2019 10:06 AM
  • 80/20 rule. Optimize the 20% of your code that runs 80% of the time. I highly, highly doubt you are having any sort of issue with getting the length of a string. It is a simple property lookup like fetching the value of some integral variable. Use a profiler and identify where your code is slow. Then fix that area. Attempting to optimize all your code will have detrimental impact, take longer to write, change when you make any adjustments to the code later and ultimately be hard to understand.

    Given your sample 1, ignoring the fact that you'll crash when you get to the end of the string, this code will run faster than you can think it. Given a string with thousands of characters it would probably take milliseconds to run. There is no optimizations to do here. Just write the code correctly.

    string Text = "123";
    //Use a foreach instead
    for (int index = 0; index < Text.Length; index++)
      { // Error in Mychar[3]. HResult = -2146233080  ErrorExcp = {"Index outside the matrix limits."}
        char Mychar = Text[index];
      }

    An optimizing compiler will eliminate this code altogether since it isn't doing anything. However the next step down would be for the compiler to do common expression optimization, hoist the `Text.Length` out of the for loop and replace it with the equivalent of your first code but not with the errors. Again, let the compiler do its job and stop trying to optimize for it. You'll just make it worse.

    Sample 2 is slightly more interesting but for the wrong reasons. Strings are immutable in .NET so any time you are manipulating a string you need to decide the best way to do that. This has absolutely nothing to do with C# here. This is how the CLR works. If you are going to be building up a string then StringBuilder is the preferred approach as it optimizes how the string is stored during the build process. This is all about managing memory. Once you are done with the building you convert it to a normal string and move on. The general rule of thumb is approximately 6 string concatenations. If you do more than that then switch to StringBuilder. Less than that and you can probably get away with simple concatenation provided you do it all at once. Under the hood the C# compiler generates some optimizations to convert string concat to a call to String.Concat. Once again, stop doing the job of the compiler.

    Unless the 2 blocks of code you posted are the only code in your program neither of these are going to have any sort of impact on app performance. Use a profiler to identify where your actual perf issues are and optimize that instead.

    Answers:

    1) Length is stored as part of the string data (think VB) so it is a simple field lookable.

    2) The reference source was at the URL you listed but with new versions of VS 2019+ you can go to definition and it will disassemble the source code for you. Alternatively use a decompiler like JustDecompile to see the source code for the version of the framework you are targeting.

    3) The CLR is a mix of C# and C++ code. Most of the performance sensitive code is written in C++. Looking at the reference source will show you the export to the corresponding C++ function defined in mscorlib. You cannot disassemble this code. MS provided the C++ code as reference up through 2.0. After that I think it has stopped so you cannot really see the underlying code for NF. .NET Core is open source on GitHub so you can see its implementation there. Mono is another open source version of NF so you can see its implementation as well.

    Not sure why any of this matters though except for maybe curiousity. You should never optimize code based upon its implementation details. Details can change. Optimize code based upon actual perf #s and expected behavior. Methods are assumed to be expensive so if your code is perf sensitive and you have a critical loop then avoid making method calls. If you are calling the same method over and over with the same values then just cache the result. This is programming 101 stuff and has absolutely nothing to do with the method implementation.

    Update 1) Yes but is doesn't matter. You don't need to know how IndexOf works. Are you having a perf issue with calling it? If so optimize the call. If not then don't worry about it. The implementation can change at any time so don't optimize based upon its current behavior. IndexOf isn't going to be expensive as it is a run through the string and likely O(N) performance. You cannot optimize better than that anyway.


    Michael Taylor http://www.michaeltaylorp3.net

    Wednesday, November 13, 2019 3:36 PM
    Moderator