locked
Need to initialize Dictionary of 900 KB throws "Unhandled Exception: OutOfMemoryException" RRS feed

  • Question

  • User-1319896600 posted

    Hi,

    I need to initialize a Dictionary<string,string> of size 600KB, I have 12 GB of RAM on my local machine, but it throws a "Unhandled Exception: OutOfMemoryException" error. How do I make it use the available memory?

    Thanks

    Wednesday, February 11, 2015 4:32 PM

Answers

  • User-434868552 posted

    @ihaveaquesti...

    ihaveaquestion

    The file size is approximately 1GB

    First one gigabyte is not 900KB as your title states; it's 1048576KB

    although you have 12GB, are you compiling as a 32-bit application?
    http://stackoverflow.com/questions/11891593/the-maximum-amount-of-memory-any-single-process-on-windows-can-address 

    (excerpt from the above article):  " ... 32 bit on 32 bit OS: 2 GB, unless set to large address space aware, in which case 3 GB. 32 bit on 64 bit OS: 2 GB, unless set to large address space aware, in which case 4 GB" 

    Looking at your code, because i do not have access to your file, i wonder a few things:

    (a) how many tabs are there in your fileBIG.txt?

    (b) how clean is your data?

    (c) have you tried adding some debug code?

    again, since i do not know your data, i'd certainly add some debug code, for example:

    Dictionary<string, string> dictionaryBIG = new Dictionary<string, string>();
    char[] tabDelimiter = new char[] { '\t' };    // move this before the while 
    // watch these debug variables:
    Int64 bytesRead    = 0;
    Int32 maxKeyLength = 0;
    Int32 minKeyLength = Int32.MaxValue;
    Int32 maxValueLength = 0;
    Int32 minValueLength = Int32.MaxValue;
    Int32 maxLineLength = 0;
    Int32 minLineLength = Int32.MaxValue;
    Int32 linesRead     = 0;
    Int32 skippedlines  = 0;
    Int32 maxLineParts  = 0;
    Int32 minLineParts  = Int32.MaxValue;
    // ....................................
    String line = String.Empty;
    while  ((line = streamReaderBIG.ReadLine()) != null)
    {
        bytesRead += line.Length;
        linesRead++;
        if(!String.IsNullOrWhiteSpace(line))
    	{
                if(line.Length > maxLineLength) maxLineLength = line.Length;
                if(line.Length < minLineLength) minLineLength = line.Length;
                string[] lineParts = line.Split(tabDelimiter, StringSplitOptions.RemoveEmptyEntries);
                if(lineParts.Length == 2)
                {
                    string key = lineParts[0];
                    if(key.Length > maxKeyLength) maxKeyLength = key.Length;
                    if(key.Length < minKeyLength) minKeyLength = key.Length;
                    string value = lineParts[1];
                    if(value.Length > maxValueLength) maxValueLength = value.Length;
                    if(value.Length < minValueLength) minValueLength = value.Length;
                    if (!dictionaryBIG.ContainsKey(key: key))
                    {
                        try
                        {
                            dictionaryBIG.Add(key: key, value: value);
                            // you could display count of keys here			
                        }
                        catch
                        {
                            // .........
                        }
                    }
                    else // we have a DUPLICATE KEY ... HOWEVER, do the values match?
                    {
                        // if the value does NOT match, then you lose this value.
                    }
                }
                else // too few or too many parts
                {
                    Console.WriteLine("Parts {0}", lineParts.Length);
                    if(lineParts.Length > maxLineParts) maxLineParts = lineParts.Length;
                    if(lineParts.Length < minLineParts) minLineParts = lineParts.Length;
                    skippedlines++;
                }
        }
        else skippedlines++;
    }				

    ideally, you want to identify what, if any, bad data is in your file, plus how far into your 1GB file you got.

    FWIW, i do see the possibility of blowing yourself out of the water.  Remember, as your DIctionary is manipulated during its growth by your program, your program will also have a degree of working memory.

    WORKAROUND

    if memory turns out to be your issue, as it appears to be, instead of building a Dictionary in memory, you could use a SQL table, et cetera.

    TIMTOWTDI

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, February 12, 2015 4:57 AM

All replies

  • User-434868552 posted

    @haveaquesti...  you need to show us your offending code.

    http://weblogs.asp.net/gerrylowry/clarity-is-important-both-in-question-and-in-answer 

    edit:

    this works:

    Dictionary<String, String> kb600 = new Dictionary<String, String>();
    Console.WriteLine ((600*1024)*14);  // approximate size of the dictionary
    for (int i = 1000000; i < 2000000; i++)
    {
    	kb600.Add(i.ToString(), (1+i).ToString());
    }
    Console.WriteLine ("done" );
    Console.WriteLine (kb600["1999999"]);

    output:

    8610000
    done
    2000000

    haveaquesti..., unless your dictionary entries are very large, correct code should work ... in the above example, i have created one million entries where both the key and the entry are seven bytes each.

    again, it's important for you to share your code for your peers at forums.asp.net to help you.

    end edit.

    edit #2:

    haveaquesti...,, please clarify ... by 600kb (900kb?), you mean the total length of all of your entries, n'est-ce pas?

    (my Dictionary will consume at least c. 14MB)

    end edit #2.

    Wednesday, February 11, 2015 4:41 PM
  • User-1319896600 posted
    Below is the code to initialize the dictionary, it is pretty basic. It reads from the file and loads the dictionary. 
    The file size is approximately 1GB. The values for each key is a very long sentence.
    The machine has 12 GB RAM installed (based on the PC info).
    It has a Windows 8.1, 64 bit OS. I am on .Net Framework 4.5

    #region Initiliaze dictionary Console.WriteLine("BEGIN initializing dictionary"); Dictionary<string, string> dictionaryBIG = new Dictionary<string, string>(); StreamReader streamReaderBIG = new StreamReader(@"C:\fileBIG.txt"); string line = ""; while ((line = streamReaderBIG.ReadLine()) != null) { char[] tabDelimiter = new char[] { '\t' }; string[] lineParts = line.Split(tabDelimiter, StringSplitOptions.RemoveEmptyEntries); string key = lineParts[0]; string value = lineParts[1]; if (!dictionaryBIG.ContainsKey(key: key)) { try { dictionaryBIG.Add(key: key, value: value); } catch (Exception) { Console.WriteLine("Exception " + key); } } } streamReaderBIG.Close(); Console.WriteLine("DONE initializing dictionary"); #endregion Initiliaze dictionary

    Thursday, February 12, 2015 2:27 AM
  • User-434868552 posted

    @ihaveaquesti...

    ihaveaquestion

    The file size is approximately 1GB

    First one gigabyte is not 900KB as your title states; it's 1048576KB

    although you have 12GB, are you compiling as a 32-bit application?
    http://stackoverflow.com/questions/11891593/the-maximum-amount-of-memory-any-single-process-on-windows-can-address 

    (excerpt from the above article):  " ... 32 bit on 32 bit OS: 2 GB, unless set to large address space aware, in which case 3 GB. 32 bit on 64 bit OS: 2 GB, unless set to large address space aware, in which case 4 GB" 

    Looking at your code, because i do not have access to your file, i wonder a few things:

    (a) how many tabs are there in your fileBIG.txt?

    (b) how clean is your data?

    (c) have you tried adding some debug code?

    again, since i do not know your data, i'd certainly add some debug code, for example:

    Dictionary<string, string> dictionaryBIG = new Dictionary<string, string>();
    char[] tabDelimiter = new char[] { '\t' };    // move this before the while 
    // watch these debug variables:
    Int64 bytesRead    = 0;
    Int32 maxKeyLength = 0;
    Int32 minKeyLength = Int32.MaxValue;
    Int32 maxValueLength = 0;
    Int32 minValueLength = Int32.MaxValue;
    Int32 maxLineLength = 0;
    Int32 minLineLength = Int32.MaxValue;
    Int32 linesRead     = 0;
    Int32 skippedlines  = 0;
    Int32 maxLineParts  = 0;
    Int32 minLineParts  = Int32.MaxValue;
    // ....................................
    String line = String.Empty;
    while  ((line = streamReaderBIG.ReadLine()) != null)
    {
        bytesRead += line.Length;
        linesRead++;
        if(!String.IsNullOrWhiteSpace(line))
    	{
                if(line.Length > maxLineLength) maxLineLength = line.Length;
                if(line.Length < minLineLength) minLineLength = line.Length;
                string[] lineParts = line.Split(tabDelimiter, StringSplitOptions.RemoveEmptyEntries);
                if(lineParts.Length == 2)
                {
                    string key = lineParts[0];
                    if(key.Length > maxKeyLength) maxKeyLength = key.Length;
                    if(key.Length < minKeyLength) minKeyLength = key.Length;
                    string value = lineParts[1];
                    if(value.Length > maxValueLength) maxValueLength = value.Length;
                    if(value.Length < minValueLength) minValueLength = value.Length;
                    if (!dictionaryBIG.ContainsKey(key: key))
                    {
                        try
                        {
                            dictionaryBIG.Add(key: key, value: value);
                            // you could display count of keys here			
                        }
                        catch
                        {
                            // .........
                        }
                    }
                    else // we have a DUPLICATE KEY ... HOWEVER, do the values match?
                    {
                        // if the value does NOT match, then you lose this value.
                    }
                }
                else // too few or too many parts
                {
                    Console.WriteLine("Parts {0}", lineParts.Length);
                    if(lineParts.Length > maxLineParts) maxLineParts = lineParts.Length;
                    if(lineParts.Length < minLineParts) minLineParts = lineParts.Length;
                    skippedlines++;
                }
        }
        else skippedlines++;
    }				

    ideally, you want to identify what, if any, bad data is in your file, plus how far into your 1GB file you got.

    FWIW, i do see the possibility of blowing yourself out of the water.  Remember, as your DIctionary is manipulated during its growth by your program, your program will also have a degree of working memory.

    WORKAROUND

    if memory turns out to be your issue, as it appears to be, instead of building a Dictionary in memory, you could use a SQL table, et cetera.

    TIMTOWTDI

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, February 12, 2015 4:57 AM