none
std::ifstream::getline empty, in rather "big" project RRS feed

  • Question

  • Hi,

    I am doing some matrix operations, and, in the process of it, need to read in matrices, or, rather, matrix lines (vectors, if you so will), into c++.

    I have a system of equations with 2079 lines and 12 variables.

    It is stored in a comma-separated text file E:\foot\mat.txt

    a sample line looks like this:

    3.727272749,0.071428575,0.071428575,0.214285716,0,0.071428575,3.636363745,0.071428575,0.071428575,0.142857149,0,0,-1000



    From these 2079 lines, I want to choose 12, so that I can solve this "resulting SubMatrix" for the 12 variables.

    I want to do this a fair amount, choosing different 12 lines each time. I have chosen, for now, to do 20 Choose 12 (20C12) combinations of lines, resulting in 20C12, or, 125970 sub-matrices.

    I have code for solving them and all, and that is not the problem (although It'll take a while ;)). BUT, when reading in the lines, there is a problem at some point.

    Let me try and break this down even further, into parts:

    Part (a): Generate all possible 20C12 combinations, and store them in a manner that they can be used for any purpose (in this case, for "indexing" the lines I want to read from the "mat.txt" file.

    I have an algorithm for this, which looks like:

    void do_next(std::vector<set> *newSets, int t[], int i, int n, int N )
     {
      for( int j = t[i-1] + 1; j <= N - n + i; ++j)
         {
             t[i] = j;
             if( i == n)
             {
        set newSet;
                 for(int k = 1; k <= n; ++k)
                 {
                   
         newSet.intSet.push_back(t[k]);
                 }
        newSets->push_back(newSet);
             }                                
             else
             {
                 do_next(newSets, t, i + 1, n, N);
             }
         }
    }


    and works fine. It a struct [code]set[/code], in order to hold an individual combination in the 20C12 set of combinations, i.e., just a set of integers. it is this:

    struct set
     {
      std::vector<int> intSet;
    };


    ,

    and pushes_back all these individual

    set


    s into a std::vector<set>, and, then, returns this std::vector.


    So, Part (a), generating the combinations of possible lines, is done.

    Then part (b):

    Reading in the lines into the code, and, then, storing them in my customly defined 

    avector


    struct:


    struct avector{
     std::vector<double> doubles;
    };



    .

    This struct stored the individual (in this case [code]double[/code]) "numbers"/"entries", or what you will, entries in an individual vector.

    The resulting [code]avector[/code] object instances are then stored in a [code]matrix[/code] struct:

    struct matrix{
      std::vector<avector> vectors;
    };


    So far, so good.

    My code, for reading in the vectors from the text file into an individual [code]avector[/code], and, subsequently, [code]matrix[/code] instance, is:

    matrix getMatrixFromSubFile(set rows, std::string fn, int factor)
     {
      matrix ma;
      std::ifstream *f = new std::ifstream(fn);
      char a[10000];
    
     for (int i = 0; i < rows.intSet.size(); i++)
      {
       for (int j = 0; j <rows.intSet.at(i)*factor; j++)
       {
        if (j == rows.intSet.at(i)*factor -1)
        {
         f->getline(a, 10000);
         std::string szVec = std::string(a);
         avector curvec = szVectorToRealVec(szVec);
         ma.vectors.push_back(curvec);
         f = new std::ifstream(fn);
        }
        else
        {
         f->getline(a,10000);
        }
    
      }
      }
      f->close();
      delete f;
      return ma;
    }


    which should be pretty straight forward. It "loops" through the file, using [code]std::ifstream::getline[/code], and, once it reaches a certain line, converts the char buffer *a, or 

    a[10000]


    to a std::string.

    It then uses szVectorToRealVec(), defined here:


    vector szVectorToRealVec(std::string szVec)
    {
      avector vec;
      bool bEOL = false; 
    
     int posS = 0;
      int posE = 100000000;
      int l = 0;
       
      while(bEOL != true)
      {
       std::string af;
       if(szVec.find(",") != -1)
       {
        posE = szVec.find(",");
        l = posE - posS;
        af = szVec.substr(posS, l);
        szVec = szVec.substr(af.length()+1);
       }
       else
       {
        bEOL = true;
        af = szVec.substr(posS);
       }
       _CRT_DOUBLE *curDStr = new _CRT_DOUBLE;
       _atodbl(curDStr,const_cast<char*>(af.c_str()));
       double d = double(curDStr->x);
       vec.doubles.push_back(d);
       delete curDStr;
      }
    
     return vec;
    }



    to create an avector with the current vector/line stored in the text file.

    This avector is then stored in the matrix.

    The condition

    if (j == rows.intSet.at(i)*factor -1), 


    i.e., the 

    factor-1


    part, in getMatrixFromSubFile is used so that I can read in lines in the range of the entire system of equations, i.e. lines in range [1,2079]. I chose 

    77


    for the value of factor, because, I will probably later use 27C12 combinations, rather than 20C12, and, 27*77 is 2079, resulting in the max line I have in the file.

    So, again, so far, so good. This works, I can generate the 20C12 combinations, and then read in the matrices. However, my part(c), subsequent solving of the matrices, later crashes, and it ALWAYS (at least so far), crashes at matrix number 43. This happens, because, for some reason, when READING IN matrix number 43, from the "mat.txt" file, using the code posted above, the line

    f->getline(a, 10000);


    , to place the line into the char buffer a, "fails", leaving the char buffer empty.

    Tracking this down further, I could see that, for this "matrix number 43", it was still able to read in the first FIVE lines of the matrix, from the text file. Using the combination number 43, generated from the previuos algorithm, and using the factor of 77, which I used, I can see that it, thus, reads in lines
    77*1 (so 77), 77*2 (so 154), 77*3 (so 231), 77*4(so 308), and line 77*5 (so, 385), successfully, but, then, on the line number six, 77*6, which is 462, the line:

    f->getline(a, 10000);

    in getMatrixFromSubFile fails to place anything in the char buffer, just leaving it absolutely empty (or, rather, I THINK, as an empty string "").

    Verifying line 462 in my "mat.txt" file, I can see that it is:

    3.363207579,0.091954023,0.068965517,0.130268201,0,0.003831418,3.6340909,0.122137405,0.110687025,0.106870227,0,0,1000


    , and, that, at least from my perspective, nothing seems to be "weird" or "funny" about neither the previous, nor the next lines:

    461:

    3.795348883,0.114503816,0.087786257,0.156488553,0.003816794,0.003816794,3.799086809,0.07662835,0.07662835,0.107279696,0,0.007662835,-1000


    or 463:

    3.751162767,0.08429119,0.057471264,0.183908045,0.003831418,0,3.775701046,0.080769233,0.065384619,0.115384616,0.007692308,0.003846154,0


    .

    So, I have really no idea why it is failing, and, always on [b]matrix number 43, line 6[/b].

    I am only guessing, that it might have something to do with memory allocation, somehow?

    The task is at about 420,000 KB of private memory allocated, reported by task mon, at the time it crashes.


    Finally, I will post my calling code, so you can see how I call the stuff defined above:

    int _tmain(int argc, _TCHAR* argv[])
     {
      std::vector<matrix> matrices;
      int toI;
      int combosize;
      while (true)
     {
       std::vector<set> *combos = new std::vector<set>; 
       std::cout << "from 1 to ";
       std::cin >> toI;
       std::cout << std::endl << "pick ";
       std::cin >> combosize;
       int * t = new int[combosize + 1];
       t[0] = 0;
       
       time_t s = time(NULL);
       do_next(combos, t, 1, combosize, toI);
       delete t;
       time_t a = time(NULL);
       time_t diff = difftime(a,s);
       std::cout << std::endl;
        
       std::cout << combos->size() << " combinations" << std::endl;
       std::cout << "t: " << diff << std::endl;
    
      std::string fn = std::string("E:\\foot\\combos\\combos.txt");
       std::cout << combos->size() << " printing to " << fn.c_str();
       std::cout << std::endl;
    
      std::cout << "size of combos is " << sizeof(combos) << std::endl;
    
      std::ofstream *f = new std::ofstream(fn);
       setsToFile(*combos, f); 
       f->close();
       delete f;
    
      std::cout << "printed" << std::endl;
    
    
       for (int i = 0; i < combos->size(); i++)
       {
        std::cout << "getting matr " << i+1 << std::endl;
        matrices.push_back(getMatrixFromSubFile(combos->at(i), "E:\\foot\\mat.txt", 77));
       }
    
      std::cout << "size of matrices is " << sizeof(matrices) << std::endl;
       
       for (int i = 0; i < matrices.size(); i++)
       {
        std::cout << "solving mat " << i+1 << std::endl;
        matrix ma = solveSystem(matrices.at(i));
       }
    
    
       std::cout << std::endl;
    
     
       delete combos;
       matrices.erase(matrices.begin(), matrices.end());
      } 
      return 0;
    }

    P.S.: I was originally planning on running this with 2079C12 combinations, to choose all possible combinations of all possible lines, creating ALL possible subMatrices, but, then, at some point, realized that this would take A LOT of time. just for the heck of it, I calculated what this would take, using ONLY Gigaflops as a measure, on a mediumly-sized distributed computing project, such as ABC@HOME (12.333 TERAFLOPS, according to Wikipedia), on a pretty large distributed project (FOLDING@HOME, 8,588 TERAFLOPS, or, 8.588 PETAFLOPS, also acc. to Wikipedia), and, then, using numbers I got from other sources, calculating that there are 1.1 billion computers on earth, and, taking an average of 10 GIGAFLOPS for any one individual of these pcs, got some times.

    for example:

    Calculating 2079C12:

    ABC@HOME (12.233 TFLOPS)  FOLDING@HOME (8.588 PETAFLOPS)    PCS Worldwide

    2.25E+15, or 2.25*10^15 years            3.14E+12 years                              2.5 billion years

    What would be feasible within a year is:

    ABC@HOME (12.233 TFLOPS)  FOLDING@HOME (8.588 PETAFLOPS)    PCS Worldwide

    114C12 (5.52E+15 combos)       193C12 (3.93E+18 combos)        346C12 (5.07E+21 combos)

    Well anyway, my pc can do about 71C12, 1.28E+13 combos in a year. hehe ^^

    And I need 0.27 seconds to generate 20C12, or 39 sec to generate 27C12, so, I guess, something like that will have to do for now hehe^^

    By the way, I don't know if any of this can be improved through some kind of parallelism, at least on distr. comp. projects, or, MAYBE, some other algorithm or so.

    Anyway, happy joy generating all those numbers! hehe ;) and thanks for ANY, and ALL input to the actual issue I am facing here!

    Cheers!

    hansaa

     



    • Edited by hansaaa Tuesday, May 7, 2013 9:34 AM
    Tuesday, May 7, 2013 9:07 AM

Answers

  • >by the memory leak, you meant the leak caused by the "ifstream *f",
    >not something caused by the array?

    Yes. You keep allocating new ifstream objects without ever
    deleteing any of them.

    >But how do I, set the "cursor" back to the start of the file,
    >without the need for creating a new instance. Can I do this by
    >resetting the status bits? SO that the NEXT "getline" call will
    >do the VERY first line again?

    >Also, seekg(), I am unclear.

    Study this example:

    #include <iostream>
    #include <fstream>
    #include <string>
    #include <vector>
    #include <cstdlib>
    
    using namespace std;
    
    void MyExit() {system("pause");}
    
    int main()
    {
        atexit(MyExit);
    
        vector<string> vstr;
        vstr.reserve(100000);
    
        string text;
        text.reserve(1000);
        ifstream file("abc.txt");
        if(!file)
            {
            cout << "File open failed!\n";
            return -1;
        }
        while(getline(file, text))
            {
            vstr.push_back(text);
            }
    
        if(!file.eof())
            {
            cout << "Error reading file!\n";
            return -2;
            }
    
        cout << "vector holds " << vstr.size() << " strings:\n";
        for(size_t n=0; n<vstr.size(); ++n)
            {
            cout << vstr[n] << endl;        
            }
    
        // Do it again using the same objects:
    
        cout << "\nClearing objects for second pass...\n";
        vstr.clear();
        cout << "vector size now " << vstr.size() << endl;
        text.clear();
    
        cout << "\nSecond pass:\n";
        file.clear();
        file.seekg(0, ios::beg);
    
        while(getline(file, text))
            {
            vstr.push_back(text);
            }
    
        if(!file.eof())
            {
            cout << "Error reading file!\n";
            return -2;
            }
    
        cout << "vector holds " << vstr.size() << " strings:\n";
        for(size_t n=0; n<vstr.size(); ++n)
            {
            cout << vstr[n] << endl;        
            }
    
        return 0;
    }
    

    E&OE

    - Wayne

    • Marked as answer by hansaaa Tuesday, May 7, 2013 4:44 PM
    Tuesday, May 7, 2013 4:23 PM

All replies

  • My code, for reading in the vectors from the text file into an individual [code]avector[/code], and, subsequently, [code]matrix[/code] instance, is:
    matrix getMatrixFromSubFile(set rows, std::string fn, int factor)
     {
      matrix ma;
      std::ifstream *f = new std::ifstream(fn); // *****
      char a[10000];
    
     for (int i = 0; i < rows.intSet.size(); i++)
      {
       for (int j = 0; j <rows.intSet.at(i)*factor; j++)
       {
        if (j == rows.intSet.at(i)*factor -1)
        {
         f->getline(a, 10000); // *****
         std::string szVec = std::string(a);
         avector curvec = szVectorToRealVec(szVec);
         ma.vectors.push_back(curvec);
         f = new std::ifstream(fn); // *****
        }
    
    ...

    This doesn't look good. Why are you doing repetitive new
    operations? Every time you do that and store it in the same
    pointer you leak the one that was allocated previously.

    Note also that you should always check the file status after
    *every* attempted getline, etc. to see if it failed before
    trying to use any possible returned data.

    As well, there is a getline which reads into a std::string
    directly. There's no need to read into a "raw" character array
    and then convert that to a std::string.

    - Wayne

    Tuesday, May 7, 2013 10:02 AM
  • Ok, so, I tried the std::getline (I assume you meant that?) to read the line straight into a std::string. unfortunately, this is VERY slow, making the whole application more than ten times as slow. so, unfortunately, this can not serve the purpose here.

    This therefore means, I need to use the ifstream::getline to put the line in a char buffer. SO, you suggested there is a memory leak, which does fit the suspicion I had at the beginning. However, I am not completely clear on how to move on with this. What I have tried, is:

    if (j == rows.intSet.at(i)*factor -1)
    {
    	char a[10000];
    	f->getline(a, 10000);
    	std::string szVec = std::string(a);
    	//std::getline(*f, szVec);
    	avector curvec = szVectorToRealVec(szVec);
    	ma.vectors.push_back(curvec);
    	delete a;
    	f = new std::ifstream(fn);
    }
    else
    {
    	char a[10000];
    	f->getline(a,10000);
    	delete a;
    }

    I.e., I try to move the allocation, and, deallocation, of a, into the respective if-statements. Usually, using new, delete, and so on (i.e. pointers), works for me, although, I might generate some memory leaks sometimes, as I am not completely clear YET (;)), on what can cause memory leaks and what not. I thought, when you use new (i.e., a pointer)), just use delete at the end of its use, and all should be ok. However, this does not seem to be the case here. I am not sure if I don't completely understand something, or what is going on, but THIS code (just posted above), generates an exception right on first execution, complaining. This is the actual exception it gives, in dbgdel.cpp:

    _ASSERTE(_BLOCK_TYPE_IS_VALID(pHead->nBlockUse));


    I have also tried using

    delete[] a;

    , but, it generates the same problem.

    Again, I am not sure if I understand absolutely everything correctly here, so, if someone can help me in understanding what is going on in this, that would be great!

    Thanks!

    hansaa

    P.S: Apart from that, you suggested checking the status of the std::ifstream. How, exactly, do you suggest doing that? using something like f->eofbit(), or so? or what do you mean, exactly?

    Thanks! :)

    hansaa :)


    • Edited by hansaaa Tuesday, May 7, 2013 11:57 AM
    Tuesday, May 7, 2013 11:51 AM
  • Ok, so, I have tried the same, using just a char* pointer, no array. I thought, it might only put in one character from the getline function, but it actually put in the entire line, anyway. BUT, it still throws the debug assertion when deleting.

    _ASSERTE(_BLOCK_TYPE_IS_VALID(pHead->nBlockUse));

    so, I am not sure why this is happening.

    Why does it work if I have

    char *a = new char;
    //f->getline(a,10000);
    delete a;
    but not if I have
    char *a = new char;
    f->getline(a,10000);´
    delete a;


    Tuesday, May 7, 2013 3:43 PM
  • Ok, so, I tried the std::getline (I assume you meant that?) to read the line straight into a std::string. unfortunately, this is VERY slow, making the whole application more than ten times as slow.

    Probably due to the need to repeatedly expand the string.
    Try it by reserving a large amount of memory for the string
    when it's created: e.g. -

    string str;
    str.reserve(10000); // ****
    getline(*f, str);



    >    char a[10000];

        ...
        
    >    delete a;

    This is completely wrong. You can never delete a static array.
    You only use delete with an object created via new. and use
    delete[] on an object created with new[].

    If you're going to do this in the loop:

    f = new std::ifstream(fn);

    the you need to delete the previous ifstream instance created
    via new and pointed to by "f" before doing that:


    f->getline(a, 10000);

    ...

    delete f; // f NOT a
    f = new std::ifstream(fn);


    Again I ask. why are you creating a new ifstream instance
    over and over? You can read the same file repeatedly using
    a single ifstream object. You just need to be clear on:

    (1) How and when to check for read errors and EOF.

    (2) How and when to reset (clear) the status bits when needed.

    (3) How and when to use seekg().

    - Wayne

    Tuesday, May 7, 2013 3:46 PM
  • Why does it work if I have

    char *a = new char;
    //f->getline(a,10000);
    delete a;
    but not if I have
    char *a = new char;
    f->getline(a,10000);´
    delete a;

    In the first example you allocate enough space to hold one
    char. You do nothing with it, then you delete the allocation.

    In the second example you allocate enough space to hold one char.
    You then try to read more than one char into that allocation,
    causing a buffer overrun and memory overwrite.

    You don't seem to have mastered the fundamentals.

    - Wayne

    Tuesday, May 7, 2013 3:55 PM
  • basically, I was doing this, because I had to read the same file over and over, and I was under the (maybe, "stupid" ;)) assumption, that "getline" moves the "cursor", you could call it, to the next line, irreversibly. Therefore, I was using a new instance, to "reset" the "cursor". Might have been a bit hard-boiled^^.

    EOF, I think, I should be clear on.

    But how do I, set the "cursor" back to the start of the file, without the need for creating a new instance. Can I do this by resetting the status bits? SO that the NEXT "getline" call will do the VERY first line again?

    Also, seekg(), I am unclear. I will see if I can brush up on it over at "cplusplus.com". Unless you wish to provide a tiny "tutorial" ;) hehe ;)

    About the delete a;

    I was under, the, I guess uncorrect, assumption, that an array, any array (not a std::vector, but an array declared by some_type a[n], where n is size of array), was basically something like a pointer to the first element, so I could delete it, and it would then delete the entire array. So, I take it, that is not actually the case. Thanks for clearing that up for me, that is actually really helpful! :)

    Taking this useful item of new information, is it then correct that a char array, statically allocated, can not leak memory (unless, you want to define that "leaking memory" could also be, if you just use too much memory, such, as declaring an array that would be way bigger than the data you would be getting)?

    And then, by the memory leak, you meant the leak caused by the "ifstream *f", not something caused by the array?

    Oh, and finally, thanks for the tip of reserving memory for the string. it does make it quicker again, but, still, not as quick as using the char[].

    In general, I thank thee for thy help so far! hehe :) it has been really great, and I am glad that SOMEONE is helping me :) hehe :) :)

    Tuesday, May 7, 2013 4:11 PM
  • >by the memory leak, you meant the leak caused by the "ifstream *f",
    >not something caused by the array?

    Yes. You keep allocating new ifstream objects without ever
    deleteing any of them.

    >But how do I, set the "cursor" back to the start of the file,
    >without the need for creating a new instance. Can I do this by
    >resetting the status bits? SO that the NEXT "getline" call will
    >do the VERY first line again?

    >Also, seekg(), I am unclear.

    Study this example:

    #include <iostream>
    #include <fstream>
    #include <string>
    #include <vector>
    #include <cstdlib>
    
    using namespace std;
    
    void MyExit() {system("pause");}
    
    int main()
    {
        atexit(MyExit);
    
        vector<string> vstr;
        vstr.reserve(100000);
    
        string text;
        text.reserve(1000);
        ifstream file("abc.txt");
        if(!file)
            {
            cout << "File open failed!\n";
            return -1;
        }
        while(getline(file, text))
            {
            vstr.push_back(text);
            }
    
        if(!file.eof())
            {
            cout << "Error reading file!\n";
            return -2;
            }
    
        cout << "vector holds " << vstr.size() << " strings:\n";
        for(size_t n=0; n<vstr.size(); ++n)
            {
            cout << vstr[n] << endl;        
            }
    
        // Do it again using the same objects:
    
        cout << "\nClearing objects for second pass...\n";
        vstr.clear();
        cout << "vector size now " << vstr.size() << endl;
        text.clear();
    
        cout << "\nSecond pass:\n";
        file.clear();
        file.seekg(0, ios::beg);
    
        while(getline(file, text))
            {
            vstr.push_back(text);
            }
    
        if(!file.eof())
            {
            cout << "Error reading file!\n";
            return -2;
            }
    
        cout << "vector holds " << vstr.size() << " strings:\n";
        for(size_t n=0; n<vstr.size(); ++n)
            {
            cout << vstr[n] << endl;        
            }
    
        return 0;
    }
    

    E&OE

    - Wayne

    • Marked as answer by hansaaa Tuesday, May 7, 2013 4:44 PM
    Tuesday, May 7, 2013 4:23 PM
  • great! thanks! it seems to work as expected now! I thank thee a lot! hehe ! You have been a GREAT help! :)
    Tuesday, May 7, 2013 4:44 PM
  • Why does it work if I have

    char *a = new char;
    //f->getline(a,10000);
    delete a;
    but not if I have
    char *a = new char;
    f->getline(a,10000);´
    delete a;

    In the first example you allocate enough space to hold one
    char. You do nothing with it, then you delete the allocation.

    In the second example you allocate enough space to hold one char.
    You then try to read more than one char into that allocation,
    causing a buffer overrun and memory overwrite.

    You don't seem to have mastered the fundamentals.

    - Wayne

    Well, ONE last question then.

    You are correct, and that IS exactly what I thought (that I had only allocated enough memory for one char). But then, reading in AND printing the ENTIRE line, worked, with just this one memory allocation, so, I thought to myself, "well, maybe, when reading the line into the char* pointer using, it dynamically allocates more memory, and the char* pointer "knows" where it ends by a null-termination-character". Again, this is because I was surprised that I could get and display the entire line, event though I had only allocated one char*. so, is it, then, correct, that even if it can get, and display the characters correctly, that does NOT mean that a buffer overrun has NOT occurred?

    YOu are probably right, I haven't mastered all the fundamentals yet, specifically, because I am not always sure how c++ allocates dynamic memory. However, I would say, I do know a fair bit ;)

    Anyway, thank you again for your help, it was really great!

    Tuesday, May 7, 2013 4:54 PM
  • >But then, reading in AND printing the ENTIRE line, worked,
    >with just this one memory allocation,

    Read your own posts again - you said it worked as long as you
    *weren't* trying to read into that variable. But when you added
    the getline() it didn't "work".

    >this is because I was surprised that I could get and display
    >the entire line, event though I had only allocated one char*.

    You allocated space for one char, not one char pointer.

    Dynamic allocation for multiple chars will only return
    one *pointer* to the allocation as well.

    char *p1 = new char; // allocate space for one char

    char *p2 = new char[10]; // allocate space for ten chars

    In both cases only a single pointer is used.

    Compare to:

    char **pp = new char*[5]; // allocate space for 5 char pointers

    >so, is it, then, correct, that even if it can get,
    >and display the characters correctly, that does NOT
    >mean that a buffer overrun has NOT occurred?

    That's right - and that's one of the most frequent causes
    of obscure bugs and strange program behaviour. When you
    overrun a buffer and overwrite memory nothing obviously
    bad *may* happen *at that instant*. Depending on what gets
    overwritten that shouldn't, the damage may not become
    manifest until much later in the program's execution.
    In some cases it may not become apparent at all - until
    you make later changes to the program which alter the
    layout of the code and data in memory. Then the program
    breaks even though the change looks valid - because there
    was a bug from the start.

    It all depends on where those extra chars are being written.

    - Wayne
    Tuesday, May 7, 2013 5:23 PM