locked
Efficient way to write huge boost dynamic_bitset vector to a file and read it back in C++ RRS feed

  • Question

  • I have a huge vector of boost dynamic_bitset. I want to write the dynamic_bitset vector to a file and later read the file back into a dynamic_bitset vector. Is the memory for dynamic_bitset allocated as a contiguous block of memory (so that I can write the entire vector at once without traversing) ?

    The size of the bitset vector is in order of millions. So I am looking for an efficient way to write them to a file instead of iterating through the elements.

    I converted the dynamic_bitset to a string and then wrote the string to a file. Later read the string from the file and converted it back to dynamic_bitset.

    Below is the code I wrote in C++ using Visual Studio:

    #include "stdafx.h"
    #include <iostream>
    #include <fstream>
    #include <string>
    #include <boost/dynamic_bitset.hpp>
    using namespace std;
    
    int main(int argc, char* argv[])
    {
      // Initializing a bitset vector to 0s
      boost::dynamic_bitset<> bit_vector(10000000);
      bit_vector[0] = 1;   bit_vector[1] = 1;  bit_vector[4] = 1;  bit_vector[7] = 1;  bit_vector[9] = 1;
      cout<<"Input :"<<bit_vector<<endl; //Prints in reverse order
    
      //Converting dynamic_bitset to a string
      string buffer;
      to_string(bit_vector, buffer);
    
      //Writing the string to a file
      ofstream out("file", ios::out | ios::binary);  
      char *c_buffer = (char*)buffer.c_str();   
      out.write(c_buffer, strlen(c_buffer));
      out.close();
    
      //Find length of the string and reading from the file
      int len = strlen(c_buffer);
      char* c_bit_vector = new char(len+1);
      ifstream in;
      in.open("file", ios::binary);
      in.read(c_bit_vector, len);
      c_bit_vector[len] = 0;
      in.close();
    
      //Converting string back to dynamic_bitset
      string str2 = c_bit_vector;
      boost::dynamic_bitset<> output_bit_vector( str2 );
      cout<<"Output:"<<output_bit_vector<<endl;
    
      system("PAUSE");
      return 0;
    }

    But even this method, storing it as a string, takes a long time to write to the file. And when I try to read back from the file into the string, I get an "unhandled access violation exception". Is there a more efficient way to implement the same?
    Friday, July 6, 2012 3:13 PM

Answers

  • If you need to perform linear algebra operations, consider using Boost::uBLAS

    http://www.boost.org/doc/libs/1_50_0/libs/numeric/ublas/doc/index.htm

    Otherwise, stick to boost::dynamic_bitset and increase the block size by changing its type from 'ungisned long' (4 bytes, default) to 'unsigned long long' (8 bytes). That should reduce memory allocations and improve speed for IO. ---> boost::dynamic_bitset<unsigned long long> bv.

    OR

    Create a new custom vector type for your requirements.

    • Proposed as answer by Helen Zhao Monday, July 9, 2012 6:01 AM
    • Marked as answer by Helen Zhao Friday, July 13, 2012 4:52 AM
    Friday, July 6, 2012 5:51 PM

All replies

  • Hi

    dynamic_bitset is not a container and doesn't provider iterators. And you cannot access bit data directly.

    Use can use to_block_range and from_block_range boost-functions. That should be fast enough :) Something like:

        dynamic_bitset<> bv(10000000);
        bv[0] = 1;   bv[1] = 1;  bv[4] = 1;  bv[7] = 1;  bv[9] = 1;
    
        // SAVE
        {
            std::vector<dynamic_bitset<>::block_type> v(bv.num_blocks());
            to_block_range(bv, v.begin());
    
            ofstream ofs("my_bitset.dat", ios::out | ios::binary);
            ofs.write(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(dynamic_bitset<>::block_type));
            ofs.close();
        }
    

    And consider saving the number of blocks (or bits) before saving the data. That will help you to construct the dynamic_bitset of the required size before reading data.

    Friday, July 6, 2012 4:46 PM
  • Thanks for your suggestion. I initially started with vector<bool>. But I would be performing bit operations like AND, OR between two or more vectors. It was easier to perform those operations in boost's dynamic_bitset.

    Two main concerns are,

    The vectors are huge and its values has to be stored in a file and

    Perform bit operations on these vectors.  Please give your comments if I can go for a better  approach.

    Friday, July 6, 2012 5:29 PM
  • If you need to perform linear algebra operations, consider using Boost::uBLAS

    http://www.boost.org/doc/libs/1_50_0/libs/numeric/ublas/doc/index.htm

    Otherwise, stick to boost::dynamic_bitset and increase the block size by changing its type from 'ungisned long' (4 bytes, default) to 'unsigned long long' (8 bytes). That should reduce memory allocations and improve speed for IO. ---> boost::dynamic_bitset<unsigned long long> bv.

    OR

    Create a new custom vector type for your requirements.

    • Proposed as answer by Helen Zhao Monday, July 9, 2012 6:01 AM
    • Marked as answer by Helen Zhao Friday, July 13, 2012 4:52 AM
    Friday, July 6, 2012 5:51 PM