Answered by:
Handling csv data as 2D vector - C++: from vector of string vector to vector of struct

Question
-
Hi
Based on a sample study in previous thread raised https://social.msdn.microsoft.com/Forums/en-US/49313508-ed2b-449e-bff6-08c64a48a213/c-error-out-of-range-vector-subscript-with-struct-type?forum=vcgeneral
I am wondering if there is a way to handle a data.csv as 2d vector using a struct type. From my below code, i 've found easier to use a vector of string vector , but to perform calculation on elements (see std::cout block), one has to convert each time (tedious) string type to double.
By contrast, using potentially a vector of struct the conversion is useless as each type (e.g double) is embedded within struct.
My problem using vector of struct lies on indexation as it's not possible to use for instance test.pushback(struct) and retrieve values as test[0][0]. Following attempts, I am eventually realizing that struct can only be indexed at a single-vector dimension such as iden[j].grade, etc.
Looking forward to sharing ideas on this
Best,
#include <fstream> #include <iostream> #include<sstream> #include<vector> #include<algorithm> #include<strstream> struct Identity { int ID; std::string name; std::string surname; double grade; }; Identity id; std::vector<Identity> iden; //vector of struct type std::vector<std::string> col; std::vector<std::vector<std::string>> table; // substitute to vector of struct type std::string filedir = "C:\\local\\"; std::string extension = "samplet.csv"; std::string samplepath = filedir + extension; std::string line; std::string field; int main() { std::ifstream test; test.open(samplepath); if (!test.good()) { exit(1); // terminate } while (std::getline(test, line)) { col.clear(); std::stringstream ss(line); while (std::getline(ss,field,',')) { col.push_back(field); } table.push_back(col); } test.close(); std::cout << std::stod(table[0][3])* std::stod(table[1][3]) << std::endl; system("pause"); return 0; }
- Edited by itneophyte85 Monday, November 9, 2015 3:57 PM
Monday, November 9, 2015 3:55 PM
Answers
-
I am eventually realizing that struct can only be indexed at a single-vector dimension such as iden[j].grade, etc.
No, that's not true.
struct Identity { int ID; std::string name; std::string surname; double grade; }; Identity id; //vector of struct type std::vector<Identity> iden; //vector of vectors of struct type std::vector<std::vector<Identity>> vviden; using namespace std; int _tmain(int argc, _TCHAR* argv[]) { id.ID = 1234; id.name = "Given"; id.surname = "Family"; id.grade = 65.5; iden.push_back(id); // add more structs to the vector ... vviden.push_back(iden); // push this vector of structs // repeat as needed ... // show first struct in first vector of structs in vector of vectors cout << vviden[0][0].grade << endl; return 0; }
But why do you feel you need a 2-dim vector? If you have a vector of vectors of struct,
we know that a vector of structs holds the data for all persons (lines) from the file.
What does the extra dimension represent? Are you planning on storing the data from
multiple files?
- Wayne
Monday, November 9, 2015 7:04 PM
All replies
-
It would help if you told us what you are trying to do. Are you trying to work with a varying number of grades per student (as implied by your vector code) or a single grade per student (as implied by your structure definition)? After the name and surname, are all the fields in a csv line numeric. Is there a maximum number of grades for a student? Do you need to access any (all) grades for a student or just the first N? Are there multiple records for a student?
You can use a vector of struct. Define table as vector<Identity> and process ss something like
getline(ss, field, ',');
id.ID = stoi(field);
getline(ss, id.name, ',');
getline(ss, surname, ',');
getline(ss, field, ',');
id.grade = stod(field);
table.push_back(id);
Since it makes no sense to compare a name and a surname or grade, the loss of the second index is a non-issue. If you need to have multiple grades for a student, change member grade to a vector<double> and use a loop similar to
id.grade.clear();
while (getline(ss, field, ','))
id.grade.push_back(stod(field));
just before the table.push_back. This will allow you to process table[i].grade[j].Alternately, consider parallel vectors: vecit<int> ID, vector<string> name, vector<string> surname, and vector<vector<double>> table to hold your database and vector<double> grades to collect the input. Your ss processing code would look something like
getline(ss, field, ',');
ID.push_back(stoi(field));
getline(ss, field, ',');
name.push_back(field);
getline(ss, field, ',');
surname.push_back(field)
grades.clear();
while (getline(ss, field, ','))
grades.push_back(stod(field));
table.push_back(grades);
and you would never have to call stod again.Monday, November 9, 2015 6:59 PM -
I am eventually realizing that struct can only be indexed at a single-vector dimension such as iden[j].grade, etc.
No, that's not true.
struct Identity { int ID; std::string name; std::string surname; double grade; }; Identity id; //vector of struct type std::vector<Identity> iden; //vector of vectors of struct type std::vector<std::vector<Identity>> vviden; using namespace std; int _tmain(int argc, _TCHAR* argv[]) { id.ID = 1234; id.name = "Given"; id.surname = "Family"; id.grade = 65.5; iden.push_back(id); // add more structs to the vector ... vviden.push_back(iden); // push this vector of structs // repeat as needed ... // show first struct in first vector of structs in vector of vectors cout << vviden[0][0].grade << endl; return 0; }
But why do you feel you need a 2-dim vector? If you have a vector of vectors of struct,
we know that a vector of structs holds the data for all persons (lines) from the file.
What does the extra dimension represent? Are you planning on storing the data from
multiple files?
- Wayne
Monday, November 9, 2015 7:04 PM -
"It would help if you told us what you are trying to do. Are you trying to work with a varying number of grades per student (as implied by your vector code) or a single grade per student (as implied by your structure definition)? After the name and surname, are all the fields in a csv line numeric. Is there a maximum number of grades for a student? Do you need to access any (all) grades for a student or just the first N? Are there multiple records for a student?" { Barry-Schwarz}
OP: Thanks for the inputs. As explained to Wayne before, I am moving gradually my matlab programs (once backtested) to the challenging C++ . The sample c++ code exhibited here is hugely far from my automated codes developed on real data (usually 100 X 50) but it has the virtue to receive well explained feedback by c++ experts on this forum on some unkown topics. Once the logic is grasped then I could work, complete, deploy it at a macro-scale and for any purposes. To summarize, my data might handle 100 identifiers (ids) with 50 intrinsic parameters for each id. Then I perform calculations based on multiple data.csv. Each id has also historical data/parameters stored in my sql database. So my calculations involve querying data, perform matrix calculations, etc... a BIG and ENJOYABLE MESS that took me 4-8 weeks (non stop) to deploy a single program on Matlab. Reverting back to your questions, I have 50 fields, 100 ids, and the maximum number of grades per id = 1000. In my original program depending on the calculation chosen by user (using user prompt) either the first 500 grades or 1000 are queried.
Hoping you understand better why I have to extremely simplify with the student case, otherwise it could cause serious confusion. Best,
- Edited by itneophyte85 Monday, November 9, 2015 9:17 PM
Monday, November 9, 2015 8:55 PM -
"No, that's not true." {Wayne}
OP: Fair enough.
"But why do you feel you need a 2-dim vector? If you have a vector of vectors of struct,
we know that a vector of structs holds the data for all persons (lines) from the file.
What does the extra dimension represent? Are you planning on storing the data from
multiple files?"OP: 2-dim vector, as I am a MATLAB thinker, so I see everything in terms of matrix :). Seriously I mean in Matlab I usually import my data as cell array, dataset array or matrix (depending on data types), so handling a 2-dim vector in C++ makes , a priori, my life easier in transferring my MATLAB codes as beyond I am transposing a logic I am familiar with.
Reverting back to your questions, in my 'real' MATLAB program, i use 4-5 data.csv + SQL
vector of vector of structs seems to be the solution, thanks for the idea. I will investigate.
Cheers Wayne,
Monday, November 9, 2015 9:10 PM