none
SQL advice needed. RRS feed

  • Question

  • I need a tool that will look up all lines from file1 in file2 (something like grep -f file1 file2), but I need it to be really fast for big files. I expect file1 to change for every lookup and to have even few million lines, file2 will stay the same, it might be updated from time to time and may contain over few billion lines (something around 150GB file size).


    file1 format: item
    file2 format: item:value


    I heard program with hashmap would be the fastest option, but apparently I'd need a lot of ram for it. So I'd like it to be done with NoSQL database like Cassandra Apache. I'd need a table with 2 rows: item and value and some program using API that would get all lines (file1) and look them all up in the table and if item match I'd like it to return item:value.


    I need it to be really fast so I'd need to know what speeds can I expect, is disk speeds the only important factor?
    Wednesday, July 3, 2019 7:32 PM

All replies