none
Database structure problem, too big RRS feed

  • General discussion

  • I don't know the best way to structure my database.

    I have a data feed that comes in a CSV format, its financial data about 65,000 different mutual funds with daily prices. In each CSV i get, each mutual fund is listed once with the price for the current day.

    What i do first is i import the CSV. But then what i want is a table for each mutual fund with prices and dates inside each table. this means i have to create 65,000 tables in a separate database, then every day read from the imported CSV file and write to the tables. The problem is this ajax query takes almost an hour each time (for each imported CSV or in other words each day)

    The logic behind this is when i want to forecast data on an individual fund later, i can just go to that fund's table, query the data out, and do what i want with the prices for that fund over a range of time which is the whole point.

    The alternative approach i think is instead of creating 65,000 tables, just leave the imported CSV tables and draw data from there on demand. To do that though i would have to join together all my imported CSV with each query before grabbing fund data, because i'd want data over a course of many months even years for a particular fund.

    Which approach is the most efficient at runtime? The way i have it now with the 65,000 tables i have an index on the date column in each case. I'm thinking if i did the other option and just queried the imports, i'd need some sort of dynamic way to index them once they're joined so i can use the date column effectively. I would need that i think. I'd need to have them all indexed by fund name and then by date in order to efficiently grab data from a query that would literally be billions or maybe trillions of rows. Is there a dynamic way to do indexes when you join tables?

    Your thoughts are much appreciated. Thanks.
    Friday, September 6, 2013 4:42 AM

All replies

  • Definitely you do not need to create 65K tables...I do not know your business requirements but perhaps you can import the file into SQL Server and either join that table with another or spilt that table into two-three tables based on the business logic...

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Friday, September 6, 2013 4:58 AM
    Moderator
  • I don't know exactly what is the business logic behind the 65K funds, but the daily creation of 65K tables does not look like a good idea. What I would try to do, is create a single table for all these 65K funds and then try to distinguish each fund by implementing a primary key (e.g. an extra column where each mutual fund has its own ID). Does this suggestion make sense?


    JODASK

    Friday, September 6, 2013 8:01 AM
  • I suggest you hire some expertise.  Dealing with large amounts of data (65k * 250 business days per year is approximately 16million rows per year) is a task that requires an advanced level of sql server knowledge as well as a good understanding of both the source data and how it will be used (what we generally call data  analysis). 

    And since funds come into and go out of existence all the time, your idea of using a single table per fund will be difficult to implement and difficult to use - EXTREMELY difficult (but not impossible).

    Friday, September 6, 2013 1:56 PM
    Answerer