locked
Relatively big data challenge RRS feed

  • Question

  • Hi,
    I'm tasked to develop a gaming web application system with relatively large database - it should hold the data about 300 million players, the orders/payment of each player - ~3B records - , and other info about the players. Note that the Players entity is subject to frequent read and writes. Every operation that a user does (login, purchase, start to play) require to load his info and validate that he's authorized. 
    I have few questions in this regard:
    1. Is there a DB technology that can simply store 300M records and return a single indexed record in a very short time (e.g. 0.3 second) ? by saying simply I mean without special techniques like sharding.
    2. Are no-sql db REALLY ready for the mission in terms of reliability, supporting tools, simplicity of maintenance, etc?
    3. I thought of splitting the entire dataset into multiple smaller database (~1 million record each), where each DB represent a subset of the players (e.g. players with id 1-100000), then some component will hold the mapping of player id to DB (user with id 33443 belong to DB xxx), this way I keep the DB lean. Whenever some cross DB data is needed (top rank player among all players in the system) I will keep this queries ready in a separate DB. Would love to hear your feedback.

    3. Kindly recommend any reading, technologies, best practices... :)

    cheers,
    Yonatan

    ...

    Thursday, May 7, 2015 9:45 AM

Answers

  • What you describe is not actually a very big database physically.  Although it will have many rows, the columns should be small.

    SQL Server, and Oracle, have many methods of handling a database this size.  You can use multiple files, or table partitioning or column store indexes, etc to resolve many performance issues.

    Your database is going to be disk bound.  You need to spend money on fast hard drives and IO.  RAM will offset some of that due to caching, but individual users will rarely be cached in the model you describe and will need to be retrieved from disk.

    Thursday, May 7, 2015 2:24 PM
    Answerer
  • I agree that a scale-up SQL Server can handle this workload, if carefully built.  You can also add a readable secondary with AlwaysOn AGs to help scale reads.

    You can also scale-out, with some help from the application.

    Microsoft Azure SQL Database as a growing set of tools for building and managing applications that use a an "elastic pool" of databases instead of a single large database.

    Get started with elastic database tools

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, May 7, 2015 2:33 PM

All replies

  • Hi Yoni (shalom)

    >>1. Is there a DB technology that can simply store 300M records and return a single indexed record in a very short time (e.g. 0.3 second) ? by saying simply I mean without special techniques like sharding.

    Yes, sure,  you need properly defined index/s

    >>>2. Are no-sql db REALLY ready for the mission in terms of reliability, supporting tools, simplicity of maintenance, etc?

    What do you mean by no-sql db? It is SQL Server forum......

    >>>3. I thought of splitting the entire dataset into multiple smaller database (~1 million record each), where each DB represent a subset of the players (e.g. players with id 1-100000), then some component will hold the mapping of player id to DB (user with id 33443 belong to DB xxx), this way I keep the DB lean. Whenever some cross DB data is needed (top rank player among all players in the system) I will keep this queries ready in a separate DB. Would love to hear your feedback.

    No, more maintenance,.. (optimization, backups end etc).. I would gp with single database,....It is possible to create a secondary file groups to keep heavy using tables separately in order to increase performance but at this stage I think you need have properly designed /normalized database 


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, May 7, 2015 10:45 AM
  • 1. Is there a DB technology that can simply store 300M records and return a single indexed record in a very short time (e.g. 0.3 second) ? by saying simply I mean without special techniques like sharding.

    Enterprise class database systems like SQL Server are certainly capable of supporting a large volume OLTP workload without resorting to techniques like "sharding".  However, the onus is on you for a good database design, efficient queries, and providing adequate hardware resources.  With relatively large tables and a highly transactional system, it is very important to have either a lot of memory (ideally enough for the frequently used data to stay memory resident) or a fast I/O subsystem (e.g. local SSD) to avoid IO and reduce latency. You will also need many CPUs (cores) to handle a lot of concurrent users.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Thursday, May 7, 2015 12:05 PM
    Answerer
  • Thanks Uri and Dan.

    Frankly, few DB experts I spoke with were a bit skeptical about the feasibility of traditional data model to cope with these numbers. They suggested some logical separation of DB. Might be that these advises were inspired by the trend of moving to new database models and technologies.

    Do I understand correctly that the single DB approach (your recommendation) is limited to vertical scaling (e.g. more server resources) only? should I opt for SQL Server, it's important for me to backup my decision with use-cases, are there publicly available use-case/benchmarks that deal with this question?

    Thanks again.


    ...

    Thursday, May 7, 2015 12:17 PM
  • I have no experience with MySql but Oracle also can handle such amount of the data + frequently running transactions as well. Do you plan install SQL Server instance on the client machine? I have a client who had 1TB database and perform a searching on some fields with a requirement to less than 1 sec return one-two records ... It worked very well.  

    >>>Are there publicly available use-case/benchmarks that deal with this question?

    Try search on Google ...I think that should be combined with strong server , fast disks, CPU, memory +  properly designed database, very well written queries (no UDF, no cursors)


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, May 7, 2015 12:39 PM
  • You've been tasked with a challenging job. As Guzman said you need a lot, but really a lot, of memory and a lot of CPU. The server side is fundamental for your task but in the same time you must apply the best schema design, bearing in mind the importance of the writing and even more the performance in reading the data (gaming need to have information live without any latency). 
    Thursday, May 7, 2015 12:42 PM
  • You could also think of a polygot approach, storing some of the data in SQLServer and some in a noSQL DB.

    Mainly usage data that is of WORM nature like orders details, scores, history of actions... can be stored and retrieved easily from a noSQL DB.

    Players balances, details... might need transnational ability store them in SQLServer. I think if you do a breakdown you will find that only a small amount of your schema will need these capabilities.

    This answers requirements described in 1. regarding 2. Twitter, Linkedin, Facebook... have all proved that noSQL technologies can be used effectively to store usage data.

    Regarding 3. there is definitely a concern around the maintenance as well as a question on how can you shared an existing database that grow too large over time.

    A polygot approach will introduce challenges like syncing and handling failures of one or another system. But these are known problems, take a look at this book, "Data Access for Highly-Scalable solutions: Using SQL, NoSQL and Polygot Persistence".
    https://msdn.microsoft.com/en-us/library/dn271399.aspx

    Good Luck

    Thursday, May 7, 2015 12:49 PM
  • What you describe is not actually a very big database physically.  Although it will have many rows, the columns should be small.

    SQL Server, and Oracle, have many methods of handling a database this size.  You can use multiple files, or table partitioning or column store indexes, etc to resolve many performance issues.

    Your database is going to be disk bound.  You need to spend money on fast hard drives and IO.  RAM will offset some of that due to caching, but individual users will rarely be cached in the model you describe and will need to be retrieved from disk.

    Thursday, May 7, 2015 2:24 PM
    Answerer
  • I agree that a scale-up SQL Server can handle this workload, if carefully built.  You can also add a readable secondary with AlwaysOn AGs to help scale reads.

    You can also scale-out, with some help from the application.

    Microsoft Azure SQL Database as a growing set of tools for building and managing applications that use a an "elastic pool" of databases instead of a single large database.

    Get started with elastic database tools

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, May 7, 2015 2:33 PM