None Relational Storage Design Questions

Answered None Relational Storage Design Questions

  • Wednesday, August 29, 2012 2:05 PM
     
     

    Hello,

    I have a website and it is designed to use relational database and now i want to give up that and use the Table storage of windows azure. lets say it is social website where you post on your friends walls (i will call the post "Status") and comment on each Status. so the database was like this

    User(ID, Name ... ), Status (FromID, ToID, Subject, ... ) , Comment (ID, StatusID, Subject ... ).

    so the status belongs to two users, the one who submits it, and the one who has it on his wall.

    now that i want the table storage i thought to put everything in one table for Users partitioned by the ID of the user:

    public class User

    {

         public Guid ID { get; set; }

         public List<Status> Statuses { get; set; } // the statuses the user submits to others

         ....

    }

    while the Status class has the List<Comments> property to expose related comments.

    but i thought that design is not really what i want, because i want when i open a friends page i get the statuses posted on his wall not the statuses he posted on other's walls. so i changed the design to be

    public class User

    {

         ....

         public List<Status> Status { get; set; } // the statuses on this user's wall

         ....

    }

    then i thought what if in the future i want to have two tabs, one shows me what i have posted on others walls, and the other tab shows me what others have posted on my wall. so now how the partitioning like that will slow the website when i want to get the statuses i have posted. another scenario is, say we have advertisign website where users can brows "Companies" and their "Products" and we have one table for Companies (class Company has List<Product>) partitioned by the Company ID. that will be fine while the user is browsing. but when the user goes to search for a "Product" by the property "Color" this partition will slow down his search.

    so what is the best partitioning design for these cases? if i am totally wrong in the design please guid me since it is my early days with azure.

    another question is regarding to identifying rows. any other "simpler" solution than Guid to identify my rows? you know Guid look ugly and make the query string hard to read thus the URI hard to share if you want to have it printed on paper or so.

    Thank You.


    Alan-SY

All Replies

  • Wednesday, August 29, 2012 3:28 PM
     
     

    Hi,

    If you want more complex queries, you may want to look at Azure SQL Databases (formally SQL Azure). If your application already uses relational storage it will be a lot easyer to migrate.

    Is there a compelling reason for using table storage?

    >>another question is regarding to identifying rows. any other "simpler" solution than Guid to identify my rows?

    You can use whatever you like, as long as its unique. YouTube uses a shorter identifyer for it's videos, maybe something similar could work.

    Regards,

    Alan


    http://www.CloudCasts.net - Community Webcasts Powered by Azure

  • Wednesday, August 29, 2012 3:35 PM
     
     

    Hello Alan, i do really want the table service. for the scaling and for the cheaper price. thanks :).


    Alan-SY

  • Wednesday, August 29, 2012 9:56 PM
     
     Proposed

    Hello Alan, i do really want the table service. for the scaling and for the cheaper price. thanks :).


    Alan-SY

    Hi Alan

    I can understand that going for the NoSQL approach (Windows Azure Table Storage) for the arguments you pointed, seems to be enough reasons to not even consider SQL Azure or even SQL Server. Although, from what you described, it appears that eventually you might end up needing to perform complex queries (or not so complex - like: Count, OrderBy, Distinct, etc), on Table Storage... And you´ll find out that you just can´t, because the service doesn's support it!

    Before making these types of decisions (SQL vs NoSQL), I encourage you reading the Query Operators (Table Service Support). And check the list of all those unsupported query operators... Is it something that you can afford giving away for scalling and costs?

    Keep in mind that SQL Azure is also quite scallable, using Federations. You could even consider going for a (more complex) hybrid solution: Table Storage + SQL Azure.

    I also encourage you reading this MSDN article: Windows Azure Table Storage and Windows Azure SQL Database - Compared and Contrasted

    Technology Selection Considerations

    When determining which data storage technology fits the purpose for a given solution, solution architects and developers should consider the following recommendations.

    As a solution architect/developer, consider using Windows Azure Table Storage when:

    • Your application must store significantly large data volumes (expressed in multiple terabytes) while keeping costs down.
    • Your application stores and retrieves large data sets and does not have complex relationships that require server-side joins, secondary indexes, or complex server-side logic.
    • Your application requires flexible data schema to store non-uniform objects, the structure of which may not be known at design time.
    • Your business requires disaster recovery capabilities across geographical locations in order to meet certain compliance needs. Windows Azure tables are geo-replicated between two data centers hundreds of miles apart on the same continent. This replication provides additional data durability in the case of a major disaster.
    • You need to store more than 150 GB of data without the need for implementing sharding or partioning logic. 
    • You need to achieve a high level of scaling without having to manually shard your dataset.

    As a solution architect/developer, consider using Windows Azure SQL Database when:

    • Your application requires data processing over schematic, highly structured data sets with relationships.
    • Your data is relational in nature and requires the key principles of the relational data programming model to enforce integrity using data uniqueness rules, referential constraints, and primary or foreign keys.
    • Your data volumes might not exceed 150 GB per a single unit of colocated data sets, which often translates into a single database. However, you can partition your data across multiple sets to go beyond the stated limit. Note that this limit is subject to change in the future.
    • Your existing data-centric application already uses SQL Server and you require cloud-based access to structured data by using existing data access frameworks. At the same time, your application requires seamless portability between on-premises and Windows Azure.
    • Your application plans to leverage T-SQL stored procedures to perform computations within the data tier, thus minimizing round trips between the application and data storage.
    • Your application requires support for spatial data, rich data types, and sophisticated data access patterns through consistent query semantics that include joins, aggregation, and complex predicates.
    • Your application must provide visualization and business intelligence (BI) reporting over data models using out-of-the-box reporting tools.

    Hope this helps!


    Best Regards,
    Carlos Sardo





    • Proposed As Answer by Carlos Sardo Wednesday, August 29, 2012 9:58 PM
    • Edited by Carlos Sardo Wednesday, August 29, 2012 10:30 PM
    •  
  • Thursday, August 30, 2012 6:07 AM
     
     
    Hello Carlos, thank you so much for the reply it is helpful as well. but here i am curious about how to design for table storage. even if you and Alan think relational database fit better in my scenario. i am still curious about how to solve the problem with the table storage. just now i found that i can't save statuses in a list in the user class. because data types for table properties are very limited. and the row size is really very small. so now i have to separate the entities (a user entity and status entity) but here i don't know how to partition the status table? and i am confused if the table is used for large amount of data why the row size is very small like that? that can't hold an image! and it is not relational so it is not designed to get data from multiple tables. so i really wonder how would this Table be used for (Your application stores and retrieves large data sets)??? maybe i lack to a practical example in table design. resources are really not practical at all and just talking about adding and deleting entities.

    Alan-SY


    • Edited by Alan-Sy Thursday, August 30, 2012 10:35 AM
    •  
  • Thursday, August 30, 2012 1:03 PM
     
     Answered

    ok i found this page could help. http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx

    in the (Considering Query) section. but that page still doesn't explain the terms of (big partition) and (many partitions) by numbers.


    Alan-SY