none
Is Peer-to-Peer replication designed only for scaling out reads. Is it bad to use P2P as an update anywhere topology.

    Question

  • Statement: P2P is designed to scale out reads, although many people wrongly use them as an update anywhere topology.

    Is this statement saying if my subscribers will be doing a lot of writes I should use Merge replication instead of P2P replication. Question: Under this statement, why is it bad to use P2P replication, please elaborate? What will happen. Will servers lock up every week on me or something such as that.

    Thanks,

    Tuesday, November 20, 2012 8:15 PM

Answers

  • If you do decide to do update anywhere then you will need to partition the writes so that only rows belonging to a particular node will be written to from that node to avoid conflicts.  We discusses this technique in this thread:  http://social.msdn.microsoft.com/Forums/en-US/sqlreplication/thread/3df3dc6e-bcc1-48ad-a3e6-346421d80394/#b0491e2c-751d-4e05-9d4d-461867ece534


    Brandon Williams (blog | linkedin)

    Tuesday, November 20, 2012 10:58 PM
  • P2p was designed to scale out reads. All changes should occur on one node and then flow to all other nodes in a p2p topology where they will be read.

    However, as soon as it was out the door, customers started to use it for load balancing and it is not good for this due to the latency involved. With the latency involved you may insert data on one node and seconds later query for it on another node and it is not there yet.

    Then you might insert data on one node and then try to add children to it on another node and the parent is not there yet. Or you may update a row on one node and it is deleted on another.

    This is the problem with using p2p for load balancing. You are IIRC trying to use it for HA.  Due to the latency this is a bad choice as your exposure to data loss is quite high with replication.

    With merge replication you have the same issues, but with the tracking mechanism latency is much higher and scalability is limited when you make the changes as triggers must fire to track the changes.

    With merge you will experience deadlocking under load, with p2p there will be more IO contention as the writes must be written on each node and then there is logic to see if the data has been applied on each node.

    So with merge your scalability is limited, with p2p unless architected carefully you will notice more IO and you may notice more conflicts.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Tuesday, November 20, 2012 11:36 PM
  • It really depends on many factors. You may find that p2p will saturate your network, the upper limit was 10 nodes in a p2p topology in a test environment. You may find that you saturate your network at a smaller number of nodes, or you may be able to scale beyond that depending on your workload.

    If you expect a lot of writes you will find that merge replication will not scale - but you have to define what you mean by "a lot of writes".

    With p2p, the change is replicated to each node in the topology and each table participating has a column on it with an identifier. This identifier is used to see if the change coming in from each node has already being applied on the row.

    But the change tracking with p2p is asynchronous, whereas with merge replication it is synchronous.

    You have to test on a representative topology with a your network to see what works best for you.

    I would tempted to go with merge replication for 10 subscribers BUT if your workload is in the 1k tps per second I don't think that will work for you.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Wednesday, November 21, 2012 2:17 PM
  • Are you going over a WAN? How many inserts per second?

    The answers you are looking for can really only be answered by you using a representative test. You have to see what works for you. Otherwise it is just guessing.

    I set up such a topology for a telco which was doing 2000 tps and merge did not work for them between their NYC and Charlotte offices. It may work well for you.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Wednesday, November 21, 2012 3:18 PM

All replies

  • If you do decide to do update anywhere then you will need to partition the writes so that only rows belonging to a particular node will be written to from that node to avoid conflicts.  We discusses this technique in this thread:  http://social.msdn.microsoft.com/Forums/en-US/sqlreplication/thread/3df3dc6e-bcc1-48ad-a3e6-346421d80394/#b0491e2c-751d-4e05-9d4d-461867ece534


    Brandon Williams (blog | linkedin)

    Tuesday, November 20, 2012 10:58 PM
  • P2p was designed to scale out reads. All changes should occur on one node and then flow to all other nodes in a p2p topology where they will be read.

    However, as soon as it was out the door, customers started to use it for load balancing and it is not good for this due to the latency involved. With the latency involved you may insert data on one node and seconds later query for it on another node and it is not there yet.

    Then you might insert data on one node and then try to add children to it on another node and the parent is not there yet. Or you may update a row on one node and it is deleted on another.

    This is the problem with using p2p for load balancing. You are IIRC trying to use it for HA.  Due to the latency this is a bad choice as your exposure to data loss is quite high with replication.

    With merge replication you have the same issues, but with the tracking mechanism latency is much higher and scalability is limited when you make the changes as triggers must fire to track the changes.

    With merge you will experience deadlocking under load, with p2p there will be more IO contention as the writes must be written on each node and then there is logic to see if the data has been applied on each node.

    So with merge your scalability is limited, with p2p unless architected carefully you will notice more IO and you may notice more conflicts.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Tuesday, November 20, 2012 11:36 PM
  • If you do decide to do update anywhere then you will need to partition the writes so that only rows belonging to a particular node will be written to from that node to avoid conflicts.  We discusses this technique in this thread:  http://social.msdn.microsoft.com/Forums/en-US/sqlreplication/thread/3df3dc6e-bcc1-48ad-a3e6-346421d80394/#b0491e2c-751d-4e05-9d4d-461867ece534



    If we do updates at all (with less than 10) Subscribers which is better Merge replication or Peer-to-Peer replication.
    Wednesday, November 21, 2012 7:33 AM
  • P2p was designed to scale out reads. All changes should occur on one node and then flow to all other nodes in a p2p topology where they will be read.

    However, as soon as it was out the door, customers started to use it for load balancing and it is not good for this due to the latency involved. With the latency involved you may insert data on one node and seconds later query for it on another node and it is not there yet.

    Then you might insert data on one node and then try to add children to it on another node and the parent is not there yet. Or you may update a row on one node and it is deleted on another.

    This is the problem with using p2p for load balancing. You are IIRC trying to use it for HA.  Due to the latency this is a bad choice as your exposure to data loss is quite high with replication.

    With merge replication you have the same issues, but with the tracking mechanism latency is much higher and scalability is limited when you make the changes as triggers must fire to track the changes.

    With merge you will experience deadlocking under load, with p2p there will be more IO contention as the writes must be written on each node and then there is logic to see if the data has been applied on each node.

    So with merge your scalability is limited, with p2p unless architected carefully you will notice more IO and you may notice more conflicts.


    So, if I expect to do a lot of writes with less than 10 Subscribers which is better Merge replication or Peer-to-Peer replication. Why is your replication recommendation better.

    Thanks, 

    Wednesday, November 21, 2012 7:36 AM
  • It really depends on many factors. You may find that p2p will saturate your network, the upper limit was 10 nodes in a p2p topology in a test environment. You may find that you saturate your network at a smaller number of nodes, or you may be able to scale beyond that depending on your workload.

    If you expect a lot of writes you will find that merge replication will not scale - but you have to define what you mean by "a lot of writes".

    With p2p, the change is replicated to each node in the topology and each table participating has a column on it with an identifier. This identifier is used to see if the change coming in from each node has already being applied on the row.

    But the change tracking with p2p is asynchronous, whereas with merge replication it is synchronous.

    You have to test on a representative topology with a your network to see what works best for you.

    I would tempted to go with merge replication for 10 subscribers BUT if your workload is in the 1k tps per second I don't think that will work for you.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Wednesday, November 21, 2012 2:17 PM
  • If you expect a lot of writes you will find that merge replication will not scale - but you have to define what you mean by "a lot of writes".

    We will be doing inserts and updates without deletes.

    Thanks,

    Wednesday, November 21, 2012 3:06 PM
  • Are you going over a WAN? How many inserts per second?

    The answers you are looking for can really only be answered by you using a representative test. You have to see what works for you. Otherwise it is just guessing.

    I set up such a topology for a telco which was doing 2000 tps and merge did not work for them between their NYC and Charlotte offices. It may work well for you.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Wednesday, November 21, 2012 3:18 PM
  • Are you going over a WAN? How many inserts per second?

    It will be over a Wide Area Network with many inserts per second. I don't know excactly how many inserts per second. I'd be guessing if I said anything.

    Thanks,

    Wednesday, November 21, 2012 3:28 PM