locked
A P2P system (maybe) and data streaming RRS feed

  • Question

  • Let me lay out the situation - I'll translate the specifics into something that everyone can relate to: weather.

     

    Picture those little weather stations that you can buy for your house for either $29 at SprawlMart or $399 from the SkyMall catalog on the airplane.  These things are pretty easily turned into a small service which can be consumed to retireve a set of data (temperature, wind speed, humidity, etc.)

     

    Now, picture that there are actually 10's of thousands of these things all over the world.  In theory, one could setup a P2P network into which the stations would "log in".  Anyone on the network could then get weather data from any station. 

     

    So far so good.  P2P technologies so that everyone can find each other and web services at each station to expose the available data, right?

     

    Probably not.  Turns out that the weather stations in this scenario can produce up to 100 variables at up to 100 times per second, and some of the clients need some of the data in real time.  So much for polling web services.  I not only need a stateful connection over which I can stream data, but it probably needs to be IP Multicast and UDP otherwise the bandwidth requirements are going to go crazy trying to get data to 1000's of clients simultaneously.  Worse, I can't guarantee that each client will want the same data from the same station.  I could end up with 1000 connections to a particular station, each subscribed to a different combination of measurements and update frequency.  Not likely, but if it happened IP Multicast would be out the window as well.

     

    Also, P2P is atractive but I have the option of maintaining a semi-decentralized directory (a la AD which is server based but multi-master and pretty easy to distribute geographically) for folks to locate the stations.  The pros and cons of each are pretty self-evident but neither strategy seems to be a clear winner at this point.

     

    Finally, most of this data is useful if delivered with minutes or even hours of latency.  However, some of it is extreemly time sensitive and so I need the option of implementing a QoS type system on top of whatever the communications protocols look like.

     

    In my head, this is what the P2P and WCF stuff in Framework 3.5 are for.  I'm just curious if anyone else thinks this is practical at all?  The alternative of trying to roll all the data up to a couple of large servers somewhere doesn't look any simpler.

     

    Thoughts?
    Wednesday, August 15, 2007 3:40 AM

Answers

  • I hadn't really anticipated that nodes in the mesh would be able to serve up data that they weren't authoritative for.  By that I mean, if you want the data from node "A" - you need to connect to node "A".  Even if "B" is already connected to "A" and receiving data, you can't get "A"s data from "B".  Hence the IP Multicast idea.  This was how I was going to handle many, many connections to an individual node.

     

    However, that's an interresting twist on the idea.  Many of the clients are using the data for visualization purposes, and can easily tolerate much more latency than the clients performing real-time analyitcs.  Getting the data from an intermediate node may work just fine in these situations.

     

    I suppose you could implement something similar to the TTL parameter in TCP.  Each node could track how many "hops" it is away from the source.  As a client, you could make intelligent decisions about the tradeoff between getting the data from the original source and getting it from a nearby neighbor.

     

    Speaking of multicast:  What do you think about maintaining the directory of available nodes?  Multicast broadcasts to the entinre network for each directory update, or if you had the option of a server based system - would you use it?

     

    Wednesday, August 15, 2007 4:13 PM

All replies

  • I think you  can build a more sophisticated P2P solution
    so for xample a node will try  to find the lowest latency route to the data.  Each node accepts requests based on priority of the requestor (so that sites that need more real-time data would have the data before sites that don't) and of course capacity.
    A node will choose which to connect based on the latency if offers (and if the resource still accepts new connections)
    You're network will need to propogate the route tree as well (though you may be able to piggyback on the actual data packets/messages)

    HTH

    Arnon
    Wednesday, August 15, 2007 11:30 AM
  • Have you considered a third party CDN - content delivery network that uses a PTP protocol? it might save you a lot of hassle.

     

    HTH

     

    Ollie Riches

    Wednesday, August 15, 2007 1:52 PM
  • I hadn't really anticipated that nodes in the mesh would be able to serve up data that they weren't authoritative for.  By that I mean, if you want the data from node "A" - you need to connect to node "A".  Even if "B" is already connected to "A" and receiving data, you can't get "A"s data from "B".  Hence the IP Multicast idea.  This was how I was going to handle many, many connections to an individual node.

     

    However, that's an interresting twist on the idea.  Many of the clients are using the data for visualization purposes, and can easily tolerate much more latency than the clients performing real-time analyitcs.  Getting the data from an intermediate node may work just fine in these situations.

     

    I suppose you could implement something similar to the TTL parameter in TCP.  Each node could track how many "hops" it is away from the source.  As a client, you could make intelligent decisions about the tradeoff between getting the data from the original source and getting it from a nearby neighbor.

     

    Speaking of multicast:  What do you think about maintaining the directory of available nodes?  Multicast broadcasts to the entinre network for each directory update, or if you had the option of a server based system - would you use it?

     

    Wednesday, August 15, 2007 4:13 PM
  • apropos multi-cast, you can use a messaging middleware that supports topics and multicasts (Tibco Rendezvous is one example that comes to mind)
    What you get is a bus architecture where  multiple clients can listen in as well as multicast distribution

    Arnon
    Thursday, August 16, 2007 10:17 AM