none
Architecture for algorithmic strategy host application RRS feed

  • Question

  • Hi Everybody,

    I'm looking for some input on architectural design for a financial trading application, which is going to act as a host for multiple strategies. Think of it as kind of an IIS model. The strategies (websites), have access to IIS core functionality but yet are totally separated so if one site crashes none of the others are affected.
    The problem I'm facing here is that I don't think the AppDomain model really works here because of latency/performance issues of passing data from the core to the individual strategies and vice versa.
    For example, the individual strategies need to consume market data. The core is connected to a data vendor and needs to pass on the data to the individual strategies which easily exceeds the rate of 10k events/sec. So, I think if I were to marshal those events across AppDomains it's going to introduce too much latency.
    Thus, I'm wondering how IIS achieves this ... basically it receives all http requests at one entry point and finally passes it on to the actual website. Are those requests really marshalled across AppDomains or how does it work?
    Any insight and guidance is greatly appreciated!

    Thanks,

    Tom
    Wednesday, August 8, 2007 3:01 AM

Answers

  • Hi Nick & Arnon,

    thanks a lot for your replies.
    Sorry, that I didn't provide more context but if I were to I could write an entire book on it, so I tried to minimize it to the core problem Smile
    Nick, your suggestion is interesting but not appropriate for my specific use case. I'm not worried about inter- computer communication here only about intra- computer communication.
    If we notice that one of the severs gets stressed too much by the running strategies we simply bring up another node on a separate machine and hand off the external market data that we receive to that node as well (this is actually IP multicasted from the external provider). There needs to be no communication between the nodes (or even strategies for that matter) except some minor status information which I would achieve by having the "sub" nodes report to a master node on a periodic basis.
    State is persisted on our high performance SAN with gigabit backplanes, so we can easily fail over between machines and restore state and don't need continous state synchronization (distributed caching) as in the spaces architecture.

    The only thing that I'm worried about here is that if I take out the market data and execution gateway from the actual strategy host appdomain/process, I need to serialize data to communicate between them (intra machine only). Basically, after receiving the external market data via IP multicast, the market data gateway has to deserialize that data process it and serialize it again in order to send it to the strategy host's individual strategy that has subscribed to that specific content.
    Basically, I had in mind to have one strategy host process that runs several strategies in their own appdomains and each of those strategies maintains their own connection with the market data/execution gateway (via tcp/ip).

    For the execution I'm not really worried actually because I receive the transactions in wire format  (FIX - ASCII encoded protocol) and can simply pass it on in that format, so the execution gateway simply becomes another routing hub.

    I'm not sure if this makes sense to everyone here but at least typing this helps me to think about the process flow a bit more and consider alternate solutions Big Smile

    Thanks everyone,

    Tom
    Sunday, August 12, 2007 2:56 PM
  • Hi Ollie,

    your thinking is right but causes a problem with the approach that I'm currently favoring.
    I think I need to clarify the terms that I've been using here:
    By "node" I meant one strategy host per machine. However, I want to run multiple strategies (in separate AppDomains) on one strategy host and as I've layed out previously I'd like each individual strategy to establish a tcp pub/sub connection to the content gateway (e.g. for market data) and subscribe to specific content. I'd like the strategies to maintain those connections rather than the strategy host, so I don't need to marshal the data from the host across appdomains to the strategies.
    So, if I were to just pass on the raw data to all the strategies without prefiltering, it would really hurt performance as CPU usage for market data consumption would basically double with every added strategy.
    Does my approach make sense that way or do you see any potential issues?

    Thanks,

    Tom

    Tuesday, August 14, 2007 2:56 AM

All replies

  • anyone who can help out here? Smile
    Friday, August 10, 2007 2:03 AM
  • Are events routed to a specific strategy per event type?

    Do you really need to process this in real time?

     

    IIS 6/ASP.NET loads a managed workerrequest data in a managed isapiruntime that runs on the same app domain. The data that runtime gets its data from some unmanaged isapi interfaces.

     

    I think it will go well if you group events data, doing a small number of cross domain method calls per period of time (perhaps each couple secs). I suggest you take/plan some time to do some simple proof of concept.

     

    Regards,

    Freddy Rios

    Friday, August 10, 2007 9:44 PM
  • Freddy, thanks for your reply.
    Some events are routed to the strategy on a pub/sub basis and some are broadcasted amongst all running strategies. It's necessary to process this realtime as latency is very critical here thus your suggested approach unfortunately doesn't work for this. Events have to be handed off and processed within microseconds.
    Having said that and after doing some more research I'm really having a hard time finding a feasible approach that achieves a high level of isolation as well as really good performance. I've designed several systems like this before but none of them ever achieved an appdomain isolation between the individual strategies.. so one strategy could potentially bring down the entire system.
    One idea I have is to strip out the market data and execution backend and maintain an individual connection between them and several strategy instances via named pipes. However, I'm not sure if it's gonna be fast enough as I fear that the serialization overhead might pose too much of a performance killer.

    I'm open to any other suggestions Smile

    Thanks,

    Tom
    Saturday, August 11, 2007 5:20 AM
  •  

    To be completely honest, Tom, I'm only getting a keyhole view of the complexity of your application.  I read your original post and the reply a couple of times, but I don't have enough context to offer insights that may challenge basic assumptions.  As an architect, I find that most dead ends are created when a wrong turn was made well up the road... without seeing the big picture, it is hard to offer the most useful advice.

     

    So, I'll revert to principles.  (occupational hazard... I'm an EA). 

     

    Principle 1) find the highest (performance) thing that matches your (and your company's) ability to leverage it, and stick to it, without changing it.  (I put performance in parens, because you could substitute in any of the quality attributes that you care passionately about.  If you were a durability person, I'd use the word 'durability' in that sentence.)  The key here is "without changing it."  Someone has solved your problem.  Attach yourself to their solution.

     

    I'm going to guess at a solution.  Like I said, I don't know enough to know if it will work.  But I would guess that you can configure your strategies as HTTP modules (web.config allows you to set up different URIs to different modules).  Then simply pump your messages to IIS itself, and have the modules act independently to interpret the messages.

     

    To speed up routing, consider IP Multicast.  The source of the data can run through configured hardware that routes the messages to multiple destinations depending on simple routing rules.  Since it is happening at wire-speeds, you get the benefits of optimal speed.

     

    I really don't know enough about how these strategy modules will work to guess at how many of them you need, so I cannot discuss scalability or distribution.

     

    I hope this response was helpful.  If not... well, I tried.  Either way, I wish you the best of luck.

    --- Nick
    Saturday, August 11, 2007 4:34 PM
  • Hello Tom,
    You should take this with a grain of salt, since as Nick said there's not a whole of context in your description. However from what I understand, It does sound to me that you application is suitable for a space based architecture. See for example gigaspaces, which, by the way is used in several financial applications.

    HTH,
    Arnon
    Sunday, August 12, 2007 9:18 AM
  • Hi Nick & Arnon,

    thanks a lot for your replies.
    Sorry, that I didn't provide more context but if I were to I could write an entire book on it, so I tried to minimize it to the core problem Smile
    Nick, your suggestion is interesting but not appropriate for my specific use case. I'm not worried about inter- computer communication here only about intra- computer communication.
    If we notice that one of the severs gets stressed too much by the running strategies we simply bring up another node on a separate machine and hand off the external market data that we receive to that node as well (this is actually IP multicasted from the external provider). There needs to be no communication between the nodes (or even strategies for that matter) except some minor status information which I would achieve by having the "sub" nodes report to a master node on a periodic basis.
    State is persisted on our high performance SAN with gigabit backplanes, so we can easily fail over between machines and restore state and don't need continous state synchronization (distributed caching) as in the spaces architecture.

    The only thing that I'm worried about here is that if I take out the market data and execution gateway from the actual strategy host appdomain/process, I need to serialize data to communicate between them (intra machine only). Basically, after receiving the external market data via IP multicast, the market data gateway has to deserialize that data process it and serialize it again in order to send it to the strategy host's individual strategy that has subscribed to that specific content.
    Basically, I had in mind to have one strategy host process that runs several strategies in their own appdomains and each of those strategies maintains their own connection with the market data/execution gateway (via tcp/ip).

    For the execution I'm not really worried actually because I receive the transactions in wire format  (FIX - ASCII encoded protocol) and can simply pass it on in that format, so the execution gateway simply becomes another routing hub.

    I'm not sure if this makes sense to everyone here but at least typing this helps me to think about the process flow a bit more and consider alternate solutions Big Smile

    Thanks everyone,

    Tom
    Sunday, August 12, 2007 2:56 PM
  • Tom,

     

    I have question about the following statement:

     

    'Basically, after receiving the external market data via IP multicast, the market data gateway has to desterilize that data process it and serialize it again in order to send it to the strategy host's individual strategy that has subscribed to that specific content.'

     

    I wanted to ask why does the market data gateway have to deserialize, process, serialize etc the data before forwarding onto the strategies, it sounds as though the market data gateway has knowledge of the strategies and the individual data required. Now I could be wrong about this but it seems to me that your market data gateway would be better suited to being purely a re-broadcast process and not a transform, mapping and routing process - essentially a broadcast mechanism that takes an external third party feed and re-broadcasts the data on an internal network where each node (strategy) listens to the network and extracts the data required as required, hence de-coupling the strategies (node) from the market data gateway.

     

    Now 'what a node is' I guess is what you were originally asking and how to get best performance, separation and resilience across the nodes. If I am right with the above scenario nodes could be one per process, multiple per process, one per machine etc - It all depends on the cost of running a node (strategy).

     

    Just my two pennies worth...

     

    Ollie Riches

     

    Monday, August 13, 2007 9:41 AM
  • Hi Ollie,

    your thinking is right but causes a problem with the approach that I'm currently favoring.
    I think I need to clarify the terms that I've been using here:
    By "node" I meant one strategy host per machine. However, I want to run multiple strategies (in separate AppDomains) on one strategy host and as I've layed out previously I'd like each individual strategy to establish a tcp pub/sub connection to the content gateway (e.g. for market data) and subscribe to specific content. I'd like the strategies to maintain those connections rather than the strategy host, so I don't need to marshal the data from the host across appdomains to the strategies.
    So, if I were to just pass on the raw data to all the strategies without prefiltering, it would really hurt performance as CPU usage for market data consumption would basically double with every added strategy.
    Does my approach make sense that way or do you see any potential issues?

    Thanks,

    Tom

    Tuesday, August 14, 2007 2:56 AM
  • Hi Tom,

     

    As the others have said it is impossible to tell from this discussion whether you are going to run into any issues with what you have described, but from what you have described it sounds an acceptable approach.

     

    What are the charactistics of the data arriving at the content gateway?

     

    Ollie Riches

    Tuesday, August 14, 2007 8:56 AM