joshua schachter's blog

tag mockeryoverclocking the lecture

beyond rest

Rabble and Kellan's presentation, "Beyond REST? Building data services with XMPP" is both a great idea as well as a good introduction to coping with massive amount of traffic that large systems have to service.

A publish/subscribe architecture is natural to other problem domains such as instant messaging and financial data systems (Tibco, Reuters, and so on).

Similarly, Brad Fitzpatrick implemented something similar as a never-ending Atom feed a few years ago for Livejournal (sans XMPP, which wasn't as conceptually prevalent then.)

One important point in the presentation is that, for example, a single application would poll Flickr approximately three million times in a day to fetch only several thousand updates. At Delicious we saw a similar level of polling activity, made somewhat worse by speculative querying (hitting the URL information pages to see if there was any data for arbitrary URLs, which was generally unlikely.)

One solution that ocurred to me at the time was to build a simple callback system over HTTP. This would fall comfortably between full polling and full persistent publish/subscribe. The clever acronym even writes itself: PIMP Is Mostly Push, although maybe PRSS (Push RSS) would be slightly more polite.

Simply described, instead of polling frequently, a client would send a normal HTTP request with the resource to be subscribed to and an endpoint to deliver updates to: http://your.app/subscribe?resource=/some/user&callback=http://my.app/endpoint

Presumably the endpoint would then receive RSS item fragments when and only when that resource updated. For security, the exchange should include some kind of token, borowing from the appropriate protocols. The subscription would lapse after, say, 24 hours, or that could be passed in as a parameter.

In some ways this is slightly more elegant than the XMPP solution as neither side has to maintain a dedicated long-running process. A simple server-side implementation would justfetch items from a work queue and send out HTTP messages. A simple implementation on the client side would be a plain old web page that could accept and process a POST request. There are a number of people on inexpensive service providers who have at best web scripting hosting and not much else. The case where Delicious/Twitter/Flickr pushes my own items (and not much else) up to my blog is an important one. Additionally, there would not need to be any persistent TCP connections, which is probably more efficient in server resources (but less efficient in network resources; for billions of messages the TCP overhead becomes significant).

Of course, callbacks are totally infeasible for a variety of other uses, especially for mobile or desktop applications (which are likely to be firewalled).