<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
   <channel>
      <title>joshua's blog</title>
      <link>http://joshua.schachter.org/</link>
      <description />
      <language>en</language>
      <copyright>Copyright 2008</copyright>
      <lastBuildDate>Sun, 02 Nov 2008 00:14:21 -0800</lastBuildDate>
      <generator>http://www.sixapart.com/movabletype/?v=3.2</generator>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs> 

            <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/JoshuaSchachter" type="application/rss+xml" /><item>
         <title>overclocking the lecture</title>
         <description><![CDATA[The growth of both bandwidth and storage mean that in the last few years practically everyone from individuals to large universities have begun putting lectures and talks online. While I can easily pick out a dozen or a hundred videos that that would be fascinating and educational, I am hamstrung by my short attention span, and I drift off almost immediately. Not to mention the fact that one browser crash or accidental tab closure loses my place and probably the video itself as well.<p>
After tinkering a while, I've managed to figure out a way to cut down the time it takes to watch a video. This works for me, on my Mac; your mileage may vary:
<ol>
<li> Make sure you have the appropriate codecs installed. I generally use the <a href="http://perian.org/">Perian</a> codec package. I additionally find that some FLVs require QTPro to be installed; it's not very expensive.
<li> Download the video somehow. Some sites, like Google Video, let you download a copy. Others, like YouTube, do not allow this. However, most embedded flash video can be grabbed via the technique in the bottom video in <a href="http://perian.org/#watch">the demo videos at Perian</a>.
<li> Open the video in QuickTime. The video is now happily outside the browser.
<li> Go to Window &rarr; Show A/V Controls; change the playback speed in the relevant window. I find that 2.0x generally works pretty well; the video will be faster and the audio is a little clipped but nicely de-chipmunked.
<li> Enjoy your new lecture! The glacial discussion now arrives at a rapid-fire pace. You'll be too busy trying to keep up to play Desktop Tower Defense, and you'll be done in a half hour.
</ol>
<p>
<a href="http://www.flickr.com/photos/joshu/2994666728/"><img src="http://farm4.static.flickr.com/3071/2994666728_6a1b86249d_o.png" width="395" height="842" alt="how to watch lectures faster" /></a>
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/439798812/overclocking-lecture.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/11/overclocking-lecture.html</guid>
         <category>complaining</category>
         <pubDate>Sun, 02 Nov 2008 00:14:21 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/11/overclocking-lecture.html</feedburner:origLink></item>
            <item>
         <title>amateur economist</title>
         <description><![CDATA[<p>Ever since seeing a presentation by <a href="http://doloreslabs.com/">Dolores Labs</a> about Amazon's <a href="http://mturk.com/">Mechanical
Turk</a>, I've been itching for an excuse to play with the system.</p>

<p>I recently saw a <a href="http://news.ycombinator.com/item?id=295822">thread</a> that highlights the distinction between 
expected value and utility. Would you take a more likely but lower payoff 
instead of a less likely but higher payoff?  Similarly, the <a href="http://en.wikipedia.org/wiki/St._Petersburg_paradox">St. Petersburg 
Paradox</a> takes the problem to its logical extreme. By constructing a game 
that has a series of increasingly rare payoffs of increasingly larger size, 
a game with infinite expected value is created.</p>

<p>So I constructed 21 versions of the questions, varying the size of the dollars
as well as the rate of payoff for the second outcome. </p>

<p><a href="http://www.flickr.com/photos/joshu/2851824479/" title="Example Question"><img src="http://farm4.static.flickr.com/3097/2851824479_884f680e33.jpg" width="500" height="231" alt="Example Question" /></a></p>

<p>For one cent apiece, I sent the questions to be answered by one hundred
people each, and collated the results. 2100 questions, three hours,
and thirty dollars later, I have my results.</p>

<p><a href="http://www.flickr.com/photos/joshu/2851822883/" title="Batch_3890_result.csv"><img src="http://farm4.static.flickr.com/3292/2851822883_9a926c6030_o.png" width="318" height="335" alt="Batch_3890_result.csv" /></a></p>

<p>Clearly, people (or at least these Turks) begin to cross over
at larger values, reaching equilibrium at around $1,000.</p>
<p><a href="http://www.flickr.com/photos/joshu/2851817247/"><img src="http://farm4.static.flickr.com/3046/2851817247_c3cb2335f7.jpg" width="500" height="346" /></a></p>

<p>While this isn't the most groundbreaking work, it is nice to be able to
generate an experiment and gather the results in the course of an evening
and then have the results be so pleasing.</p>

<p>The Mechanical Turk is presented as a way to solve problems that
are easily explained to people but difficult to implement for computers,
frequently described as "artificial artificial intelligence."
However, I think some of the most intriguing uses yet will be to explore
the edges of our own uniquely human behavior and self-understanding.</p>]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/391288370/amateur-economist.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/09/amateur-economist.html</guid>
         <category>data</category>
         <pubDate>Fri, 12 Sep 2008 20:46:39 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/09/amateur-economist.html</feedburner:origLink></item>
            <item>
         <title>beyond rest</title>
         <description><![CDATA[<a href="http://anarchogeek.com/2008/7/23/beyond-rest-building-data-services-with-xmpp-pubsub">Rabble</a> and <a href="http://laughingmeme.org/2008/07/23/beyond-rest-building-data-services-with-xmpp/">Kellan's</a> presentation, "<a href="http://www.slideshare.net/kellan/beyond-rest">Beyond REST? Building data services with XMPP</a>" is
both a great idea as well as a good introduction to coping with massive amount of traffic that
large systems have to service.
<p>
A publish/subscribe architecture is natural to other problem domains
such as instant messaging and financial data systems (Tibco, Reuters, and so on).
<p>
Similarly, <a href="http://brad.livejournal.com/2143713.html">Brad Fitzpatrick implemented</a> something similar as a never-ending Atom feed 
a few years ago for Livejournal (sans XMPP, which wasn't as conceptually prevalent then.)
<p>
One important point in the presentation is that, for example, a single application would poll
Flickr approximately three million times in a day to fetch only several thousand updates.
At Delicious we saw a similar level of polling activity,
made somewhat worse by speculative querying (hitting the <a href="http://del.icio.us/url/a2269ff72a91edf6178c7ea060c43564">URL information pages</a> to see if
there was any data for arbitrary URLs, which was generally unlikely.)
<p>
One solution that ocurred to me at the time was to build a simple callback system over HTTP.
This would fall comfortably between full polling and full persistent publish/subscribe. 
The clever acronym even writes itself:
PIMP Is Mostly Push, although maybe PRSS (Push RSS) would be slightly
more polite.
<p>
Simply described, instead of polling frequently, a client would send a normal HTTP 
request with the resource to be subscribed to and an endpoint to deliver updates to: 

<code>http://your.app/subscribe?resource=/some/user&callback=http://my.app/endpoint</code>
<p>
Presumably the endpoint would then receive RSS item fragments when and only when that 
resource updated. For security, the exchange should include some kind of token, borowing from the appropriate protocols. The subscription would lapse after, say, 24 hours, or that could
be passed in as a parameter.
<p>
In some ways this is slightly more elegant than the XMPP solution as neither side has to  maintain a dedicated long-running process. A simple server-side implementation would justfetch items from a <a href="http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone">work queue</a> and send out HTTP messages.  A simple implementation on the client side would be a plain old web page that could accept and process a POST request. There are a number of people on inexpensive service providers who have at best web scripting hosting and not much else.  The case where Delicious/Twitter/Flickr pushes my own items (and not much else) up to my blog is an important one.  Additionally, there would not need to be any persistent TCP connections, which is probably more efficient in server 
resources (but less efficient in network resources; for billions of messages the TCP overhead 
becomes significant).
<p>
Of course, callbacks are totally infeasible for a variety of other uses, especially for 
mobile or desktop applications (which are likely to be firewalled).
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/344892536/beyond-rest.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/07/beyond-rest.html</guid>
         <category>ideas</category>
         <pubDate>Thu, 24 Jul 2008 10:41:59 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/07/beyond-rest.html</feedburner:origLink></item>
            <item>
         <title>rhizome</title>
         <description><![CDATA[I'm speaking briefly at the <a href="https://rhizome.org/benefit/2008/tickets.php">Rhizome 2008 Benefit</a> in New York City later this week. There are still tickets available, so you have no excuses for not attending.<p>
<a href="http://www.flickr.com/photos/joshu/2486840917/" title="rhizome by joshua, on Flickr"><img src="http://farm4.static.flickr.com/3219/2486840917_45c229139a.jpg" border="0" width="500" height="496" alt="rhizome" /></a>

]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/288942280/rhizome.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/05/rhizome.html</guid>
         <category />
         <pubDate>Mon, 12 May 2008 12:14:15 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/05/rhizome.html</feedburner:origLink></item>
            <item>
         <title>tag mockery</title>
         <description><![CDATA[A sure sign that I've hit the big time:<p>
<a href="http://www.flickr.com/photos/joshu/2458146876/" title="dilbert on folksonomy by joshua, on Flickr"><img src="http://farm3.static.flickr.com/2093/2458146876_4bde8f41b3_o.jpg" width="209" height="189" alt="dilbert on folksonomy" /></a> 
<a href="http://www.flickr.com/photos/joshu/2455328085/" title="GTA IV by joshua, on Flickr"><img src="http://farm3.static.flickr.com/2411/2455328085_4d28825e25_m.jpg" width="240" height="160" alt="GTA IV" /></a>]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/281727625/tag-mockery.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/05/tag-mockery.html</guid>
         <category>complaining</category>
         <pubDate>Thu, 01 May 2008 14:49:22 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/05/tag-mockery.html</feedburner:origLink></item>
            <item>
         <title>stupid internet joke day</title>
         <description><![CDATA[While I might occasionally notice interesting inconsistencies
in the structure of the world and phrase them into semi-witty banter, I know in my
bones that I am not a funny person.
<p />
So when I realized that tomorrow's imminent arrival of
<a href="http://en.wikipedia.org/wiki/April_Fools%27_Day#By_websites">Stupid Internet Joke Day</a>
(was: April Fool's Day) would require the avoidance of all unnecessary internet
contact, it also occurred to me that I may as well point out some common
Funny Anti-Patterns. But that's been done to death - thank you,
<a href=http://www.dashes.com/anil/2006/03/your-april-fool.html>internet hipsters</a>.
<p />
I merely wish to remind you all that elaborate hoaxes (press releases involving
small company acquiring a large one, switching stylesheets with someone, etc) are
immediately and transparently stupid. Instead, try to actually do something
surprising. The ha you save might just be your own.
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/261519870/stupid-internet-joke-day.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/03/stupid-internet-joke-day.html</guid>
         <category>complaining</category>
         <pubDate>Mon, 31 Mar 2008 12:57:34 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/03/stupid-internet-joke-day.html</feedburner:origLink></item>
            <item>
         <title>put a proxy in front</title>
         <description><![CDATA[When scaling from a single web server to multiple web servers, the typical
practice is to put a load-balancing reverse HTTP proxy in front. This is a
web server that forwards incoming HTTP requests to other internal web
servers and thus distributes the load across all the different HTTP servers, allows
for failover, and all sorts of good things. <p />

However, a simple trick I learned early on is that even if you have
only a single web server, a proxy in front can help out performance
significantly. Through the simple expedient of buffering the communication
with slow web clients, your potentially heavyweight (especially when mod_perl meant that each process was dozens
 or even a hundred megabytes apiece)
and/or expensive Apache processes don't have to waste time
serving every request for the entire length of time the client is
connected. This allows you to run vastly fewer Apache processes. <p />

In the past, I've used <a
href="http://www.apsis.ch/pound/index_html">pound</a> and <a
href="http://www.danga.com/perlbal/">perlbal</a>. Pound is fast
and lightweight, and allows routing based on the HTTP query;
for example, everything under /img/ got routed to a high-speed
<a href="http://www.acme.com/software/thttpd/">thttpd</a>
instead of the Apache itself. Perlbal is much more configurable
but slightly harder to get running, and the documentation was sparse.<p />

These days, I'd also
investigate <a href="http://nginx.net/">nginx</a> and <a
href="http://varnish.projects.linpro.no/">varnish</a>. <a href="http://siag.nu/pen/">Pen</a>, a generalized TCP load-balancer with server affinity (connections will go to servers they've gone to recently in the past) is also quite interesting but will not help with the slow client problem. Finally, a second
set of apache processes, configured to reverse-proxy via <a href="http://httpd.apache.org/docs/2.0/mod/mod_proxy.html">mod_proxy</a>,
will also do the trick. A
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/221165432/proxy.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2008/01/proxy.html</guid>
         <category>lessons learned</category>
         <pubDate>Tue, 22 Jan 2008 00:16:32 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2008/01/proxy.html</feedburner:origLink></item>
            <item>
         <title>unsubscribe</title>
         <description><![CDATA[Chris Anderson is <a href="http://www.longtail.com/the_long_tail/2007/10/sorry-pr-people.html">fed up</a> with PR folks spamming him.<p>

Me too.<p>
<a href="http://www.flickr.com/photos/joshu/1807520386/" title="Photo Sharing"><img src="http://farm3.static.flickr.com/2309/1807520386_833131789e.jpg" width="500" height="59" alt="unsubscribe" /></a><p>
These were sent to the email address listed on my blog; I use tear-off addresses for subscribing myself to things.]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/177543339/unsubscribe.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/10/unsubscribe.html</guid>
         <category>complaining</category>
         <pubDate>Tue, 30 Oct 2007 20:54:13 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/10/unsubscribe.html</feedburner:origLink></item>
            <item>
         <title>social spring cleaning</title>
         <description><![CDATA[<p>I find myself lately re-entering everyone I know into the system
every year or two; I remember Six Degrees, Friendster, Linkedin, (I
skipped MySpace -- I'm too old,) Facebook, Dopplr, Flickr, and so on.
Brad Fitzpatrick seems to agree that this is an annoying waste of time,
and says so <a href="http://bradfitz.com/social-graph-problem/">
in his thoughts on the social graph</a>.</p>

<p>
Most social systems never forget anyone. Given that recent behavior appears to send friend
requests to anyone you've ever met even briefly, I find my contacts list ends up
filled with people I don't really know. In many systems, removing someone
from your list is either buried or simply impossible. Further, since these systems make implicit relationship information
explicit, deleting someone becomes a loud signal. In real life you would
merely back off a bit, but the systems only allow you to express a binary
sort of relationship.</p>

<p>Therefore, switching networks becomes a way to regularly cleanse your contact
list. There is evidence that younger internet users regularly start new
instant messaging IDs; this likely serves a similar purpose.</p>

<p>So perhaps frequent switching is less a function of fashion but instead a coping mechanism to deal with the mismatch between reality and software.</p>
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/149490358/social-spring-cleaning.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/08/social-spring-cleaning.html</guid>
         <category>lessons learned</category>
         <pubDate>Tue, 28 Aug 2007 01:16:04 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/08/social-spring-cleaning.html</feedburner:origLink></item>
            <item>
         <title>ouch</title>
         <description><![CDATA[<div  style="text-align: center;" ><a href="http://www.flickr.com/photos/joshu/991045918/" title="Photo Sharing"><img src="http://farm2.static.flickr.com/1034/991045918_94f135c395.jpg" width="445" height="500" border=0 alt="ouch" /></a></div>]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/140121062/ouch.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/08/ouch.html</guid>
         <category>complaining</category>
         <pubDate>Thu, 02 Aug 2007 15:33:05 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/08/ouch.html</feedburner:origLink></item>
            <item>
         <title>elevator camera obscura</title>
         <description><![CDATA[The elevator in my apartment building opens to the outside, and on clear, sunny days, a brief picture of the world snaps into focus on its brushed metal interior. The narrowing elevator doors focus the image, counteracting the blurring of the vertically brushed metal; in one dimension, a mirror, and in the other, a camera.
<p>
I'd like to thank the tender ministrations of the Northern California Kidney Stone Center and the resulting painkillers for allowing me spend a happy afternoon attempting to capture the effect. I tried using a fancy DSLR, but none of the photos <a href="http://flickr.com/photos/joshu/971426362/">really came out</a>; the short video embedded below works quite well, though.
<p>
<EMBED SRC="http://s3.amazonaws.com/jdata01/elevator1.mov" WIDTH=640 HEIGHT=496 autoplay="false" />
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/139472801/elevator.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/07/elevator.html</guid>
         <category>obsess</category>
         <pubDate>Tue, 31 Jul 2007 22:38:33 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/07/elevator.html</feedburner:origLink></item>
            <item>
         <title>new york essentials</title>
         <description><![CDATA[<p>When folks ask me where they need to go in NYC, I have a short list of places I send them to. I'm not saying these are the best or anything; I just miss them.</p>

<p>At <a href="http://en.wikipedia.org/wiki/Gray's_Papaya">Gray's Papaya</a> on W 72nd and Broadway, the Recession Special, consisting of two hot dogs (just mustard) and a papaya drink, shared with my wife, is a reliable late-afternoon snack.</p>

<p><a href="http://www.pamrealthai.com/">Pam's Real Thai</a> is a tiny hole in the wall on West 49th Street and 9th Avenue; I always get the red curry with chicken. After eating chicken and associated vegetables, I feel so badly about abandoning the remaining sauce that I douse the remaining rice with it or just get a spoon and eat it like a sweet and spicy soup.</p>

<p>The omakase (chef's choice) at <a href="http://gothamgal.blogs.com/gotham_gal/2006/10/sushi_gari_open.html">Sushi of Gari</a> on Columbus at W 78th isn't deeply concerned with being authentic and I don't much mind; the sushi is amazing fish with all sorts of interesting garnishes. I my favorites are the salmon sushi with roasted tomato and the marinated tuna with pine nuts on a tiny crisp flake of fried nori. The Upper East Side location is supposedly better (Gari himself presides) but I've found the Upper West Side spot easier to get into on short notice.</p>

<p>I am told that the shakes at <a href="http://www.shakeshacknyc.com/">Shake Shack</a> in Madison Square Park are amazing, but honestly. I always get the Shack Burgers, which are transcendently good. The only hard decision is whether to get the Single Shack, which has an excellent sauce-to-meat ratio, or the Double Shack, which has more of the tasty, tasty meat. The lunchtime lines are too long to deal with; go early (11:30 am or so) or late (2:30 pm) and the lines won't be too long.</p>

<p><a href="http://www.joeshanghairestaurants.com/">Joe's Shanghai</a>, on Pell Street, has amazing xiao long bao - tiny dumplings filled with a bit of meat and soup. They're a bit challenging to eat and there are a variety of strategies; I prefer to poke a hole in one and let it drain into a spoon, then add a bit of gingered vinegar, and then drink the soup and finally eat the remaining dumpling. There's an location on West 56th Street, which is much more expensive but equally good. Twice a year, a famous Taiwanese chef, whose xiao long bao are even better, makes an appearance at the Sheraton in Flushing, but the scheduling is just too difficult to work out.</p>
]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/135198852/new-york-essentials.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/07/new-york-essentials.html</guid>
         <category>food</category>
         <pubDate>Wed, 18 Jul 2007 22:08:31 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/07/new-york-essentials.html</feedburner:origLink></item>
            <item>
         <title>finally, a mission i am qualified for</title>
         <description><![CDATA[<div style="text-align: center;"><a href="http://www.flickr.com/photos/joshu/455531313/" title="Photo Sharing"><img border="0" src="http://farm1.static.flickr.com/175/455531313_338599aae5.jpg" width="500" height="375" alt="finally a mission i am qualified for" /></a></div>]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/108336104/finally_a_mission_i_am_qualifi.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/04/finally_a_mission_i_am_qualifi.html</guid>
         <category>games</category>
         <pubDate>Wed, 11 Apr 2007 11:12:33 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/04/finally_a_mission_i_am_qualifi.html</feedburner:origLink></item>
            <item>
         <title>fidelity</title>
         <description><![CDATA[<p>While software systems tend to strive towards accuracy and fidelity, I have frequently observed that these exact qualities may hurt social software.</p>

<p>When you walk down the hall and see someone you know, you raise your eyebrows to acknowledge their existence, and expect the same from them. If they don't reciprocate, you can plausibly tell yourself that perhaps they didn't see you, or were otherwise distracted. However, when you send someone an instant message, and they never reply, you can be reasonably sure they got it and are ignoring you. Thankfully, in the email world, we can at least blame the spam filter as to why you never replied. </p>

<p>It occurs to me that not every factoid gleaned from the constellation of behavioral data should be presented. </p>

<p>For example, the emminently social <a href="http://twitter.com/">Twitter</a>, happily informs me that while 34 people count themselves amongst my friends, only 31 of them care to be informed about I'm up to every day -- and then shows me who those folks are. While these lists are on different actual web pages, it's not a herculean task to figure out the actual people involved. Even though it's possible to show all the information, from a social perspective a degraded view would be better.</p>]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/84002067/fidelity.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/01/fidelity.html</guid>
         <category>lessons learned</category>
         <pubDate>Tue, 30 Jan 2007 09:20:46 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/01/fidelity.html</feedburner:origLink></item>
            <item>
         <title>autoincrement considered harmful</title>
         <description><![CDATA[<p>MySQL's auto_increment, and similar features in other databases, are a powerfully useful function but ultimately lead to problems.</p>

<p>The first problem is that you will be tempted to use the internal identifiers in external URLs. I realize that RESTian canon indicates that every single object have its own identifier, and many <a href="http://media.rubyonrails.org/video/rails_take2_with_sound.mov">new and whizzy frameworks</a> generate simple create/lookup/update/delete user interfaces automatically. </p>

<p>URLs that include an identifier will let you down for three reasons. </p>

<p>The first is that given the URL for some object, you can figure out the URLs for objects that were created around it. This exposes the number of objects in your database to possible competitors or other people you might not want having this information (as <a href=http://www.guardian.co.uk/g2/story/0,,1824525,00.html>famously demonstrated</a> by the Allies guessing German tank production levels by looking at the serial numbers.) </p>

<p>Secondly, at some point some jerk will get the idea to write a shell script with a for-loop and try to fetch every single object from your system; this is definitely no fun.</p>

<p>Finally, in the case of users, it allows people to derive some sort of social hierarchy. Witness the frequent hijacking and/or hacking of high-prestige low-digit ICQ ids.</p>

<p>The second problem, in the case of MySQL, setting a column as <a href="http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html">auto_increment</a> requires that there be a primary key placed on this column. It's not well explained in the documentation, but under InnoDB, the primary key is similar to a unique key, except that the rows in the database are stored in the sort order of the primary key -- this is why there may only be one such key. (Other database systems refer to this as a "clustered index"). This means if you are using it merely as a join identifier, but freqently do large queries based on some other column, the rows have to be fetched from all across the disk since they are not all together. As an example, in early implementations of <a href="http://del.icio.us">del.icio.us</a>, fetching all of the bookmarks for a given URL could cause tens of thousands of disk seeks even if there was an index on that column. As a datastore grows, the location of things on disk in relation to each other becomes an important consideration for scaling.</p>]]></description>
         <link>http://feeds.feedburner.com/~r/JoshuaSchachter/~3/75789075/autoincrement.html</link>
         <guid isPermaLink="false">http://joshua.schachter.org/2007/01/autoincrement.html</guid>
         <category>lessons learned</category>
         <pubDate>Mon, 15 Jan 2007 18:19:21 -0800</pubDate>
      <feedburner:origLink>http://joshua.schachter.org/2007/01/autoincrement.html</feedburner:origLink></item>
      
   </channel>
</rss>
