April 03, 2009»

URL shortening services have been around for a number of years. Their original purpose was to prevent cumbersome URLs from getting fragmented by broken email clients that felt the need to wrap everything to an 80 column screen. But it's 2009 now, and this problem no longer exists. Instead it's been replaced by the SMS-oriented 140 character constraints of sites like Twitter. (Let's leave aside the fact that any phone that can run a web browser and thus follow links can also run a proper client, and doesn't have to hew to the SMS character limit.) Since TinyURL, there has been a rapid proliferation of shortening services.

Aside from the raw utility of allowing URLs to fit within a Twitter message, newer services add several interesting bits of functionality. The most important of these is that let the linker turn any link into THEIR link, and view metrics on how far it's spread and how many clicks it's gotten. Showing a user how popular his actions are is inevitably addictive. Shorteners are relatively easy and lightweight to set up. Adding a simple interstitial before the redirect provides an obvious way to monetize. And maybe someday all the link data will be worth something.

So there are clear benefits for both the service (low cost of entry, potentially easy profit) and the linker (the quick rush of popularity). But URL shorteners are bad for the rest of us.

The worst problem is that shortening services add another layer of indirection to an already creaky system. A regular hyperlink implicates a browser, its DNS resolver, the publisher's DNS server, and the publisher's website. With a shortening service, you're adding something that acts like a third DNS resolver, except one that is assembled out of unvetted PHP and MySQL, without the benevolent oversight of luminaries like Dan Kaminsky and St. Postel.

There are three other parties in the ecosystem of a link: the publisher (the site the link points to), the transit (places where that shortened link is used, such as Twitter or Typepad), and the clicker (the person who ultimately follows the shortened links). Each is harmed to some extent by URL shortening.

The transit's main problem with these systems is that a link that used to be transparent is now opaque and requires a lookup operation. From my past experience with Delicious, I know that a huge proportion of shortened links are just a disguise for spam, so examining the expanded URL is a necessary step. The transit has to hit every shortened link to get at the underlying link and hope that it doesn't get throttled. It also has to log and store every redirect it ever sees.

The publisher's problems are milder. It's possible that the redirection steps steals search juice — I don't know how search engines handle these kinds of redirects. It certainly makes it harder to track down links to the published site if the publisher ever needs to reach their authors. And the publisher may lose information about the source of its traffic.

But the biggest burden falls on the clicker, the person who follows the links. The extra layer of indirection slows down browsing with additional DNS lookups and server hits. A new and potentially unreliable middleman now sits between the link and its destination. And the long-term archivability of the hyperlink now depends on the health of a third party. The shortener may decide a link is a Terms Of Service violation and delete it. If the shortener accidentally erases a database, forgets to renew its domain, or just disappears, the link will break. If a top-level domain changes its policy on commercial use, the link will break. If the shortener gets hacked, every link becomes a potential phishing attack.

There are usability issues as well. The clicker can't even tell by hovering where a link will take them, which is bad form. Some sites offer link previews, but there's no way to make a preview preference stick globally across the many shortening services. And just like ad networks, link shortening services could track a user's behavior across many domains. That makes the paranoid among us uncomfortable. We hope the shortener never decides to add interstitials or otherwise "monetize" the link with ads, but we have no guarantee.

For these reasons, I feel that shorteners are bad for the ecosystem as a whole. But what can be done to improve the situation?

One important conclusion is that services providing transit (or at least require a shortening service) should at least log all redirects, in case the shortening services disappear. If the data is as important as everyone seems to think, they should own it. And websites that generate very long URLs, such as map sites, could provide their own shortening services. Or, better yet, take steps to keep the URLs from growing monstrous in the first place.

You could guarantee that the shortened link is the one that was originally shortened by using a cryptographic hash. But this causes URLs that aren't as short as is possible.

A variety of greasemonkey scripts resolve shortened URLs and replace them inline.

Finally, shortening services could provide archives of their entire database - but this raises all sorts of privacy concerns that I hesitate to even dig into.

The most likely, of course, is that we don't do anything and that the great linkrot apocalypse causes all of modern culture to dissapear in a puff of smoke. Hopefully.

With thanks to Maciej Ceglowski

November 02, 2008»

The growth of both bandwidth and storage mean that in the last few years practically everyone from individuals to large universities have begun putting lectures and talks online. While I can easily pick out a dozen or a hundred videos that that would be fascinating and educational, I am hamstrung by my short attention span, and I drift off almost immediately. Not to mention the fact that one browser crash or accidental tab closure loses my place and probably the video itself as well.

After tinkering a while, I've managed to figure out a way to cut down the time it takes to watch a video. This works for me, on my Mac; your mileage may vary:

  1. Make sure you have the appropriate codecs installed. I generally use the Perian codec package. I additionally find that some FLVs require QTPro to be installed; it's not very expensive.
  2. Download the video somehow. Some sites, like Google Video, let you download a copy. Others, like YouTube, do not allow this. However, most embedded flash video can be grabbed via the technique in the bottom video in the demo videos at Perian.
  3. Open the video in QuickTime. The video is now happily outside the browser.
  4. Go to Window → Show A/V Controls; change the playback speed in the relevant window. I find that 2.0x generally works pretty well; the video will be faster and the audio is a little clipped but nicely de-chipmunked.
  5. Enjoy your new lecture! The glacial discussion now arrives at a rapid-fire pace. You'll be too busy trying to keep up to play Desktop Tower Defense, and you'll be done in a half hour.

how to watch lectures faster

Continue reading "overclocking the lecture" »

September 12, 2008»

Ever since seeing a presentation by Dolores Labs about Amazon's Mechanical Turk, I've been itching for an excuse to play with the system.

I recently saw a thread that highlights the distinction between expected value and utility. Would you take a more likely but lower payoff instead of a less likely but higher payoff? Similarly, the St. Petersburg Paradox takes the problem to its logical extreme. By constructing a game that has a series of increasingly rare payoffs of increasingly larger size, a game with infinite expected value is created.

So I constructed 21 versions of the questions, varying the size of the dollars as well as the rate of payoff for the second outcome.

Example Question

For one cent apiece, I sent the questions to be answered by one hundred people each, and collated the results. 2100 questions, three hours, and thirty dollars later, I have my results.

Batch_3890_result.csv

Clearly, people (or at least these Turks) begin to cross over at larger values, reaching equilibrium at around $1,000.

While this isn't the most groundbreaking work, it is nice to be able to generate an experiment and gather the results in the course of an evening and then have the results be so pleasing.

The Mechanical Turk is presented as a way to solve problems that are easily explained to people but difficult to implement for computers, frequently described as "artificial artificial intelligence." However, I think some of the most intriguing uses yet will be to explore the edges of our own uniquely human behavior and self-understanding.

July 24, 2008»

Rabble and Kellan's presentation, "Beyond REST? Building data services with XMPP" is both a great idea as well as a good introduction to coping with massive amount of traffic that large systems have to service.

A publish/subscribe architecture is natural to other problem domains such as instant messaging and financial data systems (Tibco, Reuters, and so on).

Similarly, Brad Fitzpatrick implemented something similar as a never-ending Atom feed a few years ago for Livejournal (sans XMPP, which wasn't as conceptually prevalent then.)

One important point in the presentation is that, for example, a single application would poll Flickr approximately three million times in a day to fetch only several thousand updates. At Delicious we saw a similar level of polling activity, made somewhat worse by speculative querying (hitting the URL information pages to see if there was any data for arbitrary URLs, which was generally unlikely.)

One solution that ocurred to me at the time was to build a simple callback system over HTTP. This would fall comfortably between full polling and full persistent publish/subscribe. The clever acronym even writes itself: PIMP Is Mostly Push, although maybe PRSS (Push RSS) would be slightly more polite.

Simply described, instead of polling frequently, a client would send a normal HTTP request with the resource to be subscribed to and an endpoint to deliver updates to: http://your.app/subscribe?resource=/some/user&callback=http://my.app/endpoint

Presumably the endpoint would then receive RSS item fragments when and only when that resource updated. For security, the exchange should include some kind of token, borowing from the appropriate protocols. The subscription would lapse after, say, 24 hours, or that could be passed in as a parameter.

In some ways this is slightly more elegant than the XMPP solution as neither side has to maintain a dedicated long-running process. A simple server-side implementation would justfetch items from a work queue and send out HTTP messages. A simple implementation on the client side would be a plain old web page that could accept and process a POST request. There are a number of people on inexpensive service providers who have at best web scripting hosting and not much else. The case where Delicious/Twitter/Flickr pushes my own items (and not much else) up to my blog is an important one. Additionally, there would not need to be any persistent TCP connections, which is probably more efficient in server resources (but less efficient in network resources; for billions of messages the TCP overhead becomes significant).

Of course, callbacks are totally infeasible for a variety of other uses, especially for mobile or desktop applications (which are likely to be firewalled).

May 12, 2008»

I'm speaking briefly at the Rhizome 2008 Benefit in New York City later this week. There are still tickets available, so you have no excuses for not attending.

rhizome

May 01, 2008»

A sure sign that I've hit the big time:

dilbert on folksonomy GTA IV

March 31, 2008»

While I might occasionally notice interesting inconsistencies in the structure of the world and phrase them into semi-witty banter, I know in my bones that I am not a funny person.

So when I realized that tomorrow's imminent arrival of Stupid Internet Joke Day (was: April Fool's Day) would require the avoidance of all unnecessary internet contact, it also occurred to me that I may as well point out some common Funny Anti-Patterns. But that's been done to death - thank you, internet hipsters.

I merely wish to remind you all that elaborate hoaxes (press releases involving small company acquiring a large one, switching stylesheets with someone, etc) are immediately and transparently stupid. Instead, try to actually do something surprising. The ha you save might just be your own.


by joshua schachter | projects |