joshua schachter's blog

definitely in silicon valleyfinally, a mission i am qualified for

autoincrement considered harmful

MySQL's auto_increment, and similar features in other databases, are a powerfully useful function but ultimately lead to problems.

The first problem is that you will be tempted to use the internal identifiers in external URLs. I realize that RESTian canon indicates that every single object have its own identifier, and many new and whizzy frameworks generate simple create/lookup/update/delete user interfaces automatically.

URLs that include an identifier will let you down for three reasons.

The first is that given the URL for some object, you can figure out the URLs for objects that were created around it. This exposes the number of objects in your database to possible competitors or other people you might not want having this information (as famously demonstrated by the Allies guessing German tank production levels by looking at the serial numbers.)

Secondly, at some point some jerk will get the idea to write a shell script with a for-loop and try to fetch every single object from your system; this is definitely no fun.

Finally, in the case of users, it allows people to derive some sort of social hierarchy. Witness the frequent hijacking and/or hacking of high-prestige low-digit ICQ ids.

The second problem, in the case of MySQL, setting a column as auto_increment requires that there be a primary key placed on this column. It's not well explained in the documentation, but under InnoDB, the primary key is similar to a unique key, except that the rows in the database are stored in the sort order of the primary key -- this is why there may only be one such key. (Other database systems refer to this as a "clustered index"). This means if you are using it merely as a join identifier, but freqently do large queries based on some other column, the rows have to be fetched from all across the disk since they are not all together. As an example, in early implementations of del.icio.us, fetching all of the bookmarks for a given URL could cause tens of thousands of disk seeks even if there was an index on that column. As a datastore grows, the location of things on disk in relation to each other becomes an important consideration for scaling.