It’s not a new “feature,” but speed and reliability are a key component of the appeal of a site. A few weeks ago we reported on a new server configuration that cut page-generation times in half (see LibraryThing is Faster). Now we’re reporting on some database tweaks that have made the process of finding, adding and editing data faster.
Like all large database-driven sites, LibraryThing can’t rely on a single database. Instead, we have a single “master” database which replicates its changes to a number of “slave” databases. (See Wikipedia: Database Replication.) Because sites “read” a lot more than they write, scalability is achieved doing most “reads” from the slave machines, which can be multiplied almost indefinitely to deal with increased traffic. Unfortunately, writes still need to move from the master to the slave, which necessarily involves a slight lag. If the lag becomes too great you get stale data or processes that pause (and pile up!) waiting for fresh information to pass from the master to the slaves. You also get bugs. And annoyances, like Talk posts not appearing right away. Replication lag also degrades query speed and therefore site speed generally.
As a heavy database-driven site running on relatively cheap hardware we’ve sometimes struggled to keep replication delay down. The problem is particularly acute on our weaker slaves. Fortunately, our ongoing review of performance issues has disclosed series of code and “schema” changes that have substantially improved replication speed. Here’s a chart of the average replication delay on one of our database servers over the last ten days. As you can see, two changes have made a big difference!
We’re excited about the progress we’ve made so far. It exceeded our expectations. Our performance review is continuing. We won’t stop until LibraryThing is as fast and reliable as it is powerful, rich in data and fun to use!
Come talk about this.