We’re seeing some issues around our load balancer and a few hung machines. Service will be slow until these come back up to full speed.
Update (11:35a): We’re still working through this problem.
We’re seeing some issues around our load balancer and a few hung machines. Service will be slow until these come back up to full speed.
Update (11:35a): We’re still working through this problem.
The site has been impacted by a database problem (unrelated to the @ replies work we performed today). We’re working right now on recovery.
Update (10:14p): And we’re back. Total downtime was less than 15 minutes.
The team worked today to bring the @ replies tab back to the web site. We’re watching the impact very closely and plan to turn the API methods back on Sunday.
We’re still working on bringing the Replies tab and feed back. Unfortunately it will remain off for tonight as we continue to work on the stability of this feature.
We’re currently experiencing massive slowness and downtime across the web site. Our data center has been notified and is looking into it.
Update (9:53a): Our load balancer is overloaded. We’re working to bring web machines back up to service requests now. You’ll be seeing whales for a bit.
Update (10:20a): Web services are coming back online and we’re starting to serve timelines again.
We’re still working on the restoration of viewing replies (both via the API and on the web). This functionality will remain off tonight as we continue to work through the underlying problem.
In the meantime, we recommend checking out Summize. You can run queries like “to:username” or “@username” to see replies directed at you.
This morning we’re seeing slow page loads and many a whale. We’re deploying several fixes soon which we believe will alleviate the more onerous queries and will allow us to bring back the replies tab.
Update: While we work on this problem, replies will be disabled via the API as well as on the web.
The replies tab remains disabled today as we rework some of the queries that were causing problems yesterday. This has also been reflected in the sidebar of the status blog for the web features.
One way you can see replies directed to you is to search on Summize. You can search for “to:username” to see all updates directed at you.
The site has been slower this morning than in the past few days. Some users are also seeing error pages.
Update (6:50p): We’re still working through this issue.
Update (7:30p): We’ve identified some bad queries relating to favorites and replies. We’re disabling those pages until we put a fix in place.
As you can see from our public Pingdom report, we had 3 minutes of downtime about an hour ago. Despite this, our situation is largely improved today. We’ve had our lowest page load times since last Sunday and our uptime for the week has been 99.9%.
Our latency is still not as low as it should be (> 1sec is not good) and folks were running into error pages a lot yesterday. There’s still more to do, including the restoration of the services noted in red and orange in the sidebar.
As reflected in the sidebar of this blog, we’re seeing a lot of over capacity errors and long load times. We’re working on this problem.
Update: We’re still investigating this issue. We are temporarily reducing the API rate limit to 20 requests per hour in order to help address the latency issues we’re seeing.
We moved status information about key systems to the sidebar of the blog. You can hover over the individual components to get the latest information from those items alt text. (We’re going to be making additional refinements to the display of this information).
Yesterday we deployed a fix for the Facebook application problem. Should be working better now.
We’re seeing some increased site slowness this evening. This can also manifest itself as error pages. We’re working to resolve this.
Status of systems is listed below. While we still have a number of systems that need attention, our public uptime report shows that we’ve had 99.6% uptime since last Saturday (including the Steve Jobs keynote on Monday). We’ve got a lot more work to do but we feel we’re making progress
We’re seeing a number of whales pop up around the site, especially on profile pages. We’re aware of the issue and working on it now.
Update: site back up and mostly whale free.
Still working on some improvements to deal with spiky load in the morning hours. Here’s a summary of our systems:
We’ve seen increased load the past couple mornings, but the site has been largely stable. We do have several services that we are still working to restore, so I wanted to update you on the status of each:
There is an issue at our data-center that is affecting the service. We’re working to resolve it as quickly as possible.
Update: this has been resolved.
We’re still capping the API at 10 requests per hour during the Stevenote, but we’ve just brought the replies tab back. Other updates here as we restore functionality. We expect the site to run a bit slow for a while, but we think we can bring back some of these features without significantly impacting performance.
Update (10:22a): sidebar pictures have been restored
10:28a: public timeline has been restored
10:32a: archive tab has been restored
11:10a: temporarily disabling track via SMS
12:14p: follower, following, /home stats have been restored
12:20p: API request limit has been restored to 20 per hour
12:26p: user deletion and restoration has been restored
We’ll be turning these back on later this afternoon.
Update: The replies, everyone and archive tabs have also been temporarily disabled and will be restored later this afternoon.
The API request limit has been temporarily dropped to 10 per hour. Please configure your clients not to pull more frequently than once every 6 minutes.
User account restoration and deletion is currently disabled. We expect these services to be brought back tomorrow afternoon.
In addition, IM is still offline as we work to bring it back in a more stable form. The API rate limit remains at 30 requests per hour.
Some users are still reporting some timeline inconsistency issues and we’re working to track those down.
We tested an evolution of our process for recovering from a crashed database today.
There was no visible impact to users or via any of our monitoring mechanisms, and we were very happy to beat our expected times for this operation.
For the next hour or so we will be testing our DB failover practice to ensure it works as planned.
The site may be unreachable for a few minutes during this testing although we don’t expect any appreciable effect to end users.
Track via SMS has been restored. As a reminder, you can check to see what terms you are tracking by texting track to Twitter. To track a new term, send, for example, track iphone. You can read more about track in Help.
Some users are seeing inconsistencies in either their timelines or the timelines of their friends. From time to time some, older updates may appear missing. These updates have not been lost and we are working on the underlying problem.
Since a deploy this afternoon, site latency has been looking a lot better. We’re still vulnerable to particular database problems and there are additional enhancements we want to make. We’ll be working on these issues tonight and tomorrow.
IM continues to be down but we’ve also made progress on that front; we’ve tested a lot more of the technology we need in order to restore this service.
We’re experiencing a database problem similar to the one that affected the site last night. Working on the recovery now.
We just lost a database about 5 minutes ago and this has severely impacted the site.
We’re working on recovering from this now.
Recently we made a change to remove the With Friends tab from user profiles. We did this after finding out that this tab was both a relatively rarely accessed as well as computationally expensive page for us to serve.
At the same time we removed access to the feeds for this page. It’s still possible for users to receive their own With Friends timelines, but authentication is required. What we did not anticipate was that some users were subscribing to these feeds in readers. Most readers do not support authentication, so the feeds that these folks were subscribing have since broken.
It’s our hope to bring back the access to these feeds at some point. But for stability reasons, we’re unable to restore them at this time. We should have done a better job explaining this up front and anticipating this problem. Apologies for this; it’s our highest priority to provide a reliable, stable service for everyone.
In the last 36hrs, we’ve had 99.4% uptime (according to Pingdom), which is not where we want to be, but is a heck of a lot better than the couple days before. At the same time, average page response time has been substantially reduced. And the number of updates and other key metrics are significantly higher. (We had more visits to the site than ever in our history yesterday.)
So some of the short-term measures we’ve put in place seem to be working.
However, we still have the same self-imposed limits on API requests, and IM is still off. This is still our top priority to restore.
Also, Track is not working currently via SMS. This is a bug not a limit. We’re looking into it.
With the exception of some spiky load this morning, today has been largely stable. We believe the changes we made over the weekend have had a postive effect on the overall stability of the service.
Next up: the restoration of IM - this is still our top priority to fix.
Services may be slow or inconsistent for the next short while things come back to life. You may intermittently see “Something is technically wrong” pages.
We’re going to make a change to the databases which will take the service down for a bit. We’ll keep you updated here. Thanks for your patience!
We’re gathering today to make some minor changes on our slave databases (shortening a few tables) to help speed queries. You may notice some extra slowness while we bring the modifications into production. We’ll get speedy again after the databases warm to the new queries.
This is an effort to reduce overall web and API load time and clear the area so we can bring IM services online (we’re shooting for Tuesday).