• Hack Week

    Friday, October 22, 2010


    Here at Twitter, we make things. Over the last five weeks, we’ve launched the new Twitter and made significant changes to the technology behind Twitter.com, deployed a new backend for search, and refined the algorithm for trending topics to make them more real-time.

    To keep with the spirit of driving innovation in engineering, we’ll be holding our first Hack Week starting today (Oct 22) and running through next Friday (Oct 29). In this light, we’ll all be building things that are separate from our normal work and not part of our day-to-day jobs. Of course, we’ll keep an eye out for whales.

    There aren’t many rules – basically we’ll work in small teams and share our projects with the company at the end of the week. What will happen with each project will be determined once it’s complete. Some may ship immediately, others may be added to the roadmap and built out in the future, and the remainder may serve as creative inspiration.

    If you have an idea for one of our teams, send a tweet to @hackweek. We’re always looking for feedback.
  • Twitter's New Search Architecture

    Wednesday, October 6, 2010

    If we have done a good job then most of you shouldn’t have noticed that we launched a new backend for search on twitter.com during the last few weeks! One of our main goals, but also biggest challenges, was a smooth switch from the old architecture to the new one, without any downtime or inconsistencies in search results. Read on to find out what we changed and why.

    Twitter’s real-time search engine was, until very recently, based on the technology that Summize originally developed. This is quite amazing, considering the explosive growth that Twitter has experienced since the Summize acquisition. However, scaling the old MySQL-based system had become increasingly challenging.

    The new technology

    About 6 months ago, we decided to develop a new, modern search architecture that is based on a highly efficient inverted index instead of a relational database. Since we love Open Source here at Twitter we chose Lucene, a search engine library written in Java, as a starting point.

    Our demands on the new system are immense: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines. As we want the new system to last for several years, the goal was to support at least an order of magnitude more load.

    Twitter is real-time, so our search engine must be too. In addition to these scalability requirements, we also need to support extremely low indexing latencies (the time it takes between when a Tweet is tweeted and when it becomes searchable) of less than 10 seconds. Since the indexer is only one part of the pipeline a Tweet has to make it through, we needed the indexer itself to have a sub-second latency. Yes, we do like challenges here at Twitter! (btw, if you do too: @JoinTheFlock!)

    Modified Lucene

    Lucene is great, but in its current form it has several shortcomings for real-time search. That’s why we rewrote big parts of the core in-memory data structures, especially the posting lists, while still supporting Lucene’s standard APIs. This allows us to use Lucene’s search layer almost unmodified. Some of the highlights of our changes include:

    • significantly improved garbage collection performance
    • lock-free data structures and algorithms
    • posting lists, that are traversable in reverse order
    • efficient early query termination

    We believe that the architecture behind these changes involves several interesting topics that pertain to software engineering in general (not only search). We hope to continue to share more on these improvements.

    And, before you ask, we’re planning on contributing all these changes back to Lucene; some of which have already made it into Lucene’s trunk and its new realtime branch.

    Benefits

    Now that the system is up and running, we are very excited about the results. We estimate that we’re only using about 5% of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get! And the new system runs extremely smoothly, without any major problems or instabilities (knock on wood).

    But you might wonder: Fine, it’s faster, and you guys can scale it longer, but will there be any benefits for the users? The answer is definitely yes! The first difference you might notice is the bigger index, which is now twice as long -- without making searches any slower. And, maybe most importantly, the new system is extremely versatile and extensible, which will allow us to build cool new features faster and better. Stay tuned!

    The engineers who implemented the search engine are: Michael Busch, Krishna Gade, Mike Hayes, Abhi Khune, Brian Larson, Patrick Lok, Samuel Luckenbill, Jake Mannix, Jonathan Reichhold.

  • Tool Legit

    Thursday, September 30, 2010

    Hi, I'm @stirman, and I'm a tool.

    Well, I build tools, along with @jacobthornton, @gbuyitjames and @sm, the Internal Tools team here at Twitter.

    To build or not to build internal tools is usually a debated topic, especially amongst startups. Investing in internal projects has to be weighed against investing in external-facing features for your product, although at some point the former investment shows greater external returns than the latter. Twitter has made it a priority to invest in internal tools since the early days, and with the growth of the product and the company, our tools have become a necessity.

    I often hear from friends in the industry about internal tools being a night and weekend additional project for engineers that are already backlogged with "real" work. We have decided to make building tools our "real" work. This decision means we have time to build solid applications, spend the necessary time to make them look great and ensure that they work well.

    Our team's mission is to increase productivity and transparency throughout the company. We increase productivity by streamlining processes and automating tasks. We increase transparency by building tools and frameworks that allow employees to discover, and be notified of, relevant information in real time. Many companies use the term "transparency" when discussing their company culture, but very few put the right pieces in place to ensure that a transparent environment can be established without exposing too much information. Twitter invests heavily in my team so that we can build the infrastructure to ensure a healthy balance.

    We have built tools that track and manage milestones for individual teams, manage code change requests, provide an easy A/B testing framework for twitter.com, create internal short links, get approval for offer letters for new candidates, automate git repository creation, help conduct fun performance reviews and many more. We release a new tool about once every other week. We release a first version as early as possible, and then iterate quickly after observing usage and gathering feedback.

    Also, with the help of @mdo, we have put together a internal blueprint site that not only contains a style guide for new apps, but also hosts shared stylesheets, javascript libraries and code samples, like our internal user authentication system, to make spinning up a new tool as simple as possible.

    We put a lot of effort into ensuring our tools are easy to use and making them look great. We have fun with it. Here's a screenshot of a recent app that tracks who's on call for various response roles at any given time.


    We also have fun learning new technologies. Here's a screenshot of a real-time Space Invaders Twitter sentiment analysis visualization that is part of a status board displayed on flat screens around the office. @jacobthornton wanted to learn more about node.js for some upcoming projects and he built "Space Tweets" to do just that! If you're interested in the code, get it on github.



    While we're talking about open source, we would like to mention how much our team values frameworks like Ruby on Rails, MooTools and their respective communities, all of which are very important to our internal development efforts and in which you'll find us actively participating by submitting patches, debating issues, etc. We are proactively working towards open sourcing some of our own tools in the near future, so keep an eye on this blog.

    Does this stuff interest you? Are you a tool? Hello? Is this thing on? Is anyone listening? (If you are still here, you passed the test! Apply here to join our team or hit me up at @stirman!)

  • The Tech Behind the New Twitter.com

    Monday, September 20, 2010

    The Twitter.com redesign presented an opportunity to make bold changes to the underlying technology of the website. With this in mind, we began implementing a new architecture almost entirely in JavaScript. We put special emphasis on ease of development, extensibility, and performance. Building the application on the client forced us to come up with unique solutions to bring our product to life, a few of which we’d like to highlight in this overview.

    API Client

    One of the most important architectural changes is that Twitter.com is now a client of our own API. It fetches data from the same endpoints that the mobile site, our apps for iPhone, iPad, Android, and every third-party application use. This shift allowed us to allocate more resources to the API team, generating over 40 patches. In the initial page load and every call from the client, all data is now fetched from a highly optimized JSON fragment cache.

    The Javascript API

    We built a JavaScript library to access Twitter's REST API for @anywhere which provided a good starting point for development on this project. The JavaScript API provides API fetching and smart client-side caching, both in-memory and using localStorage, allowing us to minimize the number of network requests made while using Twitter.com. For instance, timeline fetches include associated user data for each Tweet. The resulting user objects are proactively cached, so viewing a profile does not require unnecessary fetches of user data.

    Another feature of the JavaScript API is that it provides event notifications before and after each API call. This allows components to register interest and respond immediately with appropriate changes to the UI, while letting independent components remain decoupled, even when relying on access to the same data.

    Page Management

    One of the goals with this project was to make page navigation easier and faster. Building on the web’s traditional analogy of interlinked documents, our application uses a page routing system that maintains a strong relationship between a URL and its content. This allows us to provide a rich web application that behaves like a traditional web site. Doing so demanded that we develop a rich routing model on the client. To do so we developed a routing system to switch between stateful pages, driven by the URL hash. As the user navigates, the application caches the visited pages in memory. Although the information on those pages can quickly become stale, we’ve alleviated much of this complexity by making pages subscribe to events from the JavaScript API and keep themselves in sync with the overall application state.

    The Rendering Stack

    In order to support crawlers and users without JavaScript, we needed a rendering system that runs on both server and client. To meet this need, we've built our rendering stack around Mustache, and developed a view object system that generates HTML fragments from API objects. We’ve also extended Mustache to support internationalized string substitution.

    Much attention was given to optimizing performance in the DOM. For example, we’ve implemented event delegation across the board, which has enabled a low memory profile without worrying about event attachment. Most of our UI is made out of reusable components, so we've centralized event handling to a few key root nodes. We also minimize repaints by building full HTML structures before they are inserted and attach relevant data in the HTML rendering step, rather than through DOM manipulation.

    Inline Media

    One important product feature was embedding third-party content directly on the website whenever tweet links to one of our content partners. For many of these partners, such as Kiva and Vimeo, we rely on the oEmbed standard, making a simple JSON-P request to the content provider's domain and embeds content found in the response. For other media partners, like TwitPic and YouTube, we rely on known embed resources that can be predicted from the URL, which reduces network requests and results in a speedier experience.

    Open Source

    Twitter has always embraced open-source technology, and the new web client continues in this tradition. We used jQuery, Mustache, LABjs, Modernizr, and numerous other open-source scripts and jQuery plugins. We owe a debt of gratitude to the authors of these libraries and many others in the JavaScript community for their awesome efforts in writing open-source JavaScript. We hope that, through continuing innovations in front-end development here at Twitter, we'll be able to give back to the open-source community with some of our own technology.

    Conclusions

    With #NewTwitter, we’ve officially adopted JavaScript as a core technology in our organization. This project prompted our first internal JavaScript summit, which represents an ongoing effort to exchange knowledge, refine our craft and discover new ways of developing for the web. We're very excited about the doors this architectural shift will open for us as we continue to invest more deeply in rich client experiences. If you're passionate about JavaScript, application architecture, and Twitter, now is a very exciting time to @JoinTheFlock!

    This application was engineered in four months by seven core engineers: Marcus Phillips, Britt Selvitelle, Patrick Ewing, Ben Cherry, Dustin Diaz, Russ d’Sa, and Sarah Brown, with numerous contributions from around the company.

  • My Awesome Summer Internship at Twitter

    Friday, August 27, 2010

    On my second day at Twitter, I was writing documentation for the systems I was going to work on (to understand them better), and I realized that there was a method in the service’s API that should be exposed but wasn’t. I pointed this out to my engineering mentor, Steve Jenson (@stevej). I expected him to ignore me, or promise to fix it later. Instead, he said, “Oh, you’re right. What are you waiting for? Go ahead and fix it.” After 4 hours, about 8 lines of code, and a code review with Steve, I committed my first code at Twitter.

    My name is Siddarth Chandrasekaran (@sidd). I’m a rising junior at Harvard studying Computer Science and Philosophy, and I just spent the last 10 weeks at Twitter as an intern with the Infrastructure team.

    When I started, I had very little real-world experience -- I’d never coded professionally before – so, I was really excited and really nervous. I spent the first couple of weeks understanding the existing code base (and being very excited that I sat literally three cubicles away from Jason Goldman! (@goldman)). I remember my first “teatime” (Twitter’s weekly Friday afternoon company all-hands), when Evan Williams (@ev) broke into song in the middle of his presentation, dramatically launching the karaoke session that followed teatime.

    Over the next few weeks, I worked on a threshold monitoring system: a Scala service that facilitates defining basic “rules” (thresholds for various metrics), and monitors these values using a timeseries database. The goal was to allow engineers to be able to easily define and monitor their own thresholds. I was extremely lucky to have the opportunity to build such a critical piece of infrastructure, with abundant guidance from Ian Ownbey (@iano). Writing an entire service from scratch was scary, but as a result, I also learned a lot more than I expected. It was perfect: I was working independently, but could turn to my co-workers for help anytime.

    There are several things that I’ve loved during my time at Twitter:

    On my third day at work, I got to see the President of Russia.



    A few weeks later, Kanye West (@kanyewest) “dropped by” for lunch.













    I was in the amazing “Class of Twitter HQ” recruiting video.



    Every day at Twitter has given me something to be very excited about: the snack bars, the delicious lunches, teatime, random rockband sessions, the opportunity to work on some really cool stuff with very smart people, and most importantly, being part of a company that is caring and honest. My co-workers have artful, creative, daring, and ingenious approaches to the hard engineering problems that Twitter faces, and the company supports them by providing a culture of trust and openness. As an intern, it has been an overwhelmingly positive experience to be part of such a culture. Needless to say, I will miss Twitter very dearly, and I’m very thankful for this opportunity.

    What are you waiting for? Join The Flock! (@jointheflock)

    --Siddarth (@sidd)
  • Twitter & Performance: An update

    Wednesday, July 21, 2010

    On Monday, a fault in the database that stores Twitter user records caused problems on both Twitter.com and our API. The short, non-technical explanation is that a mistake led to some problems that we were able to fix without losing any data.

    While we were able to resolve these issues by Tuesday morning, we want to talk about what happened and use this an opportunity to discuss the recent progress we’ve made in improving Twitter’s performance and availability. We recently covered these topics in a pair of June posts here and on our company blog).

    Riding a rocket
    Making sure Twitter is a stable platform and a reliable service is our number one priority. The bulk of our engineering efforts are currently focused on this effort, and we have moved resources from other important projects to focus on the issue.

    As we said last month, keeping pace with record growth in Twitter’s user base and activity presents some unique and complex engineering challenges. We frequently compare the tasks of scaling, maintaining, and tweaking Twitter to building a rocket in mid-flight.

    During the World Cup, Twitter set records for usage. While the event was happening, our operations and infrastructure engineers worked to improve the performance and stability of the service. We have made more than 50 optimizations and improvements to the platform, including:
    • Doubling the capacity of our internal network;
    • Improving the monitoring of our internal network;
    • Rebalancing the traffic on our internal network to redistribute the load;
    • Doubling the throughput to the database that stores tweets;
    • Making a number of improvements to the way we use memcache, improving the speed of Twitter while reducing internal network traffic; and,
    • Improving page caching of the front and profile pages, reducing page load time by 80 percent for some of our most popular pages.
    So what happened Monday?
    While we’re continuously improving the performance, stability and scalability of our infrastructure and core services, there are still times when we run into problems unrelated to Twitter’s capacity. That’s what happened this week.

    On Monday, our users database, where we store millions of user records, got hung up running a long-running query; as a result, most of the table became locked. The locked users table manifested itself in many ways: users were unable to sign-up, sign in, update their profile or background images, and responses from the API were malformed, rendering the response unusable to many of the API clients. In the end, this affected most of the Twitter ecosystem: our mobile, desktop, and web-based clients, the Twitter support and help system, and Twitter.com.

    To remedy the locked table, we force-restarted the database server in recovery mode, a process that took more than 12 hours (the database covers records for more than 125 million users -- that’s a lot of records). During the recovery, the users table and related tables remained unavailable. Unfortunately, even after the recovery process completed, the table remained in an unusable state. Finally, yesterday morning we replaced the partially-locked user db with a copy that was fully available (in the parlance of database admins everywhere, we promoted a slave to master), fixing the database and all of the related issues.

    We have taken steps to ensure we can more quickly detect and respond to similar issues in the future. For example, we are prepared to more quickly promote a slave db to a master db, and we put additional monitoring in place to catch errant queries like the one that caused Monday’s incidents.

    Long-term solutions
    As we said last month, we are working on long-term solutions to make Twitter more reliable (news that we are moving into our own data center this fall, which we announced this afternoon, is just one example). This will take time, and while there has been short-term pain, our capacity has improved over the past month.

    Finally, despite the rapid growth of our company, we’re still a relatively small crew maintaining a comparatively large (rocket) ship. We’re actively looking for engineering talent, with more than 20 openings currently. If you’re interested in learning more about the problems we’re solving or “joining the flock,” check out our jobs page.
  • Room to grow: a Twitter data center

    Later this year, Twitter is moving our technical operations infrastructure into a new, custom-built data center in the Salt Lake City area. We're excited about the move for several reasons.

    First, Twitter's user base has continued to grow steadily in 2010, with over 300,000 people a day signing up for new accounts on an average day. Keeping pace with these users and their Twitter activity presents some unique and complex engineering challenges (as John Adams, our lead engineer for application services, noted in a speech last month at the O'Reilly Velocity conference). Having dedicated data centers will give us more capacity to accommodate this growth in users and activity on Twitter.

    Second, Twitter will have full control over network and systems configuration, with a much larger footprint in a building designed specifically around our unique power and cooling needs. Twitter will be able to define and manage to a finer grained SLA on the service as we are managing and monitoring at all layers. The data center will house a mixed-vendor environment for servers running open source OS and applications.

    Importantly, having our own data center will give us the flexibility to more quickly make adjustments as our infrastructure needs change.

    Finally, Twitter's custom data center is built for high availability and redundancy in our network and systems infrastructure. This first Twitter managed data center is being designed with a multi-homed network solution for greater reliability and capacity. We will continue to work with NTT America to operate our current footprint, and plan to bring additional Twitter managed data centers online over the next 24 months.