Straight Talk on Event Loops

Two days ago, I pointed out how Node.js, an event-driven web framework, will eat it hard if it's given any nontrivial amount of CPU work to do in its request handler. After I published that, it seemed that the point of the article went sailing right past the Node.js camp, who proceeded to see how fast they can make a Fibonacci number generator.

The Fibonacci function was arbitrary. It was inefficient on purpose. I needed a function that would use CPU time, and chose that because it's familiar and easy to implement. So, now I offer a more formal analysis of what CPU usage does to the throughput of an event-driven system as compared to a threaded system.

Since it's now clear that reading comprehension and critical thinking are not strong suits of the Node.js programmer, I would suggest that all Noders reading this article read it aloud, slowly and loudly, like an American tourist trying to find a train station in Tokyo. Furthermore, to assist the Node camp, I will highlight the important parts in large lettering, like this:

When the weather is threatening rain, bring an umbrella with you.

A Math Model of Throughput

Assume we've got a request handler that processes an HTTP request and sends back the result. Let's see how threads and event loops differ on processing these requests. Note that we're measuring throughput here, not latency. That's an article for another day.

This is an analysis of Queries Per Second (QPS) only.

Let's start with some definitions:

  • Let C be the amount of CPU time used by the handler, in milliseconds.
  • Let I be the amount of I/O time used by the handler, in milliseconds.
  • Let W be the wall clock time it takes for a handler to execute. By definition, W = I + C
  • Let N be the number of threads running in the threaded system.
  • Let E be the throughput of the event driven system.
  • Let T be the throughput of the threaded system.

Given that the times are measured in milliseconds, we can define and .

Since the wall time W is expressable in terms of CPU time C and I/O time I, and considering that both C and I are positive, nonzero, it is helpful define , with the factor k expressing the relationship between C and I.

It follows then that and .

THEOREM 1. When a handler takes more CPU time than I/O time, an event-driven system has greater throughput than a threaded system if and only if the threaded system has exactly one thread.

PROOF (partial). (note: for brevity, I will only prove one direction. The other direction is an exercise left for the reader.)

Suppose

Simplifying the inequality,

Given that , we can bound the inner term

Further simplifying

Since N is integral and nonzero, it follows that .

If you do more CPU than I/O, use threads.

THEOREM 2.When the handler takes more I/O time than CPU time, an event-driven system has greater throughput than a threaded system if and only if .

PROOF (partial).(note: again for brevity, I will prove one direction).

Given our previous construction,

and the alternate expression

it follows that .

If you do more I/O than CPU, use more threads.

A Practical Example

Let's suppose you have a request handler that does 10 milliseconds of CPU work and 50 milliseconds of database I/O. Would you choose threads or events?

I this case, the theoretical maximum throughput of the event driven system is 1000/10 = 100 QPS, where as a threaded system with 50 threads has a theoretical maximum throughput of 50,000/60 = 833.33 QPS. Of course, in the threaded case, you need to worry about being bound by the CPU, but given the number of cores on modern hardware, threads seems like a winner here.

Multiple Event Workers

The Noders got really into this one: forking "workers" from your event loop to do the heavy CPU work, and having them call back to the event loop when they're done. One parent process coordinates work among many children? Where have I heard that before?

Anyhow, let's extend the model to that case. Just for funsies.

Since your asynchronous processes do not block on I/O, at full utilization, they will theoretically take 100% of the CPU. Therfore, the number of worker processes to spawn must be equal to the number of CPUs in the system to avoid oversubscribing the machine. Let's introduce a new variable, M, to represent the number of CPUs in the computer.

The throughput formula for the event driven system therefore becomes

Now, with threads, we also need to avoid oversubscribing the CPU. Considering that during a single handler execution, only C milliseconds of CPU are used, it follows that the number of threads that will achieve theoretical maximum utilization is .

Our formula for the threaded system's throughput is therefore

...but look at this:

At full utilization, threads and events have the same theoretical throughput.

This makes intuitive sense, as if the CPUs are working as hard as they can, all else equal, they should yield the same performance regardless of the framework used.

Hold up, this does not prove that Node is good.

Of course, in a practical setting, threads have a greater memory overhead, and programming an event loop with multiple workers just seems silly, as if you're doing that much CPU work in an event looped system, you've already fucked up somewhere, so why add to it?

Node.js Is Still Cancer

So, let's review.

Suppose you're a less-than-expert programmer, which Node seems to attract in droves for some reason. You are using Node for the supposed "scalability" of it, but as we have just seen, threaded programming, which is easier to understand than callback driven programming, meets or exceeds the asynchronous model in the vast majority of cases. Chances are, you're not going to be forking worker processes to do CPU jobs, what with the less-than-expert and all.

Therefore, the reason you're using Node is not a lack of technical ability, it's because all the cool kids are doing it.

Node.js is a danger to novice programmers.

Next, suppose you're an expert programmer, and you've got some CPU bound work that you fork off to child processes to keep your event loop trucking. OK man, how complicated do you want to make this thing? At full capacity, you're at par with threads, provided it's not memory bound. At this point, you are less focused on solving the problem at hand than you are on coming up with something you can blog about and get on programming Reddit.

If you're forking workers in Node, you've got bigger problems.

Plus, it's fucking JavaScript ... on the server.

Node.js is Cancer

If there's one thing web developers love, it's knowing better than conventional wisdom, but conventional wisdom is conventional for a reason: that shit works. Something's been bothering me for a while about this node.js nonsense, but I never took the time to figure it out until I read this butthurt post from Ryan Dahl, Node's creator. I was going to shrug it off as just another jackass who whines because Unix is hard. But, like a police officer who senses that something isn't quite right about the family in a minivan he just pulled over and discovers fifty kilos of black horse heroin in the back, I thought that something wasn't quite right about this guy's aw-shucks sob story, and that maybe, just maybe, he has no idea what he is doing, and has been writing code unchecked for years.

Since you're reading about it here, you probably know how my hunch turned out.

Node.js is a tumor on the programming community, in that not only is it completely braindead, but the people who use it go on to infect other people who can't think for themselves, until eventually, every asshole I run into wants to tell me the gospel of event loops. Have you accepted epoll into your heart?

A Scalability Disaster Waiting to Happen

Let's start with the most horrifying lie: that node.js is scalable because it "never blocks" (Radiation is good for you! We'll put it in your toothpaste!). On the Node home page, they say this:

Almost no function in Node directly performs I/O, so the process never blocks. Because nothing blocks, less-than-expert programmers are able to develop fast systems.
This statement is enticing, encouraging, and completely fucking wrong.

Let's start with a definition, because you Reddit know-it-alls keep your specifics in the pedantry. A function call is said to block when the current thread of execution's flow waits until that function is finished before continuing. Typically, we think of I/O as "blocking", for example, if you are calling socket.read(), the program will wait for that call to finish before continuing, as you need to do something with the return value.

Here's a fun fact: every function call that does CPU work also blocks. This function, which calculates the n'th Fibonacci number, will block the current thread of execution because it's using the CPU.

function fibonacci(n) {
  if (n < 2)
    return 1;
  else
    return fibonacci(n-2) + fibonacci(n-1);
}
(Yes, I know there's a closed form solution. Shouldn't you be in front of a mirror somewhere, figuring out how to introduce yourself to her?.)

Let's see what happens to a node.js program that has this little gem as its request handler:

http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end(fibonacci(40));
}).listen(1337, "127.0.0.1");

On my older laptop, this is the result:

ted@lorenz:~$ time curl http://localhost:1337/
165580141
real	0m5.676s
user	0m0.010s
sys	0m0.000s
5 second response time. Cool. So we all know JavaScript isn't a terribly fast language, but why is this such an indictment? It's because Node's evented model and brain damaged fanboys make you think everything is OK. In really abusive pseudocode, this is how an event loop works:

while(1) {
  ready_file_descriptor = event_library->poll();
  handle_request(ready_file_descriptor);
}

That's all well and good if you know what you're doing, but when you apply this to a server problem, you've pluralized that shit. If this loop is running in the same thread that handle_request is in, any programmer with a pulse will notice that the request handler can hold up the event loop, no matter how asynchronous your library is.

So, given that, let's see how my little node server behaves under the most modest load, 10 requests, 5 concurrent:

ted@lorenz:~$ ab -n 10 -c 5 http://localhost:1337/
...
Requests per second:    0.17 [#/sec] (mean)
...
0.17 queries per second. Diesel. Sure, Node allows you to fork child processes, but at that point your threading/event model is so tightly coupled that you've got bigger problems than scalability.

Considering Node's original selling point, I'm God Damned terrified of any "fast systems" that "less-than-expert programmers" bring into this world.

Node Punishes Developers Because it Disobeys the Unix Way

A long time ago, the original neckbeards decided that it was a good idea to chain together small programs that each performed a specific task, and that the universal interface between them should be text.

If you develop on a Unix platform and you abide by this principle, the operating system will reward you with simplicity and prosperity. As an example, when web applications first began, the web application was just a program that printed text to standard output. The web server was responsible for taking incoming requests, executing this program, and returning the result to the requester. We called this CGI, and it was a good way to do business until the micro-optimizers sank their grubby meathooks into it.

Conceptually, this is how any web application architecture that's not cancer still works today: you have a web server program that's job is to accept incoming requests, parse them, and figure out the appropriate action to take. That can be either serving a static file, running a CGI script, proxying the connection somewhere else, whatever. The point is that the HTTP server isn't the same entity doing the application work. Developers who have been around the block call this separation of responsibility, and it exists for a reason: loosely coupled architectures are very easy to maintain.

And yet, Node seems oblivious to this. Node has (and don't laugh, I am not making this shit up) its own HTTP server, and that's what you're supposed use to serve production traffic. Yeah, that example above when I called http.createServer(), that's the preferred setup.

If you search around for "node.js deployment", you find a bunch of people putting Nginx in front of Node, and some people use a thing called Fugue, which is another JavaScript HTTP server that forks a bunch of processes to handle incoming requests, as if somebody maybe thought that this "nonblocking" snake oil might have an issue with CPU-bound performance.

If you're using Node, there's a 99% probability that you are both the developer and the system administrator, because any system administrator would have talked you out of using Node in the first place. So you, the developer, must face the punishment of setting up this HTTP proxying orgy if you want to put a real web server in front of Node for things like serving statics, query rewriting, rate limiting, load balancing, SSL, or any of the other futuristic things that modern HTTP servers can do. That, and it's another layer of health checks that your system will need.

Although, let's be honest with ourselves here, if you're a Node developer, you are probably serving the application directly from Node, running in a screen session under your account.

It's Fucking JavaScript

This is probably the worst thing any server-side framework can do: be written in JavaScript.

if (typeof my_var !== "undefined" && my_var !== null) {
  // you idiots put Rasmus Lerdorf to shame
}
What is this I don't even...

tl;dr

Node.js is an unpleasant software library and I will not use it.

The Craigslist Reverse Programmer Troll

Stop me if you have heard this before. I'm a business guy, not so good on the technical side, and I've got a great idea that I need a programmer to develop for me. I don't have any funding yet, but I've got a really nebulous connection to the venture capital world. That being said, I'll start paying you once we get funding or we start making a lot of money from the project! All you need to do is write a Facebook clone in 2 weeks. For a smart programmer like you, that should be easy, right? I'll also cut you in on a little bit of equity. Let's get started!

This kind of shit lands on Craigslist so often that it makes you wonder what they actually teach at business schools (Side note: I recently learned that if you earn $1, you get to multiply that by about 20, the price/earnings ratio, meaning that the $1 you've earned is actually $20 in value. Makes me wonder how MBAs do differential equations.). It's time for we programmers to take revenge. So, a couple of months ago, I did a reverse-programmer troll on Craigslist. It went something like this:

Title:(computer gigs) Looking for Tech Idea Person

I am a computer programmer looking for a top-notch idea person to help build the next great internet company. Being a good programmer, I don't have many business ideas of my own. That's where you come in.

The perfect idea person to work with me will have:

  • A great business idea!
  • At least a passing knowledge of computers and the internet.
  • A vague reference to knowing somebody in the venture capital industry.

I have whipped up peoples' ideas very quickly in the past. Here is a list of some of the things I've built, and how long it's taken:

  • A facebook clone (4 days)
  • A flickr-like photo sharing web site (3 days)
  • A Google-like seach engine (2 weeks - longer because you have to stop spam!)

And hopefully, I can add YOUR idea to the list!

Note that I have a structured settlement from a lawsuit that I covers my basic expenses, so I don't need to be paid, but of course once we start making money, I'd like to be paid. Since I put the code together for these web sites so quickly, it's not fair to do a 50/50 split of ownership, but I would like to have at least a couple of percent.

Too obvious, right? This can't possibly generate any responses, I thought. Nope. 31 replies in about 2 hours, before Craigslist pulled the post. Here are some of the highlights:

Hi there,
I have been looking for a Man like you for years now.
i have a few superb ideas which have big potential.
please share you phone number.
we will discuss more about it.
thanks,
SP.

Swingers have a term, 'unicorn'. Look it up. It's called a unicorn for a reason.

We are introducing a similar site to Groupon.com. We are currently speaking with [VC I have never heard of] of [fund I have never heard of], who is interested in funding our project after beta is up. We businesses people who have been extremely successful in our past positions, looking for someone like you. We to get our site up and running, I already have one programmer. I would love to speak with you more, can we arrange a meeting at my Fremont office?

[Business Dude]

How refreshingly original. Do go on.

Hi,
I'm in the middle of a business plan and looking for co-founders to develop a web appliance for the medical industry. Do you know how to develop on a LAMP platform? Also, please let me know where you are located so we can arrange a meeting.
Cheers
--j

LAMP? Web appliance? Medical industry? Excuse me, I'm getting flustered by how awesome this idea probably is. Either that or it's the taste of bile. Can't tell.

People are afraid of lonely but dont't want to go out to street. People want to share and want communicate. That why social network is sucess such as facebook, youtube ...

Has anyone really been far even as decided to use even go want to do look more like?

Hello ,
I am happy to contact you with a set of fresh offerings.
- Saas
- Web 2.0
- Enterprise 2.0
- Open source consulting and Implementation

We are sure you will find it compelling. We have full time web developers and designers working with us in various technology stacks & moreover you can also hire them according to your need and get your stuff completed from your virtual team working dedicatedly for you fulltime (We can arrange for a telephonic Interview also with these designers & developers). We do have a global presence catering clients in US, UK, Australia and Canada. We have tons of expertise in developing Web 2.0, Action Script, AJAX, CSS, C, C++, JavaScript, XML, PHP, Joomla,Drupal,Wordpress, E commerce, SEO and .NET Framework, Mobile application based projects.

We can help with designing team who as expertise on designing tools such as ADOBE Suites,CS3,CS4,Flash,HTML etc

It will be great if you can have a short call/chat for a better understanding. As we work round the clock, time zone will not be a problem. Please let me know your time of convenience and a number / Skype id through which I can reach you..

For Web Application development we work for $12/hr and for Mobile application development we work for $15/hr.

Price will never be a deal breaker its always negotiable depending upon the project requirement.

Looking forward to hear from you for a win win business relationship.

Dammit, I knew I should have listed reading comprehension as a requirement.

You Can't Be Serious.
You sound like what every CL poster's dream, a poster dream
You should have added links.

This was from a fellow programmer who apparently got the joke. Jim F, keep on keepin' on, my brother.

Not long after that, Craigslist pulled the post, or, flagged and removed, which is the jargon for troll detected. Oh well, it was fun for a while.