Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Threaded vs Evented Servers (mmcgrana.github.com)
75 points by khingebjerg on July 25, 2010 | hide | past | favorite | 30 comments


I would find it much easier to follow the article if the author used traditional mathematical notation instead of Clojure. Which do you prefer?

    (* t (/ 1000 w))
or (perhaps typeset properly)

    t * 1000 / w


Clojure notation for me. In my view, it's a lot more appropriate for today's programmer world than the traditional math notation, because computer parsing is an issue. So I prefer the moldable Lisp version.

Many programmers dislike math notation; they have bad experiences of school, and gloss over anything which reminds them of dull homework and tests. On the other hand, programming notation symbolizes a feeling of freedom.

It helps that I'm more used to Clojure's notation than traditional math notation. Also it's easily executable for me. No need to execute precise arithmetic mentally; the computer can do it.


Exactly.. i got through the first few equations and then stopped reading.


I much prefer the clojure to the mathematical notation and secretly wish academics would use code instead of traditional notation (when possible).


As someone who typically has a Clojure repl handy, I found it convenient to be able to copy-and-paste and play around with the model.


"Finally, we’ll assume single-core servers"... isn't that rather like working out mathematically that the best gait for horses is hopping, provided you assume a one legged horse?


I think I agree. Perhaps if it was titled "Thread versus Evented single core servers" it would make much more sense. Having said that, there are ways to scale out to fill the cores for event based, and in some cases (ie nothing needs to be shared, would suit event based as IO bound) works just fine. There was an interesting article which I to me looked like it was uncovering that: http://dosync.posterous.com/clojure-nodejs-and-why-messaging...


No, while this is a wonderful analogy, I don't think this is actually a problem. It's more like assuming horses weigh 1000 kg --- wrong for all but the biggest draft horses, but a nice round number to work with.

For the level of analysis in the article, there's really no difference between a 4 core machine and a processor with a 4x clock speed. His point is that the threaded model makes the most sense when each request is CPU intensive, and the event model works best when the work is light but the delays are long. All that would change for a multi-core processor is the definition of when the works starts to be CPU 'intensive'.


Moore's law is no longer in effect, it is cheaper to add more cores these days then add more speed.

Edit: My point is that, since we live in a real tangible world, where CPU power is being expanded by parallelism rather then increasing single pipeline throughput, we have to deal with that reality. To dismiss it to prove your point, seems a tad bit naive.


Moore's law is about transistor count, not about speed. The increasing number of cores on a single die is predicted by Moore's law.


False.

An evented server (at least, until they get a lot fancier and make you program with locks and such) only gets to use one of the cores, while a threaded server gets all four.

Using a single core just means that the threaded server doesn't get any advantage from being threaded, while the evented server gets everything it can use.

This "model" is worthless for comparison.


It is pretty easy to build a round robin load scheduler that queries for x amount of cores on a system, spawns a thread for x cores and then schedules request to the least burdened core. This reduces the amount of threads the server uses while still taking advantage of evented co-routine based processes. (this stuff is not new, Node just kicked off interest again due to its growing popularity). If evented servers catch on, I think you will eventually see this as a hybrid solution. But yes you will certainly need threads or different processes, to take advantage of multi-core systems, it just a question of whether or not you need a thread per request. There are also a lot of security implication that are getting glossed over in the talks about evented servers.


Event driven servers should not block on disk i/o, dns queries, or db calls [by using/writing async libs]. At least the well written ones. Else the Server Is likely to be dos-ed. The author is wrong in using a db call as an example. Secondly, the author totally ignored the time it takes to launch a thread vs not launching one at all in an event driven model and that affects the # of new connections/sec. Finally, switching between threads is not a very cheap operation since it causes a context switch, which directly affect the response time of the server.


Paul Tyma discusses some other myths regarding evented vs threaded servers: http://www.mailinator.com/tymaPaulMultithreaded.pdf


That looks interesting enough to post separately too: http://news.ycombinator.com/item?id=1546711


This article seems to focus on fairly heavy weight threads, but what about green threads? It seems like green threads with proper nonblocking libraries could behave performance-wise just like evented servers.


Check out GHC's new IO manager. GHC, by default, uses non-blocking IO exclusively and calls "poll" or some equivalent syscall to figure out which green thread to wake up.


The examples given seem to indicate that evented servers are as fast or faster than threaded servers. If that's the case, why doesn't everyone just use evented servers?


First, the analysis relies on every step from request to response being evented, which in turn relies on the existence of event-based libraries for everything you want to do.

Also, most languages don't have a good way of dealing with callbacks. They tend to make the code verbose and difficult to reason about. This is especially true for non-trivial applications, where there are more than two or three callbacks chained together.

However, you can counter-argue that the additional complexity isn't endemic to the programming model, only certain languages. Also, in languages which support multiple threads (read: not Javascript), a hybrid approach that uses events where possible, and threads where necessary, may retain many of the benefits of the event-based approach.


First, the analysis relies on every step from request to response being evented, which in turn relies on the existence of event-based libraries for everything you want to do.

This is one of the main reasons I like node.js, EVERYTHING is evented, if it isn't it's probably a bug, or at least carefully explained why it can't be (and usually there's an async alternative).


IIRC it does not support things like the mysql lib because it is normally not event driven. So yes everything in node.js is event driven, but not everything you may need is included. Last I checked work was being done on it. The solution was to create a thread pool and treat communicating with it as an event driven protocol.


I think the answer is actually pretty straightforward. It's trivial to go from a serial to a threaded server. It's way harder to write an event-based server using poll or select.

Serially:

  int handle_connection(int fd) {
    ...
  }

  int loop() {
    ...
    while(1) {
      fd = accept(listener);
      handle_connection(fd);
    }
  }
Threaded:

  int handle_connection(int fd) {
    ...
  }

  int loop() {
    ...
    while(1) {
      fd = accept(listener);
      thread_start(handle_connection, fd);
    }
  }


No its not. Using your contrived limited example its even less code.

Evented:

  int handle_connection(int fd) {
    ...
  }

  int doaccept() {
      put_on_event_loop(accept(listener));
  }


IMNSHO (dealing with both kinds in some cases):

Probably because they weren't popular. And they weren't popular because they were hard to write in "classic" languages. I'd never try to write a complex asynchronous server in a language like C, or Java. But then, I'd never try to write a thread-per-connection server in Python. Limiting ourselves to popular languages only: "old" ones didn't provide good support for this kind of thing. In "new" ones evented servers are trivial with closures, simulated continuations, etc. This might seem silly, but look at the slides 36 from "Paul Tyma" mentioned in another comment. "Found that when switching between clients, the code for saving and restoring values/state was difficult" - I'm not even sure what he means by that... why save / restore? That seems like a seriously difficult way to write the code.

Then again there are places where you really shouldn't choose one over another, unless you've got a really good reason. For example in telecommunication when you're dealing with signalling, doing thread per connection is close to insane. That's one of the reasons there's only a handful of people who understand chan_sip in the Asterisk project and why it's full of DEADLOCK_AVOIDANCE macros.


This Usenix article argues that threaded servers are a better programming model, though it admits that evented servers with current setups may be more efficient (e.g. if there's no compiler support for minimizing per-thread stack overhead): http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.58...


Reply to myself since it's been too long to edit: Decided this was maybe worth its own submission -> http://news.ycombinator.com/item?id=1547353


A lot of existing libraries, web services and other external resources make eventing more difficult since they don't provide an asynchronous way of using them.

In the case of node.js for example a lot of libraries (mysql for example) have to be wrapped or rewritten since it would block the whole event loop.


Not specific to this article:

I find it unrealistic that threaded vs evented (or blocking vs non-blocking I/O) comparisons always use a slow database server as the prototypical thing to wait for.

I get that this is not the point but to a newcomer it must seem like "oh.. database servers are super slow, I must first and foremost worry about optimizing access to them".

If your queries regularly take more then 10ms to complete, something is wrong with the database (do caching, put database closer to querying server - maybe even on the same machine).


Whatever. >10ms is perfectly reasonable db access. Of course if your data is small enough to fit in memory you will have much faster access time, but if it's and you don't have a SSD, every disk seek will be at least 10ms and you may need several to do all the queries needed to render a page.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: