From: torvalds@transmeta.com (Linus Torvalds) Subject: Re: Thread creation/switch times on Linux and NT (was Re: Linux users working at Microsoft!) Date: 8 Mar 1998 01:31:03 GMT In article <slrn6g3jpd.85.galexand@sietch.sietch.bloomington.in.us>, Greg Alexander <galexand@sietch.sietch.bloomington.in.us> wrote: >My biggest suggestion is to try kernel profiling. Check if any notable >amount of time is actually spent in goodness before worrying about changing >it. Also, check out 2.1.x - there are some changes to various details of the scheduler that were brought on by the finer locking granularity, but that were sometimes also related to performance. I do obviously agree with the basic points above - I wrote most of the scheduler. Usually there aren't all that many runnable processes even under heavy load, and having a very simple linear queue is a win for almost all situations in my opinion. For example, if there are lots of processes doing IO, the process list tends to be fairly short and you really want a very simple scheduler for latency reasons. In contrast, if there are lots of CPU-bound processes, there may be lots of runnable processes, but it very seldom results in a re-schedule (because they keep running until the timeslot ends), so again there is no real reason to try to be complex. So yes, under certain circumstances the current scheduler uses more CPU than strictly necessary - and the "40 processes doing a sched_yield() all the time" example is one of the worst (because it implies a lot of runnable processes but still implies continuous thread switching). Personally I don't think it's a very realistic benchmark (it tells you _something_, but I don't think it tells you anything you need to know), which is one reason why Linux isn't maybe the best system out there for that particular benchmark. But it would be easy enough to make Linux perform better on it, so I'll think about it. [ Even when I don't find benchmarks very realistic I really hate arguing against hard numbers: hard numbers are still usually better than just plain "intuition". And I may well be wrong, and maybe there _are_ circumstances where the benchmark has some real-world implications, which is why I wouldn't just dismiss the thing out-of-hand. It's just too easy to ignore numbers you don't like by saying that they aren't relevant, and I really try to avoid falling into that trap. ] The particular problem with "sched_yield()" is that the Linux scheduler _really_ isn't able to handle it at all, which is why the Linux sched_yield() implementation sets the counter to zero - I well know that it's not the best thing to do for performance reasons, and I think it unduly penalizes people who want to yield some CPU time, but as it stands the scheduler can't handle it any other way (the "decrement counter by one" approach that Ingo suggested is similarly broken - it just happens to not show it quite as easily as the more drastic "zero the counter", and it has some other problems - mainly that it doesn't guarantee that we select another process even if another one were to be runnable). I should probably add a "yield-queue" to the thing - it should be rather easy to do, and it would get rid of the current scheduler wart with regard to sched_yield(). My reluctance is purely due to the fact that I haven't heard of any real applications that it would matter for, but I suspect we need it for stuff like "wine" etc that need to get reasonable scheduling in threaded environments that look different from pthreads(). Linus