Kernel Planet

January 07, 2009

Rusty Russell: Fun with cpumasks

I've been meaning for a while to write up what's happening with cpumasks in the kernel. Several people have asked, and it's not obvious so it's worth explaining in detail. Thanks to Oleg Nesterov for the latest reminder.

The Problems

The two obvious problems are

  1. Putting cpumasks on the stack limits us to NR_CPUS around 128, and
  2. The whack-a-mole attempts to fix the worst offenders is a losing game.

For better or worse, people want NR_CPUS 4096 in stock kernels today, and that number is only going to go up.

Unfortunately, our merge-surge development model makes whack-a-mole the obvious thing to do, but the results (creeping in largely unnoticed) have been between awkward and horrible. Here's some samples across that spectrum:

  1. cpumask_t instead of struct cpumask. I gather that this is a relic from when cpus were represented by an unsigned long, even though now it's always a struct.
  2. cpu_set/cpu_clear etc. are magic non-C creatures which modify their arguments through macro magic:
    #define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
    
  3. cpumask_of_cpu(cpu) looked like this:
    #define cpumask_of_cpu(cpu)
    (*({
            typeof(_unused_cpumask_arg_) m;
            if (sizeof(m) == sizeof(unsigned long)) {
                    m.bits[0] = 1UL(cpu);
            } else {
                    cpus_clear(m);
                    cpu_set((cpu), m);
            }
            &m
    }))
    
    Ignoring that this code has a silly optimization and could be better written, it's illegal since we hand the address of a going-out-of-scope local var. This is the code which got me looking at this mess to start with.
  4. New "_nr" iterators and operators have been introducted to only go to up to nr_cpu_ids bits instead of all the way to NR_CPUS, and used where it seems necessary. (nr_cpu_ids is the actual cap of possible cpu numbers, calculated at boot).
  5. Several macros contain implicit declarations in them, eg:
    #define CPUMASK_ALLOC(m)        struct m _m, *m = &_m
    ...
    #define node_to_cpumask_ptr(v, node)                                    \
                    cpumask_t _##v = node_to_cpumask(node);                 \
                    const cpumask_t *v = &_##v
    
    #define node_to_cpumask_ptr_next(v, node)                               \
                              _##v = node_to_cpumask(node)
    

But eternal vigilance is required to ensure that someone doesn't add another cpumask to the stack, somewhere. This isn't going to happen.

The Goals

The Solution

These days we avoid Big Bang changes where possible. So we need to introduce a parallel cpumask API and convert everything across, then get rid of the old one.

Conclusion

At this point, we will have code that doesn't suck, rules which can be enforced by the compiler, and the possibility of setting CONFIG_NR_CPUS to 16384 as the SGI guys really want.

Importantly, we are forced to audit all kinds of code. As always, some few were buggy, but more were unnecessarily ugly. With less review happening these days before code goes in, it's important that we strive to leave code we touch neater than when we found it.

January 07, 2009 12:01 PM

James Morris: Kernel Security Wiki

This is to announce a kernel security subsystem wiki, supported by the kind folk at kernel.org. It's intended for use by community developers and users of kernel security projects. So far, there are sections on working with the security-testing git repo, a listing of various kernel security projects, and an events page. If there's something you'd like to see or change on the wiki (particularly if it relates to your own project), create an account and make it so.

Note that this wiki is not related to security response: the security incident contact for the kernel per the MAINTAINERS file is security @ kernel.org.

January 07, 2009 09:37 AM

January 06, 2009

Evgeniy Polyakov: LISP error handling gotcha.

[1]> (defun resolve-host-name (addr)
  (handler-case (hostent-name (resolve-host-ipaddr addr))
    (t ()
       addr)))
RESOLVE-HOST-NAME
[2]> (resolve-host-name "1.2.3.4")
"1.2.3.4"
[3]> (resolve-host-name "195.178.208.66")
"tservice.net.ru"

Exception mechanism is a great extension to the whatever language, and I think LISP has one of the best realizations (and the first one actually). I'm not very familiar with the exceptions in C++ as long as with language itself, but iirc it is not (easily) possible return back to the calling point with some value determined by the exception handler. Even in Java with its finally section it is still less convenient. But I may be wrong of course :)

Above chunk of the code catches the error (all exceptions) and returns requested address itself, and when no error happend it returns resolved address.

January 06, 2009 08:46 PM

Pete Zaitcev: Beyond Liferea

Liferea won every time when I compared it to other newsreaders thus far. The RSSOwl came closest, and only floundered because it takes twice or three times as many clicks to accomplish every task over Liferea. When you have more than 200 feeds like I do, the smallest inefficiency accumulates fast. The Google Reader is not even in the running on this score. Its only good point is the accessibility from any browser.

However, recently Liferea began to show age by accumulating bugs which Lars won't fix. He might be giving up on the project and moving on with life, I think. Here's my list in the decreasing order of annoyance:

Ideally I would like something like Liferea with the bugs fixed. But if someone took RSSOwl and changed its GUI to match, that would be interesting too. I'm thinking about writing a Liferea clone actually.

January 06, 2009 08:02 PM

Evgeniy Polyakov: Appartment development: the brick corner completed.

Just bloody good IMO.


The brick corner

To finish this I bought a 25kg glue bag and while delivered that sack and the wood plate for the shelves on my hump from the development market I decided to introduce a new physical quantity to measure a load and a work: a ML. One man-load is equal to the amount of work needed to deliver 25kg to the 1 km distance with the speed of 10 km/h. Thus I wasted one man-load. IIRC this equals to 400 W or roughly one half of the (not real) horse power.

Since now I have a big bag of the glue I decided to glue all the tiles I have, so started to glue part of the floor in the kitchen, some parts of the hall and wall there... Well, I need to put the glue and tiles (lots of tiles, overall I have about 5 boxes of 3 different types (of 2 sizes: 33x33 tiles and 30x30 ceramic granite) of the tiles) somewhere, so I improve the look and feel of the appartments. This will not take lots of time, likely tomorrow all will be finished, but it again requires to saw the tiles which I already hate because of the amount of the dust. It is just hell everywhere, but tomorrow this will be finished and I will finally clean the whole appartments.

Own appartment development - infinite amount of the sex with the ugly stuff creativity on the very limited area. Pervertively love it.

January 06, 2009 07:53 PM

Michael Kerrisk (manpages): See you at LCA 2009

It's too much fun to miss, so I finally made the booking... I'm going to LCA 2009 (19-4 Jan), in Hobart, Australia!

January 06, 2009 07:00 PM

James Morris: Security changes in the 2.6.28 kernel

Version 2.6.28 of the Linux kernel was released during Christmas, so I thought it'd be worthwhile waiting until after typical vacation days to post a summary of changes to the security subsystem. As always, thanks to the Kernel Newbies folk who track major kernel changes.


This was not a terribly exciting release for the security subsystem.

Thus far for the 2.6.29 kernel, the main change is the massive credentials API change from David Howells. This has caused a couple of regressions, which were picked up by subsystem testing of Linus' tree. Fixes have been developed and are currently partially merged upstream. It seems we need to get more testing done in linux-next to avoid such breakage during future merge windows.

Also noteworthy is the merge of the pathname security hooks for LSM, which should pave the way for TOMOYO and AppArmor in 2.6.30, subject to the general patch submission review process. TOMOYO is only a couple of acks from approval, has been baking in -mm, and is pretty much self-contained. It may even appear in 2.6.29 if the merge window is open for features long enough.

January 06, 2009 11:11 AM

Pavel Machek: edimax keyboard/VGA/mouse switch: only usable if you are fast typist

...and if you are not using emacs.

So what happens? This broken thing automatically sends "release all keys" event one second after you release _any_ key. So if you hold down shift and start typing, LETTERS COME OUT ALL CAPS, unless you pause for a second in typing, when it sends you bogus shift release event and you continue with small letters.

The same thing happens with control, and makes emacs unusable :-(.

January 06, 2009 08:18 AM

Pete Zaitcev: Oh Raster and Sean you cards

I'm not making any value judgements and any story has two sides, but when read cold this is pure software engineering LOL:

We asked Raster to integrate this [Qtopia predictive keypad] into Om 2008 and extend it to make it more hacker friendly (i.e., usable from places like the terminal). After two months of more or less silence he showed us his own version, written from scratch. The design was a work in progress. And the dictionary was far inferior to what Qtopia had already. An internal battle started that lasted until one month before Om 2008 was set to be released when our product manager, Will Lai, couldn't take it anymore. He asked another engineer to just get the Qtopia keypad working.

Oh god, the nostalgia. Saw things exactly like it in Metabyte. Funny thing though, nothing like this happened to me since I joined Red Hat.

January 06, 2009 01:18 AM

Evgeniy Polyakov: Http log parser in LISP.

Updated my old LISP parser of the http logs to support date and time parameters, cleaned up code a bit and improved performance a little.

What I really like in LISP is its nature. It is just perfect to be used for some obscure project when you slack after some other (potentially hard) task. And while result may be far from the perfect code, it still does its job of brain relaxation.

That's the parts of my (rather bad I do not argue) code:

(defun filter-urls (check-list rules)
  (let ((l check-list)
	(res '()))
    (setf l (remove-if #'(lambda (check)
			   (let ((str-to-push check))
			     (dolist (r rules)
			       (let* ((match (first r))
			  	      (trans (second r)))
				 (when (search match check)
			           (when trans
				     (when (eql 0 (count trans res :test #'string=))
			               (push trans res)))
				   (setf str-to-push nil)
			           (return t))))
			     (when (not (null str-to-push))
			       (when (eql 0 (count check res :test #'string=))
			         (push check res)))
			       t))
		       l))
    res))
 
(defmethod dump-short-block ((inst address-instance))
  (with-slots (atime get_urls post_urls empty addr refs) inst
      (setf get_urls (filter-urls get_urls *filter-rules*))
      (setf post_urls (filter-urls post_urls *filter-rules*))
      (setf refs (filter-urls (filter-urls refs *filter-refs*) *filter-rules*))
      (when (or (not (null get_urls)) (not (null post_urls)))
	(dump-block inst))))

Next task is to find out a way to get data from the DNS server, likely via:

> (dolist (a (hostent-addr-list (resolve-host-ipaddr "www.kernel.org")))
    (format t "addr: ~A, name: ~A~%" a (hostent-name (resolve-host-ipaddr a))))
addr: 204.152.191.37, name: www.kernel.org

Its main advantage is that it is very different from the-best-ever high-level-assembler language C.

Edited to fix the title

January 06, 2009 12:30 AM

Jesse Barnes: I hate spam

I’ve been really bad at dealing with spam on this blog, but I just went through the posts from the last few months and cleaned things up. So if you’ve made comments, you might actually see them now and maybe even see replies…

January 06, 2009 12:03 AM

January 05, 2009

Jesse Barnes: back from drinking island

It’s been an exciting few weeks since my last post (well ok 6 weeks):

The downside of the KMS merge is a boatload of new kernel bugs :p but that’s to be expected. We still have quite a few 2D bugs open and those will start to migrate over to the kernel side as more and more people use the code. I’ve got some cleanups to the libdrm doxygen markup as well, but nothing I’m ready to push out (it seems there’s always a more important bug or feature to work on so docs keep getting delayed, so dear lazyweb please write it for me etc etc).

On the home front things have been fun too; my new office (a shed in the back yard I’ve been remodeling) is nearly ready to move in to, just a few more weekends worth of work to go. It’s been a fun project, but I’m really ready for it to be done now, since I need the space and freedom from distraction it should bring. The holidays were great as well. Our little family visited my parents for the week of Christmas and had a lot of fun with the clan (grandparents, great grandparents, aunts, uncles, cousins from all over came to visit). Then for new year’s my sister finally got to visit us up here in Arcata.

Well anyway, back to herding PCI patches and fixing up the KMS bits.

January 05, 2009 10:45 PM

Val Henson: Adventures in red hats

Things I learned about red hats this week:

I would like to learn that someone in the UK is reading this and has a Red Hat fedora that needs a home. Preferably a size medium.

January 05, 2009 10:27 PM

Evgeniy Polyakov: Appartment development: the brick wall.

That's how I spent productive part of this day.


The brick corner wall

The more I look at it the more I like this corner.
Actually I only covered one wall today (and the corner with the hatch in the bathroom yesterday). And while there is enough time to finish another wall (this took about 3-4 hours), but because I 'm idiot can not calculate needed materials in advance, I have to wait for tomorrow, when I will buy another glue bag (this wall took 10kg), then fill the holes between bricks with the graphite plaster and finally finish the kitchen. If stars will stay in line this may even happen tomorrow eveninig.

But I do not regret I do this. Not only because I got lots of experience with appartment development in general and tile glueing (and related tasks) in particular, but basically because the end result is exactly what I want. Even if it was achieved after several iterations :)

And now some slacking reading and slacking beering...

January 05, 2009 06:20 PM

January 04, 2009

Pete Zaitcev: COPYING.SWsoft must go

I pulled the git for 2.6.27 OpenVZ branch and I'm idly wondering why Parallels have not officially removed this:

Nothing in this license should be construed as a grant by SWsoft of any rights beyond the rights specified in the GNU General Public License, and nothing in this license should be construed as a waiver by SWsoft of its patent, copyright and/or trademark rights, beyond the waiver required by the GNU General Public License. This license is expressly inapplicable to any product that is not within the scope of the GNU General Public License.

Nothing says better than "we're not serious about going upstream" than threats of patent litigation.

Kinda makes LXC more of a fait accompli than it already is or needs to be.

January 04, 2009 09:53 PM

Val Henson: One factor in San Francisco's pedestrian deaths

Walking home from the gym, I see a man on the corner so enraged that his face is the color of Pepto Bismol. He is shouting at someone I can't see, something like, "This isn't Disneyland, you know!" As I come closer, I see that he is yelling at the empty space between him and the free newspaper box. I relax; no one is going to get beaten up, it's just another crazy guy on Valencia street.

Then things take an unexpected turn. Judging by his growing frustration, his invisible friend is not listening. He shouts, "Watch this!" and poises himself in the crosswalk, head turned to watch oncoming traffic. Then, with great agility and an air of long practice, he sprints in front of an oncoming car. Fortunately, whoever is driving is not text-messaging or updating Twittr and brakes just in time. "See," he shouts to his invisible friend, "San Francisco is dangerous! You can die out here!"

January 04, 2009 09:09 PM

Matthew Garrett: Analysis

While it's obvious that being a kernel developer is absolutely the best thing anyone in life could ever aspire to, I'm finding it increasingly difficult to justify not having just given up and gone off to be an analyst instead. Two stories stuck with me this week. The first claims that we can expect Android netbooks to be on sale within a year or so. The reasoning behind this? Android runs on x86 and has a MID profile. Oh, yeah, and it sounds like a cool idea. Sure, we need to ignore straightforward facts like, oh, I don't know, it being UNSPEAKABLY PAINFUL TO DO OS DEVELOPMENT ON ARM, and hence it being NO SURPRISE WHATSOEVER TO FIND THAT BUILDS AND RUNS ON COMPUTERS THAT ACTUALLY GO FAST AND SIT ON PEOPLE'S DESKS. Or that MID refers to the class of devices like Nokia's internet tablets or, well, any of these and not a netbook. And, leaving those, the evidence we have for this is that "It sounds like a cool idea".

It's easy to write an article on why something sounds like a good idea, and even easier to make it sound like a good business plan if you're not the one who's going to lose money on it. But yeah, maybe Google's going to push for Android on netbooks. It's going to be a kind of crowded place to be, what with Microsoft and Intel and Canonical and Xandros and half a dozen Chinese Linux distributions that you've never heard of and really hope never to again once you've seen their distributions, but it's possible. And so these guys might even be right. But the sum total of their insight here is that (a) Google have a product, and (b) Google might want to enlarge the market for that product. The rest is a collection of extraneous facts designed to make you think that they've actually done something more impressive than typing make, and frankly if that's enough to get you noticed then gentoo users the world over are missing out badly.

It's ok, though. There's worse. Google are making a secret OS, which by my count means that there's been eleventy billion people inside Google working on a fucking OS for the past TWENTY YEARS and this time we can tell not because people have, y'know, gone to Google and seen it, but because they're busy leaking information about their secret OS by removing the user agent string from a bunch of their web traffic.

Of course, as Clint points out in his article, Google's a web company and so would want to produce a web OS. The obvious way to develop this web OS would be to, uh, run it on top of Android, an OS currently entirely unsuited to desktop use due to little things like the lack of decently accelerated framebuffer drivers on any hardware you can currently obtain (rather than, say, any of the operating systems already deployed in Google, all of which share one striking feature - the ability to RUN A WEB BROWSER AT A DECENT SPEED). And even though basically everyone in Google seems to have an Android device, it'd be vital to prevent anyone from knowing that they used Android to browse the web, so scratch the user agent.

But that makes no sense. So what Clint's clearly getting at is that people using the Google web OS are browsing the web using the Google web OS, and so remove the user agent string to avoid leaking that information. That makes sense, up until you actually think about it. A web OS would run in a web browser. Why would you run a web browser in a web browser?



Google's not an OS company. They don't employ enough people to develop a full OS and releasing a Linux derivative themselves would just be a way of tarnishing their brand in a hideous manner. Any move into the home would have to be in the form of appliances with known hardware configurations, and seriously, what's the point? It's not a big enough market. If Google's going to push applications, it's going to do it through the browser. And if it's doing it through the browser, then right now it doesn't matter what the underlying OS is. There could be any number of reasons for the mystery of the missing user agent strings, but if it's because Google are trying to prevent people from working out that they're developing an entire software stack that they're going to push out onto arbitrary hardware then I have an Android netbook to sell you.

That'll be $50,000, please.

January 04, 2009 02:23 AM

Harald Welte: First Impressions of South Korea

So today I arrived in South Korea, after a one day stop-over in Taipei (following a flight connecting in Abu Dhabi). I've arrived at about 6pm local time and had a 90minute bus ride to the hotel in Yongtong-gong. So besides check-in and a quick stop at the convenience store, I didn't yet do much.

Some first impressions, in no particular order:

So I remain thrilled what happens next. Not sure how much time I will have during the week, depends how busy it is at work.

January 04, 2009 01:00 AM

January 03, 2009

Evgeniy Polyakov: Appartment development: installing the water access hatch.

I did not write for a while about my appartment development. It took way too long already and I believe its time to finish the things.

The main cases to resolve are tile completion in the bathroom and brick tiles in the kitchen. Both are rather simple tasks, but bathroom tile glueing stuck because of the water system hatch.


Installing the water system access hatch

Long ago I bought an interesting system which allows to hide hatch behind the tiles, so only sawed edges will be visible. The most interesting is connection part which allows to move the hatch door rotation point. Unfortunately the hatch hole is large enough so even frame connection is not trivial. But after some sawing, screwing and foam sealing I finished this task.


Installed water system access hatch

Hatch is not covered by tiles yet, and because of its dimensions (and bathroom walls themself) some tiles have to be sawed. I love my Bosch corner grinding machine more than any other instrument, maybe because it can saw and break anything, and that's the time for it. I found that the same clamps can be used to saw the straight line even over the extremely hard tiles.


Preparing the tile

I have to saw the tiles while all usual people just cut them with the special hard-metal rotating disk. This does not work with my tiles, I almost imagine how molecules laugh on the master who tries to scratch that surface. Sometimes I succeeded to cut the tile by producing deep enough scratches, but I way too many tiles trying to cut a single piece, so I prefer just to saw them.


Sawing the tile using corner grinding machine

I will spent tomorrow glueing the tiles in the bathroom and expect it to be fully completed. If time (and amount of the glue) will permit, I will move to the kitchen, where will finally glue my ubercool brick tiles.


Brick tiles

I can not even imagine how I will feel when this is finished. I will be sooo close to the finish...
Probability that I will do the whole appartment development (even in my small loft) myself again is somewhere between zero and void. But it is very interesting no matter what :)

January 03, 2009 07:07 PM

Ulrich Drepper: Fedora 10 a little bit more secure

Fedora 10 comes with filesystem capability support. Unfortunately it is not used by default in the packages which can take advantage of it. I think the excuse is that there people who build their own kernels and disable it. That's nonsense since there are many other options we rely on and which can be compiled out.

Anyway, you can do the following by hand. Unfortunately you have to do it every time the program is updated again.

sudo chmod u-s /bin/ping
sudo /usr/sbin/setcap cap_net_raw=ep /bin/ping
sudo chmod u-s /bin/ping6
sudo /usr/sbin/setcap cap_net_raw=ep /bin/ping6


Voilà, ping and ping6 are no SUID binaries anymore. Note that ls still signals (at least when you're using --color) that there is something special with the file, namely, there are filesystem attributes.

These are two easy cases. Other SUID programs need some research to see whether they can use filesystem capabilities as well and which capabilities they need.

January 03, 2009 12:21 AM

January 02, 2009

Dave Jones: Alpha400 netbook.

I think that this is the cheapest Linux netbook around. The kicker being, it uses a MIPS CPU, so you’re basically stuck with the OS that it comes with, (or possibly debian). The factory OS is also 2.4 kernel based. I wonder how much of it works in 2.6, and also I wonder how much of their 2.4 varies from mainline 2.4.

CPU asides, the rest of the hardware isn’t spectacular either. 128MB of RAM is pitiful these days. Just 1GB of flash too.

“Supports File Sizes up to 8 MB” is kind of.. bizarre.

Despite the miserable hardware, a year ago, this would have piqued my curiosity, and at $150 I probably would have picked one up. Now though, for just a little more, you can get an eeepc, or other netbook which is far more useful.

Nice to see the pricepoint getting lower all the time though. Getting closer to the mythical $100 laptop, but the hardware still sucks.

January 02, 2009 08:42 PM

Evgeniy Polyakov: Single supply LM3886 amplifier.

LM3886 datasheet provides a simple scheme for single-supply amplifier, which basically creates differential signal where 'ground' point is selected to be somewhat in the middle of the supplied input voltage. It is achieved by using n-p-n transistor, which base is connected to the resistor voltage divider, so it controls potentially big current simultaneously providing needed voltage after base-emmitter drop.

I already implemented this design, but it did not work good, apparently because of transistor (it either was broken or I connected it wrongly, since it is quite hard to find datasheets for the russian transistors and other active components). So I replaced it with Fairchild's 2n3904.

And things exploded.


Exploded capacitor

History can not tell us what the input voltage had the exploded bypass capacitor, but apparently it was not enough. I remember it was something like 25V, but input voltage was just 30V. Explosion was quite loud and a bit 'dirty'. Here are the only parts of the capacitor left.


Exploded capacitor details

So I replaced it with the 63V capacitors and reran the test. Somewhat surprisingly, there were no new explosions.


LM3886 single-supply amplifier output

As you see, it works not bad, without evident distorsions. I even connected single channel to the loudspeakers and dropped the headphone output signal in software, since it would be way too loud for the signal I used without amplifier. Someday I will test it, neighbours will not sleep safely anymore :)

With 25V input voltage LM3886 is not very hot (I did not connect radiator yet), apparently because amplifier drains only 20-30 mA current (as PSU tells me). This will grow when input (and thus output) signal is increased. LM3886 is closed when single-supply input voltage drops below 20V, ground point will be set at about +9-10V, and this is a limit for the differential input voltage. Without input signal there is a silent enough signal, it looks like a sine (at least it is something repeated), and I suspect mute function, since I use simple resistor as a current source and this may be not enough and polar transistor should be used instead. This is not related to the input signal or input voltage (since I use the lab PSU), but I may be wrong that lab PSU is great at it.

I will experiment with the operation amplifier preamp next. I want to combine two-channel preamps and LM3886 on some usable board (maybe even on PCB) and hide into the cabinet with the simple PSU I have transformator, Shottkey (or just rectifying bridge, I need to test both) diodes and huge capacitors for.
And I want to think about next scheme, but that's already a different story.

Now I'm drinking the beer, listening my single channel audio system connected to the laptop and huge PSU. And you know, I like it.

January 02, 2009 06:26 PM

January 01, 2009

Dave Jones: Last post on leap seconds.

I thought I was done with this. Then, today I saw this. To the best of my knowledge, Fedora 8 didn’t suffer from the bug I originally described several posts ago. I think this one happening at nearly midnight UTC is coincidence.

There’s a “me too” in the comments, but it seems odd that two people on slashdot saw it, but we never heard a peep on the Fedora mailing lists, or in bugzilla. Or even in upstream kernel.org. It could just be coincidence, the story is unsurprisingly short on details. I guess slashdot stories are easier to write than bug reports. But without additional debugging info we won’t ever know. Bear in mind that last time we saw a crash of this nature it didn’t affect everyone then either.

It was only by chance I managed to catch the backtrace in the `06 crash. I actually had two locked up machines, but one had its screen blanked, and wouldn’t unblank. The other machine had blanking disabled (setterm -blank 0) and thankfully, had also been set up to use a VGA screen resolution so had plenty of lines to display the whole backtrace.

Update: a problem has been found, and fixed.

January 01, 2009 11:03 PM

Dave Jones: Resolutions.

I’ve never been big on new years resolutions. Though the beginning of a year is a good point to stop some bad habits, and start some good ones. I decided I want to change a few things this year.

  • Give up soda / drink more water.
    This ties into the above. I drink far too much sugary crap. Even the ‘flavoured water’ stuff is out. No more corn syrup & crystaline fructose for me. The money I save on buying these will instead be spent on water filters.
  • Exercise more often.
    Continuing again with the ‘be more healthy’ kick. I had something of a wake-up call last year when I had a medical exam, which kickstarted me into actually doing _some_ exercise. Prior to that, I was a complete couch potato. I need to do more, and more regularly.
  • The changes I want to make aren’t just ‘be more healthy’. Here’s some other things I hope to achieve this year.

  • Do new and different things.
    I feel I’ve been stagnating somewhat the last year or two. I’ve learned some new things, but it’s mostly through necessity. I want to do some things outside of my comfort zone. For example, it’s been a long time since I’ve done any UI programming. It’d be nice to do some non-plumbing programming once in a while. Perhaps even in a new language. I had fun recently playing with processing to do some fun visualisation things. Perhaps more of that.
  • Read more books.
    I really suck at reading books. In the last year, I lost count how many started. But I know how many I finished. Zero. My english teacher would be very disappointed. This sounds like an easily achievable goal, but I get distracted far too easily. I need to carve out a chunk of time to do nothing but read. At least once a week. Actually, I did finish some books, but they were of the instructional variety, and I kind of skimmed past some of the dull chapters, so I don’t think they really count.
  • So that’s my hand-wavy goals for 2009. This time next year, I’ll make another post to see how many of these I failed at.

    January 01, 2009 08:55 PM

    Dave Airlie: Isabel Airlie.

    Isabel was delivered this morning, at 11:54am, at 8lb13oz (4.07kg).

    I'll be slow on anything non-baby related for a while :)

    January 01, 2009 12:13 PM

    Evgeniy Polyakov: Happy New Year!

    Maybe a little bit late, but it is still better than never :)

    2008 was somewhat a new year for me, really lots of new things appeared, many projects were completed and started. I began to work with the new people, and it went really well as long as expected to be no less than excellent.

    I got to play with the completely new things: playing trumpet, while far from being perfect, but still I love it. Sometimes I even believe that there is a progress in the playing technique and produced sounds are not that bad.
    I started climbing trainings again (although missed couple of them at the end of the year because of the almost everyday drinks sending off the ending year), which brought me back my physical exercises and excellent refreshing after the brain's fucking work.
    Started to implement my another dream: I work with electronics and while do not have much of a experience there, I very much like how things go and how practical and theoretical skill improve.

    At the programming area there was a serious progress with the POHMELFS and DST. Essentially I managed to write fully-coherent parallel network filesystem with list of great supported features and very powerful network block device by this year. There was a set of smaller projects I enjoyed to work with and expect things to be even more interesting with time. Unfortunaltey I was not able to complete initial elliptics network implementation (because of starting it over recently because of new problems I caught and wanted to work out without crutches and (of course) drinks :), but I decided to combine its early implementation version and usable library, so expect some news here soon too. I do not worry too much about its future: it will be excellent project.
    Started to learn and programm in LISP. It is the greatest language I get in touch with. And while I frequently end up programming some obscure stuff with it (like html log parser), I like how it goes. I have several really interesting projects in mind for the next year to be implemented (first) in LISP like AI (I got Russel and Norvig for reason :), data mining, language analysis and compilers.

    I started to read books. Well, I read them all the time, but usually it was computer science related printed articles. Now I own great load of computer science book (like abovementioned Russel and Norvig, although "The Dragon Book" I still read printed on A4), some musical literature (I even got a piano studying book, since there are no trumpet ones, maybe I will get myself a piano though :). Want to be able to draw people. I do not like coloured pictures (like watercolour or oil paintings), but enjoy the graphics (like black pencil crayon). I envy to death (friendly way of course) to those who can paint, and want to try this more seriously...
    To extend the horizonts I now own many books completely unrelated to what I did all the time and reading them brings me a real enjoyment: there is a huge world outside the IT. Reading is a great process of learning something new, I even specially selected a time early the day and at the evening for this.

    Private life has this name for reason, but still I want to mention, that I found new friends at new places and managed not to lose the old ones. Although we are getting older and everyone has own problems, we still meet and enjoy the process.
    Want to send the greatest thanks to you and wish to enjoy the time, actually you are the only what matters :)

    That's it: Happy New Year and be cool. There will be lots of interesting stuff around!

    January 01, 2009 11:59 AM

    Dave Jones: More on leap seconds.

    Jesse Keating made a comment in my previous post on leap seconds, which I thought was worth highlighting in another post, for the benefit of those who don’t read the comments.

    This is why rarely executed codepaths suck. Whilst it is tempting to gloat over another Microsoft failure, this could easily have been any other OS. I already mentioned that Linux had suffered something similar once. A bug like this in consumer devices is a nightmarish, but imagine if such a bug ended up in something more critical ? “Sorry, your life support system went offline because there was a leap second”. In safety critical systems, rare codepaths are kind of terrifying.

    Writing test cases for bugs like this is also not particularly fun. You’d have to have a fake ntp server for testing the rare case.
    Now think about all the other potential ‘only runs once every blue moon’ codepaths in your apps, and imagine the effort required to write test plans for all of them. Not impossible, but certainly a lot of potential job security there for QA folks. Just like fuzz-testing, traditional coverage-testing by just running common workloads aren’t the panacea of testing when there are variables outside your control.

    What’s still puzzling to me though.. The Zunes died several hours before 00:00:00 UTC.
    Quirk of MSFT’s ntp implementation I guess. *shrug*

    January 01, 2009 03:09 AM

    December 31, 2008

    Dave Jones: Leap seconds.

    Tonight, a leap second will occur. After 23:59.59, we have 23:59:60 before rolling over to 00:00:00. Most people won’t even notice. Most electronic devices won’t notice. Those unaware of the event (like the clock on my microwave oven), will end up a second slower. (not that it really matters, it doesn’t display seconds in its clock, and I surely wasn’t second-accurate when I set it).

    Of slightly more concern, are the more clever devices. The devices that are aware of leap seconds know when to insert one. On these internet connected devices, ntpd tells the kernel “insert or deduct a second” as necessary.
    This all sounds fairly benign, but it has been known to be problematic. For reasons I’m not entirely sure of, ntp still calls into the kernel twice a year, regardless of whether a leap second is inserted or not. So, twice a year, we end up in different code paths that we don’t execute the rest of the year.

    Whilst I was travelling in June 2006, I noticed I couldn’t get at my email. A week passed before I found out on returning home that the kernel had oopsed in that code path. There was no leap second in June that year. Nor has there been in any year this decade. Thankfully, that particular oops was only fatal if you were running a build with certain debugging CONFIG options turned on (I was), so that vast majority of users never saw a problem. Here’s the fix that went into 2.6.22 for this bug.

    The very few that did see the problem (I don’t recall anyone else mentioning it when I posted to lkml) likely just rebooted, with the “if it happens again, I’ll report it” mindset, which of course, it didn’t..

    Hopefully at midnight, all will be well and that code will just do it’s thing with no dire consequences :-)

    update: anti-climax, just as we like it :-)

    Dec 31 18:59:59 localhost kernel: Clock: inserting leap second 23:59:60 UTC

    December 31, 2008 07:27 PM

    Harald Welte: 25C3: Total Overload

    In my 10+ years of CCC congress, I've never been trying to run any significant project at the hackcenter so far. In the first couple of years I was just hanging out there, chatting with people and working on stuff here and there, operating FTP sites (like the trial we once had with then-experimental ext3 vs. Reiserfs on machines with Gigabit Ethernet interfaces [I was operating the ext3 one]). The years following that I was trying my best with the audio+video recording and streaming - with mixed results, as all people from that time remember. I was just trying to help, digital A/V not being my particular area of expertise.

    So this year I decided it would be a good idea to do some serious GSM protocol side development at the hack center, which would complement the talk I was giving on running your own GSM network.

    So far so good. The only day where I really could hack the way I wanted was on Day 0 (the day before the event officially started) and Day 1. Friends with various backgrounds started to join and help with issues here and there. Everyone was excited by the numerous new possibilities a project like this provided.

    However, starting with Day 2, and particularly Day 3 and Day 4, the amount of constant interruptions by various people was simply unbearable and brought the development close to a complete halt. Not only that, it caused severe lack of sleep, stress levels even beyond what I had ever experienced before, and I developed a cold and even some fever.

    In general, I am completely disappointed by many of the crowds. I would have assumed that most people _know_ that frequent interruptions lead to inproductivity, and that they would also know and understand that a project that literally hundreds (if not thousands) of people are excited about cannot answer RTFM style questions that everyone would have been able to read up by themselves on wikipedia or similar sites. Sure, there were some exceptions to that rule. But overall, it was a very unpleasant experience.

    So from next year on, I will certainly refrain from running any kind of project in the hackcenter. I will be a regular attendee, possibly speaking on some kind of subject or the other, preferably on the last day so people won't drive me nuts with their never-ending questions.

    The DDoS attack on the GSM/BS11/OpenBSC hackers, combined with the overcrowded 25C3 has in the end led to a point where the only two talks that I've been able to attend were the ones in which I was speaking.

    "Thank you" :(

    December 31, 2008 01:00 AM

    December 30, 2008

    Linus Torvalds: Spastically flailing around..

    It's that magical time of year when I actually play video games. I have one rule for christmas (and bday, for that matter) gifts for Tove: she should buy me toys. No practical gifts, no soft packages with sweaters or socks. I didn't enjoy them when I was little, and I don't enjoy them now. I refuse to grow up.

    And none of that "toys for men" crap either. I couldn't care less about a new miter saw or something like that. I'll buy manly toys myself if I have some project in the yard that needs them, I don't want them as a present. The week after christmas is when I regress to my teenage years, and play games, build models, or play with RC cars.

    In other words, I want presents that I wouldn't really ever buy myself.

    But since I usually do it for just about a week each year, I really suck at it. So this year the suckage involved me playing the new Prince of Persia (christmas), and flying around a small electric indoor RC micro-helicopter (birthday).

    Talk about spastic.

    In PoP, I'm actually pretty good at the acrobatics (I like the platforming part, and I've played all the versions of PoP over the years), but the fights are really frustrating. You're supposed to be able to create those wondrous fight sequences with the right button combinations. I can't do it, so I just flail wildly around, mashing the buttons as best I can, and eventually I wear the opponent down. I'm pretty certain some of the bosses just decided that suicide was better than watching me jump around and hit things at random. Or maybe I just embarrassed them to death. But as long as they die, I don't care.

    And Ubisoft must have known that no normal person actually ever gets any of those magic 14-hit combos, and I could finish the game in just over two days. Some people may complain that it's too short a game, but for me, that was just perfect. Last years game was Assassin's Creed, and the thing was just too long - I got pretty good at killing guards, and it was all beautiful, but it just took me too long. So I eventually just left it with the last long assassination sequence not even started.

    As to my mad skillz at flying RC helicopters, I have yet to ever succeed in actually landing the damn thing in a designated spot, unless "on the floor, possibly with a crunch that sounds like the helicopter barely avoided becoming scrap" counts as such. And I probably never will.

    But hey, it's all good. It's what christmas is all about. Killing people and crashing helicopters? Isn't it? Even if you're not very good at it.

    December 30, 2008 03:43 PM

    Harald Welte: Announcing project OpenBSC

    Yesterday I was co-presenting with Dieter Spaar on Running your own GSM network at the 25C3. The talk went quite well, and received an overwhelming response.

    Together with the talk, we also announced, described and released the current development version of OpenBSC, a software implementation of the minimal subset of the GSM BSC/MSC/HLR in order to get a GSM BTS up and working.

    The code is available in svn, there's a wiki describing it's current status and features (or lack thereof).

    December 30, 2008 01:00 AM

    Evgeniy Polyakov: The electronics night.

    Ded Moroz (aka Santa Claus) brought me the new equipment, so my lab contains following items now:


    Lab setup

    Electronic details will likely do not match my next scheme (frequent reader will obviously notice this essentially a rule), even although I have really lots of passive and (noticebly smaller amount) active components now. I got several rectifying bridges at least, so will not use part of the transistor as a diode, and couple of the stabiliers.

    But we will see, and now let's return to my first LM3886 amplifier with differential PSU, which did not really work like expected. I've attached differential signal from the lab PSU and volia, 20+ amplified output signal. That's how it looked.


    Medium frequency sine signal looks good. Well, it almost always looks good.


    Low frequency square signal

    It shows some problems, but apparently it is the generator, which is not able to produce correct square signal on the small frequencies. If you could see how it tries to do it at 1Hz and smaller frequencies. Looks like it does not use comparator at the output chain for the square signal and just relies on differential and integrating circuits of the operation amplifier, but I may be very wrong of course.


    High frequency square signal

    High frequency (about 15 kHz, not really high, but high enough for the sound amplifier) amplification produces some delay and stabilizing pike, but it is expected when working with interpolation active filters, which are likely used in LM3886.

    Found interesting problems with my LM3886 amplifier and chip behaviour in general during my tests:

    I think input voltage flows likely because of some problems with bypass capacitors connection or wrong ground potential. That analysis I will keep for the next experiments.

    Some new scheme will be implemented soon and likely be tested with the loudspeakers during the New Year vacations and celebrations. Stay tuned for the news (operation amplfiers, comparators, timers, stabilizers, bipolar (including Darlington pairs) and mosfet transistors, they all will be used for something I do not yet know exactly what :)!

    December 30, 2008 12:03 AM

    December 29, 2008

    Dave Airlie: AMD release r600/r700 code

    AMD today pushed the initial code to support acceleration on the r600/r700 range of GPUs.

    This consists of r6xx-r7xx-support branches in the drm, radeonhd and a new r600_demo repo.

    This code is really only for developers at this point but its great to see AMD finally get things lined up to allow this code to be released.

    I've only been barely involved in the r600 code so far, I wrote the original drm over a few days and handed it over to AMD to continue on with, as I wanted to concentrate on the kms work. Hopefully I can get some time to look at it over the next while (yeah right, new baby still not here).

    December 29, 2008 10:13 PM

    Evgeniy Polyakov: Loudspeakers assembled.


    Visaton loudspeakers

    They sound just freaking awesome, very deep and powerful bases, extremely precise high tones and clear middle.

    I attached them just to the headphones output of the notebook, so they are obviously not very loud, but they have enough reserve for the amplifiers I build (I will show something new soon with lots of new equipment).

    Speaker's crossover contains of three parts: two-level low-frequency filter for the woofers, high-frequency filter and Zobel circuit for the tweeter. Physically woofers are separated by the hidden wall, so effectively lower speaker has more than 30 liters of the closed space, so acts like subwoofer with kind of phase inverter. Its output hole located at the back of the loudspeakers and produces very low tunes.

    I do not regret any second spent building them, excellent sound justifies the efforts.

    December 29, 2008 12:54 PM

    Dave Airlie: Kernel modesetting pull request sent

    So I sent a drm pull request that includes the kernel modesetting core + intel i915 driver supporting it.

    This is a major milestone for a project I started working on in a previous job, and I barely remember burning through the initial code for the initial prototype in a week of little sleep.

    To enable the code you need to set the CONFIG_DRM_I915_KMS, this isn't enabled by default, as we don't have a userspace that supports it available yet for general consumption. If you enable kms now, you will more than likely get a broken X for your trouble as the kernel drivers aren't compatible with having userspace drivers trample the hardware.

    So where is ATI at?

    although we are shipping radeon code in F10, the code is based on the TTM memory manager, which isn't really in an upstreamable
    state in its current form. Hopefully a newer TTM codebase might become available that can be used upstream. If that doesn't happen, we might rearchitect the core memory manager code of the radeon system now that we have the API mostly proven. So I'm not sure when we will upstream it, it all depends on how much time I can work on it.

    Baby status: due today, no sign yet, will severely impact amount of time I spend on this stuff :)

    December 29, 2008 09:17 AM

    Ted Tso: Debian, Philosophy, and People

    Given the recent brouhaha in Debian, and General Resolution regarding Lenny’s Release policy as it relates to Firmware and Debian’s Social Contract, which has led to the resignation of Manoj Srivastava from the position of Secretary for the Debian Project, I’m reminded of the following passage from Gordon Dickson’s Tactics of Mistakes (part of Dickson’s Childe Cycle, in which he tells the story of the rise of the Dorsai):

    “No,” said Cletus. “I’m trying to explain to you why I’d never make an Exotic. In your calmness in the face of possible torture and the need to kill yourself, you were showing a particular form of ruthlessness. It was ruthlessness toward yourself—but that’s only the back side of the coin. You Exotics are essentially ruthless toward all men, because you’re philosophers, and by and large, philosophers are ruthless people.”

    “Cletus!” Mondar shook his head. “Do you realize what you’re saying?”

    “Of course,” said Cletus, quietly. “And you realize it as well as I do. The immediate teaching of philosophers may be gentle, but the theory behind their teaching is without compunction—and that’s why so much bloodshed and misery has always attended the paths of their followers, who claim to live by those teachings. More blood’s been spilled by the militant adherents of prophets of change than by any other group of people down through the history of man.”

    The conflict between idealism and pragmatism is a very old one in the Free and Open Source Software Movement. At one end of the spectrum stands Richard Stallman, who has never compromised on issues regarding his vision of Software Freedom. Standing at various distances from this idealistic pole are various members of the Open Source Community. For example, in the mid-1990’s, I used to give presentations about Linux using Microsoft Powerpoint. There were those in the audience that would give me grief about using a non-free program such as MS Powerpoint, but my response was that I saw no difference between driving a car which had non-free firmware and using a non-free slide presentation program. I would prefer to use free office suite, but at the time, nothing approached the usability of Powerpoint, and while dual-booting into Windows was a pain, I could do a better job using Powerpoint than other tools, and I refused to handcap myself just to salve the sensibilities of those who felt very strongly about Free Software and who viewed the use of all non-Free Software as an ultimate evil that must be stamped out at all costs.

    It is the notion of Free Software as a philosophy, with no compromises, which has been the source of many of the disputes inside Debian. Consider, if you will, the first clause of the Debian Social Contract:

    Debian will remain 100% free

    We provide the guidelines that we use to determine if a work is free in the document entitled The Debian Free Software Guidelines. We promise that the Debian system and all its components will be free according to these guidelines. We will support people who create or use both free and non-free works on Debian. We will never make the system require the use of a non-free component.

    This clause has in it no room for compromise. Note the use of words such as “100% free” and “never make the system require the use of a non-free component” (emphasis mine). In addition, the Debian Social Contract tends to be interpreted by Computer Programmers, who view such imperatives as constraints that must never be violated, under any circumstances.

    Unfortunately, the real world is rarely so cut-and-dried. Even the most basic injunctions, such as “Thou shalt not kill” have exceptions. Few people might agree with claims made by the U.S. Republican Party that the war in Iraq qualified as a Just War as defined by Thomas Aquinas, but rather more people might agree that the July 20, 1944 plot to assassinate Hitler would be considered justifiable. And most people would probably agree most of the actions undertaken by the Allied Soldiers on World War II battlefields that involved killing other soldiers would be considered a valid exception to the moral (and for those in the Judeo-Christian tradition, biblical) injunction, “Thou shalt not kill“.

    As another example, consider the novel and musical Les Misérables, by Victor Hugo. One of the key themes of this story is whether or not “Thou shalt not steal” is an absolute or not. Ultimately, the police inspector Javert, who lived his whole life asserting that law (untempered by mercy, or any other human considerations) was more important than all else, drowns himself in the Seine when he realizes that his life’s fundamental organizing principle was at odds with what was ultimately the Right Thing To Do.

    So if even the sixth and eighth commandments admit to exceptions, why is it that some Debian developers approach the first clause of the Debian Social Contract with a take-no-prisoners, no-exceptions policy? Especially given the fourth clause of the Debian Social contract:

    Our priorities are our users and free software

    We will be guided by the needs of our users and the free software community. We will place their interests first in our priorities. We will support the needs of our users for operation in many different kinds of computing environments. We will not object to non-free works that are intended to be used on Debian systems, or attempt to charge a fee to people who create or use such works. We will allow others to create distributions containing both the Debian system and other works, without any fee from us. In furtherance of these goals, we will provide an integrated system of high-quality materials with no legal restrictions that would prevent such uses of the system.

    This clause does not have the same sort of absolutist words as the first clause, so many Debian Developers have held that the “needs of the users” is defined by “100% free software”.   Others have not agreed with this interpretation — but regardless of how “needs of the users” should be interpreted, the fact of the matter is, injuctions such as “Thou shalt not kill” are just as absolute — and yet in the real world, we recognize that there are exceptions to such absolutes, apparently unyielding claims on our behavior.

    I personally believe that “100% free software” is a wonderful aspirational goal, but in particular with regards to standards documents and firmware, there are other considerations that should be taken into account.   People of good will may disagree about what those exceptions should be, but I think one thing that we should consider as even higher priority and with a greater claim on how we behave is the needs of our users and fellow developers as people.   For those who claim Christianity as their religious tradition, Jesus once stated,

    Thou shalt love the Lord thy God with all thy heart, and with all thy soul, and with all thy mind.  This is the first and great commandment.  And the second is like unto it: Thou shalt love thy neighbour as thyself.   On these two commandments hang all the law and the prophets.

    Even for those who do not claim Christianity as their religious tradition, most moral and ethical frameworks have some variant on the Golden Rule: “Do unto others as you would have them do unto you”.  I would consider, for example, that the Golden Rule is at least a high priority claim on my behavior as the notion of free speech, and in many cases, it would be a higher priority claim.  The recent controversy surrounding Josselin Mouette was started precisely because Joss has taken a something which is a good thing, namely Free Speech, and relegated it to a principle more important than all else, and claiming that any restraint on such a notion was equivalent to censorship.

    I think the same thing is true for free software, although it is a subtler trap.  Philosophical claims than “100% free software” as most important consideration is dangerously close to treating Free Software as the Object of Ultimate Concern — or in religious terms, idolotry.  For those who are religious, it’s clear why this is a bad thing; for those who aren’t — if you are unwilling to worship a supernatural being, you may want to very carefully consider whether you are willing to take a philosophical construct and raise it to a position of commanding your highest allegiance to all else, including how you treat other people.

    Ultimately, I consider people to be more important than computers, hardware or software.  So over time, while I may have had some disagreements with how Mark Shuttleworth has run Canonical Software and Ubuntu (but hey, he’s the multimillionaire, and I’m not), I have to give him props for Ubuntu’s Code of Conduct.  If Debian Developer took the some kind of Code of Conduct at least as seriously as the Social Contract, I think interactions between Debian Developers would be far more efficient, and in the end the project would be far more successful.   This may, however, require lessening the importance of philosophical constructs such as Free Speech and Free Software, and perhaps becoming more pragmatic and more considerate towards one another.

    Originally published at Thoughts by Ted. Please leave any comments there.

    December 29, 2008 01:14 AM

    December 28, 2008

    Harald Welte: If you're at the 25C3: Don't miss the DECT talk

    If you're at the 25C3, I strongly recommend visiting the DECT security talk. Trusty me, you won't be disappointed.

    It's one of the most exciting thigs that I've been seeing happening recently. Finally, some more people transcending beyond boring Internet security and moving into other areas of communications security that are desperately needing more research.

    December 28, 2008 01:00 AM

    December 27, 2008

    Matthew Garrett:

    So. Burning down the house. Uniquely Welsh concept[1] or near-inevitable consequence of design failure? I present the following:


    Innocent Christmas ornament or HARBINGER OF DOOM?

    Note how the candle descends into the foliage. There's a good inch and a half of candle left there, embedded into a block. The foliage is, in fact, plastic. The block it's embedded into is some sort of hydrocarbon-based foam. It turns out that lighting one of these and leaving it for a while is a good way to trigger all kinds of excitement. Of the "Goodness me, there appears to be a large lump of petroleum byproduct burning quite vigorously in the hall" variety.

    The software design moral: Everything is shit and will attempt to kill you when you're not looking

    Now. Back to eating turkish delight and wondering what to do for New Year.

    [1] Yes, yes, it's a cover. THAT'S THE JOKE.

    December 27, 2008 10:25 PM

    Dave Jones: Hidden initcalls.

    The boot tracing post I wrote up led me to scrutinise the dmesg a little further. There’s a ton of data in there, and not all of it makes sense.
    To call out one example..

    [ 7.209578] calling snap_init+0×0/0×2a @ 1
    [ 7.215488] initcall snap_init+0×0/0×2a returned 0 after 72 usecs

    What is this stuff doing built into the vmlinuz? This stuff is a prime candidate for being modular, given that not everyone needs it. (I’ll bet a majority of users don’t even know what it is, let alone ‘need’ it).

    This is defined in net/802/psnap.c Looking at net/802/Makefile, we see this gets built providing one or more of the following CONFIG options are set..

    obj-$(CONFIG_LLC) += p8022.o psnap.o
    obj-$(CONFIG_TR) += p8022.o psnap.o tr.o
    obj-$(CONFIG_IPX) += p8022.o psnap.o p8023.o
    obj-$(CONFIG_ATALK) += p8022.o psnap.o

    Lets take these one by one.
    Here’s where the fail begins. In the Fedora kernel, we had CONFIG_LLC set to =m. But something ends up overriding that decision, and making it a built-in. [note to self: make oldconfig shout when this happens]. Something must be ’select’ing it somewhere. There’s actually quite a few things that do. But it turns out that the culprit in this case is CONFIG_TR. Wait, tokenring support is being built-in for every user ?
    Afraid so. And why this happens is a bit tragic.

    Looking at the definition of CONFIG_TR in drivers/net/tokenring/Kconfig is enlightening.

    menuconfig TR
    bool “Token Ring driver support”
    depends on NETDEVICES && !UML
    depends on (PCI || ISA || MCA || CCW)
    select LLC

    The ‘bool’ being the key problem here. Because TR ends up being built-in, all its dependencies and everything it selects also become built-ins. Changing this to a tristate solves this, and LLC remains modular.

    Problems like this are why I really loathe the ’select’ statement in kconfig.

    The initcalls mentioned above were _tiny_ in comparison to some of the more obvious bloat, but there’s a bunch of low-hanging fruit in there like this which on first sight just leaves you wondering ‘wtf?’.

    December 27, 2008 04:47 AM

    December 26, 2008

    Pete Zaitcev: Wikipedia retards

    Wikipedia is a great resource, but what the provocative post title meant to say is that they tend to take things too far along the route of bureaucratic idiocy:

    In practice, therefore, when a USB host computer has mounted an MSC partition, it assumes absolute control of the storage, which then may not be safely modified without risk of data corruption until the host computer has severed the connection[citation needed].

    The statement above is obvious to anyone who is not a retard. Why is a citation needed? Do they want a nod to shared ownership cluster filesystems here?

    December 26, 2008 10:33 PM

    Pete Zaitcev: Oh Polyakov you card

    Now he's done it:

    POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.

    It's obvious to any Russian that "pohmel fs" means "a filesystem created thanks to a heavy hungover", as in the joke about Ilya Muromets. So, why bother with the fake de-abbreviation?

    December 26, 2008 05:19 PM

    Pete Zaitcev: The Comodo CA thing

    The problem is, I don't understand what the breach of Comodo's CAs means. How bad is it? Are there sites attacked and passwords collected? What sites? Is my bank's password intercepted yet? The thread opens with Mozilla and Comodo people stonewalling and downplaying. But then they would, wouldn't they? I need a trusted security expert to tell me if this is a problem. I'm only a kernel hacker, I know zilch about security.

    FRIDGE UPDATE: For this to succeed, it needs either a phish or a DNS attack to work first, right? So it cannot be a big deal.

    December 26, 2008 05:06 PM

    Evgeniy Polyakov: POHMELFS: The Great Southern Trendkill release.

    POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.

    POHMELFS is a kernel client for the developed distributed parallel internet filesystem. As it exists today, it is a high-performance parallel network filesystem with ability to balance reading from multiple hosts and simultaneously write data to multiple hosts.

    Main design goal of this filesystem is to implement very fast and scalable network filesystem with local writeback cache of data and metadata, which greatly speeds up every IO operation compared to traditional writethrough based network filesystems.

    Read balancing and writing to multiple hosts features can be used to improve parallel multithreaded read-mostly data processing workload and organize fault-tolerant systems. POHMELFS as a network client does not support data synchrnonization between the nodes, so this task should be implemented in servers. POHMELFS and multiple-server-write can be used as backup solution for the physically distributed network servers.

    Currently development is concentrated on the distributed object-based server development implemeneted with distributed hash table design approach in mind, which main goals it completely transparent from client point of view node management, full absence of any controlling central servers (points of failure), transaction/history based object storage.

    POHMELFS utilizes writeback cache, which is built on top of MO(E)SI-like coherency protocol. It uses scalable cached read/write locking. No additional requests are performed if lock is granted to the filesystem. The same protocol is used by the server to on-demand flushing of the client's cache (for example when server wants to update local data or send some new content into the clients caches).

    POHMELFS is able to encrypt data channel or perform strong data checksumming. Algorithms used by the filesystems are autoconfigured during startup and mount may fail (depending on options) if server does not support requested algorithms.

    Autoconfiguration also involve sending information about size of the exported directory specified by the server, permission, statistics about amount of inodes, used space and so on.

    POHMELFS utilizes transaction model for all its operations. Each transction is an object, which may embed multiple commands completed atomically. When server fails the whole transaction will be replied against it (or different server) later. This approach allows to maintain high data integrity and do not desynchronize filesystem state in case of network or server failures.

    More details can be found at the homepage.

    December 26, 2008 02:20 PM

    Evgeniy Polyakov: New distributed storage release.

    DST is a network block device storage, which can be used to organize exported storages on the remote nodes into the local block device.

    The main goal of the project is to allow creation of the block devices on top of different network media and connect physically distributed devices into single storage using existing network infrastructure and not introducing new limitations into the protocol and network usage model.

    Tree was rebased against 2.6.28 kernel release.

    December 26, 2008 12:23 PM

    Evgeniy Polyakov: For those who naively believe

    into all that sweet talks about 'tell a story' and extended description of the patches...

    DST was released more than a week ago with two-page extended description of the ideas, implementations, features and use cases for the distributed storage. Each file was separately introduced with description of the content and rough usage cases in the project.

    Guess the result? We talked a little with Arnd Bergmann and Benjamin Herrenschmidt about thread pools, mainly that it could be good idea to push it separately, and that likely David Howells' slow_work patches will be pushed into the kernel as a thread pool implementation.

    I will rebase against 2.6.28 and resend DST and POHMELFS today. Interested people are invited to the appropriate maillists to ask the questions and discuss the needed features.

    December 26, 2008 08:46 AM

    Pete Zaitcev: yum and redundant things

    I'm looking at the "yum update" printing an endless stream of "/sbin/ldconfig: /usr/lib64/libxcb-xlib.so.0 is not a symbolic link" and think that it might save some run time if yum did not launch ldconfig from every goddamn package it installs. Unfortunately, it's probably impossible, since ldconfig is launched from rpm's postinstall scriptlet, which does not know if it's invoked from yum.

    December 26, 2008 04:10 AM

    Dave Jones: Boot tracing.

    I’m on vacation, but I can’t resist playing with new toys, seeing as Santa didn’t bring me anything fun this year. In my previous post, I mentioned that 2.6.28 was for the most part, dull. Reading the excellent changelog summary at kernelnewbies, I noticed a new feature I had until now overlooked.

    1.6. Boot tracer

    The purpose of this tracer is to helps developers to optimize boot times: it records the timings of the initcalls. Its aim is to be parsed by the scripts/bootgraph.pl tool to produce graphics about boot inefficiencies, giving a visual representation of the delays during initcalls. Users need to enable CONFIG_BOOT_TRACER, boot with the “initcall_debug” and “printk.time=1″ parameters, and run “dmesg | perl scripts/bootgraph.pl > output.svg” to generate the final data.

    Very interesting.

    Here’s what it looks like when I ran it on my eeepc ..


    boot tracing output. (click to enlarge)

    boot tracing output.

    Looks pretty. Though something isn’t quite right.
    If you look at the dmesg output, there are over 400 initcalls. Even if we ignore all the uninteresting ones that return in 0 usecs, there’s still over 300 in the log. What gives?
    The script stops parsing once the kernel hands off to the early userspace scripts in initramfs. So everything from the ‘Write protecting the kernel’ message at 8 seconds into the bootup is ignored. (Sidenote: The fact that we’re taking 8 seconds just to get to this stage _sucks_, more on that another time). So all the later modules that get loaded aren’t part of this picture.

    My perl is a little rusty so I didn’t spot how it does it, but it seems there’s a threshold at which it ignores the initcalls that return quickly. Of those reported in the graph, the ‘fastest’ was ehci_hcd_init at 126837. acpi_init was almost in the same ballpark at 106445, but didn’t get picked up.

    Whilst these big hitters are no doubt damaging to the boot time, it’s important to note the cumulative effect of all those five-figure initcalls.

    December 26, 2008 03:28 AM

    Matthew Garrett:

    Christmas is a time to ignore such trifling details as nearly accidently burning the house down, and instead to focus on what's important in life - working out which hoops to jump through to make new hardware useful without installing less convenient operating systems. This year was more straightforward than some, and merely involved attempting to work out how to give people money in return for books that could then be read on my Sony reader thingy. Shockingly enough, the Sony Ebook Store requires a windows app, so not a good start. There's no shortage of sites that sell ebooks in a variety of formats without any platform dependent awkwardness, though - of course, most of them are inconveniently DRMed. And the only DRMed content the Sonys will read is either Sony's own or some encrypted PDFs[1].

    Not to worry - the DRMed version of the Mobipocket format is based on a symmetric encryption algorithm from 1991, so finding a way around that isn't much of a problem[2]. The only real issue I had was generating the PID needed to buy one of the damn things in the first place. This 10 character string is generated when you install the Mobipocket reader software, but doing so would again involve Windows.

    So here's a trivial C program that generates valid PIDs with working checksums. The PID on its own isn't a secret (given that you can get as many as you want by reinstalling the Windows software), so I can't see any awkward copyright law things going on. And, really, it's just a random string plus a checksum.

    CRC code is from SSH, the checksum validation is a reimplementation of code from EBook::Tools::Mobipocket by Zed Pobre. My total creative input was generating a 7 character random string and sticking a dollar sign at the end. Go me.

    [1] According to the blurb about the firmware update, though mine still seems to claim that it's not authorised to read DRMed content in the about screen. Possibly it needs keys generating, or something

    [2] No, I'm not going to tell you how. The internet exists so you can work these things out without asking me

    December 26, 2008 02:37 AM

    December 25, 2008

    Val Henson: My Christmas presents to myself

    1. Season 2 of House M.D. on DVD.
    2. Fixing the uninitialized block group checksum bug in my 64-bit e2fsprogs:
      [val@fsbox e2fsprogs]$ ~/src/build/misc/mke2fs -t ext4 -O 64bit /terabyte/33bit
      [val@fsbox e2fsprogs]$ sudo mount -o loop /terabyte/33bit /mnt
      [val@fsbox e2fsprogs]$ dmesg | tail
      tun0: Disabled Privacy Extensions
      tun0: Disabled Privacy Extensions
      EXT4-fs: barriers enabled
      kjournald2 starting.  Commit interval 5 seconds
      EXT4 FS on loop0, internal journal on loop0:8
      EXT4-fs: delayed allocation enabled
      EXT4-fs: file extents enabled
      EXT4-fs: mballoc enabled
      EXT4-fs: mounted filesystem with ordered data mode.
      SELinux: initialized (dev loop0, type ext4), uses xattr
      [val@fsbox e2fsprogs]$ df -h
      [val@fsbox e2fsprogs]$ df -h /mnt
      Filesystem            Size  Used Avail Use% Mounted on
      /terabyte/33bit        16T  229M   15T   1% /mnt
      

    December 25, 2008 11:41 PM

    Evgeniy Polyakov: Updated 64k-bind() patch.

    I suppose I fixed the issue with the multiple sockets bound to the same local address and port. Likely problem is in the way my patch broke the old code, which assumed that table may contain only single reuse socket added via bind() with 0 port.

    In the updated version bind() checks that selected bucket contains fast reuse sockets already, and in that case runs the whole bind conflict check, which involves address and port, otherwise socket is just added into the table.

    Patch was not yet heavily tested though, but it passed several trivial lots-of-binds test application runs.
    I will update patch If production testing reveals some problems.

    December 25, 2008 09:33 PM

    Dave Jones: Linux kernel 2.6.28

    Linus just released the 2.6.28 kernel. It’s already compiling for tomorrows rawhide. Fedora 9 & 10 will probably move to it in a few weeks. Typically, we wait until the dust settles and the first -stable release comes out. I was asked recently what bits we’re excited about in .28 for Fedora. To be honest, I didn’t give a great answer. It’s just not a “OMG, THIS RELEASE IS AWESOME” kind of release. There’s nothing in there that I was disappointed not to get into .27 for F10’s release. In fact, lots of the bits in there we were already carrying in the Fedora kernel (the DRM bits for example). Asides from that, it’s the usual churn of bug fixes, new drivers, and probably some interesting new bugs.

    What about F11 ? Looking at the current schedule, we’ll get at least .29 in. I’m not sure we’ll have enough time to pull in .30 at this stage. All depends on how quickly .29 stabilises. Version numbers are so hand-wavy anyway. I wish when people asked me ‘what version is fX going to be’, they’d really ask ‘is feature xyz going to be merged by fX’. But people sure are hung up on numbers.

    People tend not to notice kernel features these days for the most part. Which in a way is a good thing. (means it’s working). Unless it’s something that gets a lot of press like “unified x86 architecture” “tickless kernel” “modesetting”. There are dozens of features every release, but people don’t really get excited about a lot of them, and for good reason. They’re mostly dull from a userspace programmer/end-user perspective.

    December 25, 2008 01:23 AM

    December 24, 2008

    Evgeniy Polyakov: Optimizing Linux bind() performance.

    When I implemented a simple extension to the binding mechanism, which allows to bind more than 64k sockets (or smaller amount, depending on sysctl parameters), I stuck with the problem, when we have to traverse the whole bind hash table to find out empty bucket. And while it is not a problem for example for 32k connections, bind() completion time grows exponentially (since after each successful binding we have to traverse one bucket more to find empty one) even if we start each time from random offset inside the hash table.

    So, when hash table is full, and we want to add another socket, we have to traverse the whole table no matter what, so effectivelly this will be the worst case performance and it will be constant.

    Let's see the results.


    bind() time depending on number of already bound sockets

    Green area corresponds to the usual binding to zero port process, which turns on kernel port selection as described above. Red area is the bind process, when number of bound sockets (let me remind that we only talk here about automatic kernel port selection for the sockets which have reuse option turned on, i.e. those which are bound to different addresses for example, but want to share the port) is not limited by 64k (or sysctl parameters). The same exponential growth (hidden by the green area) before number of ports reaches sysctl limit.

    At this time bind hash table has exactly one reuse-enbaled socket in a bucket, but it is possible that they have different addresses. Actually kernel selects the first port to try randomly, so at the beginning bind will take roughly constant time, but with time number of port to check after random start will increase. And that will have exponential growth, but because of above random selection, not every next port selection will necessary take longer time than previous. So we have to consider the area below in the graph (if you could zoom it, you could find, that there are many different times placed there), so one can hide another.

    Blue area corresponds to the title of the post: optimized bind() and kernel port selection algorithm.

    This is rather simple design approach: hashtable now maintains (unprecise and racely updated) number of currently bound sockets, and when number of such sockets becomes greater than predefined value (I use maximum port range defined by sysctls), we stop traversing the whole bind hash table and just stop at first matching bucket after random start. Above limit roughly corresponds to the case, when bind hash table is full and we turned on mechanism of allowing to bind more reuse-enabled sockets, so it does not change behaviour of other sockets.

    Patch was sent to netdev@, but I just have been told about following issue:

    $ grep -n :60013 netstat.res
    33101:tcp        0      0 local1:60013       remote:80       ESTABLISHED
    33105:tcp        0      0 local1:60013       remote:80       ESTABLISHED
    52084:tcp        0      0 local2:60013       remote:80       ESTABLISHED
    52085:tcp        0      0 local2:60013       remote:80       ESTABLISHED
    58249:tcp        0      0 local3:60013       remote:80       ESTABLISHED

    it is yet to resolve what it is and how much harm it brings :)

    December 24, 2008 07:10 PM

    December 23, 2008

    Evgeniy Polyakov: Climbing evening: feel like shit. Dejavu.

    But I will survive. To spite everyone :)

    December 23, 2008 09:47 PM

    Dave Jones: blog moving, blahblahblah.

    As should be obvious from this page, I’m in the process of moving my blog from livejournal, to a self-hosted wordpress instance here.

    The various pages that used to be linked on the frontpage codemonkey.org.uk are still here, and should be reachable from the ‘projects’ tab at the top of the page. Let me know if I’ve screwed something up.

    Davej.

    December 23, 2008 07:26 PM

    Evgeniy Polyakov: Elliptics distributed network for the NATed boxes. Request forwarding.

    It is quite simple to create distributed network for the nodes, which allow to connect to them, i.e. those lucky machines, which have static IP address and no firewall between them and the internet. Unfortunately things are not that simple in the real life, when really lots of machines do not allow direct connect to them since they live behind the NAT.

    Basic design decision for such system is to force NATed clients to connect to some external servers and receive store commands via those connections. NATed boxes can fetch data either by connecting to the remote servers (if connection is allowed by the local firewall) or by connecting to some other host in the elliptics distributed network and ask them to fetch need data via own channels and then resend it to the requesting NATed client.

    While design looks clear enough, implementation is rather tricky. Each node in the network has a list of routes it knows how to connect to, which include address and optionally cached socket. NATed node can not provide address to connect to, so internet node only relies on cached socket, created when NATed node connected to the system. Since that socket can be used by the commands (and replies) created by the NATed box itself, it should also be used by the commands originated from the different nodes, when they load or store some data to/from the network.

    This in turn brings us asynchronous nature of the connection, when system is not allowed to wait for some packet to be received from the socket, since received data can be arbitrary command from the different thread (and after all from the different node). To solve the problem I decided to implement transactional mechanism similar to what is used in POHMELFS and DST, when there is a dedicated thread (or potentially set of threads), which handle data receiving from the given socket, and no writers wait on socket itself, but instead embed waiting primitive into the transaction. When transaction reply is received from the network and completion callback is invoked, it may access private transaction data and awake sleeped writer, which may read reply back from the transaction structure.
    I also plan to create actual transaction receiving to be not a generic receiving function (like it is made right now), but having transaction callback to receive data itself, so that it could be placed directly into needed object and not copied afterwards.

    For example, simple state machine for the NATed box, which wants to read some data via given node, may look like this:
    1. NATed node asks listened node to forward some command (like read the object with given ID)
    2. listened box creates a read transaction where it puts route to the NATed box (mainly it only needs accepted socket), stores it into some tree and sends request to the network.
    3. dedicated reading thread(s) will eventually receive a reply (either via different node connected to out listened node and provide the data, or when node listened node connected to replied), find the transaction id, locate transaction in the tree and invoke its completion callaback.
    4. transaction completion callback will copy data into the temporal buffer and send it back to the NATed client via stored route (accepted socket).

    If node joined not only to fetch the data, but to allow data to be stored, NATed box route will be stored locally on the listened node and will be used when some write requests to that node arrives. The same transaction mechanism should handle NATed nodes correctly even if NATed box is connected to the node with the ID, which is not a neighbour ID for the NATed box. In this case during joining protocol NATed box will advertise that it actually lives not where it is located, but at address of the listened node it is connected to. When listened node will receive commands for the ID of the NATed node, it will forward them via stored socket.

    The same mechanism will be generically used to forward requests between nodes even when they are allowed to connect to remote ones.

    December 23, 2008 03:38 PM

    David Woodhouse: 23 Dec 2008

    Wheee! Opened my first Christmas present a couple of days ago. A PCI ADSL2+ card, fully supported by Linux.

    Many thanks to the folks at Traverse Technologies and Xrio for the effort they've put into developing this hardware — and for sending me a board.

    At last I'll be able to have a properly supported Linux box, with legal drivers, as the endpoint for my ADSL lines.

    There are a few things to improve; adding DMA support is a priority right now. But we do have the capability to update the FPGA from software now, so they're starting to ship real hardware to customers.

    December 23, 2008 02:46 PM

    Pete Zaitcev: Contributing to Fedora

    So, usbmon is in Fedora now. As a newly-minted packager, I'm shocked by the ease of the procedure. All I had to do was to follow the steps, documented in a clear language. For the spec I simply copied the recommented template. My sponsor approved me in a matter of days, all back-end systems (Fedora Accounts, CVS, Koji) are in place and seem to function fine. Looks like all that's left is to answer bugs and maybe put something into Bodhi when Fedora 11 ships.

    I'm going to go ahead and submit the extended binary API ("-b1") to kernel, and then we may deprecate the text interface in practice, thus removing a stone from from under Mackall's debugfs argument. If someone finds a good argument for keeping the text API, we may move it to /dev, but most likelye it will not be needed.

    December 23, 2008 01:25 AM

    Harald Welte: Some more progress with the BS11 Abis (BSC) implementation

    Very infrequently I've been reporting about my humble attempts in talking the A-bis protocol to the Siemens BS11 microBTS GSM base station.

    Since Dieter Spaar and myself are going to have a talk about this at the 25C3 in a couple of days, I'm currently working every minute of each day to get that Free Software BSC-side A-bis implementation going.

    While the actual code is getting more and more in shape, I'm now back to fixing the underlying infrastructure: mISDN. The mISDN kernel code base is _really_ hard to understand... if I have problems with it - despite about a decade of experience with network protocols and Linux kernel development - then that probably says quite a bit about it. It would definitely benefit from quite a bit more documentation. Anyway, it's FOSS, so no reason to complain. Use the source, Luke.

    So just about one hour before I had to leave to travel to my parents (where I could not take a 48kg GSM BTS with me) I finally had mISDN in shape to be able to support multiple TEIs with different SAPIs on the D Channel of timeslot 1 of the E1 interface carrying A-bis. My userspace code was happily sending and receiving OML (Organization and Maintenance Layer) and RSL (Radio Signalling Link) frames, while the L2ML (Layer 2 Management Layer) is entirely handled by the slightly patched TEI manager that mISDN has in the kernel.

    Funny enough, after initializing OML and RSL, the first unsolicited message I got was the error event report about the 'intrusion detection' at the BTS, since I was operating it with open connector panel ;)

    So now I've returned to the actual BSC/MSC subset implementation. I'm still confident to f