Kernel Planet

November 29, 2015

Matthew Garrett: What is hacker culture?

Eric Raymond, author of The Cathedral and the Bazaar (an important work describing the effectiveness of open collaboration and development), recently wrote a piece calling for "Social Justice Warriors" to be ejected from the hacker community. The primary thrust of his argument is that by calling for a removal of the "cult of meritocracy", these SJWs are attacking the central aspect of hacker culture - that the quality of code is all that matters.

This argument is simply wrong.

Eric's been involved in software development for a long time. In that time he's seen a number of significant changes. We've gone from computers being the playthings of the privileged few to being nearly ubiquitous. We've moved from the internet being something you found in universities to something you carry around in your pocket. You can now own a computer whose CPU executes only free software from the moment you press the power button. And, as Eric wrote almost 20 years ago, we've identified that the "Bazaar" model of open collaborative development works better than the "Cathedral" model of closed centralised development.

These are huge shifts in how computers are used, how available they are, how important they are in people's lives, and, as a consequence, how we develop software. It's not a surprise that the rise of Linux and the victory of the bazaar model coincided with internet access becoming more widely available. As the potential pool of developers grew larger, development methods had to be altered. It was no longer possible to insist that somebody spend a significant period of time winning the trust of the core developers before being permitted to give feedback on code. Communities had to change in order to accept these offers of work, and the communities were better for that change.

The increasing ubiquity of computing has had another outcome. People are much more aware of the role of computing in their lives. They are more likely to understand how proprietary software can restrict them, how not having the freedom to share software can impair people's lives, how not being able to involve themselves in software development means software doesn't meet their needs. The largest triumph of free software has not been amongst people from a traditional software development background - it's been the fact that we've grown our communities to include people from a huge number of different walks of life. Free software has helped bring computing to under-served populations all over the world. It's aided circumvention of censorship. It's inspired people who would never have considered software development as something they could be involved in to develop entire careers in the field. We will not win because we are better developers. We will win because our software meets the needs of many more people, needs the proprietary software industry either can not or will not satisfy. We will win because our software is shaped not only by people who have a university degree and a six figure salary in San Francisco, but because our contributors include people whose native language is spoken by so few people that proprietary operating system vendors won't support it, people who live in a heavily censored regime and rely on free software for free communication, people who rely on free software because they can't otherwise afford the tools they would need to participate in development.

In other words, we will win because free software is accessible to more of society than proprietary software. And for that to be true, it must be possible for our communities to be accessible to anybody who can contribute, regardless of their background.

Up until this point, I don't think I've made any controversial claims. In fact, I suspect that Eric would agree. He would argue that because hacker culture defines itself through the quality of contributions, the background of the contributor is irrelevant. On the internet, nobody knows that you're contributing from a basement in an active warzone, or from a refuge shelter after escaping an abusive relationship, or with the aid of assistive technology. If you can write the code, you can participate.

Of course, this kind of viewpoint is overly naive. Humans are wonderful at noticing indications of "otherness". Eric even wrote about his struggle to stop having a viscerally negative reaction to people of a particular race. This happened within the past few years, so before then we can assume that he was less aware of the issue. If Eric received a patch from someone whose name indicated membership of this group, would there have been part of his subconscious that reacted negatively? Would he have rationalised this into a more critical analysis of the patch, increasing the probability of rejection? We don't know, and it's unlikely that Eric does either.

Hacker culture has long been concerned with good design, and a core concept of good design is that code should fail safe - ie, if something unexpected happens or an assumption turns out to be untrue, the desirable outcome is the one that does least harm. A command that fails to receive a filename as an argument shouldn't assume that it should modify all files. A network transfer that fails a checksum shouldn't be permitted to overwrite the existing data. An authentication server that receives an unexpected error shouldn't default to granting access. And a development process that may be subject to unconscious bias should have processes in place that make it less likely that said bias will result in the rejection of useful contributions.

When people criticise meritocracy, they're not criticising the concept of treating contributions based on their merit. They're criticising the idea that humans are sufficiently self-aware that they will be able to identify and reject every subconscious prejudice that will affect their treatment of others. It's not a criticism of a desirable goal, it's a criticism of a flawed implementation. There's evidence that organisations that claim to embody meritocratic principles are more likely to reward men than women even when everything else is equal. The "cult of meritocracy" isn't the belief that meritocracy is a good thing, it's the belief that a project founded on meritocracy will automatically be free of bias.

Projects like the Contributor Covenant that Eric finds so objectionable exist to help create processes that (at least partially) compensate for our flaws. Review of our processes to determine whether we're making poor social decisions is just as important as review of our code to determine whether we're making poor technical decisions. Just as the bazaar overtook the cathedral by making it easier for developers to be involved, inclusive communities will overtake "pure meritocracies" because, in the long run, these communities will produce better output - not just in terms of the quality of the code, but also in terms of the ability of the project to meet the needs of a wider range of people.

The fight between the cathedral and the bazaar came from people who were outside the cathedral. Those fighting against the assumption that meritocracies work may be outside what Eric considers to be hacker culture, but they're already part of our communities, already making contributions to our projects, already bringing free software to more people than ever before. This time it's Eric building a cathedral and decrying the decadent hordes in their bazaar, Eric who's failed to notice the shift in the culture that surrounds him. And, like those who continued building their cathedrals in the 90s, it's Eric who's now irrelevant to hacker culture.

(Edited to add: for two quite different perspectives on why Eric's wrong, see Tim's and Coraline's posts)

comment count unavailable comments

November 29, 2015 10:41 PM

November 19, 2015

Matthew Garrett: If it's not practical to redistribute free software, it's not free software in practice

I've previously written about Canonical's obnoxious IP policy and how Mark Shuttleworth admits it's deliberately vague. After spending some time discussing specific examples with Canonical, I've been explicitly told that while Canonical will gladly give me a cost-free trademark license permitting me to redistribute unmodified Ubuntu binaries, they will not tell me what Any redistribution of modified versions of Ubuntu must be approved, certified or provided by Canonical if you are going to associate it with the Trademarks. Otherwise you must remove and replace the Trademarks and will need to recompile the source code to create your own binaries actually means.

Why does this matter? The free software definition requires that you be able to redistribute software to other people in either unmodified or modified form without needing to ask for permission first. This makes it clear that Ubuntu itself isn't free software - distributing the individual binary packages without permission is forbidden, even if they wouldn't contain any infringing trademarks[1]. This is obnoxious, but not inherently toxic. The source packages for Ubuntu could still be free software, making it fairly straightforward to build a free software equivalent.

Unfortunately, while true in theory, this isn't true in practice. The issue here is the apparently simple phrase you must remove and replace the Trademarks and will need to recompile the source code. "Trademarks" is defined later as being the words "Ubuntu", "Kubuntu", "Juju", "Landscape", "Edubuntu" and "Xubuntu" in either textual or logo form. The naive interpretation of this is that you have to remove trademarks where they'd be infringing - for instance, shipping the Ubuntu bootsplash as part of a modified product would almost certainly be clear trademark infringement, so you shouldn't do that. But that's not what the policy actually says. It insists that all trademarks be removed, whether they would embody an infringement or not. If a README says "To build this software under Ubuntu, install the following packages", a literal reading of Canonical's policy would require you to remove or replace the word "Ubuntu" even though failing to do so wouldn't be a trademark infringement. If an email address is present in a changelog, you'd have to change it. You wouldn't be able to ship the juju-core package without renaming it and the application within. If this is what the policy means, it's so impractical to be able to rebuild Ubuntu that it's not free software in any meaningful way.

This seems like a pretty ludicrous interpretation, but it's one that Canonical refuse to explicitly rule out. Compare this to Red Hat's requirements around Fedora - if you replace the fedora-logos, fedora-release and fedora-release-notes packages with your own content, you're good. A policy like this satisfies the concerns that Dustin raised over people misrepresenting their products, but still makes it easy for users to distribute modified code to other users. There's nothing whatsoever stopping Canonical from adopting a similarly unambiguous policy.

Mark has repeatedly asserted that attempts to raise this issue are mere FUD, but he won't answer you if you ask him direct questions about this policy and will insist that it's necessary to protect Ubuntu's brand. The reality is that if Debian had had an identical policy in 2004, Ubuntu wouldn't exist. The effort required to strip all Debian trademarks from the source packages would have been immense[2], and this would have had to be repeated for every release. While this policy is in place, nobody's going to be able to take Ubuntu and build something better. It's grotesquely hypocritical, especially when the Ubuntu website still talks about their belief that people should be able to distribute modifications without licensing fees.

All that's required for Canonical to deal with this problem is to follow Fedora's lead and isolate their trademarks in a small set of packages, then tell users that those packages must be replaced if distributing a modified version of Ubuntu. If they're serious about this being a branding issue, they'll do it. And if I'm right that the policy is deliberately obfuscated so Canonical can encourage people to buy licenses, they won't. It's easy for them to prove me wrong, and I'll be delighted if they do. Let's see what happens.

[1] The policy is quite clear on this. If you want to distribute something other than an unmodified Ubuntu image, you have two choices:

  1. Gain approval or certification from Canonical
  2. Remove all trademarks and recompile the source code
Note that option 2 requires you to rebuild even if there are no trademarks to remove.

[2] Especially when every source package contains a directory called "debian"…

comment count unavailable comments

November 19, 2015 10:16 PM

November 13, 2015

Gustavo F. Padovan: Collabora contributions to Linux Kernel 4.2

A total of 63 patches were contributed upsteam by Collabora engineers as part of our current projects.

In the ARM multi_v7_defconfig we have the addition of support for Exynos Chromebooks, all options that had a tristate Kconfig option were added as module. After this change it was found that a few drivers weren’t working  properly when built as module, so this was fixed. This work was done by Javier Martinez.

Javier also added multi EC support as newer Chromebooks have more than one Embedded Controller in the system.

Tomeu Vizoso added EMC (External Memory Controller) support to the Tegra124 platform.

On the DRM side initial support for Atomic Modesetting was added to Exynos devices by Gustavo Padovan. The Atomic Modesetting interface allows all screen updates such as changing modes, pageflip and set planes/cursors to happen in the same IOCTL. Thus everything can be updated atomically. More on that can be found at Daniel Vetter’s post at Another contribution, from Daniel Stone, to Atomic Modesetting was the addition of the CRTC state mode property, it is through this property that userspace configure a modeset that will be updated via an Atomic Modesetting ioctl.

Following is a list of all patches submitted by Collabora for this kernel release:

Daniel Stone (17):

Gustavo Padovan (17):

Javier Martinez Canillas (19):

Tomeu Vizoso (11):

November 13, 2015 09:38 AM

November 12, 2015

Gustavo F. Padovan: Collabora contributions to Linux Kernel 4.3

Collabora developers contributed 48 patches to kernel 4.3 as part of our current projects.

Danilo worked on the kernel doc scripts to add  cross-reference links to html documentation and arguments documentation in struct body. While Sjoerd Simons fixed a clock definition in rockchip and a incorrect udelay usage for the stmmac phy reset delay.

Tomeu fixed gpiolib to defer probe if the pin controller isn’t available, added another fix to chipidea USB to defer probe of usbmisc hasn’t been probed yet. On Tegra Tomeu worked to support to gpio-ranges property. Still on Tegra cpuidle_state.enter_freeze() was added.

Gustavo Padovan did a lot of exynos DRM work, with the most important changes being improvements to atomic modesetting, including the asynchronous atomic commit in exynos, in async mode we just schedule the atomic update and return right away to the userspace, in a similar way that PageFlips works in the old API. In this release the exynos atomic modesetting interface was enabled for userspace usage. Another important set of patches was the removal of structs exynos_drm_display and exynos_drm_encoder layers which greatly improved the code making it cleaner and easier to use. Apart from that there is also a few cleanup and fixes.

Danilo Cesar Lemes de Paula (2):

Gustavo Padovan (36):

Javier Martinez Canillas (1):

Sjoerd Simons (2):

Tomeu Vizoso (7):

November 12, 2015 12:20 PM

November 11, 2015

Kees Cook: evolution of seccomp

I’m excited to see other people thinking about userspace-to-kernel attack surface reduction ideas. Theo de Raadt recently published slides describing Pledge. This uses the same ideas that seccomp implements, but with less granularity. While seccomp works at the individual syscall level and in addition to killing processes, it allows for signaling, tracing, and errno spoofing. As de Raadt mentions, Pledge could be implemented with seccomp very easily: libseccomp would just categorize syscalls.

I don’t really understand the presentation’s mention of “Optional Security”, though. Pledge, like seccomp, is an opt-in feature. Nothing in the kernel refuses to run “unpledged” programs. I assume his point was that when it gets ubiquitously built into programs (like stack protector), it’s effectively not optional (which is alluded to later as “comprehensive applicability ~= mandatory mitigation”). Regardless, this sensible (though optional) design gets me back to his slide on seccomp, which seems to have a number of misunderstandings:

OpenBSD has some interesting advantages in the syscall filtering department, especially around sockets. Right now, it’s hard for Linux syscall filtering to understand why a given socket is being used. Something like SOCK_DNS seems like it could be quite handy.

Another nice feature of Pledge is the path whitelist feature. As it’s still under development, I hope they expand this to include more things than just paths. Argument inspection is a weak point for seccomp, but under Linux, most of the arguments are ultimately exposed to the LSM layer. Last year I experimented with creating a “seccomp LSM” for path matching where programs could declare whitelists, similar to standard LSMs.

So, yes, Linux “could match this API on seccomp”. It’d just take some extensions to libseccomp to implement pledge(), as I described at the top. With OpenBSD doing a bunch of analysis work on common programs, it’d be excellent to see this usable on Linux too. So far on Linux, only a few programs (e.g. Chrome, vsftpd) have bothered to do this using seccomp, and it could be argued that this is ultimately due to how fine grained it is.

© 2015, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

November 11, 2015 06:01 PM

November 06, 2015

Dave Jones: Trinity 1.6

As alluded to in my last post, a few days ago I released a new version of Trinity.
The bulk of the work in this release happened prior to my burn out back in July. The combination of everything described in that post, and general unhappiness in my last job etc led to me just wanting to walk away from everything for an indeterminate amount of time.

Distance is good. I’ve continued to poke at trinity in small amounts since then. At last weeks kernel summit, a number of people expressed just how useful they find Trinity and how much they were bummed to find out I wasn’t working on it any more. With that feedback, I felt motivated to clean the decks and get 1.6 out. There’s a short description of most of the bigger changes below, but there were probably a whole bunch more changes made that I forgot to highlight in the shortlog.

With that release wrapped up, and with the fresh perspective of having been ‘away’ from the project for a while, when I was travelling last week, I started work on some new features, starting with implementing a generic object cache instead of hard coding a “remember this” set of functionality for every single object type a syscall could return. A relatively small amount of code, which should make life easier to support recycling syscall results for syscalls other than mmap (which is all that’s implemented right now).

So,.. while I’m working on this stuff again, it’s not the comeback many would like. I don’t know just how much time I’m going to have to devote to working on Trinity. From time to time, I suspect I’ll find some intersection between my work at Facebook and the sort of targeted testing that Trinity is useful for, but it’s not my primary focus, and probably won’t be again. Additionally, I’ve got a bunch of ideas for new projects I’m itching to work on that spawned from discussions last week, so “spare time” hacking effort might be devoted more to them in future.

tl;dr: Don’t send me feature requests. I’ve got more than enough ideas for stuff *I* want to implement. Diffs speak louder than words.

Summary of some of the bigger changes to Trinity since the last (1.5) tarball release include:

The post Trinity 1.6 appeared first on

November 06, 2015 04:08 PM

Matthew Garrett: Why improving kernel security is important

The Washington Post published an article today which describes the ongoing tension between the security community and Linux kernel developers. This has been roundly denounced as FUD, with Rob Graham going so far as to claim that nobody ever attacks the kernel.

Unfortunately he's entirely and demonstrably wrong, it's not FUD and the state of security in the kernel is currently far short of where it should be.

An example. Recent versions of Android use SELinux to confine applications. Even if you have full control over an application running on Android, the SELinux rules make it very difficult to do anything especially user-hostile. Hacking Team, the GPL-violating Italian company who sells surveillance software to human rights abusers, found that this impeded their ability to drop their spyware onto targets' devices. So they took advantage of the fact that many Android devices shipped a kernel with a flawed copy_from_user() implementation that allowed them to copy arbitrary userspace data over arbitrary kernel code, thus allowing them to disable SELinux.

If we could trust userspace applications, we wouldn't need SELinux. But we assume that userspace code may be buggy, misconfigured or actively hostile, and we use technologies such as SELinux or AppArmor to restrict its behaviour. There's simply too much userspace code for us to guarantee that it's all correct, so we do our best to prevent it from doing harm anyway.

This is significantly less true in the kernel. The model up until now has largely been "Fix security bugs as we find them", an approach that fails on two levels:

1) Once we find them and fix them, there's still a window between the fixed version being available and it actually being deployed
2) The forces of good may not be the first ones to find them

This reactive approach is fine for a world where it's possible to push out software updates without having to perform extensive testing first, a world where the only people hunting for interesting kernel vulnerabilities are nice people. This isn't that world, and this approach isn't fine.

Just as features like SELinux allow us to reduce the harm that can occur if a new userspace vulnerability is found, we can add features to the kernel that make it more difficult (or impossible) for attackers to turn a kernel bug into an exploitable vulnerability. The number of people using Linux systems is increasing every day, and many of these users depend on the security of these systems in critical ways. It's vital that we do what we can to avoid their trust being misplaced.

Many useful mitigation features already exist in the Grsecurity patchset, but a combination of technical disagreements around certain features, personality conflicts and an apparent lack of enthusiasm on the side of upstream kernel developers has resulted in almost none of it landing in the kernels that most people use. Kees Cook has proposed a new project to start making a more concerted effort to migrate components of Grsecurity to upstream. If you rely on the kernel being a secure component, either because you ship a product based on it or because you use it yourself, you should probably be doing what you can to support this.

Microsoft received entirely justifiable criticism for the terrible state of security on their platform. They responded by introducing cutting-edge security features across the OS, including the kernel. Accusing anyone who says we need to do the same of spreading FUD is risking free software being sidelined in favour of proprietary software providing more real-world security. That doesn't seem like a good outcome.

comment count unavailable comments

November 06, 2015 09:19 AM

November 05, 2015

Pete Zaitcev: Cool hardware in Tokyo

At the Mitaka Summit, we finally got some interesting kit exhibited, after the relatively lean summits in Atlanta and Vancouver. Unfortunately, the lightning in the Marketplace was very weird and pictures came out poorly.

My personal favourite is probably the flash array by SanDisk. It's nothing but JBOF, the host connection is SAS. You'd think any idiot could slap a few flash chips on cards and plug them into backplane... But just look how elegant it is. The capacity of the 2U box is 512 TB, but the whole thing only consumes 700 W maximum. It's brilliant, really.

Unfortunately, I don't have a good picture, but the second best was Ericksson's passive optical backplane. It promises to make your cables last forever: just swap out optronics when new bit rates come along. Even a terabit! Now it may actually be a misguided product. If they cannot get 3rd party vendors to build modules for it, the whole things comes crashing to the ground. Ditto if they build, but overprice. But the audacity of making something that's different is to be acknowledged. And frankly I'm not a fan of re-cabling when new servers come about.

Intel wins a consolation prize for preservance. They quietly presented some kind of next-generation multiblock computer, with pieces connected by serial cables. Finally, the future dreamed by the creators of Infiniband is here - only 15 years late, and still we don't know if it is viable.

There was also a bunch of fairly mundane boxes. Various also-run flash vendors were present, of course. Interestingly, SolidFire had a booth, but without anything eye-catching. Resting on the laurels? IBM brought their newest PowerPC, which was mostly remarkable for still existing. That sort of thing.

November 05, 2015 02:46 AM

November 04, 2015

Dave Jones: kernel summit 2015 wrap-up

Exhausting travel aside, kernel summit in Seoul was a good use of time.
Most of the sessions didn’t feel as interactive as prior years, in part I think because there really wasn’t a lot of objection, even to some
of the more controversial things. Kees’ security talk went over pretty well even if it did depress most the people in the room. Hopefully something good will come of it. The restartable sequences feature got talked about but didn’t get much (if any) real pushback.

There were a few hallway discussions surrounding various upcoming
kernel functionality that didn’t get ‘airtime’ in the sessions.
The kernel TLS stuff was probably discussed more in depth at netconf, and assorted VM features were covered more at LSFMM
earlier this year. Quite a few people talking excitedly about eBPF, both from a networking point of view, and soon.. tracing.
Quite a few people still seem concerned (rightly) about the upcoming unpriveledged bpf syscall.

It seems by fracturing the kernel summit into lots of smaller events the deep-dives into new features/problems happens there, leaving the kernel summit more for executive summary type talks, and as has been the general push over the last decade more and more process related discussions.

On process, Sasha’s discussion on stable was probably the most interesting to me personally. GregKH agreed to make 4.4 the next LTS starting a new tradition of “the next LTS is the one after the kernel summit”. We’ll see how that works out.

Chris Mason gave a “what went good/bad when facebook moved to 4.0” talk. Which for the most part, was all good. There are a few small things that are still being shaken out, but it’s by no means awful.

I had a lot of hallway conversations that began “so, trinity..”
The short answer there is that I’m still working on it, though at a much reduced pace than I was a year ago. It was good to hear feedback
from pretty much everyone I talked to that it was something that people value, which was a good motivator. More on that later.

I also had a lot of people asking a lot of questions about my Facebook bootcamp experience. I’ll do a longer write-up of that soon.

The post kernel summit 2015 wrap-up appeared first on

November 04, 2015 07:00 PM

November 03, 2015

Grant Likely: Debugging 96Boards I2C

I was originally just going to post this to one of the 96boards mailing list, but it got sufficiently interesting that I thought I’d make it a blog post instead. I’ve been working on making i2c on the 96Boards sensors adapter work properly and I’ve made some progress. The problem that user have run into is that the Grove RGB LCD module won’t work when connected to one of the baseboard’s I2C busses. I pulled out the oscilloscope today to investigate.

The LCD module is particularly useful for testing because it actually has 2 i2c devices embedded in it; an LCD controller at address 0x3e, and an RGB controller at 0x62. The two devices operate independently with different electrical properties.

​On Hikey+sensors (TXS0108 level shifter), the RGB device will work, but only after pulling the ribbon cable apart to reduce crosstalk due to insufficient pullups. However, the LCD causes the entire bus to lock up, and no further transactions will work.

On Hikey+pca9306 the LCD isn’t detected and the RGB works correctly (undetermined if there are crosstalk issues)

​The traces below show both sides of the level shifter. Green and blue on the top for the data line. Orange and purple on the bottom with the clock.​

First, what I saw on using Hikey+pca9306+RGB:

Screen CaptureRGB transaction via PCA9306

And with the LCD:

Screen CaptureLCD transaction via PCA9306

In both traces you can see the start condition (data goes low while clock is high), the 7 bits of address (7 rising clock edges), the R/W bit (1 rising clock), and then the acknowledgement bit driven by the device. If the controller doesn’t see the device drive the data line low on the 9th clock, then it decides the device isn’t there and it terminates the transaction. It is easy to recognize the ack bit because the device has a different drive strength and the voltage level is different.

The RGB controller is a happy little device and it jumps at the chance to drive the data line low. It goes down pretty close to 0V. The LCD on the other hand is sulky and doesn’t drive the line quite as low as the controller can. About to 1V. 1V is recognized fine as logic low on a 5V device, but with 1.8V it is not even less than half. The way the pca9306 level shifter works is there are pull-up resistors on either side of the device that draws each side up to its respective high level. In this case, 1.8V and 5V. When either side gets driven low, the level shifter begins to conduct and the other side also gets drawn down to the same voltage, but it can only go as low as the voltage it is driven to. If it only gets driven down to 1V, then it will never get low enough for a 1.8V controller to recognize it as a low state.

It may be that with weaker pull-ups the LCD will be able to drive to a lower voltage level. I’ll need to experiment more, but in the mean time let’s move onto the Sensors board. Back to the traces:

First, here is a transaction to address 0x63 with no device present:

Screen CaptureNo device

​Looks perfectly normal so far. Next, the RGB device at address 0x62:

Screen CaptureRGB

Also behaving the same way as it did with the pca9306. Finally, an LCD transaction:

Screen CaptureLCD

Again we see the start condition, the 7 data bits and 1 r/w bit, but the ack bit looks weird. The LCD successfully drives the data bit low enough to be recognized, but then something weird happens. The data line stays low and the clock stops running. I don’t know actually know what is happening here, but I’ve got my suspicions. The LCD is continuing to drive the data line low, (you can tell by the slightly different voltage level) but keeping data low should not stop the clock. I suspect the txs0108 is getting confused and driving the clock line high. I’ve come across reports from others having trouble with the txs010x series on i2c. It has ‘one-shot’ accelerators to reduce rise time by driving the line high. I don’t know for sure though.

On the plus side, I now know that the Hikey I2C busses are working correctly. Now I need to decide what to do next. Aside from the i2c problem, Rev B of the sensors board ready for manufacturing. I either need to make the txs part work, or rework the design to use a pair of pca9306s. I think I’ll try weaker pull-ups on the pca9306 breakout board first and see how that goes. Sadly, I blew up the i2c drives on my Hikey board while experimenting today, so I need to do the same experiments with my Dragonboard 410c.

Dear lazyweb, do you have any other suggestions on things to try?

November 03, 2015 12:35 AM

October 28, 2015

Pete Zaitcev: Darcy on the future of storage

Quick comment on the following:

Good morning, madam. What kind of storage system would you like me to build for you today?

Scary thought. That means that selling storage products is going to be hard for all of us. We'll be selling components, both hardware and software, or we'll be selling integration and support services. Somebody will always pay to have somebody else assemble the parts, maybe add some light customization, and support the result. There's a nice living to be made there... but no empires.

Why is it a problem that no empires are to be built? It's only a problem for an empire-builder like I dunno... Sam Altman or something. Darcy is an old engineer, not a startup founder. A good one, too. His kids aren't going to go to bed hungry.

We've been at this dance before with Linux. People have been asking if Red Hat was going to be like Microsoft, and I told everyone: nope. We're transfering the wealth that the proprietary lock-in vendors were collecting back to the users. That was the whole idea. In the process, we're collecting less - a more reasonable amount, necessary to put stuff together and make it run. Therefore, we're not going to be as wealthy off users' backs. But the society as a whole benefits.

So cry me a river. Not scary at all. But RTWT, I think he's drawing a truthful outline overall.

P.S. Another thing, what's magical about storage? Why, I can go build spacecraft when storage goes bust. Or whatever. Of course it's a pity for all the storage-specific techniques and skills that I accumulated, but eh. As long as we leave behind the good code (and docs), it's all good.

October 28, 2015 01:41 AM

October 22, 2015

James Morris: LSM Mailing List Being Archived Again

Several folks noticed that all of the known LSM mailing list archives stopped archiving earlier this year.  We don’t know why and generally have not had any luck contacting the owners of several archives, including marc and gmane.  This is a concern, because the list is generally where Linux kernel security takes place and it’s important to have a public record of it.

The good news is that Paul Moore was finally able to re-register the list with, and there is once again an active archive here:

Please update any links you may have!

October 22, 2015 04:58 AM

Andy Grover: iSNS support coming soon for LIO in Fedora

target-isns recently was added to Rawhide, and will be in a future Fedora release. This add-on to LIO allows it to register with an iSNS server, which potential initiators can then query for available targets. (On Fedora, see isns-utils for both the server, and client query tools.) This removes one of the few remaining areas that other target implementations have been ahead of LIO.

Kudos and thanks to Christophe Vu-Brugier for writing this useful program!

October 22, 2015 12:29 AM

October 21, 2015

Andy Grover: Some targetcli and TCMU questions

Just got an email full of interesting questions, I hope the author will be ok with me answering them here so future searches will see them:

I searched on internet and I don’t find some relevant info about gluster api support via tcmu-runner. Can you tell me please if this support will be added to the stable redhat targetcli in the near future? And I want to know also which targetcli is recommended for setup (targetcli or targetcli-fb) and what is the status for targetcli-3.0.

tcmu-runner is a userspace daemon add-on to LIO that allows requests for a device to be handled by a user process. tcmu-runner has early support for using glfs (via gfapi). Both tcmu-runner and its glfs plugin are beta-quality and will need further work before they are ready for stable Fedora, much less a RHEL release. tcmu-runner just landed in Rawhide, but this is really just to make it easier to test.

RHEL & Fedora use targetcli-fb, which is a fork of targetcli, and what I work on. Since I’m working on both tcmu-runner and targetcli-fb, targetcli-fb will see TCMU support very early.

The -fb packages I maintain switched to a “fbXX” version scheme, so I think you must be referring to the other one :-) I don’t have any info about the RTS/Datera targetcli’s status, other than nobody likes having two versions, the targetcli maintainer and I have discussed unifying them into a common version, but the un-fun work of merging them has not happened yet.

October 21, 2015 09:58 PM

October 20, 2015

Rusty Russell: ccan/mem’s memeqzero iteration

On Thursday I was writing some code, and I wanted to test if an array was all zero.  First I checked if ccan/mem had anything, in case I missed it, then jumped on IRC to ask the author (and overall CCAN co-maintainer) David Gibson about it.

We bikeshedded around names: memallzero? memiszero? memeqz? memeqzero() won by analogy with the already-extant memeq and memeqstr. Then I asked:

rusty: dwg: now, how much time do I waste optimizing?
dwg: rusty, in the first commit, none

Exactly five minutes later I had it implemented and tested.

The Naive Approach: Times: 1/7/310/37064 Bytes: 50

bool memeqzero(const void *data, size_t length)
    const unsigned char *p = data;

    while (length) {
        if (*p)
            return false;
    return true;

As a summary, I’ve give the nanoseconds for searching through 1,8,512 and 65536 bytes only.

Another 20 minutes, and I had written that benchmark, and an optimized version.

128-byte Static Buffer: Times: 6/8/48/5872 Bytes: 108

Here’s my first attempt at optimization; using a static array of 128 bytes of zeroes and assuming memcmp is well-optimized for fixed-length comparisons.  Worse for small sizes, much better for big.

 const unsigned char *p = data;
 static unsigned long zeroes[16];

 while (length > sizeof(zeroes)) {
     if (memcmp(zeroes, p, sizeof(zeroes)))
         return false;
     p += sizeof(zeroes);
     length -= sizeof(zeroes);
 return memcmp(zeroes, p, length) == 0;

Using a 64-bit Constant: Times: 12/12/84/6418 Bytes: 169

dwg: but blowing a cacheline (more or less) on zeroes for comparison, which isn’t necessarily a win

Using a single zero uint64_t for comparison is pretty messy:

bool memeqzero(const void *data, size_t length)
    const unsigned char *p = data;
    const unsigned long zero = 0;
    size_t pre;
    pre = (size_t)p % sizeof(unsigned long);
    if (pre) {
        size_t n = sizeof(unsigned long) - pre;
        if (n > length)
            n = length;
        if (memcmp(p, &zero, n) != 0)
            return false;
        p += n;
        length -= n;
    while (length > sizeof(zero)) {
        if (*(unsigned long *)p != zero)
            return false;
        p += sizeof(zero);
        length -= sizeof(zero);
    return memcmp(&zero, p, length) == 0;

And, worse in every way!

Using a 64-bit Constant With Open-coded Ends: Times: 4/9/68/6444 Bytes: 165

dwg: rusty, what colour is the bikeshed if you have an explicit char * loop for the pre and post?

That’s slightly better, but memcmp still wins over large distances, perhaps due to prefetching or other tricks.

Epiphany #1: We Already Have Zeroes: Times 3/5/92/5801 Bytes: 422

Then I realized that we don’t need a static buffer: we know everything we’ve already tested is zero!  So I open coded the first 16 byte compare, then memcmp()ed against the previous bytes, doubling each time.  Then a final memcmp for the tail.  Clever huh?

But it no faster than the static buffer case on the high end, and much bigger.

dwg: rusty, that is brilliant. but being brilliant isn’t enough to make things work, necessarily :p

Epiphany #2: memcmp can overlap: Times 3/5/37/2823 Bytes: 307

My doubling logic above was because my brain wasn’t completely in phase: unlike memcpy, memcmp arguments can happily overlap!  It’s still worth doing an open-coded loop to start (gcc unrolls it here with -O3), but after 16 it’s worth memcmping with the previous 16 bytes.  This is as fast as naive with as little as 2 bytes, and the fastest solution by far with larger numbers:

 const unsigned char *p = data;
 size_t len;

 /* Check first 16 bytes manually */
 for (len = 0; len < 16; len++) {
     if (!length)
         return true;
     if (*p)
         return false;

 /* Now we know that's zero, memcmp with self. */
 return memcmp(data, p, length) == 0;

You can find the final code in CCAN (or on Github) including the benchmark code.

Finally, after about 4 hours of random yak shaving, it turns out lightning doesn’t even want to use memeqzero() any more!  Hopefully someone else will benefit.

October 20, 2015 12:09 AM

October 09, 2015

Paul E. Mc Kenney: Deep Blue vs. Watson Revisited

Some years back, I speculated on the importance of IBM's Watson. Much has happened since then: Watson won Jeopardy, has been applied to medical applications, and has been made available to numerous business partners to enable them to produce Watson-based offerings. In short, it is long past time for a follow-up.

However, The Economist beat me to the punch in their October 3rd print edition. I doubt that I can improve on their article, so I will confine myself to taking the fair-use liberty of quoting their last sentence:

If it [Watson] can pull that off, a truly disturbing possibility looms: that the next TV show featuring Watson might be “America's Got Talent”.

October 09, 2015 02:31 AM

October 08, 2015

Matthew Garrett: Going my own way

Reaction to Sarah's post about leaving the kernel community was a mixture of terrible and touching, but it's still one of those things that almost certainly won't end up making any kind of significant difference. Linus has made it pretty clear that he's fine with the way he behaves, and nobody's going to depose him. That's unfortunate, because earlier today I was sitting in a presentation at Linuxcon and remembering how much I love the technical side of kernel development. "Remembering" is a deliberate choice of word - it's been increasingly difficult to remember that, because instead I remember having to deal with interminable arguments over the naming of an interface because Linus has an undying hatred of BSD securelevel, or having my name forever associated with the deepthroating of Microsoft because Linus couldn't be bothered asking questions about the reasoning behind a design before trashing it.

In the end it's a mixture of just being tired of dealing with the crap associated with Linux development and realising that by continuing to put up with it I'm tacitly encouraging its continuation, but I can't be bothered any more. And, thanks to the magic of free software, it turns out that I can avoid putting up with the bullshit in the kernel community and get to work on the things I'm interested in doing. So here's a kernel tree with patches that implement a BSD-style securelevel interface. Over time it'll pick up some of the power management code I'm still working on, and we'll see where it goes from there. But, until there's a significant shift in community norms on LKML, I'll only be there when I'm being paid to be there. And that's improved my mood immeasurably.

(Edited to add a context link for the "deepthroating of Microsoft" reference)

comment count unavailable comments

October 08, 2015 09:22 AM

James Bottomley: Respect and the Linux Kernel Mailing Lists

I recently noticed that Sarah Sharp resigned publicly from the kernel giving a failure to impose a mandatory code of conduct as the reason and citing interaction problems, mainly on the mailing lists.  The net result of this posting, as all these comments demonstrate, is to imply directly that nothing has ever changed.  This implication is incredibly annoying, firstly because it is actually untrue, secondly because it does more to discourage participation than the behaviour that is being complained about and finally because it totally disrespects and ignores the efforts of hundreds of people who, over the last decade or so, have been striving to improve all interactions around Linux … a rather nice irony given that “respect” is listed as one of the issues for the resignation.  I’d just like to remind everyone of the history of these efforts and what the record shows they’ve achieved.

The issue of respect on the Mailing lists goes way back to the beginnings of Linux itself, but after the foundation of the OSDL (precursor to the Linux Foundation) Technical Advisory Board (TAB), one of its first issues from OSDL member companies was the imbalance between Asian and European/American contributions to the kernel.  The problems were partly to do with Management culture and partly because the lack of respect on the various mailing lists was directly counter to the culture of respect in a lot of Asian countries and disproportionately discouraged contributions from that region.  The TAB largely works behind the scenes, but some aspects of the effort filtered into the public domain as can be seen with a session on developer relations at the 2007 kernel summit (and, in fact, at a lot of other kernel summits since then).  Progress was gradual, and influenced by a large number of people, but the climate did improve.  I have to confess that I don’t follow LKML (not because of the flame war issues, simply because it’s too much of a firehose); however, the lists I do participate in (linux-scsi, linux-ide, linux-mm, linux-fsdevel, linux-efi, linux-arch, linux-parisc) haven’t seen any flagrantly disrespectful and personally insulting posts for several years now.  Indeed, when an individual came along who could almost have been flame bait for this with serial efforts to get incorrect and badly thought out patches into the kernel (I won’t give cites here to avoid stigmatising individuals) they met with a large reserve of patience and respectful and helpful advice before finally being banned from the lists for being incorrigible … no insults or flames at all.

Although I’d love to take credit for some of this, I’ve got to say that I think the biggest influencer towards civility is actually the “professionalisation”  of Linux: Employers pay people to work on Linux but the statements of those people become identified with their employers (no matter how many disclaimers they have) … in many ways, Open Source engineers are the new corporate spokespeople.  All employers bear this in mind when they hire and they certainly look over the mailing lists to see how people behave.  The net result is really that the only people who can afford to be rude or abusive are those who don’t think they have much chance of a long term career in Linux.

So, by and large, I’m proud of the achievements we’ve made in civility and the way we have improved over the years.  Are we perfect? by no means (but then perfection in such a large community isn’t a realistic goal).  However, we have passed our stress test: that an individual with bad patches to several mailing lists was met with courtesy and helpful advice, in spite of serially repeating the behaviour.

In conclusion, I’d just like to note that even the thread that gave rise to Sarah’s desire to pursue a code of conduct is now over two years old and try as they might, no-one’s managed to come up with a more recent example and no-one has actually invoked the voluntary code of conflict, which was the compromise for not having a mandatory code of conduct.  If it were me, I’d actually take that as a sign of success …

October 08, 2015 03:47 AM

October 05, 2015

Pete Zaitcev: Pics Up

Чёт я под настроение выложил картинки с этой недели на форумы Авиабазы. Anglophones are welcome to pictures at least.

October 05, 2015 07:31 PM

Davidlohr Bueso: acquire/release semantics in the kernel

With the need for better scaling on increasingly larger multi-core systems, we've continued to extend our CPU barriers in the kernel. Two important variants to prevent CPU reordering for lock-free shared memory synchronization are pairs of load/acquire and store/release barriers; also known as LOCK/UNLOCK barriers. These enable threads to cooperate between each other.

Multiple, yet pretty much equivalent, definitions of acquire/release semantics can be found all over the internet, but I like the version from the infamous 'Documentation/memory-barriers.txt' file for three reasons: (i) it is clear and concise, (ii) it explicitly warns that they are the minimum operations and not to assume anything about reordering of loads and stores before or after the acquire or release, respectively. Finally, (iii) it strongly mentions the need for pairing and thus portability:
 (5) ACQUIRE operations.

     This acts as a one-way permeable barrier.  It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the CQUIRE operation with respect to the other components of the system. ACQUIRE operations include LOCK operations and smp_load_acquire() operations.

     Memory operations that occur before an ACQUIRE operation may appear tohappen after it completes.

     An ACQUIRE operation should almost always be paired with a RELEASE operation.

 (6) RELEASE operations.

     This also acts as a one-way permeable barrier.  It guarantees that all   memory operations before the RELEASE operation will appear to happen before the RELEASE operation with respect to the other components of the system. RELEASE operations include UNLOCK operations and smp_store_release() operations.

     Memory operations that occur after a RELEASE operation may appear to happen before it completes.

     The use of ACQUIRE and RELEASE operations generally precludes the need for other sorts of memory barrier (but note the exceptions mentioned in the subsection "MMIO write barrier").  In addition, a RELEASE+ACQUIRE pair is -not- guaranteed to act as a full memory barrier.  However, after an ACQUIRE on a given variable, all memory accesses preceding any prior RELEASE on that same variable are guaranteed to be visible.  In other words, within a given variable's critical section, all accesses of all previous critical sections for that variable are guaranteed to have completed.

     This means that ACQUIRE acts as a minimal "acquire" operation and    RELEASE acts as a minimal "release" operation.
Thread B's ACQUIRE pairs with Thread A's RELEASE. Copyright (C) IBM.

In lock-speak, all this means is that nothing leaks from the critical region that is protected by the primitive in question. A thread attempting to take a lock will synchronize/pair the load (ACQUIRE), for instance via Rmw (cmpxchg), when attempting to take the lock with the last store (RELEASE) when another thread is concurrently releasing the lock (for example, setting the counter to 0).

For v4.2, Will Deacon introduced more relaxed extensions of traditional atomic operations (including Rmw) which allow finer grained control over, what used to be, full barriers semantics on both sides of the instruction. This is also true for just about all atomic functions that return a value to the caller, ie: atomic_*_return(). As such weakly ordered architectures can make use of these -- currently only arm64 makes use of them, but efforts for PPC are being made.
      - *_relaxed: No ordering guarantees. This is similar to what we have already for the non-return atomics (e.g. atomic_add).
      - *_acquire: ACQUIRE semantics, similar to smp_load_acquire.
      - *_release: RELEASE semantics, similar to smp_store_release.
So we now have goodies such as atomic_cmpxchg_acquire() or atomic_add_return_relaxed(). Most recently, aiming for v4.4, I've ported all our locks to make use of these optimizations, which can save almost half the amount of barriers in the kernel's locking code -- which is specially nice under low or regular contention scenarios, where the fastpaths are exercised. There are plenty of other examples of real world code making use of acquire/release semantics. Mostly by using smp_load_acquire()/smp_store_release() other primitives  also use these semantics for common building blocks (as esoteric as they can get, ie RCU).

October 05, 2015 06:54 AM

September 24, 2015

Eric Sandeen: No, XFS won’t steal your money

So, the Inquirer runs a story by Chris Merriman today, titled “GreenDispenser malware threatens to take all your dosh from Linux ATMs” which includes this breathless little gem:

GreenDispenser targets the XFS file system, a popular standard for ATMs, originally designed for IRIX but now widely used in Linux. ATMs that use Windows XP Embedded, which is still supported, are not thought to be at risk.

Of course, I found this interesting, and a bit odd.  Could the XFS filesystem possibly be at fault here?  And is the “large and lots” filesystem really used in ATMS?  Let’s see what Proofpoint, the security firm who discovered it has to say about the subject:

Specifically, GreenDispenser like its predecessors interacts with the XFS middleware [4], which is widely adopted by various ATM vendors.

That handy link & footnote leads us to Wikipedia, which explains that “XFS middleware” refers to CEN/XFS, which is not in any way related to the XFS filesystem, or Linux, and is in fact Microsoft specific:

CEN/XFS or XFS (eXtensions for Financial Services) provides a client-server architecture for financial applications on the Microsoft Windows platform.

Nice job, Inquirer!  Nice job, Chris Merriman!

(As Jeff points out in the comments, The Inquirer has updated the article as of Sep 25, removing references to LInux and the XFS filesystem.)

September 24, 2015 06:49 PM

Matthew Garrett: Filling in the holes in Linux boot chain measurement, and the TPM measurement log

When I wrote about TPM attestation via 2FA, I mentioned that you needed a bootloader that actually performed measurement. I've now written some patches for Shim and Grub that do so.

The Shim code does a couple of things. The obvious one is to measure the second-stage bootloader into PCR 9. The perhaps less expected one is to measure the contents of the MokList and MokSBState UEFI variables into PCR 14. This means that if you're happy simply running a system with your own set of signing keys and just want to ensure that your secure boot configuration hasn't been compromised, you can simply seal to PCR 7 (which will contain the UEFI Secure Boot state as defined by the UEFI spec) and PCR 14 (which will contain the additional state used by Shim) and ignore all the others.

The grub code is a little more complicated because there's more ways to get it to execute code. Right now I've gone for a fairly extreme implementation. On BIOS systems, the grub stage 1 and 2 will be measured into PCR 9[1]. That's the only BIOS-specific part of things. From then on, any grub modules that are loaded will also be measured into PCR 9. The full kernel image will be measured into PCR10, and the full initramfs will be measured into PCR11. The command line passed to the kernel is in PCR12. Finally, each command executed by grub (including those in the config file) is measured into PCR 13.

That's quite a lot of measurement, and there are probably fairly reasonable circumstances under which you won't want to pay attention to all of those PCRs. But you've probably also noticed that several different things may be measured into the same PCR, and that makes it more difficult to figure out what's going on. Thankfully, the spec designers have a solution to this in the form of the TPM measurement log.

Rather than merely extending a PCR with a new hash, software can extend the measurement log at the same time. This is stored outside the TPM and so isn't directly cryptographically protected. In the simplest form, it contains a hash and some form of description of the event associated with that hash. If you replay those hashes you should end up with the same value that's in the TPM, so for attestation purposes you can perform that verification and then merely check that specific log values you care about are correct. This makes it possible to have a system perform an attestation to a remote server that contains a full list of the grub commands that it ran and for that server to make its attestation decision based on a subset of those.

No promises as yet about PCR allocation being final or these patches ever going anywhere in their current form, but it seems reasonable to get them out there so people can play. Let me know if you end up using them!

[1] The code for this is derived from the old Trusted Grub patchset, by way of Sirrix AG's Trusted Grub 2 tree.

comment count unavailable comments

September 24, 2015 01:21 AM

September 20, 2015

Matthew Garrett: The Internet of Incompatible Things

I have an Amazon Echo. I also have a LIFX Smart Bulb. The Echo can integrate with Philips Hue devices, letting you control your lights by voice. It has no integration with LIFX. Worse, the Echo developer program is fairly limited - while the device's built in code supports communicating with devices on your local network, the third party developer interface only allows you to make calls to remote sites[1]. It seemed like I was going to have to put up with either controlling my bedroom light by phone or actually getting out of bed to hit the switch.

Then I found this article describing the implementation of a bridge between the Echo and Belkin Wemo switches, cunningly called Fauxmo. The Echo already supports controlling Wemo switches, and the code in question simply implements enough of the Wemo API to convince the Echo that there's a bunch of Wemo switches on your network. When the Echo sends a command to them asking them to turn on or off, the code executes an arbitrary callback that integrates with whatever API you want.

This seemed like a good starting point. There's a free implementation of the LIFX bulb API called Lazylights, and with a quick bit of hacking I could use the Echo to turn my bulb on or off. But the Echo's Hue support also allows dimming of lights, and that seemed like a nice feature to have. Tcpdump showed that asking the Echo to look for Hue devices resulted in similar UPnP discovery requests to it looking for Wemo devices, so extending the Fauxmo code seemed plausible. I signed up for the Philips developer program and then discovered that the terms and conditions explicitly forbade using any information on their site to implement any kind of Hue-compatible endpoint. So that was out. Thankfully enough people have written their own Hue code at various points that I could figure out enough of the protocol by searching Github instead, and now I have a branch of Fauxmo that supports searching for LIFX bulbs and presenting them as Hues[2].

Running this on a machine on my local network is enough to keep the Echo happy, and I can now dim my bedroom light in addition to turning it on or off. But it demonstrates a somewhat awkward situation. Right now vendors have no real incentive to offer any kind of compatibility with each other. Instead they're all trying to define their own ecosystems with their own incompatible protocols with the aim of forcing users to continue buying from them. Worse, they attempt to restrict developers from implementing any kind of compatibility layers. The inevitable outcome is going to be either stacks of discarded devices speaking abandoned protocols or a cottage industry of developers writing bridge code and trying to avoid DMCA takedowns.

The dystopian future we're heading towards isn't Gibsonian giant megacorporations engaging in physical warfare, it's one where buying a new toaster means replacing all your lightbulbs or discovering that the code making your home alarm system work is now considered a copyright infringement. Is there a market where I can invest in IP lawyers?

[1] It also requires an additional phrase at the beginning of a request to indicate which third party app you want your query to go to, so it's much more clumsy to make those requests compared to using a built-in app.
[2] I only have one bulb, so as yet I haven't added any support for groups.

comment count unavailable comments

September 20, 2015 09:22 PM

September 18, 2015

Daniel Vetter: XDC 2015: Atomic Modesetting for Drivers

I've done a talk at XDC 2015 about atomic modesetting with a focus for driver writers. Most of the talk is an overview of how an atomic modeset looks and how to implement the different parts in a driver backend. Anyway, for all those who missed it, there's a video and slides.

September 18, 2015 03:27 PM

September 11, 2015

Pete Zaitcev: TLS Security In Firefox 40

What do people at Mozilla think is going to happen when I need to access a website and Firefox says that TLS parameters are insecure and thus I cannot? I'm going to use Chrome, that's what. Or maybe even a hacked Midori, where I can adjust build-time parameters of gcr.

That company went way downhill when they kicked Eich out.

September 11, 2015 06:33 PM

September 07, 2015

Daniel Vetter: Neat drm/i915 stuff for 4.3

Kernel 4.2 is released already and the 4.3 merge window in full swing, time to look at what's in it for the intel graphics driver.

Biggest thing for sure is that Skylake is finally out of preliminary support and enabled by default. The reason for the long hold-up was some ABI fumble - the hardware exposes the topmost plane both through the new universal plane registers and the legacy cursor registers and because we simply carried the legacy plane code around in the driver we ended up exposing both. This wasn't something big to take care of but somehow was dragged on forever.

The other big thing is that now legacy modesets are done with the new atomic modesetting code driver-internally. Atomic support in i915.ko isn't ready for prime-time yet fully, but this is definitely a big step forward. Besides atomic there's also other cross-platform improvements in the modeset code: Ville fixed up the 12bpc support for HDMI, which is now used by default if the screen supports it. Mika Kahola and Ville also implemented dynamic adjustment of the cdclk, which is the main clock source for display engines on intel graphics. And there's a big difference in the clock speeds needed between e.g. a 4k screen and a 720p TV.

Continuing with power saving features Rodrigo again spent a lot of time fixing up PSR (panel self refresh). And Paulo did the same by writing patches to improve FBC (framebuffer compression). We have some really solid testcases by now, unfortunately neither feature is ready for enabling by default yet. Especially PSR is still plagued by screen freezes on some random systems. Also there's been some fixes to DRRS (dynamic refresh rate switching) from Ramalingam. DRRS is enabled by default already, where supported. And finally some improvements to make the frontbuffer rendering tracking more accurate, which is used by all three of these display power saving features.

And of course there's also tons of improvements to platform code. Display PLL code for Sklylake and Valleyview&Cherryview was tuned by Damien and Ville respectively. There's been tons of work on Broxton and DSI support by Imre, Gaurav and others.

Moving on to the rendering side the big change is how tracking of rendering tasks is handled. In the past the driver just used raw sequence numbers emitted by the hardware, but for cross-driver synchronization and reordering tasks with an eventual gpu scheduler more abstraction is needed. A big step is converting over to the i915 request structure completely, done by John Harrison. The next step will be to switch the internal implementation for i915 requests to the cross-driver fences, but that's for future kernels. As a follow-up cleanup John also removed the OLR, which stands for outstanding lazy request. It was a neat little trick implemented years ago to simplify handling error recovery, but which causes tons of pain with subtle bugs. Making requests more explicit in the driver allowed us to finally remove this trick since.

There's also been a pile of platform related features: MOCS programming for Skylake/Broxton (which is used for caching control). Resource streamer support from Abdiel, which is used to offload some of the buffer object tracking for shaders from the cpu to the gpu. And the command parser on Haswell was extended to support atomic instructions in shaders. And finally for Skylake Mika Kuoppala added code to avoid resetting the gpu - in certain cases the hardware would hard-hang the entire system trying to execute the reset. And a dead gpu is still better than a dead system.

September 07, 2015 09:40 AM

September 04, 2015

Andy Grover: RHEL 7.2 has an updated kernel target

As mentioned in the beta release notes, the kernel in RHEL 7.2 contains a rebased LIO kernel target, to the equivalent of the Linux 4.0.stable series.

This is a big update. LIO has improved greatly since 3.10. It has added support for SCSI features that enable VMWare VAAI support, as well as data integrity (DIF), and significant iSER work, for those of you using Infiniband. (SRP is also supported, as well as iSCSI and FCoE, of course.)

Note that we still do not ship support for the Fibre Channel qla2xxx fabric. It still seems to be something storage vendors and integrators want, more than a feature our customers are telling us they want in RHEL.

(On a side note, Infiniband hardware is pretty affordable these days! For all you datacenter hobbyists who have a rack in the garage, I might suggest a cheap previous-gen IB setup and either SRP or iSER as the way to go and still get really high IOPs.)

Users of RHEL 7’s SCSI target should find RHEL 7.2 to be a very nice upgrade. Please try the beta out and report any issues you find of course, but it’s looking really good so far.

September 04, 2015 09:50 PM

Pavel Machek: Wifi fun and misc..

(And apology for the SSD entry some time back. Apparently yes, they can fail to retain data after less than a week... at the very end of their lifetime.)

In the last weeks, learned that transfering real-time data over WIFI is way more fun than I thought. And that it is possible to communicate from inside of (closed) microwave oven using 2.4GHz WIFI. I don't know about you, but it scares me a little.

N900 and not everything is a file

Pocket Computer. We had pocket computers before ... Sharp Zaurus lines was prominent example. They had keyboards and resistive
touchscreens... Resistive touchscreen with stylus is accurate enough to serve as mouse replacement. Unfortunately, such machines are slowly going extinct. Sure, we have Quad-core Full-HD smartphones these days... but they lack keyboards, making ssh from them impossible, they lack accurate pointing device, and they are really phones, not small computers. N900 can almost be used as a pocket computer...

New Mer is "broken beyond repair" for n900.. as it uses qt5.  qt4 works well (well... little slow) on n900, but qt5 needs stable egl
drivers. Ok, so that was another nice-looking trap. I'm starting to think that text-only user interface is right thing to do on n900 at
this point.
Baking n900 for 15minutes at 250C seems to have fixed the "no sim card" problem... for a week. It now seems a bit flakey, but definitely better than before baking. Thanks for everyone at Czech BrmLab!
To backup mmc card on N900, I'd like to rsync root@maemo:/dev/mmcblk1 mmcblk1.img ... but that does not work, as rsync is too clever and refuses to transfer content of special files. Is there trick I'm missing?

On the n900 front... it has 256MiB RAM and 800x480 screen. What web browser would you recommend for that? I tried links2, but its support is not good enough for properly working pages... which I'd kind of like.

Linus, please reconsider -rc0

Hmm. There's big difference between 4.1 (expected to be pretty stable kernel) and 4.2-rc0 (which is probably going to be as unstable as it gets. Unfortunately, Linus does not change makefile before merging, so it is quite tricky to tell if
Linux amd 4.1.0 #25 SMP Wed Jul 1 11:20:22 CEST 2015 x86_64 GNU/Linux
is expected-to-be-stable 4.1, or expected-to-be-very-unstable 4.2-rc0...

Its tempting to name your branches simply "v4.1", "v3.11". Don't. When -rc's are done, Linus will create "v4.1" tag, and you'll have fun
figuring out what whent wrong in your git.

Google play bloatware

I got very cheap LG optimus chic.. and android did improve from G1 days. Its still Google's spying empire, but.. at least it is fluid and mostly works.
Not sure what "Google Play services" are good for, but taking 50MB of internal flash is not funny.. and when moved to SD card, the SD card tends to disconnect. "Google Play Store" still works without them. "My Tracks" need them, but 60MB of flash is not reasonable price to pay for GPX recording. "Pubtran" got removed, too. MHDdroid has strange interface, but perhaps it will not need that much storage.
Do you know a way to search czech public transport without Android and without desktop browser or Opera Mini? leads to "full" version.

And ...dear Android, "force close" dialog is last thing I want to see after hearing ringtone. If you could at least add the number to call log...

Feeling cheated

Wed Jul  1 01:59:58 CEST 2015
Wed Jul  1 01:59:59 CEST 2015
Wed Jul  1 02:00:00 CEST 2015
Wed Jul  1 02:00:01 CEST 2015
Wed Jul  1 02:00:02 CEST 2015
Wed Jul  1 02:00:03 CEST 2015
Different power supply for X60

Thinkpad X60 is marked as 20V, 3.25A. I wonder if using 19V, 2.63A power supply is a good idea. The power brick is way smaller, and 65W seems to be a little high for a small notebook.

September 04, 2015 10:04 AM

September 03, 2015

Gustavo F. Padovan: Linux Kernel Engineer opportunity at Collabora!

Collabora is a software consultancy specialising in bringing companies and the open source software community together and it is currently looking for a Core Software Engineer, that works in the Linux kernel and/or all the plumbing around the kernel. In this role the engineer will be part of worldwide team who works with our clients to solve their Linux kernel and low level stack technical problems.

Collabora is well-known for its strong relationship to upstream development, so it is an important part of this role make significant contributions to upstream projects.

Visit our jobs page or talk me to put you in contact with our Hiring Team!

September 03, 2015 08:44 PM

Paul E. Mc Kenney: Stupid RCU Tricks: Hand-over-hand traversal of linked list using SRCU

Suppose that a very long linked list was to be protected with SRCU. Let's also make the presumably unreasonable assumption that this list is so long that we don't want to stay in a single SRCU read-side critical section for the whole traversal.

So why not try hand-over-hand SRCU protection, as shown in the following code fragment?

  1 struct foo {
  2   struct list_head list;
  3   ...
  4 };
  6 LIST_HEAD(mylist);
  7 struct srcu_struct mysrcu;
  9 void process(void)
 10 {
 11   int i1, i2;
 12   struct foo *p;
 14   i1 = srcu_read_lock(&mysrcu);
 15   list_for_each_entry_rcu(p, &mylist, list) {
 16     do_something_with(p);
 17     i2 = srcu_read_lock(&mysrcu);
 18     srcu_read_unlock(&mysrcu, i1);
 19     i1 = i2;
 20   }
 21   srcu_read_unlock(&mysrcu, i1);
 22 }

The trick is that on each pass through the loop, we enter a new SRCU read-side critical section, then exit the old one. That way the entire traversal is protected by SRCU, but each SRCU read-side critical section is quite short, covering traversal of but a single element of the list.

As is customary with SRCU, the list is manipulated using list_add_rcu(), list_del_rcu, and friends.

What are the advantages and disadvantages of this hand-over-hand SRCU list traversal?

September 03, 2015 05:20 AM

August 31, 2015

Matthew Garrett: Working with the kernel keyring

The Linux kernel keyring is effectively a mechanism to allow shoving blobs of data into the kernel and then setting access controls on them. It's convenient for a couple of reasons: the first is that these blobs are available to the kernel itself (so it can use them for things like NFSv4 authentication or module signing keys), and the second is that once they're locked down there's no way for even root to modify them.

But there's a corner case that can be somewhat confusing here, and it's one that I managed to crash into multiple times when I was implementing some code that works with this. Keys can be "possessed" by a process, and have permissions that are granted to the possessor orthogonally to any permissions granted to the user or group that owns the key. This is important because it allows for the creation of keyrings that are only visible to specific processes - if my userspace keyring manager is using the kernel keyring as a backing store for decrypted material, I don't want any arbitrary process running as me to be able to obtain those keys[1]. As described in keyrings(7), keyrings exist at the session, process and thread levels of granularity.

This is absolutely fine in the normal case, but gets confusing when you start using sudo. sudo by default doesn't create a new login session - when you're working with sudo, you're still working with key posession that's tied to the original user. This makes sense when you consider that you often want applications you run with sudo to have access to the keys that you own, but it becomes a pain when you're trying to work with keys that need to be accessible to a user no matter whether that user owns the login session or not.

I spent a while talking to David Howells about this and he explained the easiest way to handle this. If you do something like the following:
$ sudo keyctl add user testkey testdata @u
a new key will be created and added to UID 0's user keyring (indicated by @u). This is possible because the keyring defaults to 0x3f3f0000 permissions, giving both the possessor and the user read/write access to the keyring. But if you then try to do something like:
$ sudo keyctl setperm 678913344 0x3f3f0000
where 678913344 is the ID of the key we created in the previous command, you'll get permission denied. This is because the default permissions on a key are 0x3f010000, meaning that the possessor has permission to do anything to the key but the user only has permission to view its attributes. The cause of this confusion is that although we have permission to write to UID 0's keyring (because the permissions are 0x3f3f0000), we don't possess it - the only permissions we have for this key are the user ones, and the default state for user permissions on new keys only gives us permission to view the attributes, not change them.

But! There's a way around this. If we instead do:
$ sudo keyctl add user testkey testdata @s
then the key is added to the current session keyring (@s). Because the session keyring belongs to us, we possess any keys within it and so we have permission to modify the permissions further. We can then do:
$ sudo keyctl setperm 678913344 0x3f3f0000
and it works. Hurrah! Except that if we log in as root, we'll be part of another session and won't be able to see that key. Boo. So, after setting the permissions, we should:
$ sudo keyctl link 678913344 @u
which ties it to UID 0's user keyring. Someone who logs in as root will then be able to see the key, as will any processes running as root via sudo. But we probably also want to remove it from the unprivileged user's session keyring, because that's readable/writable by the unprivileged user - they'd be able to revoke the key from underneath us!
$ sudo keyctl unlink 678913344 @s
will achieve this, and now the key is configured appropriately - UID 0 can read, modify and delete the key, other users can't.

This is part of our ongoing work at CoreOS to make rkt more secure. Moving the signing keys into the kernel is the first step towards rkt no longer having to trust the local writable filesystem[2]. Once keys have been enrolled the keyring can be locked down - rkt will then refuse to run any images unless they're signed with one of these keys, and even root will be unable to alter them.

[1] (obviously it should also be impossible to ptrace() my userspace keyring manager)
[2] Part of our Secure Boot work has been the integration of dm-verity into CoreOS. Once deployed this will mean that the /usr partition is cryptographically verified by the kernel at runtime, making it impossible for anybody to modify it underneath the kernel. / remains writable in order to permit local configuration and to act as a data store, and right now rkt stores its trusted keys there.

comment count unavailable comments

August 31, 2015 05:18 PM

August 26, 2015

James Morris: Linux Security Summit 2015 – Wrapup, slides

The slides for all of the presentations at last week’s Linux Security Summit are now available at the schedule page.

Thanks to all of those who participated, and to all the events folk at Linux Foundation, who handle the logistics for us each year, so we can focus on the event itself.

As with the previous year, we followed a two-day format, with most of the refereed presentations on the first day, with more of a developer focus on the second day.  We had good attendance, and also this year had participants from a wider field than the more typical kernel security developer group.  We hope to continue expanding the scope of participation next year, as it’s a good opportunity for people from different areas of security, and FOSS, to get together and learn from each other.  This was the first year, for example, that we had a presentation on Incident Response, thanks to Sean Gillespie who presented on GRR, a live remote forensics tool initially developed at Google.

The keynote by sysadmin, Konstantin Ryabitsev, was another highlight, one of the best talks I’ve seen at any conference.

Overall, it seems the adoption of Linux kernel security features is increasing rapidly, especially via mobile devices and IoT, where we now have billions of Linux deployments out there, connected to everything else.  It’s interesting to see SELinux increasingly play a role here, on the Android platform, in protecting user privacy, as highlighted in Jeffrey Vander Stoep’s presentation on whitelisting ioctls.  Apparently, some major corporate app vendors, who were not named, have been secretly tracking users via hardware MAC addresses, obtained via ioctl.

We’re also seeing a lot of deployment activity around platform Integrity, including TPMs, secure boot and other integrity management schemes.  It’s gratifying to see the work our community has been doing in the kernel security/ tree being used in so many different ways to help solve large scale security and privacy problems.  Many of us have been working for 10 years or more on our various projects  — it seems to take about that long for a major security feature to mature.

One area, though, that I feel we need significantly more work, is in kernel self-protection, to harden the kernel against coding flaws from being exploited.  I’m hoping that we can find ways to work with the security research community on incorporating more hardening into the mainline kernel.  I’ve proposed this as a topic for the upcoming Kernel Summit, as we need buy-in from core kernel developers.  I hope we’ll have topics to cover on this, then, at next year’s LSS.

We overlapped with Linux Plumbers, so LWN was not able to provide any coverage of the summit.  Paul Moore, however, has published an excellent write-up on his blog. Thanks, Paul!

The committee would appreciate feedback on the event, so we can make it even better for next year.  We may be contacted via email per the contact info at the bottom of the event page.

August 26, 2015 07:09 PM

August 24, 2015

Davidlohr Bueso: LPC 2015: Performance and Scalability MC

This year I had the privilege of leading the Performance and Scalability micro-conference for Linux Plumbers. The goals and motivation behind organizing this track were threefold. First present relevant work-in-progress ideas that can improve performance in core kernel subsystems, and need some face to face discussion -- as such, this requires previous debate on lkml. Similarly, learn about real bottlenecks and issues people are running into. And finally, get to know more relevant academic (experimental) work going on in in both the kernel and system-level userland. As such, the sessions were grouped as follows:

(i) Fast Bounded-Concurrency Hash Tables. Samy Bahra introduced a novel non-blocking multi-reader/single writer hash table with strong forward progress guarantees for TSO. Because the common-case fastpath does not incur in barriers or atomic operations, this technique allows nearly perfect scaling. While his work is done in userspace, he sees potential for it in the kernel, such as the networking subsystem. In such situations, the use of RCU (readers being the common case) might also be used.

(ii) Improving Transactional Memory Performance with Queued Locking. While transactional memory  works nicely in conflict-free setups, it ends up requiring common serialization otherwise. An option is to retry, however, when the amount of threads executing in the CR is larger than the amount of completed threads, you can get pileups. Tim Chen presented a solution based on applying a sort of 'aperture' and using principles based on MCS for faired queuing, where can be regulated based on metrics such as the number of threads in the critical region and abort rate.

(iii) How to Apply Mutation Testing to RCU. Iftekhar Ahmed from OSU,
summarized his research in overcoming limitations of mutation testing to identify problems in RCU. As usual, working with Paul McKenney, they have been able to identify a number of mutants along with making use of rcutorture for specific periods of time. They generated ~3300 mutants from rcu and rcutorture is doing a good job identifying them. It would be interesting to see this applied along with fuzzy testing which has already uncovered several bugs in RCU in the past.

Scaling track -- LPC'15, Seattle.

 (iv) Unfair Queued Spinlocks and Transactional Locks. Waiman Long has been working on extending spinlocks and apply them to solve issues with transactional memory. He presented experiments based on rwlocks and transactional spinlock (new primitive) for transactional (reader) and non-transactional (writer) executions. This talk nicely complemented Tim Chen's previous presentation. He also touched on the qspinlock performance in virtualized environments and the challenges currently out there. As we already have code for this, it was much easier to discuss face to face. Consensus in the room was that kernel developers are not against improving pv spinlocks, but what is determined is that we will not accept a 3rd primitive.

(v) Do Virtual Machines Really Scale. Sanidhya Kashyap
from GA Tech showed us the state of scalability in the cloud where there is a clear trend that services hit poor scalability after certain degrees of contention/core-count. These are LHP issues and vmexits/enters cause performance issues at high vcpu counts. He introduces oticket backed by performing multiple wakeups at once when granting the lock. Good feedback and suggestions to overcome some of the presented issues with the approach. This was an extra short BoF like of presentation, but there was quite a bit of interest, and the appropriate people were in the room.

Overall I would say that all three objectives were met and the quality of the sessions were high, thus meeting all expectations (if not, please email me for feedback ;-). In fact, there were some highly interesting and relevant presentations that, due to time constraints, had to be left out.

August 24, 2015 09:05 PM

August 19, 2015

Matt Domsch: Dell Desktop / Notebook Linux Engineering position available

Come help Dell ensure Linux “just works!” on Dell notebooks, desktops, and devices! The Dell Client Linux Engineering team has opening for a Senior Software Engineer. This team works closely with the Linux community, device manufacturers, and Dell engineering teams to provide the best Linux experience across the entire client product line.

Visit the Dell Jobs site to apply. If you’re a friend of mine and are interested, drop me a line and I’ll make sure you get in front of the hiring manager quickly!

August 19, 2015 09:31 PM

LPC 2015: Bird-of-a-Feather Sessions

We have a great slate of bird-of-a-feather (BoF) sessions on Thursday evening! However, there are still a few BoF slots left, so proposals are still welcome here. First come, first served!

August 19, 2015 09:22 PM

August 18, 2015

Matthew Garrett: Canonical's deliberately obfuscated IP policy

I bumped into Mark Shuttleworth today at Linuxcon and we had a brief conversation about Canonical's IP policy. The short summary:

The even shorter summary: Canonical won't clarify their IP policy because they believe they can make more money if they don't.

Why do I keep talking about this? Because Canonical are deliberately making it difficult to create derivative works, and that's one of the core tenets of the definition of free software. Their IP policy is fundamentally incompatible with our community norms, and that's something we should care about rather than ignoring.

comment count unavailable comments

August 18, 2015 07:02 PM

August 17, 2015

Andi Kleen: Announcing simple-pt — A simple Processor Trace implementation

Modern Intel Core CPUs (5th and 6th generation) have a Intel Processor Trace (PT) feature to trace branch execution with low overhead. This is useful for performance analysis and debugging.

simple-pt is a simple standalone driver and decoder tool to implement PT on Linux.

Starting with Linux 4.1 Linux already has a integrated PT implementation in perf (see ). simple-pt is an alternative implementation. It has many disadvantages over the perf PT implementation, such as:
- needs to run as root
- no long term tracing or sampling with interrupts
- no support for interactive debugging (use gdb 7.10 on perf for that)
- no support for histograms
- somewhat experimental
- not as well supported as perf

On the positive side simple-pt is:
- simple
- standalone. No kernel changes needed. Could be ported to older kernels or other operating systems
- easy to modify and experiment with
- more ftrace like decoding tool
- support for kprobes based triggers
- modular “unix style” design with simple tools that do only one thing each
- BSD licensed

Example output:

        % sptcmd  -c tcall taskset -c 0 ./tcall
        cpu   0 offset 1027688,  1003 KB, writing to ptout.0
        Wrote sideband to ptout.sideband
        % sptdecode --sideband ptout.sideband --pt ptout.0 | less
        frequency 32
        0        [+0]     [+   1] _dl_aux_init+436
                          [+   6] __libc_start_main+455 -> _dl_discover_osversion
                          [+  13] __libc_start_main+446 -> main
                          [+   9]     main+22 -> f1
                          [+   4]             f1+9 -> f2
                          [+   2]             f1+19 -> f2
                          [+   5]     main+22 -> f1
                          [+   4]             f1+9 -> f2
                          [+   2]             f1+19 -> f2
                          [+   5]     main+22 -> f1

Available from

August 17, 2015 04:27 AM

August 16, 2015

Daniel Vetter: Atomic Modesetting Design Overview

After a few years of development the atomic display update IOCTL for drm drivers is finally ready for prime time with the 4.2 pull request from Dave Airlie. It's been a long road, with a lot of drivers already converted over to atomic and even more in progress, the atomic helper libraries and support code in the drm subsystem sufficiently polished. But what's really missing is a design overview of what the overall atomic infrastructure looks like and why some decisions and details are implemented like they are.

That's now done and published on LWN: Part 1 talks about the problem space, issues with the Android atomic display framework and the basic atomic IOCTL interface. Part 2 goes into more detail about a few specific things like locking, helper library design and the exact semantics of atomic modessetting updates. Happy Reading!

August 16, 2015 01:52 PM

August 15, 2015

Rusty Russell: Broadband Speeds, New Data

Thanks to edmundedgar on reddit I have some more accurate data to update my previous bandwidth growth estimation post: OFCOM UK, who released their November 2014 report on average broadband speeds.  Whereas Akamai numbers could be lowered by the increase in mobile connections, this directly measures actual broadband speeds.

Extracting the figures gives:

  1. Average download speed in November 2008 was 3.6Mbit
  2. Average download speed in November 2014 was 22.8Mbit
  3. Average upload speed in November 2014 was 2.9Mbit
  4. Average upload speed in November 2008 to April 2009 was 0.43Mbit/s

So in 6 years, downloads went up by 6.333 times, and uploads went up by 6.75 times.  That’s an annual increase of 36% for downloads and 37% for uploads; that’s good, as it implies we can use download speed factor increases as a proxy for upload speed increases (as upload speed is just as important for a peer-to-peer network).

This compares with my previous post’s Akamai’s UK numbers of 3.526Mbit in Q4 2008 and 10.874Mbit in Q4 2014: only a factor of 3.08 (26% per annum).  Given how close Akamai’s numbers were to OFCOM’s in November 2008 (a year after the iPhone UK release, but probably too early for mobile to have significant effect), it’s reasonable to assume that mobile plays a large part of this difference.

If we assume Akamai’s numbers reflected real broadband rates prior to November 2008, we can also use it to extend the OFCOM data back a year: this is important since there was almost no bandwidth growth according to Akamai from Q4 2007 to Q7 2008: ignoring that period gives a rosier picture than my last post, and smells of cherrypicking data.

So, let’s say the UK went from 3.265Mbit in Q4 2007 (Akamai numbers) to 22.8Mbit in Q4 2014 (OFCOM numbers).  That’s a factor of 6.98, or 32% increase per annum for the UK. If we assume that the US Akamai data is under-representing Q4 2014 speeds by the same factor (6.333 / 3.08 = 2.056) as the UK data, that implies the US went from 3.644Mbit in Q4 2007 to 11.061 * 2.056 = 22.74Mbit in Q4 2014, giving a factor of 6.24, or 30% increase per annum for the US.

As stated previously, China is now where the US and UK were 7 years ago, suggesting they’re a reasonable model for future growth for that region.  Thus I revise my bandwidth estimates; instead of 17% per annum this suggests 30% per annum as a reasonable growth rate.

August 15, 2015 04:54 AM

August 14, 2015

Pete Zaitcev: Tablet Uber Alles Or Is It

Given the trouble with modern laptops, I'm seriously thinking if I should make a jump to a gigantic tablet with a keyboard. You run "make" on VM. Not enough RAM? Order in the cloud! The idea was planted in my mind by that jerk Atwood, who penned an article claiming a death of PC. And a month ago I saw someone at Python meetup using Canopy. It kinda worked, actually. I expect Github Atom to be even better.

Unfortunately, there are problems in 3 broad categories still.

First, the hotspot Internet connectivity sucks. It is plain unreliable. VPN, ssh, and IRC are often blocked; it's necessary to remember "Connectivity Through Anything" lessons and tehcniques. When it works, it's often slow. These problems extend to venues such as Intel's Executive Briefing Center. If "executives" eating their awesome snacks cannot obtain a decent WiFi, what hope do I have? I do not have cellphone data, but I hear bitching about it.

Second, the usual questions about privacy and security apply. Non-proprietary tablets suck immensely, from what I heard.

Third, tablets top out at 10..11 inch. Sorry, but that is not enough to kill laptops while laptops continue to be made. Certainly, Atwood made an argument that as tablets absorb users, PC makers will stop. The day the last one quits, we'll have to use the least shitty tablet regardless of size. But today is not that day.

UPDATE: 3 weeks after this post, Apple unveiled a 12.9" (2732 x 2048) iPad Pro, with a keyboard as a factory option.

August 14, 2015 09:38 PM

Pete Zaitcev: User-facing hardware

New business trip, new hardware pictures.

It was almost a year, and I'm still looking for a decent laptop, same criteria. I saw a couple of guys using Lenovo X1 Carbon, which looks good. Most importantly, the left Ctrl is now extends to its proper position. Almost a winner, but unfortunately, there are issues. Apparently, the screen on the X1 is not touching the main frame flat when it's closed, so a bundle of clothing pressing in the middle between the hinges is capable to making a nasty crack in plastic. Not acceptable for what is a $1,400 laptop even with Amazon's "discount" of $900. Way to go, Lenovo. Almost had me this time.

Meanwhile, a $500 Dell Vostro continues to soldier on. It's showing its age: building Ceph with "make -j${N}" requires more RAM that it has for any reasonable N, and dialog windows started to outgrow its screen (notably, some of GNOME preferences). I still need a laptop, but can't find a suitable one. The Lenovo X1 tops out at 8GB, which was another strike against it.

I was a little sad when Google stopped making Nexus 7. I have a 2013 version and it is quite good. In the same meeting, I bumped into a guy with a projected update to Nexus 7 that became orphaned when Google pulled the plug. ASUS continued to build them and market them as "MemoPad 7". However, taking the page from Microsoft playbook with their "Surface" and "Surface Pro", ASUS sell "MemoPad 7" versions ranging from worthless piece of junk with 1024x600 to actual Nexus 7 replacements with 1920x1200. Allegedly, the battery life and speed are much improved by using Intel's embedded Atom core. Some of the ARM-optimized apps may not work (example is some kind of music editing thing for podcasters).

August 14, 2015 09:18 PM

August 13, 2015

Dave Jones: The case of the mysterious disappearing I211

Day one of unemployed life saw me finally getting around to the first of several hardware related maintenance items that I’ve been putting off until I’ve had the time.

I got a lot of life out of my desktop machine that I had been using since 2007. Earlier this year, I decided it was long overdue an upgrade, and ended up building a ridiculously over-specced machine in the hopes it too would last me a while. After some research, I ended up with a 6-core Haswell-E i7-5820K, and a frankly ridiculously over-featured motherboard.
Once I had delved through the absurd number of BIOS options to convince it that I *really* didn’t want to overclock my CPU or my RAM, or anything else, it was very stable.

It has exceeded all my expectations. In the time it took my old desktop to build one kernel, I can build kernel .deb’s for every machine I own, and still have time spare. It’s an absolute beast.

One of the features that sold me on this board was the two onboard ethernet ports. I had been wanting to do a bunch of networking experiments, and the possibility of using bonding, without having to screw around with add-in cards was appealing.

So I was a little irked one evening after updating its BIOS, to notice that the bond only had one interface active. After some investigation, I noticed that the PCI ID of one of the onboard NICs had changed.

What was once

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
08:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

Was now

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V (rev 05)
08:00.0 Ethernet controller: Intel Corporation Device 1532 (rev 03)

My I211 had changed its PCI ID, and the e1000 driver wouldn’t bind to this new device.

At first I thought “Cool, some kind of NIC firmware update”, and assumed that e1000 hadn’t been updated yet to support this new feature. Googling for “i211 1532” told a much sadder story however.

If you read the spec update for the i211, you find this interesting table:

I211 Device ID Code Vendor ID Device ID Revision ID1
WGI211AT (not programmed/factory default) 0x8086 0x1532 0x3
WGI211AT (programmed) 0x8086 0x1539 0x3

Uh, not cool. Somehow the BIOS update procedure had wiped the NVRAM on the NIC.

A long protracted conversation with ASUS support followed, including such gems as “I understand you’re seeing blue screens” and “Have you tried removing the DIMMs, rubbing the contacts with an eraser and replacing them”. Eventually I think they got to the end of their script, and agreed to RMA the board. Somewhat annoying, given there’s probably a tool somewhere that can rewrite the flash, but Intel only seems to make that available to integrators, not end-users, and the ASUS representatives denied all knowledge.

It was gone for about two weeks, and finally returned yesterday. Its PCI ID is 0x1539 again, and it has its old MAC address once more. (I’m now hesitant to ever upgrade the BIOS on this machine again). So what happened ? Anyone’s guess, but this isn’t the first time I’ve seen this happen. We had a bunch of these NICs at Akamai too that occasionally had the same thing happen to them.

The whole thing is reminiscent of a painful old bug where ftrace would corrupt the e1000e ROM. Hopefully Linux isn’t to blame this time.

So, long story short: If you see an i211 with a PCI ID of 1532, you’re looking at an RMA.

The post The case of the mysterious disappearing I211 appeared first on

August 13, 2015 04:09 PM

LPC 2015: LPC closing party on the water

The Linux Plumbers Conference will have its closing party on Friday, August 21 at the Palisade Restaurant at Elliott Bay Marina on the waters of Puget Sound. Buses will be leaving from the Sheraton around 18:00 for the 15-minute journey to the restaurant. The evening will start with a champagne and seafood tower reception in the courtyard overlooking the marina. It will include shrimp, lobster, and oysters, along with plenty of vegetarian choices. There will be a buffet inside the restaurant after that with entrees including sushi (with vegetarian selections), salmon, risotto, and steak. All of that will be followed by dessert and coffee. There will be local wine and beer selections and all of the food will be locally sourced as much as possible.

It should make for a fabulous evening, with great views (perhaps even of Mount Rainier), and excellent company. We look forward to seeing you on Friday.

August 13, 2015 02:13 PM

August 12, 2015

LPC 2015: How to find Room WSCC 3AB

This year, because of space constraints, the single Wednesday Microconference track (for LLVM in the morning and the Development Tools Tutorial in the afternoon, see schedule for details) is happening offsite at the Washington State Convention Centre (WSCC).  Breakfast will still be at the Sheraton, but the rest of the Microconference will happen over at WSCC in room 3AB.  To get there from the Sheraton, take the escalators down to the ground floor, exit at the corner of Pike and 6th Avenue. Turn right on to Pike, cross 7th Avenue, continue a little way up Pike then turn right into the Convention Centre. Room 3AB is up the escalators on the third floor.



Wireless in the Washington State Convention Centre is on a different system.  Unfortunately it’s portal based, not WEP key based, so we can’t hide the difference.  The portal details are

SSID: Exhibitor Internet

Which is an open network, then browse to any page and enter the login at the prompt:
USER ID: linux
Password: seattle

August 12, 2015 01:34 AM

August 11, 2015

Dave Jones: Moving on from Akamai.

Today was my last day at Akamai. It’s been brief (Just over seven months), but things weren’t really working out for me there for a number of reasons. I’ve mentioned to a number of people who have known about my decision for a while, that it’s not that it’s a bad place to work, but it never felt like a good fit for me, and I came to realize that I’ve spent most of this last year being in denial of just how unhappy I was, in the hope “things would get better”.

There are a lot of smart people working there, working on really difficult problems, but a lot of those problems just don’t align with my interests, especially when they don’t always involve contributing code back upstream. [clarification: There is some upstream work going on there, just not as much as I’d like].

Add to this my disdain for some of the proprietary tooling that’s prevalent there, and it was becoming clear it was not a matter of “if”, but “when” I was going to leave. As an example; I joked a few months ago to co-workers “next time I’m looking for a job, the first question I ask is ‘do you use perforce’?”. Only it wasn’t really a joke, I was dead serious. User-hostile software has no place in my life.
Even little things like “let’s use git” translating to “let’s license Atlassian stash” rather than “run a git-daemon somewhere” started getting me down.

The final project I worked on there was a continuous rebase strategy for the kernel, moving away from perforce to git. It’s a move in the right direction, but ultimately, not the sort of work that gets me excited, and it’s going to be a multi-year project before it starts really bearing fruit. Given how perforce is ingrained in so many of Akamai’s systems, it would also have been extremely unlikely I’d have been able to purge all knowledge of ever having used it.

The rebase work itself also started to bother me that many of the kernel changes we made had no chance of ever even being submitted, let alone accepted upstream. (In part because many of them are very unique to Akamai’s CDN — you won’t find any of the trickery employed there described in a Richard Stevens book, and they’re unlikely to ever be official RFC’s due to the competitive edge they gain from those changes).
There are exceptions to all of this, and the kernel team is trying to do a better job there with upstreaming most of the newer changes, but many of the older legacy patches are under-documented, and/or understood well by few people, with the original authors no longer around, making it a frustrating exercise to get up to speed; especially when you’re trying to learn what the upstream code is doing at the same time.

Someone with less experience dealing exclusively with open-source for most of their career would probably find many of my reasons for leaving trivial. Those same people would probably find Akamai a great place to work. There are a lot of opportunities there if you have a higher tolerance for such things than I did. It was eye-opening recently, mentoring some of the interns there. Optimism. The unjaded outlook that comes with youth. Not getting bent out of shape at crappy tooling because they don’t know different. It made me realize I wasn’t going to ever be like this here.

On a particularly bad day a few weeks back, a recruiter reached out to me, to find out if I was interested in a second chance at an offer I received last time I was looking for a new job. It worked. Enduring an unhappy situation in the hopes things will get better isn’t a great strategy when there are other options.

So, I start at Facebook in September.

I have no delusions that things are going to be perfect there, but at least from the outside right now, the grass looks greener. I feel bad walking away from problems unfinished, but going home miserable or angry or some other negative emotion every day was really starting to get take its toll. It’s not a healthy way to live.

When I was interviewing last December, I read Being Geek to death, so it’s fitting that I’ve picked it up again recently. One paragraph in particular jumps out at me.

My single worst gig was one where I got everything I wanted out the of the offer letter, but in my exuberance for being highly valued, I totally forgot that my gut read on the gig was "meh". Ninety days later, I couldn't care less that I got a 15% raise and a sign-on bonus. I couldn't stand the mundanity of the daily work, and I happily resigned a few months later, taking both a pay cut and returning my sign-on bonus for the opportunity to work at Netscape.

Anachronisms and minor details aside, that paragraph played through my head this afternoon as I wrote the check to pay back the remainder of my sign-on bonus. I wasn’t quite thinking “meh”, but I knew I was making compromises on what I really valued from day one.

Walking away from unvested RSUs, giving up this months paycheck, and writing that check stings a little, but when I did my exit interview this morning, I knew that I too, was “happily resigning” for a great opportunity.

I’m feeling uncharacteristically optimistic right now. Hopefully it’ll last.

I’ll be in Seattle next week, but due to complications with my registration being transferred to another Akamai employee, I won’t actually be at the Linux plumbers conf. If you’re also going to be there and want to catch up, drop me a mail, or <ahem> hit me up on facebook.

The post Moving on from Akamai. appeared first on

August 11, 2015 10:33 PM

Pete Zaitcev: git submodule

It's a familiar sign to anyone dealing with a project that includes submodules: you run "make" and see something like this:

rgw/ In member function ‘virtual int RGWMongooseFrontend::run()’:
rgw/ error: ‘struct mg_callbacks’ has no member named ‘log_access’
cb.log_access = rgw_civetweb_log_access_callback;

Ah, yes. Submodule civetweb is obviously out of date. Type "git submodule init; git submodule update" and... nothing happens. The goddamn submodules are stuck.

At this point, running "git diff origin" produces an output like:

--- a/ceph-object-corpus
+++ b/ceph-object-corpus
@@ -1 +1 @@
-Subproject commit 20351c6bae6dd4802936a5a9fd76e41b8ce2bad0
+Subproject commit bb3cee6b85b93210af5fb2c65a33f3000e341a11

So yeah, obviously you fetched the right thing from the origin, but you cannot merge or rebase no matter what. You may spend a good part of a hackathon reading man pages for git subcommands, all for naught.

Fortunately, the stuck submodules can be worked around, by looking at the "git diff origin" above, then doing this:

git update-index --replace --cacheinfo 160000,20351c6bae6dd4802936a5a9fd76e41b8ce2bad0,ceph-object-corpus

You get the idea: force the right commit from the origin into the local index. This allows "git submodule update" to clone and checkout the right thing and you're off to the races. The fixups in the index will stick out in "git status", so create an empty commit to get rid of them (but only after "git submodule update").

When you're done, you might want to kick in the nuts whoever chose to use submodules in your project.

P.S. "git --version" yields "git version 2.4.3".

P.P.S. You verify what you have in the index by running "git ls-files -s ceph-object-corpus" (or src/civetweb). The mode must be 160000 and the hash should match the upstream. Note that "git diff origin" continues to display a disparity until you've run the "git submodules update".

August 11, 2015 03:08 AM

Pete Zaitcev: the future is here


10005 zaitcev   20   0  809920 755384  13220 R  99.7 12.5   0:20.47 cc1plus
 9894 zaitcev   20   0 1946748 1.806g  15800 R  99.3 31.4   1:46.60 cc1plus
 9956 zaitcev   20   0 1652076 1.524g  15832 R  99.0 26.5   1:30.64 cc1plus
   72 root      20   0       0      0      0 S   4.0  0.0   0:04.60 kswapd0
 9957 zaitcev   20   0   56648  43536   1436 S   2.7  0.7   0:00.49 as
 9895 zaitcev   20   0   79480  66368   1480 S   2.0  1.1   0:00.89 as
 2870 zaitcev   20   0 1989524 533104 160868 S   1.3  8.9  60:28.10 firefox
 2035 zaitcev   20   0 2018216 166872  20028 S   0.7  2.8  16:50.66 gnome-sh

That's right, boys and girls, a compiler with a bigger resident size than Firefox. Three times bigger.

August 11, 2015 02:12 AM

August 10, 2015

Lucas De Marchi: “Throw away” linux images in seconds

Generating a new rootfs from scratch in order to test changes to early parts of the software stack or just to have a pristine environment is something I needed several times in the past.

Since I use Archlinux in my desktop something that I like is to have a similar environment in the target test rootfs. I decided to re-use and improve a script from Kay Sievers to create an installer that can be booted as a VM, as a container or in bare metal: Originally  it was a script to bootstrap a Fedora image and I think that with some small changes that would still be possible.

$ time sudo -l ~/vm/test.img
real 0m31.238s
user 0m22.277s
sys 0m2.473s

30 seconds later I have a complete pristine image that can be used as a VM with qemu, as a container with systemd-nspawn or just copied to a pendrive/sdcard to boot for example a Minnow Board Max.


$ sudo systemd-nspawn -b -i ~/vm/test.img


sudo kvm-that ~/vm/test.img

Note: ‘kvm-that’ is also a script available in the same repository so I don’t have to type all the options to qemu.

In order to boot another computer or a board like Minnow Board Max just dd the image to a usb disk or sdcard. You can also generate the image directly to the final destination:

$ sudo -l /dev/mmcblk0

The script has also some nice options to make it easy to customize the final image.  One thing that I’m often doing is giving an overlay directory with configuration files for wpa_supplicant. This way I can already access my WiFi networks in the target image.

If you always need certain packages you can use the  example debug-tools hook that is executed before the image is finalized. By mixing hooks like that and the overlay directory mentioned above it’s possible to add your local repository to pacman.conf and install packages not available in Archlinux. Or packages that you’d like to maintain on your own. In my use cases with Minnow Board Max I maintain my own kernel with configurations suited to run ardupilot on it.

August 10, 2015 03:44 PM

August 08, 2015

Michael Kerrisk (manpages): man-pages-4.02 is released

I've released man-pages-4.02. The release tarball is available on The browsable online pages can be found on The Git repository for man-pages is available on

This release resulted from patches, bug reports,and  comments from around 15 contributors. As well as a large number of minor fixes to nearly 400 man pages, the more significant changes in man-pages-4.02 include the following:

August 08, 2015 10:10 PM

Matthew Garrett: Difficult social problems are still difficult problems

After less than a week of complaints, the TODO group have decided to pause development of their code of conduct. This seems to have been triggered by the public response to the changes I talked about here, which TODO appear to have been completely unprepared for.

While disappointing in a bunch of ways, this is probably the correct decision. TODO stumbled into this space with a poor understanding of the problems that they were trying to solve. Nikki Murray pointed out that the initial draft lacked several of the key components that help ensure less privileged groups can feel that their concerns are taken seriously. This was mostly rectified last week, but nobody involved appeared to be willing to stand behind those changes in a convincing way. This wasn't helped by almost all of this appearing to land on Github's plate, with the rest of the TODO group largely missing in action[1]. Where were Google in this? Yahoo? Facebook? Left facing an angry mob with nobody willing to make explicit statements of support, it's unsurprising that Github would try to back away from the situation.

But that doesn't remove their blame for being in the situation in the first place. The statement claims
We are consulting with stakeholders, community leaders, and legal professionals, which is great. It's also far too late. If an industry body wrote a new kernel from scratch and deployed it without any external review, then discovered that it didn't work and only then consulted any of the existing experts in the field, we'd never take them seriously again. But when an industry body turns up with a new social policy, fucks up spectacularly and then goes back to consult experts, it's expected that we give them a pass.

Why? Because we don't perceive social problems as difficult problems, and we assume that anybody can solve them by simply sitting down and talking for a few hours. When we find out that we've screwed up we throw our hands in the air and admit that this is all more difficult than we imagined, and we give up. We ignore the lessons that people have learned in the past. We ignore the existing work that's been done in the field. We ignore the people who work full time on helping solve these problems.

We wouldn't let an industry body with no experience of engineering build a bridge. We need to accept that social problems are outside our realm of expertise and defer to the people who are experts.

[1] The repository history shows the majority of substantive changes were from Github, with the initial work appearing to be mostly from Twitter.

comment count unavailable comments

August 08, 2015 08:09 PM

August 05, 2015

Andi Kleen: Generating Flame graphs with Processor Trace

How to generate a FlameGraph with Processor Trace. Everybody loves Flame Graphs.

Processor trace allows to do as very exact histograms of a program’s run time. Normal sampling has shadow effects, which can hide some details. Processor traces every branch, so it can be much more accurate than normal sampling.

You need a Intel Broadwell or Skylake CPU.
Running at 4.1 or later Linux kernel where perf supports PT.
You can verify the kernel supports pt with

ls /sys/devices/intel_pt

You need perf user tools built from
(this should soon be fixed when the user tools code is merged into Linux mainline)

Build perf with PT support

# set up https_proxy as needed
git clone
cd linux-perf/tools/perf

Copy the resulting perf binary to where you want to run it

Get the flamegraph code

git clone

Collect data from the workload. Best to not collect too long traces as they take much longer to process and may need too much disk space.

perf record -e intel_pt// workload (or -a sleep 1 to collect 1s globally)

Decode the data. This may take quite some time

perf script --itrace=i100usg | /path/to/FlameGraph/ | > workload.folded

The i100us means the trace decoder samples an instruction every 100us. This can be made more accurate (down to 1ns), at the cost of longer decoding time. The ‘g’ tells the decoder to add callgraphs.

Then generate the Flamegraph with

/path/to/FlameGraph/ workloaded.folded > workload.svg

Then view the resulting SVG in a SVG viewer, such as google chrome

google-chrome workload.svg

It is possible to click around.

Here’s a larger svg example from a gcc build (2.5MB). May need chrome or firefox to view.

In principle the trace also has support for more information not in normal sampling, such as determining the exact run time of individual functions from the trace. This is unfortunately not (yet?) supported by the Flame Graph tools.

August 05, 2015 11:13 PM

August 04, 2015

Matthew Garrett: Reverse this

The TODO group is an industry body that appears to be trying to define community best practices or something. I don't really know what their backstory is and whether they're trying to do meaningful work or just provide a fig leaf of respectability to organisations that dislike being criticised for doing nothing to improve the state of online communities but don't want to have to actually do anything, and their initial work on codes of conduct was, perhaps, suboptimal. But they do appear to be trying to improve things - this commit added a set of inappropriate behaviours, and also clarified that reverseisms were not actionable behaviour.

At which point Reddit lost its shit, because Reddit is garbage. And now the repository is a mess of white men attempting to explain how any policy that could allow them to be criticised is the real racism.

Fuck that shit.

Being a cis white man who's a native English speaker from a fairly well-off background, I'm pretty familiar with privilege. Spending my teenage years as an atheist of Irish Catholic upbringing in a Protestant school in a region of Northern Ireland that made parts of the bible belt look socially progressive, I'm also pretty familiar with the idea that that said privilege doesn't shield me from everything bad in life. Having privilege isn't a guarantee that my life will be better, in the same way that avoiding smoking doesn't mean I won't die of lung cancer. But there's an association in both cases, one that's strong enough to alter the statistical likelihood in meaningful ways.

And that inherently affects discussions about race or gender or sexuality. The probability that I've been subject to systematic discrimination because of these traits is vanishingly small. In the communities this policy is intended to cover, I'm the default. It's very difficult for any minority to exercise power over me. "You're white, you wouldn't understand" isn't fundamentally about my colour, it's about the fact that my colour means I haven't been subject to society trying to make my life more difficult at every opportunity. A community that considers saying that to be racist is a community that will never change the default, a community that will never be able to empower people who didn't grow up with that privilege. A code of conduct that makes it clear that "reverse racism" isn't grounds for complaint makes it clear that certain conversations are legitimate and helps ensure we have the framework we need to gradually change that default, and as such is better than one that doesn't.

(comments disabled because I don't trust any of you)

comment count unavailable comments

August 04, 2015 09:59 PM

Rusty Russell: The Bitcoin Blocksize: A Summary

There’s a significant debate going on at the moment in the Bitcoin world; there’s a great deal of information and misinformation, and it’s hard to find a cogent summary in one place.  This post is my attempt, though I already know that it will cause me even more trouble than that time I foolishly entitled a post “If you didn’t run code written by assholes, your machine wouldn’t boot”.

The Technical Background: 1MB Block Limit

The bitcoin protocol is powered by miners, who gather transactions into blocks, producing a block every 10 minutes (but it varies a lot).  They get a 25 bitcoin subsidy for this, plus whatever fees are paid by those transactions.  This subsidy halves every 4 years: in about 12 months it will drop to 12.5.

Full nodes on the network check transactions and blocks, and relay them to others.  There are also lightweight nodes which simply listen for transactions which affect them, and trust that blocks from miners are generally OK.

A normal transaction is 250 bytes, and there’s a hard-coded 1 megabyte limit on the block size.  This limit was introduced years ago as a quick way of avoiding a miner flooding the young network, though the original code could only produce 200kb blocks, and the default reference code still defaults to a 750kb limit.

In the last few months there have been increasing runs of full blocks, causing backlogs for a few hours.  More recently, someone deliberately flooded the network with normal-fee transactions for several days; any transactions paying less fees than those had to wait for hours to be processed.

There are 5 people who have commit access to the bitcoin reference implementation (aka. “bitcoin-core”), and they vary significantly in their concerns on the issue.

The Bitcoin Users’ Perspective

From the bitcoin users perspective, blocks should be infinite, and fees zero or minimal.  This is the basic position of respected (but non-bitcoin-core) developer Mike Hearn, and has support from bitcoin-core ex-lead Gavin Andresen.  They work on the wallet and end-user side of bitcoin, and they see the issue as the most urgent.  In an excellent post arguing why growth is so important, Mike raises the following points, which I’ve paraphrased:

  1. Currencies have network effects. A currency that has few users is simply not competitive with currencies that have many.
  2. A decentralised currency that the vast majority can’t use doesn’t change the amount of centralisation in the world. Most people will still end up using banks, with all the normal problems.
  3. Growth is a part of the social contract. It always has been.
  4. Businesses will only continue to invest in bitcoin and build infrastructure if they are assured that the market will grow significantly.
  5. Bitcoin needs users, lots of them, for its political survival. There are many people out there who would like to see digital cash disappear, or be regulated out of existence.

At this point, it’s worth mentioning another bitcoin-core developer: Jeff Garzik.  He believes that the bitcoin userbase has been promised that transactions will continue to be almost free.  When a request to change the default mining limit from 750kb to 1M was closed by the bitcoin lead developer Wladimir van der Laan as unimportant, Jeff saw this as a symbolic moment:

Disappointing. New #Bitcoin Core policy: stealth fee increases Zero plan to communicate this to BTC users :(

— Jeff Garzik (@jgarzik) July 21, 2015

What Happens If We Don’t Increase Soon?

Mike Hearn has a fairly apocalyptic view of what would happen if blocks fill.  That was certainly looking likely when the post was written, but due to episodes where the blocks were full for days, wallet designers are (finally) starting to estimate fees for timely processing (miners process larger fee transactions first).  Some wallets and services didn’t even have a way to change the setting, leaving users stranded during high-volume events.

It now seems that the bursts of full blocks will arrive with increasing frequency; proposals are fairly mature now to allow users to post-increase fees if required, which (if all goes well) could make for a fairly smooth transition from the current “fees are tiny and optional” mode of operation to a “there will be a small fee”.

But even if this rosy scenario is true, this begsavoids the bigger question of how high fees can become before bitcoin becomes useless.  1c?  5c?  20c? $1?

So What Are The Problems With Increasing The Blocksize?

In a word, the problem is miners.  As mining has transitioned from a geek pastime, semi-hobbyist, then to large operations with cheap access to power, it has become more concentrated.

The only difference between bitcoin and previous cryptocurrencies is that instead of a centralized “broker” to ensure honesty, bitcoin uses an open competition of miners. Given bitcoin’s endurance, it’s fair to count this a vital property of bitcoin.  Mining centralization is the long-term concern of another bitcoin-core developer (and my coworker at Blockstream), Gregory Maxwell.

Control over half the block-producing power and you control who can use bitcoin and cheat anyone not using a full node themselves.  Control over 2/3, and you can force a rule change on the rest of the network by stalling it until enough people give in.  Central control is also a single point to shut the network down; that lets others apply legal or extra-legal pressure to restrict the network.

What Drives Centralization?

Bitcoin mining is more efficient at scale. That was to be expected[7]. However, the concentration has come much faster than expected because of the invention of mining pools.  These pools tell miners what to mine, in return for a small (or in some cases, zero) share of profits.  It saves setup costs, they’re easy to use, and miners get more regular payouts.  This has caused bitcoin to reel from one centralization crisis to another over the last few years; the decline in full nodes has been precipitous by some measures[5] and continues to decline[6].

Consider the plight of a miner whose network is further away from most other miners.  They find out about new blocks later, and their blocks get built on later.  Both these effects cause them to create blocks which the network ignores, called orphans.  Some orphans are the inevitable consequence of miners racing for the same prize, but the orphan problem is not symmetrical.  Being well connected to the other miners helps, but there’s a second effect: if you discover the previous block, you’ve a head-start on the next one.  This means a pool which has 20% of the hashing power doesn’t have to worry about delays at all 20% of the time.

If the orphan rate is very low (say, 0.1%), the effect can be ignored.  But as it climbs, the pressure to join a pool (the largest pool) becomes economically irresistible, until only one pool remains.

Larger Blocks Are Driving Up Orphan Rates

Large blocks take longer to propagate, increasing the rate of orphans.  This has been happening as blocks increase.  Blocks with no transactions at all are smallest, and so propagate fastest: they still get a 25 bitcoin subsidy, though they don’t help bitcoin users much.

Many people assumed that miners wouldn’t overly centralize, lest they cause a clear decentralization failure and drive the bitcoin price into the ground.  That assumption has proven weak in the face of climbing orphan rates.

And miners have been behaving very badly.  Mining pools orchestrate attacks on each other with surprising regularity; DDOS and block withholding attacks are both well documented[1][2].  A large mining pool used their power to double spend and steal thousands of bitcoin from a gambling service[3].  When it was noticed, they blamed a rogue employee.  No money was returned, nor any legal action taken.  It was hoped that miners would leave for another pool as they approached majority share, but that didn’t happen.

If large blocks can be used as a weapon by larger miners against small ones[8], it’s expected that they will be.

More recently (and quite by accident) it was discovered that over half the mining power aren’t verifying transactions in blocks they build upon[4].  They did this in order to reduce orphans, and one large pool is still doing so.  This is a problem because lightweight bitcoin clients work by assuming anything in the longest chain of blocks is good; this was how the original bitcoin paper anticipated that most users would interact with the system.

The Third Side Of The Debate: Long Term Network Funding

Before I summarize, it’s worth mentioning the debate beyond the current debate: long term network support.  The minting of new coins decreases with time; the plan of record (as suggested in the original paper) is that total transaction fees will rise to replace the current mining subsidy.  The schedule of this is unknown and generally this transition has not happened: free transactions still work.

The block subsidy as I write this is about $7000.  If nothing else changes, miners would want $3500 in fees in 12 months when the block subsidy halves, or about $2 per transaction.  That won’t happen; miners will simply lose half their income.  (Perhaps eventually they form a cartel to enforce a minimum fee, causing another centralization crisis? I don’t know.)

It’s natural for users to try to defer the transition as long as possible, and the practice in bitcoin-core has been to aggressively reduce the default fees as the bitcoin price rises.  Core developers Gregory Maxwell and Pieter Wuille feel that signal was a mistake; that fees will have to rise eventually and users should not be lulled into thinking otherwise.

Mike Hearn in particular has been holding out the promise that it may not be necessary.  On this he is not widely supported: that some users would offer to pay more so other users can continue to pay less.

It’s worth noting that some bitcoin businesses rely on the current very low fees and don’t want to change; I suspect this adds bitterness and vitriol to many online debates.


The bitcoin-core developers who deal with users most feel that bitcoin needs to expand quickly or die, that letting fees emerge now will kill expansion, and that the infrastructure will improve over time if it has to.

Other bitcoin-core developers feel that bitcoin’s infrastructure is dangerously creaking, that fees need to emerge anyway, and that if there is a real emergency a blocksize change could be rolled out within a few weeks.

At least until this is resolved, don’t count on future bitcoin fees being insignificant, nor promise others that bitcoin has “free transactions”.

[1] “Bitcoin Mining Pools Targeted in Wave of DDOS Attacks” Coinbase 2015

[2] “Block Withholding Attacks – Recent Research” N T Courtois 2014

[3] “GHash.IO and double-spending against BetCoin Dice” mmtech et. al 2013

[4] “Questions about the July 4th BIP66 fork”

[5] “350,000 full nodes to 6,000 in two years…” P Todd 2015

[6] “Reachable nodes during the last 365 days.”

[7] “Re: Scalability and transaction rate” Satoshi 2010

[8] “[Bitcoin-development] Mining centralization pressure from non-uniform propagation speed” Pieter Wuille 2015

August 04, 2015 02:32 AM

August 03, 2015

LPC 2015: Thursday night reception for LPC

Thanks to the generous sponsorship of Intel, the Linux Plumbers Conference is pleased to announce that there will be an additional social event this year. On Thursday August 20th, we will be gathering at the Seattle Rock Bottom Brewery—just a short walk from the conference venue and hotel—for drinks and dinner in a relaxed setting. The evening’s event will be showcasing local beers, wines, and spirits, but some of the more standard items (like single-malt scotches and cocktails) will also be available.

Since there will be various BoFs and extended microconferences going later in the evening on Thursday, the event has been structured to accommodate that. The event will not have a buffet and will, instead, provide food made to order. It will run until midnight and dinner orders can be placed up until 23:30, so folks can show up any time and still get the food of their choice, hot and fresh. That said, if we all order right at the 18:00 start, the waiting time may get long. So, if you aren’t working late, a walk around Seattle (perhaps after popping in for a drink) would work well to put some space around the food orders. The Rock Bottom is a large venue with lots of tables for discussions and the like, so continuing a conversation there, rather than at the venue, will work out well.

We look forward to seeing everyone at the Rock Bottom on Thursday!

August 03, 2015 10:33 PM

August 02, 2015

Pete Zaitcev: DNF - Debugging Not Finished

It's 100% like CKS said:

[root@kvm-rei zaitcev]# dnf check-update openstack-swift
Last metadata expiration check performed 0:10:02 ago on Sun Aug  2 18:42:13 2015.

openstack-swift.noarch                   2.3.0-2.fc23                    rawhide
[root@kvm-rei zaitcev]# dnf update openstack-swift
Last metadata expiration check performed 0:10:07 ago on Sun Aug  2 18:42:13 2015.
Dependencies resolved.
Nothing to do.
[root@kvm-rei zaitcev]# rpm -q openstack-swift
[root@kvm-rei zaitcev]# 

Searching for a good way to make it unstuck.

August 02, 2015 11:12 PM

July 29, 2015

Pete Zaitcev: Conference submission and voting

Generally I feel that I do not do any work that's important enough to present at conferences. My previous presentation was at OLS back in 2005, concerning usbmon. The usbmon is something a guy learning C would program: it's a circular buffer into which kernel drops tracing events; Wireshark pulls them out. Hardly a conference material, but at the time I thought it was supremely important to proseltize the basic techniques of always-on tracing, because it would improve the quality and the ease of debugging of the kernel overall. I really wanted FireWire guys to adopt a similar tracing scheme, because it was a hell on a stick debugging juju with just printk(). Needless to say, that was a miserable failure, as was FireWire itself. I don't think anyone who came to listen to my presentation in Ottawa received their money's worth.

Or did they? Recently an epiphany occured to me. I really should not even think if anyone is interested. That is conference organizers' job, not mine! As a result, I sent a proposal to OpenStack Tokyo, entitled "The Plot to Destroy OpenStack Swift Using C++: Enhancements of Swift API Compatibility in Ceph RADOS Gateway". It's basically a compendum of practical issues that occur when running Swift apps on top of Ceph RGW and what we do to help people do that.

The things are a little different from 10 years ago, because attendees can vote on the submissions. This sounds democratic. I went through all submissions on the storage track and voted them according to my preference. It took a very long time and I suspect that I was crowdsourced by the organizers in the best traditions of Web 2.0. I wonder if they'll even read the abstracts. :-)

July 29, 2015 04:23 PM

July 28, 2015

Matthew Garrett: Your Ubuntu-based container image is probably a copyright violation

Update: A Canonical employee responded here, but doesn't appear to actually contradict anything I say below.

I wrote about Canonical's Ubuntu IP policy here, but primarily in terms of its broader impact, but I mentioned a few specific cases. People seem to have picked up on the case of container images (especially Docker ones), so here's an unambiguous statement:

If you generate a container image that is not a 100% unmodified version of Ubuntu (ie, you have not removed or added anything), Canonical insist that you must ask them for permission to distribute it. The only alternative is to rebuild every binary package you wish to ship[1], removing all trademarks in the process. As I mentioned in my original post, the IP policy does not merely require you to remove trademarks that would cause infringement, it requires you to remove all trademarks - a strict reading would require you to remove every instance of the word "ubuntu" from the packages.

If you want to contact Canonical to request permission, you can do so here. Or you could just derive from Debian instead.

[1] Other than ones whose license explicitly grants permission to redistribute binaries and which do not permit any additional restrictions to be imposed upon the license grants - so any GPLed material is fine

comment count unavailable comments

July 28, 2015 08:06 PM

LPC 2015: Microconference schedule now available

The Linux Plumbers Conference starts in less than three weeks and so the schedule for Microconferences is now available!  Looking forward to seeing you all there!

July 28, 2015 07:40 PM

July 27, 2015

Andi Kleen: Energy efficient servers book review

Energy efficient servers – Blue prints for data center optimization from Gough/Steiner/Sanders is a new book on power tuning on servers that was recently published at Apress. I got my copy a few weeks ago and read it and it is great.

Disclaimer: I contributed a few pages to the book, but have no financial interest in its success.

As you probably already know power efficiency is very important for modern computing. It matters to mobile devices to extend battery time, it matters to desktops and servers to avoid exceeding the thermal/power capacity and lower energy costs.

Modern chips cannot run all their transistors at full speed at the same time due to the dark silicon problem. This results in the somewhat paradoxical situation that power management is needed, even if energy costs don’t matter, just to give the best performance (such as the highest Turbo frequencies)

Power management in modern systems is quite complex, with many different moving parts, hardware, operating systems, drivers, firmware, embedded micro-controllers working together to be as efficient as possible. I’m not aware of any good overview of all of this.

There is some lore around — for example you may have heard of race to idle, that is running as fast as possible to go idle again — but nothing really that puts it all into a larger context. BTW race-to-idle is not always a good idea, as the book explains.

The new book makes an attempt to explain all of this together for Intel servers (the basic concepts are similar on other systems and also on client systems).

It starts with a (short) introduction of the underlying physical principles and then moves on to the basic CPU and platform power management techniques, such as frequency scaling and idle state and thermal management. It has a discussion on modern memory subsystems and describes the trade offs between different DIMM configurations. It describes the power management differences between larger servers and micro servers. And there is a overview of thermal management and power supply, such as energy efficient power supplies and voltage regulators.

Then it moves on to an overview of the software involved in power management, including firmware, rack level power management software, and operating systems. Then there is an extensive chapter how to instrument and measure power management

Finally (and perhaps most valuable) the book lays out a systematic power tuning methodology, starting with measurements and then concrete steps to optimize existing workloads for the best power efficiency.

The book is written not as an academic text book, but intended for people who solve concrete problems on shipping systems. It is quite readable, explaining any complicated concepts. You can clearly tell the authors have deep knowledge on the topic. While the details are intended for Intel servers, I would expect the book to be useful even to people working on clients or also other architectures.

One possible issue with the book is that it may be too specific for today’s systems. We’ll see how well it ages to future systems. But right now, as it just came out, it it very up-to-date and a good guide. It has some descriptions of data center design (such as efficient cooling), but these parts are quite short and are clearly not the main focus.

The ebook version is currently available as a free download both at the the publisher after registration, or at amazon as free kindle edition, or as reasonable priced paperback.

July 27, 2015 06:14 AM

July 24, 2015

James Morris: Linux Security Summit 2015 Update: Free Registration

In previous years, attending the Linux Security Summit (LSS) has required full registration as a LinuxCon attendee.  This year, LSS has been upgraded to a hosted event.  I didn’t realize that this meant that LSS registration was available entirely standalone.  To quote an email thread:

If you are only planning on attending the The Linux Security Summit, there is no need to register for LinuxCon North America. That being said you will not have access to any of the booths, keynotes, breakout sessions, or breaks that come with the LinuxCon North America registration.  You will only have access to The Linux Security Summit.

Thus, if you wish to attend only LSS, then you may register for that alone, at no cost.

There may be a number of people who registered for LinuxCon but who only wanted to attend LSS.   In that case, please contact the program committee at

Apologies for any confusion.

July 24, 2015 03:46 AM