Kernel Planet

December 12, 2017

Matthew Garrett: Eben Moglen is no longer a friend of the free software community

(Note: While the majority of the events described below occurred while I was a member of the board of directors of the Free Software Foundation, I am no longer. This is my personal position and should not be interpreted as the opinion of any other organisation or company I have been affiliated with in any way)

Eben Moglen has done an amazing amount of work for the free software community, serving on the board of the Free Software Foundation and acting as its general counsel for many years, leading the drafting of GPLv3 and giving many forceful speeches on the importance of free software. However, his recent behaviour demonstrates that he is no longer willing to work with other members of the community, and we should reciprocate that.

In early 2016, the FSF board became aware that Eben was briefing clients on an interpretation of the GPL that was incompatible with that held by the FSF. He later released this position publicly with little coordination with the FSF, which was used by Canonical to justify their shipping ZFS in a GPL-violating way. He had provided similar advice to Debian, who were confused about the apparent conflict between the FSF's position and Eben's.

This situation was obviously problematic - Eben is clearly free to provide whatever legal opinion he holds to his clients, but his very public association with the FSF caused many people to assume that these positions were held by the FSF and the FSF were forced into the position of publicly stating that they disagreed with legal positions held by their general counsel. Attempts to mediate this failed, and Eben refused to commit to working with the FSF on avoiding this sort of situation in future[1].

Around the same time, Eben made legal threats towards another project with ties to FSF. These threats were based on a license interpretation that ran contrary to how free software licenses had been interpreted by the community for decades, and was made without any prior discussion with the FSF (2017-12-11 update: page 126 of this document includes the email in which Eben asserts that the Software Freedom Conservancy is engaging in plagiarism by making use of appropriately credited material released under a Creative Commons license). This, in conjunction with his behaviour over the ZFS issue, led to him stepping down as the FSF's general counsel.

Throughout this period, Eben disparaged FSF staff and other free software community members in various semi-public settings. In doing so he harmed the credibility of many people who have devoted significant portions of their lives to aiding the free software community. At Libreplanet earlier this year he made direct threats against an attendee - this was reported as a violation of the conference's anti-harassment policy.

Eben has acted against the best interests of an organisation he publicly represented. He has threatened organisations and individuals who work to further free software. His actions are no longer to the benefit of the free software community and the free software community should cease associating with him.

[1] Contrary to the claim provided here, Bradley was not involved in this process.

(Edit to add: various people have asked for more details of some of the accusations here. Eben is influential in many areas, and publicising details without the direct consent of his victims may put them at professional risk. I'm aware that this reduces my credibility, and it's entirely reasonable for people to choose not to believe me as a result. I will add that I said much of this several months ago, so I'm not making stuff up in response to recent events)

comment count unavailable comments

December 12, 2017 05:59 AM

Linux Plumbers Conference: Linux Plumbers Conference 2018 site and dates

We are pleased to announce that the 2018 edition of the Linux Plumbers Conference will take place in Vancouver, British Columbia, Canada at the Sheraton Vancouver Wall Centre. It will be colocated with the Linux Kernel Summit. LPC will run from November 13, 2018 (Tuesday) to November 15, 2018 (Thursday).

We look forward to another great edition of LPC and to seeing you all in Vancouver!

Stay tuned for more information as the Linux Plumbers Conference committee starts planning for the 2018 conference.

The LPC Planning Committee.

December 12, 2017 12:59 AM

December 05, 2017

Pete Zaitcev: Marcan: Debugging an evil Go runtime bug

Fascinating, and a few reactions spring to mind.

First, I have to admit, the resolution simultaneously blew me away and was very nostalgic. Forgetting that some instructions are not atomic is just the thing that I saw people commit in architecture support in kernel (I don't remember if I ever used an opportunity to do it, it's quite possible, even on sun4c).

Also, my (former) colleague DaveJ (who's now consumed by Facebook -- I remember complaints about useful people "gone to Google and never heard from again", but Facebook is the same hole nowadays) once said, approximately: "Everyone loves to crap on Gentoo hackers for silly optimizations and being otherwise unprofessional, but when it's something interesting it's always (or often) them." Gentoo crew is underrated, including their userbase.

And finally:

Go also happens to have a (rather insane, in my opinion) policy of reinventing its own standard library, so it does not use any of the standard Linux glibc code to call vDSO, but rather rolls its own calls (and syscalls too).

Usually you hear about this when their DNS resolver blows up, but it can be elsewhere, as in this case.

(h/t to a chatter in #animeblogger)

December 05, 2017 08:20 PM

November 28, 2017

Matthew Garrett: Potential impact of the Intel ME vulnerability

(Note: this is my personal opinion based on public knowledge around this issue. I have no knowledge of any non-public details of these vulnerabilities, and this should not be interpreted as the position or opinion of my employer)

Intel's Management Engine (ME) is a small coprocessor built into the majority of Intel CPU chipsets[0]. Older versions were based on the ARC architecture[1] running an embedded realtime operating system, but from version 11 onwards they've been small x86 cores running Minix. The precise capabilities of the ME have not been publicly disclosed, but it is at minimum capable of interacting with the network[2], display[3], USB, input devices and system flash. In other words, software running on the ME is capable of doing a lot, without requiring any OS permission in the process.

Back in May, Intel announced a vulnerability in the Advanced Management Technology (AMT) that runs on the ME. AMT offers functionality like providing a remote console to the system (so IT support can connect to your system and interact with it as if they were physically present), remote disk support (so IT support can reinstall your machine over the network) and various other bits of system management. The vulnerability meant that it was possible to log into systems with enabled AMT with an empty authentication token, making it possible to log in without knowing the configured password.

This vulnerability was less serious than it could have been for a couple of reasons - the first is that "consumer"[4] systems don't ship with AMT, and the second is that AMT is almost always disabled (Shodan found only a few thousand systems on the public internet with AMT enabled, out of many millions of laptops). I wrote more about it here at the time.

How does this compare to the newly announced vulnerabilities? Good question. Two of the announced vulnerabilities are in AMT. The previous AMT vulnerability allowed you to bypass authentication, but restricted you to doing what AMT was designed to let you do. While AMT gives an authenticated user a great deal of power, it's also designed with some degree of privacy protection in mind - for instance, when the remote console is enabled, an animated warning border is drawn on the user's screen to alert them.

This vulnerability is different in that it allows an authenticated attacker to execute arbitrary code within the AMT process. This means that the attacker shouldn't have any capabilities that AMT doesn't, but it's unclear where various aspects of the privacy protection are implemented - for instance, if the warning border is implemented in AMT rather than in hardware, an attacker could duplicate that functionality without drawing the warning. If the USB storage emulation for remote booting is implemented as a generic USB passthrough, the attacker could pretend to be an arbitrary USB device and potentially exploit the operating system through bugs in USB device drivers. Unfortunately we don't currently know.

Note that this exploit still requires two things - first, AMT has to be enabled, and second, the attacker has to be able to log into AMT. If the attacker has physical access to your system and you don't have a BIOS password set, they will be able to enable it - however, if AMT isn't enabled and the attacker isn't physically present, you're probably safe. But if AMT is enabled and you haven't patched the previous vulnerability, the attacker will be able to access AMT over the network without a password and then proceed with the exploit. This is bad, so you should probably (1) ensure that you've updated your BIOS and (2) ensure that AMT is disabled unless you have a really good reason to use it.

The AMT vulnerability applies to a wide range of versions, everything from version 6 (which shipped around 2008) and later. The other vulnerability that Intel describe is restricted to version 11 of the ME, which only applies to much more recent systems. This vulnerability allows an attacker to execute arbitrary code on the ME, which means they can do literally anything the ME is able to do. This probably also means that they are able to interfere with any other code running on the ME. While AMT has been the most frequently discussed part of this, various other Intel technologies are tied to ME functionality.

Intel's Platform Trust Technology (PTT) is a software implementation of a Trusted Platform Module (TPM) that runs on the ME. TPMs are intended to protect access to secrets and encryption keys and record the state of the system as it boots, making it possible to determine whether a system has had part of its boot process modified and denying access to the secrets as a result. The most common usage of TPMs is to protect disk encryption keys - Microsoft Bitlocker defaults to storing its encryption key in the TPM, automatically unlocking the drive if the boot process is unmodified. In addition, TPMs support something called Remote Attestation (I wrote about that here), which allows the TPM to provide a signed copy of information about what the system booted to a remote site. This can be used for various purposes, such as not allowing a compute node to join a cloud unless it's booted the correct version of the OS and is running the latest firmware version. Remote Attestation depends on the TPM having a unique cryptographic identity that is tied to the TPM and inaccessible to the OS.

PTT allows manufacturers to simply license some additional code from Intel and run it on the ME rather than having to pay for an additional chip on the system motherboard. This seems great, but if an attacker is able to run code on the ME then they potentially have the ability to tamper with PTT, which means they can obtain access to disk encryption secrets and circumvent Bitlocker. It also means that they can tamper with Remote Attestation, "attesting" that the system booted a set of software that it didn't or copying the keys to another system and allowing that to impersonate the first. This is, uh, bad.

Intel also recently announced Intel Online Connect, a mechanism for providing the functionality of security keys directly in the operating system. Components of this are run on the ME in order to avoid scenarios where a compromised OS could be used to steal the identity secrets - if the ME is compromised, this may make it possible for an attacker to obtain those secrets and duplicate the keys.

It's also not entirely clear how much of Intel's Secure Guard Extensions (SGX) functionality depends on the ME. The ME does appear to be required for SGX Remote Attestation (which allows an application using SGX to prove to a remote site that it's the SGX app rather than something pretending to be it), and again if those secrets can be extracted from a compromised ME it may be possible to compromise some of the security assumptions around SGX. Again, it's not clear how serious this is because it's not publicly documented.

Various other things also run on the ME, including stuff like video DRM (ensuring that high resolution video streams can't be intercepted by the OS). It may be possible to obtain encryption keys from a compromised ME that allow things like Netflix streams to be decoded and dumped. From a user privacy or security perspective, these things seem less serious.

The big problem at the moment is that we have no idea what the actual process of compromise is. Intel state that it requires local access, but don't describe what kind. Local access in this case could simply require the ability to send commands to the ME (possible on any system that has the ME drivers installed), could require direct hardware access to the exposed ME (which would require either kernel access or the ability to install a custom driver) or even the ability to modify system flash (possible only if the attacker has physical access and enough time and skill to take the system apart and modify the flash contents with an SPI programmer). The other thing we don't know is whether it's possible for an attacker to modify the system such that the ME is persistently compromised or whether it needs to be re-compromised every time the ME reboots. Note that even the latter is more serious than you might think - the ME may only be rebooted if the system loses power completely, so even a "temporary" compromise could affect a system for a long period of time.

It's also almost impossible to determine if a system is compromised. If the ME is compromised then it's probably possible for it to roll back any firmware updates but still report that it's been updated, giving admins a false sense of security. The only way to determine for sure would be to dump the system flash and compare it to a known good image. This is impractical to do at scale.

So, overall, given what we know right now it's hard to say how serious this is in terms of real world impact. It's unlikely that this is the kind of vulnerability that would be used to attack individual end users - anyone able to compromise a system like this could just backdoor your browser instead with much less effort, and that already gives them your banking details. The people who have the most to worry about here are potential targets of skilled attackers, which means activists, dissidents and companies with interesting personal or business data. It's hard to make strong recommendations about what to do here without more insight into what the vulnerability actually is, and we may not know that until this presentation next month.

Summary: Worst case here is terrible, but unlikely to be relevant to the vast majority of users.

[0] Earlier versions of the ME were built into the motherboard chipset, but as portions of that were incorporated onto the CPU package the ME followedEdit: Apparently I was wrong and it's still on the chipset
[1] A descendent of the SuperFX chip used in Super Nintendo cartridges such as Starfox, because why not
[2] Without any OS involvement for wired ethernet and for wireless networks in the system firmware, but requires OS support for wireless access once the OS drivers have loaded
[3] Assuming you're using integrated Intel graphics
[4] "Consumer" is a bit of a misnomer here - "enterprise" laptops like Thinkpads ship with AMT, but are often bought by consumers.

comment count unavailable comments

November 28, 2017 03:45 AM

November 26, 2017

Michael Kerrisk (manpages): Next Linux/UNIX System Programming course in Munich, 5-9 February, 2018

There are still some places free for my next 5-day Linux/UNIX System Programming course to take place in Munich, Germany, for the week of 5-9 February 2018.

The course is intended for programmers developing system-level, embedded, or network applications for Linux and UNIX systems, or programmers porting such applications from other operating systems (e.g., proprietary embedded/realtime operaring systems or Windows) to Linux or UNIX. The course is based on my book, The Linux Programming Interface (TLPI), and covers topics such as low-level file I/O; signals and timers; creating processes and executing programs; POSIX threads programming; interprocess communication (pipes, FIFOs, message queues, semaphores, shared memory), and network programming (sockets).
     
The course has a lecture+lab format, and devotes substantial time to working on some carefully chosen programming exercises that put the "theory" into practice. Students receive printed and electronic copies of TLPI, along with a 600-page course book that includes all slides presented in the course. A reading knowledge of C is assumed; no previous system programming experience is needed.

Some useful links for anyone interested in the course:

Questions about the course? Email me via training@man7.org.

November 26, 2017 07:38 PM

Michael Kerrisk (manpages): man-pages-4.14 is released

I've released man-pages-4.14. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release resulted from patches, bug reports, reviews, and comments from 71 contributors. Nearly 400 commits changed more than 160 pages. In addition, 4 new manual pages were added.

Among the more significant changes in man-pages-4.14 are the following:

November 26, 2017 07:14 PM

Michael Kerrisk (manpages): man-pages-4.13 is released

I've released man-pages-4.13. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release resulted from patches, bug reports, reviews, and comments from around 40 contributors. The release is rather larger than average. (The context diff runs to more than 90k lines.) The release includes more than 350 commits and contains some fairly wide-ranging formatting fix-ups that meant that all 1028 existing manual pages saw some change(s). In addition, 5 new manual pages were added.

Among the more significant changes in man-pages-4.13 are the following:

A special thanks to Eugene Syromyatnikov, who contributed 30 patches to this release!

November 26, 2017 10:17 AM

November 25, 2017

Linux Plumbers Conference: Audio Recordings Posted

This year, by way of an experiment we tried recording the audio through the sound system of the talks track and one Microconference track (Those in Platinum C).  Unfortunately, because of technical problems, we have no recordings from Wednesday, but mostly complete ones from Thursday and Friday (Missing TPM Software Stack Status and Managing the Impact of Growing CPU Register State).

To find the audio, go to the full description of the talk or Microconference (click on the title) and scroll down to the bottom of the Abstract (just before the Tags section).  The audio is downloadable mp3, so you can either stream directly to your browser or download for later offline listening.

If you find the audio useful (or not), please let us know (contact@linuxplumbersconf.org) so we can plan for doing it again next year.

November 25, 2017 03:30 PM

November 24, 2017

Paul E. Mc Kenney: Parallel Programming: November 2017 Update

This USA Thanksgiving holiday weekend features a new release of Is Parallel Programming Hard, And, If So, What Can You Do About It?.

This update includes more formatting and build-system improvements, bibliography updates, and better handling of listings, all courtesy of Akira Yokosawa; numerous fixes and updates from Junchang Wang, Pierre Kuo, SeongJae Park, and Yubin Ruan; a new futures section on quantum computing; updates to the formal-verification section based on recent collaborations; and a full rewrite of the memory-barriers section, which is now its own chapter. This rewrite was of course based on recent work with my partners in memory-ordering crime, Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern.

As always, git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git will be updated in real time.

November 24, 2017 11:21 PM

November 20, 2017

Davidlohr Bueso: Linux v4.14: Performance Goodies

Last week Linus released the v4.14 kernel with some noticeable performance changes. The following is an unsorted and incomplete list of changes that went in. Note that the term 'performance' can be vague in that some gains in one area can negatively affect another, so take everything with a grain of salt and reach your own conclusions.

sysvipc: scale key management

We began using relativistic hash tables for managing ipc keys, which greatly improves the current O(N) lookups. As such, ipc_findkey() calls are significantly faster (+800% in some reaim file benchmarks) and we need not iterate all elements each time. Improvements are even seen in  scenarios where the amount of keys is but a handful, so this is pretty much a win from any standpoint.
[Commit 0cfb6aee70bd]
 

interval-tree: fast overlap detection

With the new extended rbtree api to cache the smallest (leftmost) node, instead of doing O(logN) walks to the end of the tree, we have the pointer always available. This allows to extend and complete the fast overlap detection for interval trees to speedup (sub)tree searches if the interval is completely to the left or right of the current tree's max interval. In addition, a number of other users that traverse rbtrees are updated to use the new rbtree_cached, such as epoll, procfs and cfq.
[Commit  cd9e61ed1eeb, 410bd5ecb276, 2554db916586, b2ac2ea6296af808c13fd373]

sched: waitqueue bookmarks

A situation where constant NUMA migrations of a hot-page triggered large number of page waiters being awoken exhibited some issues in the waitqueue implementation. In such cases, large number of wakeups will occur while holding a spinlock, which causes significant unbounded lantencies. Unlike wake_qs (used in futexes and locks), where batched wakeups are done without the lock, waitqueue bookmarks allow to to pause and stop iterating the wake list such that another process has a chance to acquire the lock. Then it can resume where it left off.
[Commit 3510ca20ece, 2554db916586, 11a19c7b099f]

 x86 PCID (Process Context Identifier)

This is a 64-bit hardware feature that allows tagging TLBs such that upon context switching, only flush the required entries. For virtualization (VT-x) this has supported similar features for a while, via vpid. On other archs it is called address space ID. Linux's support is somewhat special. In order to avoid the x86 limitations of 4096 IDs (or processes), the implementation actually uses a PCID to identify a recently-used mm (process address space) on a per-cpu basis. An mm has no fixed PCID binding at all; instead, it is given a fresh PCID each time it's loaded, except in cases where we want to preserve the TLB, in which case we reuse a recent value. To illustrate, a workload under kvm that ping pongs two processes, dTLB misses were reduced by ~17x.
[Commit f39681ed0f48, b0579ade7cd8, 94b1b03b519b, 43858b4f25cf, cba4671af7550790c9aad849, 660da7c9228f, 10af6235e0d3

 

ORC (Oops Rewind Capability) Unwinder

The much acclaimed replacement to frame pointers and the (out of tree) DWARF unwinder. Through simplicity, the end result is faster profiling, such as for perf. Experiments show a 20x performance increase using ORC vs DWARF while calling save_stack_trace 20,000 times via single vfs_write. With respect to frame pointers, the ORC unwinder is more accurate across interrupt entry frames and enables a 5-10% performance improvement across the entire kernel compared to frame pointers.
[Commit ee9f8fce9964, 39358a033b2e]

mm: choose swap device according to numa node

If the system has more than one swap device and swap device has the node information, we can make use of this information to decide which swap device to use in get_swap_pages() to get better performance. This change replaces a single global swap_avail list with a per-numa-node list: each numa node sees its own priority based list of available swap devices. Swap device's priority can be promoted on its matching node's swap_avail_list. Shows ~25% improvements for a 2 node box, benchmaring random writes on mmaped region withSSDs attached to each node, ensuring swapping in and out.
[Commit a2468cc9bfdf]

mm: reduce cost of page allocator

Upon page allocation, the per-zone statistics are updated, introducing overhead in the form of cacheline bouncing; responsible for ~30% of all CPU cycles  for allocating a single page. The networking folks have been known to complain about the performance degradation when dealing with the memory management subsystem, particularly the page allocator. The fact that these NUMA associated counters are rarely used allows the counter threshold that determines the frequency of updating the global counter with the percpu counters (hence cacheline bouncing) to be increased. This means hurting readers, but that's the point.
[Commit 3a321d2a3dde, 1d90ca897cb0, 638032224ed7]

archs: multibyte memset

New calls memset16(), memset32() and memset64() are introduced, which are like memset(), but allow the caller to fill the destination with a value larger than a single byte. There are a number of places in the kernel that can benefit from using an optimized function rather than a loop; sometimes text size, sometimes speed, and sometimes both. When supported by the architecture, use a single instruction, such as stosq (stores a quadword) in x86-64. Zram shows a 7% performance improvement on x86 with a 100Mb non-zero deduplicate data. If not available, default back to the slower loop implementation.
[Commits  3b3c4babd898, 03270c13c5ff, 4c51248533ad, 48ad1abef402]

powerpc: improve TLB flushing

A few optimisations were also added to the radix MMU TLB flushing, mostly to avoid unnecessary Page Walk Cache (PWC) flushes when the structure of the tree is not changing.
[Commit a46cc7a90fd8, 424de9c6e3f8]

There are plenty of other performance optimizations out there, including ext4 parallel file creation and quotas, additional memset improvements in sparc, transparent hugepage migrations and swap improvements, ipv6 (ip6_route_output()) optimizations, etc. Again, the list here is partial and biased by me. For more list of features play with 'git log' or visit lwn (part1, part2) and kernelnewbies.

November 20, 2017 03:50 PM

November 15, 2017

Kees Cook: security things in Linux v4.14

Previously: v4.13.

Linux kernel v4.14 was released this last Sunday, and there’s a bunch of security things I think are interesting:

vmapped kernel stack on arm64
Similar to the same feature on x86, Mark Rutland and Ard Biesheuvel implemented CONFIG_VMAP_STACK for arm64, which moves the kernel stack to an isolated and guard-paged vmap area. With traditional stacks, there were two major risks when exhausting the stack: overwriting the thread_info structure (which contained the addr_limit field which is checked during copy_to/from_user()), and overwriting neighboring stacks (or other things allocated next to the stack). While arm64 previously moved its thread_info off the stack to deal with the former issue, this vmap change adds the last bit of protection by nature of the vmap guard pages. If the kernel tries to write past the end of the stack, it will hit the guard page and fault. (Testing for this is now possible via LKDTM’s STACK_GUARD_PAGE_LEADING/TRAILING tests.)

One aspect of the guard page protection that will need further attention (on all architectures) is that if the stack grew because of a giant Variable Length Array on the stack (effectively an implicit alloca() call), it might be possible to jump over the guard page entirely (as seen in the userspace Stack Clash attacks). Thankfully the use of VLAs is rare in the kernel. In the future, hopefully we’ll see the addition of PaX/grsecurity’s STACKLEAK plugin which, in addition to its primary purpose of clearing the kernel stack on return to userspace, makes sure stack expansion cannot skip over guard pages. This “stack probing” ability will likely also become directly available from the compiler as well.

set_fs() balance checking
Related to the addr_limit field mentioned above, another class of bug is finding a way to force the kernel into accidentally leaving addr_limit open to kernel memory through an unbalanced call to set_fs(). In some areas of the kernel, in order to reuse userspace routines (usually VFS or compat related), code will do something like: set_fs(KERNEL_DS); ...some code here...; set_fs(USER_DS);. When the USER_DS call goes missing (usually due to a buggy error path or exception), subsequent system calls can suddenly start writing into kernel memory via copy_to_user (where the “to user” really means “within the addr_limit range”).

Thomas Garnier implemented USER_DS checking at syscall exit time for x86, arm, and arm64. This means that a broken set_fs() setting will not extend beyond the buggy syscall that fails to set it back to USER_DS. Additionally, as part of the discussion on the best way to deal with this feature, Christoph Hellwig and Al Viro (and others) have been making extensive changes to avoid the need for set_fs() being used at all, which should greatly reduce the number of places where it might be possible to introduce such a bug in the future.

SLUB freelist hardening
A common class of heap attacks is overwriting the freelist pointers stored inline in the unallocated SLUB cache objects. PaX/grsecurity developed an inexpensive defense that XORs the freelist pointer with a global random value (and the storage address). Daniel Micay improved on this by using a per-cache random value, and I refactored the code a bit more. The resulting feature, enabled with CONFIG_SLAB_FREELIST_HARDENED, makes freelist pointer overwrites very hard to exploit unless an attacker has found a way to expose both the random value and the pointer location. This should render blind heap overflow bugs much more difficult to exploit.

Additionally, Alexander Popov implemented a simple double-free defense, similar to the “fasttop” check in the GNU C library, which will catch sequential free()s of the same pointer. (And has already uncovered a bug.)

Future work would be to provide similar metadata protections to the SLAB allocator (though SLAB doesn’t store its freelist within the individual unused objects, so it has a different set of exposures compared to SLUB).

setuid-exec stack limitation
Continuing the various additional defenses to protect against future problems related to userspace memory layout manipulation (as shown most recently in the Stack Clash attacks), I implemented an 8MiB stack limit for privileged (i.e. setuid) execs, inspired by a similar protection in grsecurity, after reworking the secureexec handling by LSMs. This complements the unconditional limit to the size of exec arguments that landed in v4.13.

randstruct automatic struct selection
While the bulk of the port of the randstruct gcc plugin from grsecurity landed in v4.13, the last of the work needed to enable automatic struct selection landed in v4.14. This means that the coverage of randomized structures, via CONFIG_GCC_PLUGIN_RANDSTRUCT, now includes one of the major targets of exploits: function pointer structures. Without knowing the build-randomized location of a callback pointer an attacker needs to overwrite in a structure, exploits become much less reliable.

structleak passed-by-reference variable initialization
Ard Biesheuvel enhanced the structleak gcc plugin to initialize all variables on the stack that are passed by reference when built with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL. Normally the compiler will yell if a variable is used before being initialized, but it silences this warning if the variable’s address is passed into a function call first, as it has no way to tell if the function did actually initialize the contents. So the plugin now zero-initializes such variables (if they hadn’t already been initialized) before the function call that takes their address. Enabling this feature has a small performance impact, but solves many stack content exposure flaws. (In fact at least one such flaw reported during the v4.15 development cycle was mitigated by this plugin.)

improved boot entropy
Laura Abbott and Daniel Micay improved early boot entropy available to the stack protector by both moving the stack protector setup later in the boot, and including the kernel command line in boot entropy collection (since with some devices it changes on each boot).

eBPF JIT for 32-bit ARM
The ARM BPF JIT had been around a while, but it didn’t support eBPF (and, as a result, did not provide constant value blinding, which meant it was exposed to being used by an attacker to build arbitrary machine code with BPF constant values). Shubham Bansal spent a bunch of time building a full eBPF JIT for 32-bit ARM which both speeds up eBPF and brings it up to date on JIT exploit defenses in the kernel.

seccomp improvements
Tyler Hicks addressed a long-standing deficiency in how seccomp could log action results. In addition to creating a way to mark a specific seccomp filter as needing to be logged with SECCOMP_FILTER_FLAG_LOG, he added a new action result, SECCOMP_RET_LOG. With these changes in place, it should be much easier for developers to inspect the results of seccomp filters, and for process launchers to generate logs for their child processes operating under a seccomp filter.

Additionally, I finally found a way to implement an often-requested feature for seccomp, which was to kill an entire process instead of just the offending thread. This was done by creating the SECCOMP_RET_ACTION_FULL mask (née SECCOMP_RET_ACTION) and implementing SECCOMP_RET_KILL_PROCESS.

That’s it for now; please let me know if I missed anything. The v4.15 merge window is now open!

© 2017, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

November 15, 2017 05:23 AM

November 14, 2017

James Morris: Save the Dates: Linux Security Summit Events for 2018

There will be a new European version of the Linux Security Summit for 2018, in addition to the established North American event.

The dates and locations are as follows:

Stay tuned for CFP announcements!

 

November 14, 2017 11:24 PM

November 13, 2017

Gustavo F. Padovan: The linuxdev-br conference was a success!

Last Saturday we had the first edition of the Linux Developer Conference Brazil. A conference  born from the need of a meeting point, in Brazil, for the developers,  enthusiasts and companies of FOSS projects that forms the Core of modern Linux systems, either it be in smartphones, cloud, cars or TVs.

After a few years traveling to conferences around the world I felt that we didn’t have in Brazil any forum like the ones outside of Brazil, so I came up with the idea of building one myself. So I invited two friends of mine to take on the challenge, Bruno Dilly and João Moreira. We also got help from University of Campinas that allowed us to use their space, many thanks to Professor Islene Garcia.

Together we made linuxdev-br was a success, the talks were great. Almost 100 people attended the conference, some of them traveling from quite far places in Brazil. During the day we had João Avelino Bellomo Filho talking about SystemTap, Lucas Villa Real talking about Virtualization with GoboLinux’ Runner and Felipe Neves talking about the Zephyr project. In the afternoon we had Fabio Estevam talking about Device Tree, Arnaldo Melo on perf tools and João Moreira on Live Patching. All videos are available here (in Portuguese).

To finish the day we had a Happy Hour paid by the sponsors of the conference. It was a great opportunity to have some beers and interact with other attendees.

I want to thank you everyone that joined us in the first edition, next year it will be even better. By the way, talking about next year, the conference idiom next year will be English. We want linuxdev-br to become part of the international cycle of conferences! Stay tuned for next year, if you want to take part, talk or sponsor please reach us at contact@linuxdev-br.net.

November 13, 2017 03:33 PM

November 07, 2017

Dave Airlie (blogspot): radv on Ubuntu broken in distro packages

It appears that Ubuntu mesa 17.2.2 packages that ship radv, have patches to enable MIR support. These patches actually just break radv instead. I'd seen some people complain that simple apps don't work on radv, and saying radv wasn't ready for use and how could anyone thing of using it and just wondered what they had been smoking as Fedora was working fine. Hopefully Canonical can sort that out ASAP.

November 07, 2017 07:35 PM

Pete Zaitcev: ProxyFS opened, I think

Not exactly sure if that thing is complete, and I didn't attend the announcement (at OpenStack Summit in Sydney, presumably), but it appears that SwiftStack open-sourced ProxyFS. The project was announced to the world a year an a half ago.

UPDATE: The Swiftstack product is called "File Access", but AFAIK the project is still "ProxyFS".

November 07, 2017 02:25 AM

October 26, 2017

Pete Zaitcev: Polite like Sphinx

Exception occurred:
   File "/usr/lib/python2.7/site-packages/sphinx/util/logging.py", line 363, in filter
     raise SphinxWarning(message % record.args)
TypeError: not all arguments converted during string formatting
The full traceback has been saved in /tmp/sphinx-err-SD2Ra4.log, if you want to report the issue to the developers.

Love how modest this package is.

October 26, 2017 08:26 PM

October 21, 2017

Pavel Machek: Prague and Nokia N900s

If you are travelling to Prague to ELCE, and have Nokia N900, N9 or N950, or spare parts for them, please take them with you. I may help you install postmarket os there (https://wiki.postmarketos.org/wiki/Main_Page), can probably charge N900 that does not charge, and spare parts would be useful for me. I have a talk about cameras, and will be around... https://osseu17.sched.com/event/ByYH/cheap-complex-cameras-pavel-machek-denx-software-engineering-gmbh .

October 21, 2017 10:25 PM

October 20, 2017

James Morris: Security Session at the 2017 Kernel Summit

For folks attending Open Source Summit Europe next week in Prague, note that there is a security session planned as part of the co-located Kernel Summit technical track.

This year, the Kernel Summit is divided into two components:

  1. An invitation-only maintainer summit of 30 people total, and;
  2. An open kernel summit technical track which is open to all attendees of OSS Europe.

The security session is part of the latter.  The preliminary agenda for the kernel summit technical track was announced by Ted Ts’o here:

There is also a preliminary agenda for the security session, here:

Currently, the agenda includes an update from Kees Cook on the Kernel Self Protection Project, and an update from Jarkko Sakkinen on TPM support.  I’ll provide a summary of the recent Linux Security Summit, depending on available time, perhaps focusing on security namespacing issues.

This agenda is subject to change and if you have any topics to propose, please send an email to the ksummit-discuss list.

 

October 20, 2017 12:23 AM

October 16, 2017

Greg Kroah-Hartman: Linux Kernel Community Enforcement Statement FAQ

Based on the recent Linux Kernel Community Enforcement Statement and the article describing the background and what it means , here are some Questions/Answers to help clear things up. These are based on questions that came up when the statement was discussed among the initial round of over 200 different kernel developers.

Q: Is this changing the license of the kernel?

A: No.

Q: Seriously? It really looks like a change to the license.

A: No, the license of the kernel is still GPLv2, as before. The kernel developers are providing certain additional promises that they encourage users and adopters to rely on. And by having a specific acking process it is clear that those who ack are making commitments personally (and perhaps, if authorized, on behalf of the companies that employ them). There is nothing that says those commitments are somehow binding on anyone else. This is exactly what we have done in the past when some but not all kernel developers signed off on the driver statement.

Q: Ok, but why have this “additional permissions” document?

A: In order to help address problems caused by current and potential future copyright “trolls” aka monetizers.

Q: Ok, but how will this help address the “troll” problem?

A: “Copyright trolls” use the GPL-2.0’s immediate termination and the threat of an immediate injunction to turn an alleged compliance concern into a contract claim that gives the troll an automatic claim for money damages. The article by Heather Meeker describes this quite well, please refer to that for more details. If even a short delay is inserted for coming into compliance, that delay disrupts this expedited legal process.

By simply saying, “We think you should have 30 days to come into compliance”, we undermine that “immediacy” which supports the request to the court for an immediate injunction. The threat of an immediate junction was used to get the companies to sign contracts. Then the troll goes back after the same company for another known violation shortly after and claims they’re owed the financial penalty for breaking the contract. Signing contracts to pay damages to financially enrich one individual is completely at odds with our community’s enforcement goals.

We are showing that the community is not out for financial gain when it comes to license issues – though we do care about the company coming into compliance.  All we want is the modifications to our code to be released back to the public, and for the developers who created that code to become part of our community so that we can continue to create the best software that works well for everyone.

This is all still entirely focused on bringing the users into compliance. The 30 days can be used productively to determine exactly what is wrong, and how to resolve it.

Q: Ok, but why are we referencing GPL-3.0?

A: By using the terms from the GPLv3 for this, we use a very well-vetted and understood procedure for granting the opportunity to come fix the failure and come into compliance. We benefit from many months of work to reach agreement on a termination provision that worked in legal systems all around the world and was entirely consistent with Free Software principles.

Q: But what is the point of the “non-defensive assertion of rights” disclaimer?

A: If a copyright holder is attacked, we don’t want or need to require that copyright holder to give the party suing them an opportunity to cure. The “non-defensive assertion of rights” is just a way to leave everything unchanged for a copyright holder that gets sued.  This is no different a position than what they had before this statement.

Q: So you are ok with using Linux as a defensive copyright method?

A: There is a current copyright troll problem that is undermining confidence in our community – where a “bad actor” is attacking companies in a way to achieve personal gain. We are addressing that issue. No one has asked us to make changes to address other litigation.

Q: Ok, this document sounds like it was written by a bunch of big companies, who is behind the drafting of it and how did it all happen?

A: Grant Likely, the chairman at the time of the Linux Foundation’s Technical Advisory Board (TAB), wrote the first draft of this document when the first copyright troll issue happened a few years ago. He did this as numerous companies and developers approached the TAB asking that the Linux kernel community do something about this new attack on our community. He showed the document to a lot of kernel developers and a few company representatives in order to get feedback on how it should be worded. After the troll seemed to go away, this work got put on the back-burner. When the copyright troll showed back up, along with a few other “copycat” like individuals, the work on the document was started back up by Chris Mason, the current chairman of the TAB. He worked with the TAB members, other kernel developers, lawyers who have been trying to defend these claims in Germany, and the TAB members’ Linux Foundation’s lawyers, in order to rework the document so that it would actually achieve the intended benefits and be useful in stopping these new attacks. The document was then reviewed and revised with input from Linus Torvalds and finally a document that the TAB agreed would be sufficient was finished. That document was then sent to over 200 of the most active kernel developers for the past year by Greg Kroah-Hartman to see if they, or their company, wished to support the document. That produced the initial “signatures” on the document, and the acks of the patch that added it to the Linux kernel source tree.

Q: How do I add my name to the document?

A: If you are a developer of the Linux kernel, simply send Greg a patch adding your name to the proper location in the document (sorting the names by last name), and he will be glad to accept it.

Q: How can my company show its support of this document?

A: If you are a developer working for a company that wishes to show that they also agree with this document, have the developer put the company name in ‘(’ ‘)’ after the developer’s name. This shows that both the developer, and the company behind the developer are in agreement with this statement.

Q: How can a company or individual that is not part of the Linux kernel community show its support of the document?

A: Become part of our community! Send us patches, surely there is something that you want to see changed in the kernel. If not, wonderful, post something on your company web site, or personal blog in support of this statement, we don’t mind that at all.

Q: I’ve been approached by a copyright troll for Netfilter. What should I do?

A: Please see the Netfilter FAQ here for how to handle this

Q: I have another question, how do I ask it?

A: Email Greg or the TAB, and they will be glad to help answer them.

October 16, 2017 09:05 AM

Greg Kroah-Hartman: Linux Kernel Community Enforcement Statement

By Greg Kroah-Hartman, Chris Mason, Rik van Riel, Shuah Khan, and Grant Likely

The Linux kernel ecosystem of developers, companies and users has been wildly successful by any measure over the last couple decades. Even today, 26 years after the initial creation of the Linux kernel, the kernel developer community continues to grow, with more than 500 different companies and over 4,000 different developers getting changes merged into the tree during the past year. As Greg always says every year, the kernel continues to change faster this year than the last, this year we were running around 8.5 changes an hour, with 10,000 lines of code added, 2,000 modified, and 2,500 lines removed every hour of every day.

The stunning growth and widespread adoption of Linux, however, also requires ever evolving methods of achieving compliance with the terms of our community’s chosen license, the GPL-2.0. At this point, there is no lack of clarity on the base compliance expectations of our community. Our goals as an ecosystem are to make sure new participants are made aware of those expectations and the materials available to assist them, and to help them grow into our community.  Some of us spend a lot of time traveling to different companies all around the world doing this, and lots of other people and groups have been working tirelessly to create practical guides for everyone to learn how to use Linux in a way that is compliant with the license. Some of these activities include:

Unfortunately the same processes that we use to assure fulfillment of license obligations and availability of source code can also be used unjustly in trolling activities to extract personal monetary rewards. In particular, issues have arisen as a developer from the Netfilter community, Patrick McHardy, has sought to enforce his copyright claims in secret and for large sums of money by threatening or engaging in litigation. Some of his compliance claims are issues that should and could easily be resolved. However, he has also made claims based on ambiguities in the GPL-2.0 that no one in our community has ever considered part of compliance.  

Examples of these claims have been distributing over-the-air firmware, requiring a cell phone maker to deliver a paper copy of source code offer letter; claiming the source code server must be setup with a download speed as fast as the binary server based on the “equivalent access” language of Section 3; requiring the GPL-2.0 to be delivered in a local language; and many others.

How he goes about this activity was recently documented very well by Heather Meeker.

Numerous active contributors to the kernel community have tried to reach out to Patrick to have a discussion about his activities, to no response. Further, the Netfilter community suspended Patrick from contributing for violations of their principles of enforcement. The Netfilter community also published their own FAQ on this matter.

While the kernel community has always supported enforcement efforts to bring companies into compliance, we have never even considered enforcement for the purpose of extracting monetary gain.  It is not possible to know an exact figure due to the secrecy of Patrick’s actions, but we are aware of activity that has resulted in payments of at least a few million Euros.  We are also aware that these actions, which have continued for at least four years, have threatened the confidence in our ecosystem.

Because of this, and to help clarify what the majority of Linux kernel community members feel is the correct way to enforce our license, the Technical Advisory Board of the Linux Foundation has worked together with lawyers in our community, individual developers, and many companies that participate in the development of, and rely on Linux, to draft a Kernel Enforcement Statement to help address both this specific issue we are facing today, and to help prevent any future issues like this from happening again.

A key goal of all enforcement of the GPL-2.0 license has and continues to be bringing companies into compliance with the terms of the license. The Kernel Enforcement Statement is designed to do just that.  It adopts the same termination provisions we are all familiar with from GPL-3.0 as an Additional Permission giving companies confidence that they will have time to come into compliance if a failure is identified. Their ability to rely on this Additional Permission will hopefully re-establish user confidence and help direct enforcement activity back to the original purpose we have all sought over the years – actual compliance.  

Kernel developers in our ecosystem may put their own acknowledgement to the Statement by sending a patch to Greg adding their name to the Statement, like any other kernel patch submission, and it will be gladly merged. Those authorized to ‘ack’ on behalf of their company may add their company name in (parenthesis) after their name as well.

Note, a number of questions did come up when this was discussed with the kernel developer community. Please see Greg’s FAQ post answering the most common ones if you have further questions about this topic.

October 16, 2017 09:00 AM

Pavel Machek: Help time travelers!

Ok, so I have various machines here. It seems only about half of them has working RTC. That are the boring ones.

And even the boring ones have pretty imprecise RTCs... For example Nokia N9. I only power it up from time to time, I believe it drifts something like minute per month... For normal use with SIM card, it can probably correct from GSM network if you happen to have a cell phone signal, but...

More interesting machines... Old thinkpad is running without CMOS battery. ARM OLPC has _three_ RTCs, but not a single working one. N900 has working RTC but no or dead backup battery. On these, RTC driver probably knows time is not valid, but feeds the garbage into the system time, anyway. Ouch. Neither Sharp Zaurus SL-5500 nor C-3000 had battery backup on RTC...

Even in new end-user machines, time quality varies a lot. "First boot, please enter time" is only accurate to seconds, if the user is careful. RTC is usually not very accurate, either... and noone uses adjtime these days. GSM time and ntpdate are probably accurate to miliseconds, GPS can provide time down to picoseconds... And broken systems are so common "swclock" is available in init system to store time in file, so it at least does not go backwards.

https (and other crypto) depends on time... so it is important to know approximate month we are in.

Is it time we handle it better?

Could we return both time and log2(expected error) from system calls?

That way we could hide the clock in GUI if time is not available or not precise to minutes, ignore certificate dates when time is not precise to months, and you would not have to send me a "Pavel, are you time traveling, again?" message next time my mailer sends email dated to 1970.

October 16, 2017 07:38 AM

October 14, 2017

James Bottomley: Using Elliptic Curve Cryptography with TPM2

One of the most significant advances going from TPM1.2 to TPM2 was the addition of algorithm agility: The ability of TPM2 to work with arbitrary symmetric and asymmetric encryption schemes.  In practice, in spite of this much vaunted agile encryption capability, most actual TPM2 chips I’ve seen only support a small number of asymmetric encryption schemes, usually RSA2048 and a couple of Elliptic Curves.  However, the ability to support any Elliptic Curve at all is a step up from TPM1.2.  This blog post will detail how elliptic curve schemes can be integrated into existing cryptographic systems using TPM2.  However, before we start on the practice, we need at least a tiny swing through the theory of Elliptic Curves.

What is an Elliptic Curve?

An Elliptic Curve (EC) is simply the set of points that lie on the curve in the two dimensional plane (x,y) defined by the equation

y2 = x3 + ax + b

which means that every elliptic curve can be parametrised by two constants a and b.  The set of all points lying on the curve plus a point at infinity is combined with an addition operation to produce an abelian (commutative) group.  The addition property is defined by drawing straight lines between two points and seeing where they intersect the curve (or picking the infinity point if they don’t intersect).  Wikipedia has a nice diagrammatic description of this here.  The infinity point acts as the identity of the addition rule and the whole group is denoted E.

The utility for cryptography is that you can define an integer multiplier operation which is simply the element added to itself n times, so for P ∈ E, you can always find Q ∈ E such that

Q = P + P + P … = n × P

And, since it’s a simple multiplication like operation, it’s very easy to compute Q.  However, given P and Q it is computationally very difficult to get back to n.  In fact, it can be demonstrated mathematically that trying to compute n is equivalent to the discrete logarithm problem which is the mathematical basis for the cryptographic security of RSA.  This also means that EC keys suffer the same (actually more so) problems as RSA keys: they’re not Quantum Computing secure (vulnerable to the Quantum Shor’s algorithm) and they would be instantly compromised if the discrete logarithm problem were ever solved.

Therefore, for any elliptic curve, E, you can choose a known point G ∈ E, select a large integer d and you can compute a point P = d × G.  You can then publish (P, G, E) as your public key knowing it’s computationally infeasible for anyone to derive your private key d.

For instance, Diffie-Hellman key exchange can be done by agreeing (E, G) and getting Alice and Bob to select private keys dA, dB.  Then knowing Bob’s public key PB, Alice can select a random integer r, which she publishes, and compute a key agreement as a secret point on the Elliptic Curve (r dA) × PB.  Bob can derive the same Elliptic Curve point because

(r dA) × PB = (r dA)dB × G = (r dB) dA × G = (r dB) × PA

The agreement is a point on the curve, but you can use an agreed hashing or other mechanism to get from the point to a symmetric key.

Seems simple, but the problem for computing is that we really want to use integers and right at the moment the elliptic curve is defined over all the real numbers, meaning E is of infinite size and involves floating point computations (of rather large precision).

Elliptic Curves over Finite Fields

Fortunately there is a mathematical theory of finite fields, called Galois Theory, which allows us to take the Galois Field over prime number p, which is denoted GF(p), and compute Elliptic Curve points over this field.  This derivation, which is mathematically rather complicated, is denoted E(GF(p)), where every point (x,y) is represented by a pair of integers between 0 and p-1.  There is another theory that says the number of elements in E(GF(p))

n = |E(GF(p))|

is roughly the same size as p, meaning if you choose a 32 bit prime p, you likely have a field over roughly 2^32 elements.  For every point P in E(GF(p)) it is also mathematically proveable that n × P = 0. where 0 is the zero point (which was the infinity point in the real elliptic curve).

This means that you can take any point, G,  in E(GF(p)) and compute a subgroup based on it:

EG = { ∀m ∈ Zn : m × G }

If you’re lucky |EG| = |E(GF(p))| and G is the generator of the entire group.  However, G may only generate a subgroup and you will find |EG| = h|E(GF(p))| where integer h is called the cofactor.  In general you want the cofactor to be small (preferably less than four) for EG to be cryptographically useful.

For a computer’s purposes, EG is the elliptic curve group used for integer arithmetic in the cryptographic algorithms.  The Curve and Generator is then defined by (p, a, b, Gx, Gy, n, h) which are the published parameters of the key (Gx, Gy represent the x and y elements of point G).  You select a random number d as your private key and your public key P = d × G exactly as above, except now P is easy to compute with integer operations.

Problems with Elliptic Curves

Although I stated above that solving P = d × G is equivalent in difficulty to the discrete logarithm problem, that’s not generally true.  If the discrete logarithm problem were solved, then we’d easily be able to compute d for every generator and curve, but it is possible to pick curves for which d can be easily computed without solving the discrete logarithm problem. This is the reason why you should never pick your own curve parameters (even if you think you know what you’re doing) because it’s very easy to choose a compromised curve.  As a demonstration of the difficulty of the problem: each of the major nation state actors, Russia, China and the US, publishes their own curve parameters for use in their own cryptographic EC implementations and each of them thinks the parameters published by the others is compromised in a way that allows the respective national security agencies to derive private keys.  So if nation state actors can’t tell if a curve is compromised or not, you surely won’t be able to either.

Therefore, to be secure in EC cryptography, you pick and existing curve which has been vetted and select some random Generator Point on it.  Of course, if you’re paranoid, that means you won’t be using any of the nation state supplied curves …

Using the TPM2 with Elliptic Curves in Cryptosystems

The initial target for this work was the openssl cryptosystem whose libraries are widely used for deriving other uses (like https in apache or openssh). Originally, when I did the initial TPM2 enabling of openssl as described in this blog post, I added TPM2 as a patch to the existing TPM 1.2 openssl_tpm_engine.  Unfortunately, openssl_tpm_engine seems to be pretty much defunct at this point, so I started my own openssl_tpm2_engine as a separate git tree to begin experimenting with Elliptic Curve keys (if you don’t use git, you can download the tar file here). One of the benefits to running my own source tree is that I can now add a testing infrastructure that makes use of the IBM TPM emulator to make sure that the basic cryptographic operations all work which means that make check functions even when a TPM2 isn’t available.  The current key creation and import algorithms use secured connections to the TPM (to avoid eavesdropping) which means it’s only really possible to construct them using the IBM TSS. To make all of this easier, I’ve set up an openSUSE Build Service repository which is building for all major architectures and the openSUSE and Fedora distributions (ignore the failures, they’re currently induced because the TPM emulator only currently works on 64 bit little endian systems, so make check is failing, but the TPM people at IBM are working on this, so eventually the builds should be complete).

TPM2 itself also has some annoying restrictions.  The biggest of which is that it doesn’t allow you to pass in arbitrary elliptic curve parameters; you may only use elliptic curves which the TPM itself knows.  This will be annoying if you have an existing EC key you’re trying to import because the TPM may reject it as an unknown algorithm.  For instance, openssl can actually compute with arbitrary EC parameters, but has 39 current elliptic curves parametrised by name. By contrast, my Nuvoton TPM2 inside my Dell XPS 13 knows precisely two curves:

jejb@jarvis:~> create_tpm2_key --list-curves
prime256v1
bnp256

However, assuming you’ve picked a compatible curve for your EC private key (and you’ve defined a parent key for the storage hierarchy) you can simply import it to a TPM bound key:

create_tpm2_key -p 81000001 -w key.priv key.tpm

The tool will report an error if it can’t convert the curve parameters to a named elliptic curve known to the TPM

jejb@jarvis:~> openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:brainpoolP256r1 > key.priv
jejb@jarvis:~> create_tpm2_key -p 81000001 -w key.priv key.tpm
TPM does not support the curve in this EC key
openssl_to_tpm_public failed with 166
TPM_RC_CURVE - curve not supported Handle number unspecified

You can also create TPM resident private keys simply by specifying the algorithm

create_tpm2_key -p 81000001 --ecc bnp256 key.tpm

Once you have your TPM based EC keys, you can use them to create public keys and certificates.  For instance, you create a self-signed X509 certificate based on the tpm key by

openssl req -new -x509 -sha256  -key key.tpm -engine tpm2 -keyform engine -out my.crt

Why you should use EC keys with the TPM

The initial attraction is the same as for RSA keys: making it impossible to extract your private key from the system.  However, the mathematical calculations for EC keys are much simpler than for RSA keys and don’t involve finding strong primes, so it’s much simpler for the TPM (being a fairly weak calculation machine) to derive private and public EC keys.  For instance the times taken to derive a RSA key from the primary seed and an EC key differ dramatically

jejb@jarvis:~> time tsscreateprimary -hi o -ecc bnp256 -st
Handle 80ffffff

real 0m0.111s
user 0m0.000s
sys 0m0.014s

jejb@jarvis:~> time tsscreateprimary -hi o -rsa -st
Handle 80ffffff

real 0m20.473s
user 0m0.015s
sys 0m0.084s

so for a slow system like the TPM, using EC keys is a significant speed advantage.  Additionally, there are other advantages.  The standard EC Key signature algorithm is a modification of the NIST Digital Signature Algorithm called ECDSA.  However DSA and ECDSA require a cryptographically strong (and secret) random number as Sony found out to their cost in the EC Key compromise of Playstation 3.  The TPM is a good source of cryptographically strong random numbers and if it generates the signature internally, you can be absolutely sure of keeping the input random number secret.

Why you might want to avoid EC keys altogether

In spite of the many advantages described above, EC keys suffer one additional disadvantage over RSA keys in that Elliptic Curves in general are very hot fields of mathematical research so even if the curve you use today is genuinely not compromised, it’s not impossible that a mathematical advance tomorrow will make the curve you chose (and thus all the private keys you generated) vulnerable.  Of course, the same goes for RSA if anyone ever cracks the discrete logarithm problem, but solving that problem would likely be fully published to world acclaim and recognition as a significant contribution to the advancement of number theory.  Discovering an attack on a currently used elliptic curve on the other hand might be better remunerated by offering to sell it privately to one of the national security agencies …

October 14, 2017 10:52 PM

October 11, 2017

Paul E. Mc Kenney: Stupid RCU Tricks: In the audience for a pair of RCU talks!

I had the privilege of attending CPPCON last month. Michael Wong, Maged Michael, and I presented a parallel-programming overview, in which I presented the "Hardware and its Habits" chapter of Is Parallel Programming Hard, And, If So, What Can You Do About It?.

But the highlight for me was actually sitting in the audience for a pair of talks by people who had implemented RCU in C++.

Ansel Sermersheim presented a two-part talk entitled Multithreading is the answer. What is the question?. The second part of this talk covered lockless containers, and used a variant of RCU to implement a low-overhead libGuarded facility in order to more easily avoid deadlocks. The implementation is similar to the Linux-kernel real-time RCU implementation by Jim Houston and Joe Korty in that the counterpart to rcu_read_unlock() actively registers a quiescent state. Ansel's implementation goes further by also driving callback invocation from rcu_read_unlock(). Now I don't recommend this for a general-purpose RCU implementation due to the possibility of deadlock should a resource need to be held across rcu_read_unlock() and acquired within the callback. However, this approach should work just fine in the case where the callbacks just free memory and the memory allocator does not contain too many RCU read-side critical sections.

Fedor Pikus presented a talk entitled Read, Copy, Update, then what? RCU for non-kernel programmers, in which he gave a quite-decent introduction to use of RCU. This introduction included an improved version of my long-standing where-to-use-RCU diagram, which I fully intend to incorporate. I had a number of but-you-could moments, including the usual "put the size in with the array" advice, ways of updating things already exposed to readers, and the fact that RCU really can tolerate multiple writers, along with some concerns about counter overflow. Nevertheless, an impressive amount of great information in a one-hour talk!

It is very good to see more people making use of RCU!

October 11, 2017 09:47 PM

October 04, 2017

Dave Airlie (blogspot): radv: a conformant Vulkan driver (with caveats)

If you take a look at the conformant vulkan list, you might see entry 220.

Software in the Public Interest, Inc. 2017-10-04 Vulkan_1_0 220
AMD Radeon R9 285 Intel i5-4460 x86_64 Linux 4.13 X.org DRI3.

 This is radv, and this is the first conformance submission done under the X.org (SPI) membership of the Khronos adopter program.

This submission was a bit of a trial run for radv developers, but Mesa 17.2 + llvm 5.0 on Bas's R9 285 card.

We can extend this submission to cover all VI GPUs.

In practice we pass all the same tests on CIK and Polaris GPUs, but we will have to do complete submission runs on those when we get a chance.

But major milestone/rubberstamp reached, radv is now a conformant Vulkan driver. Thanks go to Bas and all the other contributors and people who's code we've leveraged!

October 04, 2017 07:49 PM

October 02, 2017

James Morris: Linux Security Summit 2017 Roundup

The 2017 Linux Security Summit (LSS) was held last month in Los Angeles over the 14th and 15th of September.  It was co-located with Open Source Summit North America (OSSNA) and the Linux Plumbers Conference (LPC).

LSS 2017 sign at conference

LSS 2017

Once again we were fortunate to have general logistics managed by the Linux Foundation, allowing the program committee to focus on organizing technical content.  We had a record number of submissions this year and accepted approximately one third of them.  Attendance was very strong, with ~160 attendees — another record for the event.

LSS 2017 Attendees

LSS 2017 Attendees

On the day prior to LSS, attendees were able to access a day of LPC, which featured two tracks with a security focus:

Many thanks to the LPC organizers for arranging the schedule this way and allowing LSS folk to attend the day!

Realtime notes were made of these microconfs via etherpad:

I was particularly interested in the topic of better integrating LSM with containers, as there is an increasingly common requirement for nesting of security policies, where each container may run its own apparently independent security policy, and also a potentially independent security model.  I proposed the approach of introducing a security namespace, where all security interfaces within the kernel are namespaced, including LSM.  It would potentially solve the container use-cases, and also the full LSM stacking case championed by Casey Schaufler (which would allow entirely arbitrary stacking of security modules).

This would be a very challenging project, to say the least, and one which is further complicated by containers not being a first class citizen of the kernel.   This leads to security policy boundaries clashing with semantic functional boundaries e.g. what does it mean from a security policy POV when you have namespaced filesystems but not networking?

Discussion turned to the idea that it is up to the vendor/user to configure containers in a way which makes sense for them, and similarly, they would also need to ensure that they configure security policy in a manner appropriate to that configuration.  I would say this means that semantic responsibility is pushed to the user with the kernel largely remaining a set of composable mechanisms, in relation to containers and security policy.  This provides a great deal of flexibility, but requires those building systems to take a great deal of care in their design.

There are still many issues to resolve, both upstream and at the distro/user level, and I expect this to be an active area of Linux security development for some time.  There were some excellent followup discussions in this area, including an approach which constrains the problem space. (Stay tuned)!

A highlight of the TPMs session was an update on the TPM 2.0 software stack, by Philip Tricca and Jarkko Sakkinen.  The slides may be downloaded here.  We should see a vastly improved experience over TPM 1.x with v2.0 hardware capabilities, and the new software stack.  I suppose the next challenge will be TPMs in the post-quantum era?

There were further technical discussions on TPMs and container security during subsequent days at LSS.  Bringing the two conference groups together here made for a very productive event overall.

TPMs microconf at LPC with Philip Tricca presenting on the 2.0 software stack.

This year, due to the overlap with LPC, we unfortunately did not have any LWN coverage.  There are, however, excellent writeups available from attendees:

There were many awesome talks.

The CII Best Practices Badge presentation by David Wheeler was an unexpected highlight for me.  CII refers to the Linux Foundation’s Core Infrastructure Initiative , a preemptive security effort for Open Source.  The Best Practices Badge Program is a secure development maturity model designed to allow open source projects to improve their security in an evolving and measurable manner.  There’s been very impressive engagement with the project from across open source, and I believe this is a critically important effort for security.

CII Bade Project adoption (from David Wheeler’s slides).

During Dan Cashman’s talk on SELinux policy modularization in Android O,  an interesting data point came up:

Interesting data from the talk: 44% of Android kernel vulns blocked by SELinux due to attack surface reduction. https://t.co/FnU544B3XP

— James Morris (@xjamesmorris) September 15, 2017

We of course expect to see application vulnerability mitigations arising from Mandatory Access Control (MAC) policies (SELinux, Smack, and AppArmor), but if you look closely this refers to kernel vulnerabilities.   So what is happening here?  It turns out that a side effect of MAC policies, particularly those implemented in tightly-defined environments such as Android, is a reduction in kernel attack surface.  It is generally more difficult to reach such kernel vulnerabilities when you have MAC security policies.  This is a side-effect of MAC, not a primary design goal, but nevertheless appears to be very effective in practice!

Another highlight for me was the update on the Kernel Self Protection Project lead by Kees, which is now approaching its 2nd anniversary, and continues the important work of hardening the mainline Linux kernel itself against attack.  I would like to also acknowledge the essential and original research performed in this area by grsecurity/PaX, from which this mainline work draws.

From a new development point of view, I’m thrilled to see the progress being made by Mickaël Salaün, on Landlock LSM, which provides unprivileged sandboxing via seccomp and LSM.  This is a novel approach which will allow applications to define and propagate their own sandbox policies.  Similar concepts are available in other OSs such as OSX (seatbelt) and BSD (pledge).  The great thing about Landlock is its consolidation of two existing Linux kernel security interfaces: LSM and Seccomp.  This ensures re-use of existing mechanisms, and aids usability by utilizing already familiar concepts for Linux users.

Mickaël Salaün from ANSSI talking about his Landlock LSM work at #linuxsecuritysummit 2017 pic.twitter.com/wYpbHuLgm2

— LinuxSecuritySummit (@LinuxSecSummit) September 14, 2017

Overall I found it to be an incredibly productive event, with many new and interesting ideas arising and lots of great collaboration in the hallway, lunch, and dinner tracks.

Slides from LSS may be found linked to the schedule abstracts.

We did not have a video sponsor for the event this year, and we’ll work on that again for next year’s summit.  We have discussed holding LSS again next year in conjunction with OSSNA, which is expected to be in Vancouver in August.

We are also investigating a European LSS in addition to the main summit for 2018 and beyond, as a way to help engage more widely with Linux security folk.  Stay tuned for official announcements on these!

Thanks once again to the awesome event staff at LF, especially Jillian Hall, who ensured everything ran smoothly.  Thanks also to the program committee who review, discuss, and vote on every proposal, ensuring that we have the best content for the event, and who work on technical planning for many months prior to the event.  And of course thanks to the presenters and attendees, without whom there would literally and figuratively be no event :)

See you in 2018!

 

October 02, 2017 01:52 AM

September 25, 2017

Pavel Machek: Colorful LEDs

RGB LEDs do not exist according to Linux LED subsystem. They are modeled as three separate LEDs, red, green and blue; that matches the hardware.

Unfortunately, it has problems. Lets begin with inconsistent naming: some drivers use :r suffix, some use :red. There's no explicit grouping of LEDs for one light -- there's no place to store parameters common for the light. (LEDs could be grouped by name.)

RGB colorspace is pretty well defined, and people expect to set specific colors. Unfortunately.... that does not work well with LEDs. First, LEDs are usually not balanced according to human perception system, so full power to the LEDs (255, 255, 255) may not
result in white. Second, monitors normally use gamma correction before displaying color, so (128, 128, 128) does not correspond to 50% of light being produced. But LEDs normally use PWM, so (128, 128, 128) does correspond to 50% light. Result is that colors are completely off.

I tested HSV colorspace for the LEDs. That would have advantage of old triggers being able to use selected colors... Unfortunately, on N900, white color is something like 15% blue, which would result in significantly reducing number of white intensities we can display.

September 25, 2017 08:30 AM

September 19, 2017

Pavel Machek: Unicsy phone

For a long time, I wanted a phone that runs Unix. And I got that, first Android, second Maemo on Nokia N900. With Android I realized that running Linux kernel is not enough. Android is really far away from normal Unix machine, and I'd argue away from anything usable, too. Maemo was slightly closer, and probably could be fixed if it was open-source.

But I realized Linux kernel is not really the most important part. There's more to Unix: compatibility with old apps, small programs where each one does one thing well, data in text formats so you can put them in git. Maemo got some parts right, at least you could run old apps in a useful way; but most important data on the phone (contacts, calendar) were still locked away in sqlite.

And that is something I'd like to change: phone that is ssh-friendly, text-editor-friendly and git-friendly. I call it "Unicsy phone". No, I don't want to do phone `cat addressbook | grep Friend | cut -f 1`... graphical utilities are okay. But console tools still should be there, and file formats should be reasonable.

So there is tui project, and recently postmarketos project appeared. Nokia N900 is mostly supported by mainline kernel (with exceptions of bluetooth and camera, everything works). There's work to be done, but it looks doable.

More is missing in the userspace. Phone parts need work, as expected. What is more surprising... there's emacs org mode, with great calendar capabilities, but I could not find matching application to display data nicely and provide alerts. Situation is even worse for contacts; emacs org can help there, too, but there does not seem to be agreement that this is the way to go. (And again, graphical applications would be nice).

September 19, 2017 10:17 PM

September 16, 2017

Pavel Machek: FlightGear fun

How to die in Boeing 707, quick and easy. Take off, realize that you should set up fuel heating, select Help|, aim for checklists.. and hit auto startup/shutdown. Instantly lose all the engines. Fortunately, you are at 6000', so you start looking for the airport. Then you
realize "hmm, perhaps I can do the startup thing now", and hit the menu item once again. But instead of running engines, you get fire warnings on all the engines. That does not look good. Confirm fire, extinguish all four engines, and resume looking for airport in range. Trim for best glide. Then number 3 comes up. Then number 4. Number one and you know it will be easy. Number two as you fly over the runway... go around and do normal approach.

September 16, 2017 11:41 AM

September 15, 2017

Linux Plumbers Conference: Linux Plumbers Conference Unconference Schedule Announced

Since we only have six proposals, we can schedule them in the unconference session without any need for actual voting at breakfast.  On a purely random basis, the schedule will be:

Unconference I:

09:30 Test driven development (TDD) in the kernel – Knut Omang
11:00 Support for adding DT based thermal zones at runtime – Moritz Fischer
11:50 Restartable Sequences interaction with debugger single-stepping – Mathieu Desnoyers

Unconference II:

14:00 Automated testing of LKML patches with Clang – Nick Desaulniers
14:50 ktask: multithread cpu-intensive kernel work – Daniel Jordan
16:00 Soft Affinity for Workloads – Rohit Jain

I’ll add these to the plumbers schedule (if the author doesn’t already have an account, I’ll show up as the speaker, but please take the above list as definitive for actual speaker).

Looking forward to seeing you all at this exciting new event for Plumbers,

September 15, 2017 04:57 AM

September 14, 2017

Grant Likely: Arcade Panel Construction Time-Lapse Video

September 14, 2017 05:56 PM

Grant Likely: NeoPixel Arcade Buttons

September 14, 2017 05:54 PM

September 13, 2017

Grant Likely: Custom Arcade Control Panels

I’ve started building custom arcade controls for using with classic arcade game emulators. All Open Source and Open Hardware of course, with the source code up on GitHub.

OpenSCAD:Arcade is an arcade panel modeling too written in OpenSCAD. It designs the arcade panel layout and produces lasercutter output and frame dimensions.

STM32F3-Discovery-Arcade is a prototype USB HID device for arcade controls. It currently supports GPIO joysticks and buttons, quadrature trackballs/spinners, and will drive up to 4 channels of NeoPixel RGB LED strings. The project has both custom STM32 firmware and a custom adaptor PCB designed with KiCad.

Please go take a look.

September 13, 2017 11:39 PM

Gustavo F. Padovan: Slides of my talk at Open Source Summit NA

I just delivered a talk today at Open Source Summit NA, here in LA, about everything we’ve been doing to support explicit synchronization on the Media and Graphics pipeline in the kernel. You can find the slides here.

The DRM side is already mainline, but V4L2 is currently my focus of work along with the linux-media community in the kernel. Blog posts about that should appear soon on this blog.

September 13, 2017 07:56 PM

September 11, 2017

Linux Plumbers Conference: New to Plumbers: Unconference on Friday

The hallway track is always a popular feature of Linux Plumbers Conference.  New ideas and solutions emerge all the time.  But sometimes you start a discussion, and want to pull others in before the conference ends, and just can’t quite make it work.

This year, we’re trying an experiment at Linux Plumbers and reserving a room for an unconference session on Friday,  so the ad hoc problem solving sessions for those topics with the most participant interest can be held.

If there is a topic you want to have a 1 hour discussion around,  please put it on the etherpad with:

 

Topic:  <something short>
Host(s): <person who will host the discussion>
Description:   <describe problem you want to talk about>

 

We’ll close down the topic page on Thursday night at 8pm,  and print the collected topics out on In the morning and post them in the room.     During the breakfast period (from 8 to 9am), those wanting to participate will be given four dots to vote.  Vote by placing a dot on the topics of interest until 8:45am.   Sessions will be scheduled as the one with the most dots first, and in descending order until we run out of sessions or time.

Schedule will be posted in the room on Friday morning.

September 11, 2017 05:54 PM

September 07, 2017

James Morris: Linux Plumbers Conference Sessions for Linux Security Summit Attendees

Folks attending the 2017 Linux Security Summit (LSS) next week may be also interested in attending the TPMs and Containers sessions at Linux Plumbers Conference (LPC) on the Wednesday.

The LPC TPMs microconf will be held in the morning and lead by Matthew Garret, while the containers microconf will be run by Stéphane Graber in the afternoon.  Several security topics will be discussed in the containers session, including namespacing and stacking of LSM, and namespacing of IMA.

Attendance on the Wednesday for LPC is at no extra cost for registered attendees of LSS.  Many thanks to the LPC organizers for arranging this!

There will be followup BOF sessions on LSM stacking and namespacing at LSS on Thursday, per the schedule.

This should be a very productive week for Linux security development: see you there!

September 07, 2017 01:44 AM

September 06, 2017

Linux Plumbers Conference: Linux Plumbers Conference Preliminary Schedule Published

You can see the schedule by clicking on the ‘schedule’ tab above or by going to this url

http://www.linuxplumbersconf.org/2017/ocw/events/LPC2017/schedule

If you’d like any changes, please email contact@linuxplumbersconf.org and we’ll see what we can do to accommodate your request.

Please also remember that the schedule is subject to change.

September 06, 2017 10:01 PM

Greg Kroah-Hartman: 4.14 == This years LTS kernel

As the 4.13 release has now happened, the merge window for the 4.14 kernel release is now open. I mentioned this many weeks ago, but as the word doesn’t seem to have gotten very far based on various emails I’ve had recently, I figured I need to say it here as well.

So, here it is officially, 4.14 should be the next LTS kernel that I’ll be supporting with stable kernel patch backports for at least two years, unless it really is a horrid release and has major problems. If so, I reserve the right to pick a different kernel, but odds are, given just how well our development cycle has been going, that shouldn’t be a problem (although I guess I just doomed it now…)

As always, if people have questions about this, email me and I will be glad to discuss it, or talk to me in person next week at the LinuxCon^WOpenSourceSummit or Plumbers conference in Los Angeles, or at any of the other conferences I’ll be at this year (ELCE, Kernel Recipes, etc.)

September 06, 2017 02:41 PM

September 05, 2017

Kees Cook: security things in Linux v4.13

Previously: v4.12.

Here’s a short summary of some of interesting security things in Sunday’s v4.13 release of the Linux kernel:

security documentation ReSTification
The kernel has been switching to formatting documentation with ReST, and I noticed that none of the Documentation/security/ tree had been converted yet. I took the opportunity to take a few passes at formatting the existing documentation and, at Jon Corbet’s recommendation, split it up between end-user documentation (which is mainly how to use LSMs) and developer documentation (which is mainly how to use various internal APIs). A bunch of these docs need some updating, so maybe with the improved visibility, they’ll get some extra attention.

CONFIG_REFCOUNT_FULL
Since Peter Zijlstra implemented the refcount_t API in v4.11, Elena Reshetova (with Hans Liljestrand and David Windsor) has been systematically replacing atomic_t reference counters with refcount_t. As of v4.13, there are now close to 125 conversions with many more to come. However, there were concerns over the performance characteristics of the refcount_t implementation from the maintainers of the net, mm, and block subsystems. In order to assuage these concerns and help the conversion progress continue, I added an “unchecked” refcount_t implementation (identical to the earlier atomic_t implementation) as the default, with the fully checked implementation now available under CONFIG_REFCOUNT_FULL. The plan is that for v4.14 and beyond, the kernel can grow per-architecture implementations of refcount_t that have performance characteristics on par with atomic_t (as done in grsecurity’s PAX_REFCOUNT).

CONFIG_FORTIFY_SOURCE
Daniel Micay created a version of glibc’s FORTIFY_SOURCE compile-time and run-time protection for finding overflows in the common string (e.g. strcpy, strcmp) and memory (e.g. memcpy, memcmp) functions. The idea is that since the compiler already knows the size of many of the buffer arguments used by these functions, it can already build in checks for buffer overflows. When all the sizes are known at compile time, this can actually allow the compiler to fail the build instead of continuing with a proven overflow. When only some of the sizes are known (e.g. destination size is known at compile-time, but source size is only known at run-time) run-time checks are added to catch any cases where an overflow might happen. Adding this found several places where minor leaks were happening, and Daniel and I chased down fixes for them.

One interesting note about this protection is that is only examines the size of the whole object for its size (via __builtin_object_size(..., 0)). If you have a string within a structure, CONFIG_FORTIFY_SOURCE as currently implemented will make sure only that you can’t copy beyond the structure (but therefore, you can still overflow the string within the structure). The next step in enhancing this protection is to switch from 0 (above) to 1, which will use the closest surrounding subobject (e.g. the string). However, there are a lot of cases where the kernel intentionally copies across multiple structure fields, which means more fixes before this higher level can be enabled.

NULL-prefixed stack canary
Rik van Riel and Daniel Micay changed how the stack canary is defined on 64-bit systems to always make sure that the leading byte is zero. This provides a deterministic defense against overflowing string functions (e.g. strcpy), since they will either stop an overflowing read at the NULL byte, or be unable to write a NULL byte, thereby always triggering the canary check. This does reduce the entropy from 64 bits to 56 bits for overflow cases where NULL bytes can be written (e.g. memcpy), but the trade-off is worth it. (Besdies, x86_64’s canary was 32-bits until recently.)

IPC refactoring
Partially in support of allowing IPC structure layouts to be randomized by the randstruct plugin, Manfred Spraul and I reorganized the internal layout of how IPC is tracked in the kernel. The resulting allocations are smaller and much easier to deal with, even if I initially missed a few needed container_of() uses.

randstruct gcc plugin
I ported grsecurity’s clever randstruct gcc plugin to upstream. This plugin allows structure layouts to be randomized on a per-build basis, providing a probabilistic defense against attacks that need to know the location of sensitive structure fields in kernel memory (which is most attacks). By moving things around in this fashion, attackers need to perform much more work to determine the resulting layout before they can mount a reliable attack.

Unfortunately, due to the timing of the development cycle, only the “manual” mode of randstruct landed in upstream (i.e. marking structures with __randomize_layout). v4.14 will also have the automatic mode enabled, which randomizes all structures that contain only function pointers.

A large number of fixes to support randstruct have been landing from v4.10 through v4.13, most of which were already identified and fixed by grsecurity, but many were novel, either in newly added drivers, as whitelisted cross-structure casts, refactorings (like IPC noted above), or in a corner case on ARM found during upstream testing.

lower ELF_ET_DYN_BASE
One of the issues identified from the Stack Clash set of vulnerabilities was that it was possible to collide stack memory with the highest portion of a PIE program’s text memory since the default ELF_ET_DYN_BASE (the lowest possible random position of a PIE executable in memory) was already so high in the memory layout (specifically, 2/3rds of the way through the address space). Fixing this required teaching the ELF loader how to load interpreters as shared objects in the mmap region instead of as a PIE executable (to avoid potentially colliding with the binary it was loading). As a result, the PIE default could be moved down to ET_EXEC (0x400000) on 32-bit, entirely avoiding the subset of Stack Clash attacks. 64-bit could be moved to just above the 32-bit address space (0x100000000), leaving the entire 32-bit region open for VMs to do 32-bit addressing, but late in the cycle it was discovered that Address Sanitizer couldn’t handle it moving. With most of the Stack Clash risk only applicable to 32-bit, fixing 64-bit has been deferred until there is a way to teach Address Sanitizer how to load itself as a shared object instead of as a PIE binary.

early device randomness
I noticed that early device randomness wasn’t actually getting added to the kernel entropy pools, so I fixed that to improve the effectiveness of the latent_entropy gcc plugin.

That’s it for now; please let me know if I missed anything. As a side note, I was rather alarmed to discover that due to all my trivial ReSTification formatting, and tiny FORTIFY_SOURCE and randstruct fixes, I made it into the most active 4.13 developers list (by patch count) at LWN with 76 patches: a whopping 0.6% of the cycle’s patches. ;)

Anyway, the v4.14 merge window is open!

© 2017, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

September 05, 2017 11:01 PM

Linux Plumbers Conference: Testing and Fuzzing Microconference Accepted into the Linux Plumbers Conference

We’re pleased to announce that newcomer Microconference Testing and Fuzzing will feature at Plumbers in Los Angeles this year.

The Agenda will feature the three fuzzers used for the Linux Kernel (Trinity, Syzkaller and Perf) along with discussion of formal verification tools, discussion of how to test stable trees, testing frameworks and also a discussion and demonstration of the drm/i915 checkin and test infrastructure.

Additionally, we will hold a session aimed at improving the testing process for linux-stable and distro kernels. Please plan to attend if you have input into how to integrate additional testing and make these kernels more reliable. Participants will include Greg Kroah-Hartman and major distro kernel maintainers.

For more details on this, please see this microconference’s wiki page.

We hope to see you there!

September 05, 2017 08:04 AM

August 26, 2017

Matt Domsch: Twilio Voice to Pagerduty alert using Python Flask, Zappa, AWS Lambda & AWS API Gateway

My SaaS product DevOps team at Quest Software uses several monitoring services to notice problems (hopefully before end users see them), and raises alerts for our team using PagerDuty. We also frequently need to integrate with existing company and partner products, for example our internal helpdesk and customer-facing technical-support processes. In this case, the helpdesk team wanted to have a phone number they could call to raise an alert to our team. The first suggestion was to simply put my name down as the 24×7 on-call contact, and make it my problem to alert the right people. I scoffed. We already had PagerDuty in place – why couldn’t we use that too? Simply because we didn’t have a phone number hooked up to PagerDuty. So, lets fix that.

A few searches quickly turned up a PagerDuty blog where David Hayes had done exactly this. Excellent! However, it was written to use Google App Engine, and my team has their processes predominately in Azure and AWS. I didn’t want to introduce yet another set of cloud services, for something conceptually so simple.

Twilio’s quickstarts do a nice job of showing how to use their API, and these use Flask for the web framework. How can I use Flask apps in AWS Lambda? Here enters Zappa, a tool for deploying Flask and Django apps into AWS Lambda & AWS API Gateway. Slick! Now I have all the pieces I need.

You can find the code on github. I’ve extended the quickstarts slightly, to have the phone response first prompt for the application that is experiencing issues, and recording that in a session cookie to be retrieved later. Then it prompts the user to leave a message. With these two pieces of information, we have enough to create the PagerDuty incident for the proper application, including information about the caller gathered from Caller ID (in case the recording is garbled), and a link to the recorded message. Not too shabby for ~125 lines of “my” code, at a cost of ~$1/month to Twilio for the phone number, almost $0.00 for AWS, and a couple pennies if anyone actually calls to raise an alert.

August 26, 2017 07:19 PM

August 24, 2017

Pete Zaitcev: Oh not again

Fedora is mulling dropping the 32-bit x86 again, after the F26, which means I need to buy a new router. It's not like I cannot afford one... But it's such as hassle to migrate. I'm thinking about installing one in the background and then re-numbering it, in order to minimize issues. Even then, I cannot test, for instance, that VLANs work right, until I actually phase the box into production. It's much easier to keep a compatible 32-bit box mirrored and ready on stand-by.

In a sense, the amazing ease of upgrades in modern Fedora lulled me into this. Before, I re-installed anyway, and so could roll 64-bit just as easily.

P.S. According to records at the hoster, my primary public VM was installed as Fedora 15 and continuously upgraded since then.

August 24, 2017 07:21 PM

August 17, 2017

Linux Plumbers Conference: Tracing/BPF Microconference Accepted into the Linux Plumbers Conference

Following on from the successful Tracing Microconference last year, we’re pleased to announce there will be a follow on at Plumbers in Los Angeles this year.

The agenda for this year will not focus only on tracing but also will include several topics around eBPF. As eBPF now interacts with tracing and there is still a lot of work to accomplish, such as building an infrastructure around the current tools to compile and utilize eBPF within the tracing framework. Topics outside of eBPF will include enhancing uprobes and tracing virtualize and layered environments. Of particular interest is new techniques to improve kernel to user space tracing integration. This includes usage of uftrace and better symbol resolution of user space addresses from within the kernel. Additionally there will be a discussion on challenges of real world use cases by non-kernel engineers.

For more details on this, please see this microconference’s wiki page.

We hope to see you there!

August 17, 2017 04:51 PM

August 16, 2017

Linux Plumbers Conference: Trusted Platform Module Microconference Accepted into the Linux Plumbers Conference

Following on from the TPM Microconference last year, we’re pleased to announce there will be a follow on at Plumbers in Los Angeles this year.

The agenda for this year will focus on a renewed attempt to unify the 2.0 TSS; cryptosystem integration to make TPMs just work for the average user; the current state of measured boot and where we’re going; using TXT with TPM in Linux and using TPM from containers.

For more details on this, please see this microconference’s wiki page

We hope to see you there!

August 16, 2017 12:01 AM

August 14, 2017

Dave Airlie (blogspot): radv on SI and CIK GPU - update

I recently acquired an r7 360 (BONAIRE) and spent some time getting radv stable and passing the same set of conformance tests that VI and Polaris pass.

The main missing thing was 10-bit integer format clamping for a bug in the SI/CIK fragment shader output hardware, where it truncates instead of clamps. The other missing piece was code for handling f16->f32 conversions according to the vulkan spec that I'd previously fixed for VI.

I also looked at a trace from amdgpu-pro and noticed it was using a ds_swizzle for the derivative calculations which avoided accessing LDS memory. I wrote support to use this path for radv/radeonsi since LLVM supported the intrinsic for a while now.

With these fixed CIK is pretty much in the same place as VI/Polaris.

I then plugged in my SI (Tahiti), and got lots of GPU hangs and crashes. I fixed a number of SI specific bugs (tiling and MSAA handling, stencil tiling). However even with those fixed I was getting random hangs, and a bunch of people on a bugzilla had noticed the same thing. I eventually discovered adding a shader pipeline and cache flush at the end of every command buffer (this took a few days to narrow down exactly). We aren't 100% sure why this is required on SI only, it may be a kernel bug, or a command processor bug, but it does mean radv on SI now can run games without hanging.

There are still a few CTS tests outstanding on SI only, and I'll probably get to them eventually, however I also got an RX Vega and once I get a newer BIOS for it from AMD I shall be spending some time fixing the radv support for it.

August 14, 2017 03:16 AM

August 10, 2017

Linux Plumbers Conference: Scheduler Workloads Microconference Accepted into the Linux Plumbers Conference

New to Linux Plumbers Conference this year, the Scheduler Workloads Microconference will focus on understanding various workloads and their impact on the Linux Kernel Scheduler. The objective is to initiate a cross organizational and architectural discussion involving currently available (or in development) benchmarks and their effectiveness in evaluating the scheduler for these workloads.

The agenda for this year will focus on sharing current workload and benchmark tools and traces and how these can be used to improve the various Linux subsystems, including power management and real time.  Given that benchmarking the Linux scheduler is a controversial topic and often depends on proprietary tools, we’ll also discuss how to develop fully open source tools and benchmarks for this.

For more details on this, please see this microconference’s wiki page.

We hope to see you there!

August 10, 2017 07:10 PM

August 08, 2017

Daniel Vetter: Why Github can't host the Linux Kernel Community

A while back at the awesome maintainerati I chatted with a few great fellow maintainers about how to scale really big open source projects, and how github forces projects into a certain way of scaling. The linux kernel has an entirely different model, which maintainers hosting their projects on github don’t understand, and I think it’s worth explaining why and how it works, and how it’s different.

Another motivation to finally get around to typing this all up is the HN discussion on my “Maintainers Don’t Scale” talk, where the top comment boils down to “… why don’t these dinosaurs use modern dev tooling?”. A few top kernel maintainers vigorously defend mailing lists and patch submissions over something like github pull requests, but at least some folks from the graphics subsystem would love more modern tooling which would be much easier to script. The problem is that github doesn’t support the way the linux kernel scales out to a huge number of contributors, and therefore we can’t simply move, not even just a few subsystems. And this isn’t about just hosting the git data, that part obviously works, but how pull requests, issues and forks work on github.

Scaling, the Github Way

Git is awesome, because everyone can fork and create branches and hack on the code very easily. And eventually you have something good, and you create a pull request for the main repo and get it reviewed, tested and merged. And github is awesome, because it figured out an UI that makes this complex stuff all nice&easy to discover and learn about, and so makes it a lot simpler for new folks to contribute to a project.

But eventually a project becomes a massive success, and no amount of tagging, labelling, sorting, bot-herding and automating will be able to keep on top of all the pull requests and issues in a repository, and it’s time to split things up into more manageable pieces again. More important, with a certain size and age of a project different parts need different rules and processes: The shiny new experimental library has different stability and CI criteria than the main code, and maybe you have some dumpster pile of deprecated plugins that aren’t support, but you can’t yet delete them: You need to split up your humongous project into sub-projects, each with their own flavour of process and merge criteria and their own repo with their own pull request and issue tracking. Generally it takes a few tens to few hundreds of full time contributors until the pain is big enough that such a huge reorganization is necessary.

Almost all projects hosted on github do this by splitting up their monorepo source tree into lots of different projects, each with its distinct set of functionality. Usually that results in a bunch of things that are considered the core, plus piles of plugins and libraries and extensions. All tied together with some kind of plugin or package manager, which in some cases directly fetches stuff from github repos.

Since almost every big project works like this I don’t think it’s necessary to delve on the benefits. But I’d like to highlight some of the issues this is causing:

Interlude: Why Pull Requests Exist

The linux kernel is one of the few projects I’m aware of which isn’t split up like this. Before we look at how that works - the kernel is a huge project and simply can’t be run without some sub-project structure - I think it’s interesting to look at why git does pull requests: On github pull request is the one true way for contributors to get their changes merged. But in the kernel changes are submitted as patches sent to mailing lists, even long after git has been widely adopted.

But the very first version of git supported pull requests. The audience of these first, rather rough, releases was kernel maintainers, git was written to solve Linus Torvalds’ maintainer problems. Clearly it was needed and useful, but not to handle changes from individual contributors: Even today, and much more back then, pull requests are used to forward the changes of an entire subsystem, or synchronize code refactoring or similar cross-cutting change across different sub-projects. As an example, the 4.12 network pull request from Dave S. Miller, committed by Linus: It contains 2k+ commits from 600 contributors and a bunch of merges for pull requests from subordinate maintainers. But almost all the patches themselves are committed by maintainers after picking up the patches from mailing lists, not by the authors themselves. This kernel process peculiarity that authors generally don’t commit into shared repositories is also why git tracks the committer and author separately.

Github’s innovation and improvement was then to use pull requests for everything, down to individual contributions. But that wasn’t what they were originally created for.

Scaling, the Linux Kernel Way

At first glance the kernel looks like a monorepo, with everything smashed into one place in Linus’ main repo. But that’s very far from it:

At first this just looks like a complicated way to fill everyone’s disk space with lots of stuff they don’t care about, but there’s a pile of compounding minor benefits that add up:

In short, I think this is a strictly more powerful model, since you can always fall back to doing things exactly like you would with multiple disjoint repositories. Heck there’s even kernel drivers which are in their own repository, disjoint from the main kernel tree, like the proprietary Nvidia driver. Well it’s just a bit of source code glue around a blob, but since it can’t contain anything from the kernel for legal reasons it is the perfect example.

This looks like a monorepo horror show!

Yes and no.

At first glance the linux kernel looks like a monorepo because it contains everything. And lots of people learned that monorepos are really painful, because past a certain size they just stop scaling.

But looking closer, it’s very, very far away from a single git repository. Just looking at the upstream subsystem and driver repositories gives you a few hundred. If you look at the entire ecosystem, including hardware vendors, distributions, other linux-based OS and individual products, you easily have a few thousand major repositories, and many, many more in total. Not counting any git repo that’s just for private use by individual contributors.

The crucial distinction is that linux has one single file hierarchy as the shared namespace across everything, but lots and lots of different repos for all the different pieces and concerns. It’s a monotree with multiple repositories, not a monorepo.

Examples, please!

Before I go into explaining why github cannot currently support this workflow, at least if you want to retain the benefits of the github UI and integration, we need some examples of how this works in practice. The short summary is that it’s all done with git pull requests between maintainers.

The simple case is percolating changes up the maintainer hierarchy, until it eventually lands in a tree somewhere that is shipped. This is easy, because the pull request only ever goes from one repository to the next, and so could be done already using the current github UI.

Much more fun are cross-subsystem changes, because then the pull request flow stops being an acyclic graph and morphs into a mesh. The first step is to get the changes reviewed and tested by all the involved subsystems and their maintainers. In the github flow this would be a pull request submitted to multiple repositories simultaneously, with the one single discussion stream shared among them all. Since this is the kernel, this step is done through patch submission with a pile of different mailing lists and maintainers as recipients.

The way it’s reviewed is usually not the way it’s merged, instead one of the subsystems is selected as the leading one and takes the pull requests, as long as all other maintainers agree to that merge path. Usually it’s the subsystem most affected by a set of changes, but sometimes also the one that already has some other work in-flight which conflicts with the pull request. Sometimes also an entirely new repository and maintainer crew is created, this often happens for functionality which spans the entire tree and isn’t neatly contained to a few files and directories in one place. A recent example is the DMA mapping tree, which tries to consolidate work that thus far has been spread across drivers, platform maintainers and architecture support groups.

But sometimes there’s multiple subsystems which would both conflict with a set of changes, and which would all need to resolve some non-trivial merge conflict. In that case the patches aren’t just directly applied (a rebasing pull request on github), but instead the pull request with just the necessary patches, based on a commit common to all subsystems, is merged into all subsystem trees. The common baseline is important to avoid polluting a subsystem tree with unrelated changes. Since the pull is for a specific topic only, these branches are commonly called topic branches.

One example I was involved with added code for audio-over-HDMI support, which spanned both the graphics and sound driver subsystems. The same commits from the same pull request where both merged into the Intel graphics driver and also merged into the sound subsystem.

An entirely different example that this isn’t insane is the only other relevant general purpose large scale OS project in the world also decided to have a monotree, with a commit flow modelled similar to what’s going on in linux. I’m talking about the folks with such a huge tree that they had to write an entire new GVFS virtual filesystem provider to support it …

Dear Github

Unfortunately github doesn’t support this workflow, at least not natively in the github UI. It can of course be done with just plain git tooling, but then you’re back to patches on mailing lists and pull requests over email, applied manually. In my opinion that’s the single one reason why the kernel community cannot benefit from moving to github. There’s also the minor issue of a few top maintainers being extremely outspoken against github in general, but that’s a not really a technical issue. And it’s not just the linux kernel, it’s all huge projects on github in general which struggle with scaling, because github doesn’t really give them the option to scale to multiple repositories, while sticking to with a monotree.

In short, I have one simple feature request to github:

Please support pull requests and issue tracking spanning different repos of a monotree.

Simple idea, huge implications.

Repositories and Organizations

First, it needs to be possible to have multiple forks of the same repo in one organization. Just look at git.kernel.org, most of these repositories are not personal. And even if you might have different organizations for e.g. different subsystems, requiring an organization for each repo is silly amounts of overkill and just makes access and user managed unnecessarily painful. In graphics for example we’d have 1 repo each for the userspace test suite, the shared userspace library, and a common set of tools and scripts used by maintainers and developers, which would work in github. But then we’d have the overall subsystem repo, plus a repository for core subsystem work and additional repositories for each big drivers. Those would all be forks, which github doesn’t do. And each of these repos has a bunch of branches, at least one for feature work, and another one for bugfixes for the current release cycle.

Combining all branches into one repository wouldn’t do, since the point of splitting repos is that pull requests and issues are separated, too.

Related, it needs to be possible to establish the fork relationship after the fact. For new projects who’ve always been on github this isn’t a big deal. But linux will be able to move at most a subsystem at a time, and there’s already tons of linux repositories on github which aren’t proper github forks of each another.

Pull Requests

Pull request need to be attached to multiple repos at the same time, while keeping one unified discussion stream. You can already reassign a pull request to a different branch of repo, but not at multiple repositories at the same time. Reassigning pull requests is really important, since new contributors will just create pull requests against what they think is the main repo. Bots can then shuffle those around to all the repos listed in e.g. a MAINTAINERS file for a given set of files and changes a pull request contains. When I chatted with githubbers I originally suggested they’d implement this directly. But I think as long as it’s all scriptable that’s better left to individual projects, since there’s no real standard.

There’s a pretty funky UI challenge here since the patch list might be different depending upon the branch the pull request is against. But that’s not always a user error, one repo might simple have merged a few patches already.

Also, the pull request status needs to be different for each repo. One maintainer might close it without merging, since they agreed that the other subsystem will pull it in, while the other maintainer will merge and close the pull. Another tree might even close the pull request as invalid, since it doesn’t apply to that older version or vendor fork. Even more fun, a pull request might get merged multiple times, in each subsystem with a different merge commit.

Issues

Like pull requests, issues can be relevant for multiple repos, and might need to be moved around. An example would be a bug that’s first reported against a distribution’s kernel repository. After triage it’s clear it’s a driver bug still present in the latest development branch and hence also relevant for that repo, plus the main upstream branch and maybe a few more.

Status should again be separate, since once push to one repo the bugfix isn’t instantly available in all of them. It might even need additional work to get backported to older kernels or distributions, and some might even decide that’s not worth it and close it as WONTFIX, even thought the it’s marked as successfully resolved in the relevant subsystem repository.

Summary: Monotree, not Monorepo

The Linux Kernel is not going to move to github. But moving the Linux way of scaling with a monotree, but mutliple repos, to github as a concept will be really beneficial for all the huge projects already there: It’ll give them a new, and in my opinion, more powerful way to handle their unique challenges.

August 08, 2017 12:00 AM

August 07, 2017

Paul E. Mc Kenney: Book review: "Antifragile: Things That Gain From Disorder"

This is the fourth and final book in Nassim Taleb's Incerto series, which makes a case for antifragility as a key component of design, taking the art of design one step beyond robustness. An antifragile system is one where variation, chaos, stress, and errors improve the results. For example, within limits, stressing muscles and bones makes them stronger. In contrast, stressing a device made of (say) aluminum will eventually cause it to fail. Taleb gives a lengthy list of examples in Table 1 starting on page 23, some of which seem more plausible than others. An example implausible entry lists rule-based systems as fragile, principles-based systems as robust, and virtue-based systems as antifragile. Although I can imagine a viewpoint where this makes sense, any expectation that a significantly large swath of present-day society will agree on a set of principles (never mind virtues!) seems insanely optimistic. The table nevertheless provides much good food for thought.

Taleb states that he has constructed antifragile financial strategies using insurance to control downside risks. But he also states on page 6 “Thou shalt not have antifragility at the expense of the fragility of others.” Perhaps Taleb figures that few will shed tears for any difficulties that insurance companies might get into, perhaps he is taking out policies that are too small to have material effect on the insurance company in question, or perhaps his policies are counter to the insurance company's main business, so that payouts to Taleb are anticorrelated with payouts to the company's other customers. One presumes that he has thought this through carefully, because a bankrupt insurance company might not be all that effective at controlling his downside risks.

Appendix I beginning on page 435 gives a graphical summary of the books main messages. Figure 28 on page 441 is good grist for the mills of those who would like humanity to become an intergalactic species: After all, confining the human race seems likely to limit its upside. (One counterargument would posit that a finite object might have unbounded value, but such counterarguments typically rely on there being a very large number of human beings interested in that finite object, which some would consider to counter this counterargument.)

The right-hand portion of Figure 30 on page 442 illustrates what the author calls local antifragility and global fragility. To see this, imagine that the x-axis represents variation from nominal conditions, and the y-axis represents payoff, with large positive payoffs being highly desired. The right-hand portion shows something not unrelated to the function x^2-x^4, which gives higher payoffs as you move in either direction from x=0, peaking when x reaches one divided by the square root of two (either positive or negative), dropping back to zero when x reaches +1 or -1, and dropping like a rock as one ventures further away from x=0. The author states that this local antifragility and global fragility is the most dangerous of all, but given that he repeatedly stresses that antifragile systems are antifragile only up to a point, this dangerous situation would seem to be the common case. Those of us who believe that life is inherently dangerous should have no problem with this apparent contradiction.

But what does all of this have to do with parallel programming???

Well, how about “Is RCU antifragile?”

One case for RCU antifragility is the batching optimizations that allow many (as in thousands) concurrent requests to share the same grace-period computation. Therefore, the heavier the update-side load on RCU, the more efficiently RCU operates.

However, load is but one of many aspects of RCU's environment that might be varied. For an extreme example, RCU is exceedingly fragile with respect to small perturbations of the program counter, as Peter Sewell so ably demonstrated, by running emacs, no less. RCU is also fragile with respect to timekeeping anomalies, for example, it can emit false-positive RCU CPU stall warnings if different CPUs have tens-of-seconds disagreements as to the current time. However, the aforementioned bones and muscles are similarly fragile with respect to any number of chemical substances (AKA “poisons”), to say nothing of well-known natural phenomena such as lightning bolts and landslides.

Even when excluding hardware misbehavior such as auto-perturbing program counters and unsynchronized clocks, RCU would still be subject to software aging, and RCU has in fact require multiple interventions from its developers and maintainer in order to keep up with changing hardware, workload, and usage. One could therefore argue that RCU is fragile with respect to perturbations of time, although the combination of RCU and its developers, reviewers, and maintainer seem to have kept up reasonably well thus far.

On the other hand, perhaps it is unrealistic to evaluate the antifragility of software without including black-hat hackers. Achieving antifragility in that sort of environment is still very much a grand challenge problem, but a challenge that must be faced. Oh, you think RCU is to low-level for this sort of attack? There was a time when I thought so. And then came rowhammer.

So please be careful, and, where possible, antifragile! It is after all a real world out there!!!

August 07, 2017 04:36 AM

August 03, 2017

Linux Plumbers Conference: Book Your Hotel for Plumbers by 18 August

As a reminder, we have a block of rooms at the JW Marriott LA Live
available to attendees at the discounted conference rate of $259/night
(plus applicable taxes). High speed internet is included in the room rate.

Our discounted room rate expires on 5:00 pm PST on August 18. We encourage
you to book today!

Visit our Attend page for additional details.

August 03, 2017 10:15 PM

July 29, 2017

Linux Plumbers Conference: Late Registration Begins Soon

The late registration for Linux Plumbers conference begins on 31 July. If you want to take advantage of the standard registration fees, register now on this link.

Standard registration is $550, late registration will be $650.

July 29, 2017 07:21 PM

Linux Plumbers Conference: Checkpoint-Restart Microconference Accepted into the Linux Plumbers Conference

Following on from the successful Checkpoint-Restart Microconference
last year, we’re pleased to announce that there will be another at
Plumbers in Los Angeles this year.

The agenda this year will focus on specific use cases of Checkpoint-
Restart, such as High Performance Computing, state saving uses such as
job scheduling and hot standby.  In addition we’ll be looking at
enhancements such as performance and using userfaultfd for dirty memory
tracking in iterative migration and what it would take to have
unprivileged checkpoint-restart.  Finally, we’ll have discussions on
checkpoint-restart aware applications and what sort of testing needs to
be applied to the upstream kernel to prevent any checkpoint-restore API
breakage as it evolves.

For more details on this, please see this microconference’s wiki page.

We hope to see you there!

July 29, 2017 04:42 PM

July 21, 2017

Michael Kerrisk (manpages): man-pages-4.12 is released

I've released man-pages-4.12. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release resulted from patches, bug reports, reviews, and comments from around 30 contributors. It includes just under 200 commits changing around 90 pages. This is a relatively small release, with one new manual page, ioctl_getfsmap(2). The most significant change in the release consists of a number of additions and improvements in the ld.so(8) page.

July 21, 2017 06:53 PM

July 20, 2017

Paul E. Mc Kenney: Parallel Programming: Getting the English text out of the way

We have been making good progress on the next release of Is Parallel Programming Hard, And, If So, What Can You Do About It?, and hope to have a new release out soonish.

In the meantime, for those of you for whom the English text in this book has simply gotten in the way, there is now an alternative:

perfbook_cn_cover

On the off-chance that any of you are seriously interested, this is available from
Amazon China, JD.com, Taobao.com, and Dangdang.com. For the rest of you, you have at least seen the picture.  ;–)

July 20, 2017 02:37 AM

July 18, 2017

Matthew Garrett: Avoiding TPM PCR fragility using Secure Boot

In measured boot, each component of the boot process is "measured" (ie, hashed and that hash recorded) in a register in the Trusted Platform Module (TPM) build into the system. The TPM has several different registers (Platform Configuration Registers, or PCRs) which are typically used for different purposes - for instance, PCR0 contains measurements of various system firmware components, PCR2 contains any option ROMs, PCR4 contains information about the partition table and the bootloader. The allocation of these is defined by the PC Client working group of the Trusted Computing Group. However, once the boot loader takes over, we're outside the spec[1].

One important thing to note here is that the TPM doesn't actually have any ability to directly interfere with the boot process. If you try to boot modified code on a system, the TPM will contain different measurements but boot will still succeed. What the TPM can do is refuse to hand over secrets unless the measurements are correct. This allows for configurations where your disk encryption key can be stored in the TPM and then handed over automatically if the measurements are unaltered. If anybody interferes with your boot process then the measurements will be different, the TPM will refuse to hand over the key, your disk will remain encrypted and whoever's trying to compromise your machine will be sad.

The problem here is that a lot of things can affect the measurements. Upgrading your bootloader or kernel will do so. At that point if you reboot your disk fails to unlock and you become unhappy. To get around this your update system needs to notice that a new component is about to be installed, generate the new expected hashes and re-seal the secret to the TPM using the new hashes. If there are several different points in the update where this can happen, this can quite easily go wrong. And if it goes wrong, you're back to being unhappy.

Is there a way to improve this? Surprisingly, the answer is "yes" and the people to thank are Microsoft. Appendix A of a basically entirely unrelated spec defines a mechanism for storing the UEFI Secure Boot policy and used keys in PCR 7 of the TPM. The idea here is that you trust your OS vendor (since otherwise they could just backdoor your system anyway), so anything signed by your OS vendor is acceptable. If someone tries to boot something signed by a different vendor then PCR 7 will be different. If someone disables secure boot, PCR 7 will be different. If you upgrade your bootloader or kernel, PCR 7 will be the same. This simplifies things significantly.

I've put together a (not well-tested) patchset for Shim that adds support for including Shim's measurements in PCR 7. In conjunction with appropriate firmware, it should then be straightforward to seal secrets to PCR 7 and not worry about things breaking over system updates. This makes tying things like disk encryption keys to the TPM much more reasonable.

However, there's still one pretty major problem, which is that the initramfs (ie, the component responsible for setting up the disk encryption in the first place) isn't signed and isn't included in PCR 7[2]. An attacker can simply modify it to stash any TPM-backed secrets or mount the encrypted filesystem and then drop to a root prompt. This, uh, reduces the utility of the entire exercise.

The simplest solution to this that I've come up with depends on how Linux implements initramfs files. In its simplest form, an initramfs is just a cpio archive. In its slightly more complicated form, it's a compressed cpio archive. And in its peak form of evolution, it's a series of compressed cpio archives concatenated together. As the kernel reads each one in turn, it extracts it over the previous ones. That means that any files in the final archive will overwrite files of the same name in previous archives.

My proposal is to generate a small initramfs whose sole job is to get secrets from the TPM and stash them in the kernel keyring, and then measure an additional value into PCR 7 in order to ensure that the secrets can't be obtained again. Later disk encryption setup will then be able to set up dm-crypt using the secret already stored within the kernel. This small initramfs will be built into the signed kernel image, and the bootloader will be responsible for appending it to the end of any user-provided initramfs. This means that the TPM will only grant access to the secrets while trustworthy code is running - once the secret is in the kernel it will only be available for in-kernel use, and once PCR 7 has been modified the TPM won't give it to anyone else. A similar approach for some kernel command-line arguments (the kernel, module-init-tools and systemd all interpret the kernel command line left-to-right, with later arguments overriding earlier ones) would make it possible to ensure that certain kernel configuration options (such as the iommu) weren't overridable by an attacker.

There's obviously a few things that have to be done here (standardise how to embed such an initramfs in the kernel image, ensure that luks knows how to use the kernel keyring, teach all relevant bootloaders how to handle these images), but overall this should make it practical to use PCR 7 as a mechanism for supporting TPM-backed disk encryption secrets on Linux without introducing a hug support burden in the process.

[1] The patchset I've posted to add measured boot support to Grub use PCRs 8 and 9 to measure various components during the boot process, but other bootloaders may have different policies.

[2] This is because most Linux systems generate the initramfs locally rather than shipping it pre-built. It may also get rebuilt on various userspace updates, even if the kernel hasn't changed. Including it in PCR 7 would entirely break the fragility guarantees and defeat the point of all of this.

comment count unavailable comments

July 18, 2017 06:48 AM

July 13, 2017

Linux Plumbers Conference: VFIO/IOMMU/PCI Microconference Accepted into Linux Plumbers Conference

Following on from the successful PCI Microconference at Plumbers last year we’re pleased to announce a follow on this year with an expanded scope.

The agenda this year will focus on overlap and common development between VFIO/IOMMU/PCI subsystems, and in particular how consolidation of the shared virtual memory(SVM) API can drive an even tighter coupling between them.

This year we will also focus on user visible aspects such as using SVM to share page tables with devices and reporting I/O page faults to userspace in addition to discussing PCI and IOMMU interfaces and potential improvements.

For more details on this, please see this microconference’s wiki page.

We hope to see you there!

July 13, 2017 05:20 PM

July 11, 2017

Linux Plumbers Conference: Power Management and Energy-awareness Microconference Accepted into Linux Plumbers Conference

Following on from the successful Power Management and Energy-awareness at Plumbers last year we’re pleased to announce a follow on this year.

The agenda this year will focus on a range of topics including CPUfreq core improvements and schedutil governor extensions, how to best use scheduler signals to balance energy consumption and performance and user space interfaces to control capacity and utilization estimates.  We’ll also discuss selective throttling in thermally constrained systems, runtime PM for ACPI, CPU cluster idling and the possibility to implement resume from hibernation in a bootloader.

For more details on this, please see this microconference’s wiki page.

We hope to see you there!

July 11, 2017 04:15 PM

James Morris: Linux Security Summit 2017 Schedule Published

The schedule for the 2017 Linux Security Summit (LSS) is now published.

LSS will be held on September 14th and 15th in Los Angeles, CA, co-located with the new Open Source Summit (which includes LinuxCon, ContainerCon, and CloudCon).

The cost of LSS for attendees is $100 USD. Register here.

Highlights from the schedule include the following refereed presentations:

There’s also be the usual Linux kernel security subsystem updates, and BoF sessions (with LSM namespacing and LSM stacking sessions already planned).

See the schedule for full details of the program, and follow the twitter feed for the event.

This year, we’ll also be co-located with the Linux Plumbers Conference, which will include a containers microconference with several security development topics, and likely also a TPMs microconference.

A good critical mass of Linux security folk should be present across all of these events!

Thanks to the LSS program committee for carefully reviewing all of the submissions, and to the event staff at Linux Foundation for expertly planning the logistics of the event.

See you in Los Angeles!

July 11, 2017 11:30 AM

July 10, 2017

Kees Cook: security things in Linux v4.12

Previously: v4.11.

Here’s a quick summary of some of the interesting security things in last week’s v4.12 release of the Linux kernel:

x86 read-only and fixed-location GDT
With kernel memory base randomization, it was stil possible to figure out the per-cpu base address via the “sgdt” instruction, since it would reveal the per-cpu GDT location. To solve this, Thomas Garnier moved the GDT to a fixed location. And to solve the risk of an attacker targeting the GDT directly with a kernel bug, he also made it read-only.

usercopy consolidation
After hardened usercopy landed, Al Viro decided to take a closer look at all the usercopy routines and then consolidated the per-architecture uaccess code into a single implementation. The per-architecture code was functionally very similar to each other, so it made sense to remove the redundancy. In the process, he uncovered a number of unhandled corner cases in various architectures (that got fixed by the consolidation), and made hardened usercopy available on all remaining architectures.

ASLR entropy sysctl on PowerPC
Continuing to expand architecture support for the ASLR entropy sysctl, Michael Ellerman implemented the calculations needed for PowerPC. This lets userspace choose to crank up the entropy used for memory layouts.

LSM structures read-only
James Morris used __ro_after_init to make the LSM structures read-only after boot. This removes them as a desirable target for attackers. Since the hooks are called from all kinds of places in the kernel this was a favorite method for attackers to use to hijack execution of the kernel. (A similar target used to be the system call table, but that has long since been made read-only.) Be wary that CONFIG_SECURITY_SELINUX_DISABLE removes this protection, so make sure that config stays disabled.

KASLR enabled by default on x86
With many distros already enabling KASLR on x86 with CONFIG_RANDOMIZE_BASE and CONFIG_RANDOMIZE_MEMORY, Ingo Molnar felt the feature was mature enough to be enabled by default.

Expand stack canary to 64 bits on 64-bit systems
The stack canary values used by CONFIG_CC_STACKPROTECTOR is most powerful on x86 since it is different per task. (Other architectures run with a single canary for all tasks.) While the first canary chosen on x86 (and other architectures) was a full unsigned long, the subsequent canaries chosen per-task for x86 were being truncated to 32-bits. Daniel Micay fixed this so now x86 (and future architectures that gain per-task canary support) have significantly increased entropy for stack-protector.

Expanded stack/heap gap
Hugh Dickens, with input from many other folks, improved the kernel’s mitigation against having the stack and heap crash into each other. This is a stop-gap measure to help defend against the Stack Clash attacks. Additional hardening needs to come from the compiler to produce “stack probes” when doing large stack expansions. Any Variable Length Arrays on the stack or alloca() usage needs to have machine code generated to touch each page of memory within those areas to let the kernel know that the stack is expanding, but with single-page granularity.

That’s it for now; please let me know if I missed anything. The v4.13 merge window is open!

Edit: Brad Spengler pointed out that I failed to mention the CONFIG_SECURITY_SELINUX_DISABLE issue with read-only LSM structures. This has been added now.

© 2017, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

July 10, 2017 08:24 AM

Dave Airlie (blogspot): radv and the vulkan deferred demo - no fps left behind!

A little while back I took to wondering why one particular demo from the Sascha Willems vulkan demos was a lot slower on radv compared to amdgpu-pro. Like half the speed slow.

I internally titled this my "no fps left behind" project.

The deferred demo, does an offscreen rendering to 3 2048x2048 color attachments and one 2048x2048 D32S8 depth attachment. It then does a rendering using those down to as 1280x720 screen image.

Bas identifed the first cause was probably the fact we were doing clear color eliminations on the offscreen surfaces when we didn't need to. AMD GPU have a delta-color compression feature, and with certain clear values you don't need to do the clear color eliminations step. This brought me back from about 1/2 the FPS to about 3/4, however it took me quite a while to figure out where the rest of the FPS were hiding.

I took a few diversions in my testing, I pulled in some experimental patches to allow the depth buffer to be texture cache compatible, so could bypass the depth decompression pass, however this didn't seem to budge the number too much.

I found a bunch of registers we were setting different values from -pro, nothing too much came of these.

I found some places we were using a compute shader to fill some DCC or htile surfaces to a value, then doing a clear and overwriting the values, not much help.

I noticed the vertex descriptions and buffer attachments on amdgpu-pro were done quite different to how radv does it. With vulkan you have vertex descriptors and bindings, with radv we generate a set of hw descriptors from the combination of both descriptors and bindings. The pro driver uses typed buffer loads in the shader to embed the descriptor contents in the shader, then it only updates the hw descriptors for the buffer bindings. This seems like it might be more efficient, guess what, no help. (LLVM just grew support for typed buffer loads, so we could probably move to this scheme if we wished now).

I dug out some patches that inline all the push constants and some descriptors so our shaders had less overhead, (really helps our meta shaders have less impact), no helps.

I noticed they export the shader results in a different order from the fragment shader, and always at the end. (no help). The vertex shader emits pos first, (no help). The vertex shader uses off exports for unused channels, (no help).

I went on holidays for a week and came back to stare at the traces again, when I my brain finally noticed something I'd missed. When binding the 3 color buffers, the addresses given as the base address were unusual. A surface has a 40-bit address, normally for alignment and tiling the bottom 16-bits are 0, and we shift 8 of those off completely before writing them. This leaves the bottom 8 bits of the base address has should be 0, and the CIK docs from AMD say that. However the pro traces didn't have these at 0. It appears from earlier evergreen/cayman documents these register control some tiling offset bits. After writing a hacky patch to set the values, I managed to get back the rest of the FPS I was missing in the deferred demo. I discussed with AMD developers, and we worked out the addrlib library has an API for working out these values, and it seems that it allows better memory bandwidth utilisation. I've written a patch to try and use these values correctly and sent it out along with the DCC avoidance patch.

Now I'm not sure this will help any real apps, we may not be hitting limitations in that area, and I'm never happy with the benchmarks I run myself. I thought I saw some FPS difference with some madmax scenes, but I might be lying to myself. Once the patches land in mesa I'm sure others will run benchmarks and we can see if there is any use case where they have an effect. The AMD radeonsi OpenGL driver can also do the same tweaks so hopefully there as well there will be some benefit.

Otherwise I can just write this off as making deferred run at equality and removing at least one of the deltas that radv has compared to the pro driver. Some of the other differences I discovered along the way might also have some promise in other scenarios, so I'll keep an eye on them.

Thanks to Bas, Marek and Christian for looking into what the magic meant!

July 10, 2017 08:08 AM

Dave Airlie: Migrating to blogsport

Due to lots of people telling me LJ is bad, mm'kay, I've migrated to blogspot.

New blog is/will be here: https://airlied.blogspot.com

July 10, 2017 06:36 AM

Dave Airlie (blogspot): Migrating by blog here

I'm moving my blog from LJ to blogspot, because people keep telling me LJ is up to no go, like hacking DNC servers and interfering in elections.

July 10, 2017 06:29 AM