Kernel Planet

April 23, 2015

Paul E. Mc Kenney: Verification Challenge 5: Uses of RCU

This is another self-directed verification challenge, this time to validate uses of RCU instead of validating the RCU implementations as in earlier posts. As you can see from Verification Challenge 4, the logic expression corresponding even to the simplest Linux-kernel RCU implementation is quite large, weighing in at tens of thousands of variables and hundreds of thousands of clauses. It is therefore worthwhile to look into the possibility of a trivial model of RCU that could be used for verification.

Because logic expressions do not care about cache locality, memory contention, energy efficiency, CPU hotplug, and a host of other complications that a Linux-kernel implementation must deal with, we can start with extreme simplicity. For example:

 1 static int rcu_read_nesting_global;
 2 
 3 static void rcu_read_lock(void)
 4 {
 5   (void)__sync_fetch_and_add(&rcu_read_nesting_global, 2);
 6 }
 7 
 8 static void rcu_read_unlock(void)
 9 {
10   (void)__sync_fetch_and_add(&rcu_read_nesting_global, -2);
11 }
12 
13 static inline void assert_no_rcu_read_lock(void)
14 {
15   BUG_ON(rcu_read_nesting_global >= 2);
16 }
17 
18 static void synchronize_rcu(void)
19 {
20   if (__sync_fetch_and_xor(&rcu_read_nesting_global, 1) < 2)
21     return;
22   SET_NOASSERT();
23   return;
24 }


The idea is to reject any execution in which synchronize_rcu() does not wait for all readers to be done. As before, SET_ASSERT() sets a variable that suppresses all future assertions.

Please note that this model of RCU has some shortcomings:


  1. There is no diagnosis of rcu_read_lock()/rcu_read_unlock() misnesting. (A later version of the model provides limited diagnosis, but under #ifdef CBMC_PROVE_RCU.)
  2. The heavyweight operations in rcu_read_lock() and rcu_read_unlock() result in artificial ordering constraints. Even in TSO systems such as x86 or s390, a store in a prior RCU read-side critical section might be reordered with loads in later critical sections, but this model will act as if such reordering was prohibited.
  3. Although synchronize_rcu() is permitted to complete once all pre-existing readers are done, in this model it will instead wait until a point in time at which there are absolutely no readers, whether pre-existing or new. Therefore, this model's idea of an RCU grace period is even heavier weight than in real life.


Nevertheless, this approach will allow us to find at least some RCU-usage bugs, and it fits in well with cbmc's default fully-ordered settings. For example, we can use it to verify a variant of the simple litmus test used previously:

 1 int r_x;
 2 int r_y;
 3 
 4 int x;
 5 int y;
 6 
 7 void *thread_reader(void *arg)
 8 {
 9   rcu_read_lock();
10   r_x = x;
11 #ifdef FORCE_FAILURE_READER
12   rcu_read_unlock();
13   rcu_read_lock();
14 #endif
15   r_y = y;
16   rcu_read_unlock();
17   return NULL;
18 }
19 
20 void *thread_update(void *arg)
21 {
22   x = 1;
23 #ifndef FORCE_FAILURE_GP
24   synchronize_rcu();
25 #endif
26   y = 1;
27   return NULL;
28 }
29 
30 int main(int argc, char *argv[])
31 {
32   pthread_t tr;
33 
34   if (pthread_create(&tr, NULL, thread_reader, NULL))
35     abort();
36   (void)thread_update(NULL);
37   if (pthread_join(tr, NULL))
38     abort();
39 
40   BUG_ON(r_y != 0 && r_x != 1);
41   return 0;
42 }


This model has only 3,032 variables and 8,844 clauses, more than an order of magnitude smaller than for the Tiny RCU verification. Verification takes about half a second, which is almost two orders of magnitude faster than the 30-second verification time for Tiny RCU. In addition, the model successfully flags several injected errors. We have therefore succeeded in producing a simpler and faster model approximating RCU, and that can handle multi-threaded litmus tests.

A natural next step would be to move to litmus tests involving linked lists. Unfortunately, there appear to be problems with cbmc's handling of pointers in multithreaded situations. On the other hand, cbmc's multithreaded support is quite new, so hopefully there will be fixes for these problems in the near future. After fixes appear, I will give the linked-list litmus tests another try.

In the meantime, the full source code for these models may be found here.

April 23, 2015 05:49 PM

James Bottomley: Getting your old Sync Server to work with New Firefox

Much has been written about Mozilla trying to force people to use their new sync service.  If, like me, you run your own sync server for Firefox, you’ve mostly been ignoring this because there’s still no real way of running your own sync server for the new service (and if you simply keep upgrading, Firefox keeps working with your old server).

However, recently I had cause to want to connect my old sync server to a new installation of firefox without just copying over all the config files (one of the config settings broke google docs and I couldn’t figure out which one it was, so I figured I’d just blow the entire config away and restore from sync).  Long ago Mozilla disabled the ability to connect newer Firefoxes to an old sync server, so this is an exposé of how to do it.  I did actually search the internet for this one, but no-one else seems to have figured it out (or if they have, they’re not known to the search engines).

There are two config files you need to update get new Firefox to connect to sync (note, I did this with Firefox 37; I’ve not tested it with a different version, but I’m pretty sure it will work).  The first is that you need to put your sync key and weave user login into logins.json.  Since the password and user are encypted in this file, the easiest way is to use a password manager extension, like Saved Password Editor add on.  Then you need two new password entries of type “Annotated” under the host chrome://weave.  For each, your username is your weave username.  For the first, you’re going to add your weave password under the annotation “Mozilla Services Password”.  For the second, add the Firefox  key with all the dashes removed as the password under the annotation “Mozilla Services Encryption Passphrase”.  If you’ve got all this right, password manager will show this (my username is jejb):

tmpNext you’re going to close firefox and manually edit the prefs.js file.  To sync completely from scratch, this just needs three entries, so firstly strip out every preference that begins ‘services.sync.’ and then add three new lines

user_pref("services.sync.account", "<my account>");
user_pref("services.sync.serverURL", "<my weave URL>");
user_pref("services.sync.username", "<my weave user name>");

For most people, the account and weave user name are the same.  Now start Firefox and it should just sync on its own.  To check that you got this right, go to the Sync tab of preferences and you should see something like this

tmp

And that’s it.  You’re all done.

April 23, 2015 02:06 AM

April 21, 2015

Eric Sandeen: Buy Green Power this Earth Day

Clicking map goes to a DOE webpage, then please come back!

Happy Earth Day!  If you’ve had enough of the same tired suggestions to recycle more or turn off your lights when you leave the room, and feel like maybe that’s just not cutting it these days, I humbly propose that you can do more.  Right now.  Easy, and cheap.

You can run your home on renewable energy right now.

While it’s super fun to have solar on your roof, that’s not the only way to put more clean energy on the grid.  In most parts of the US, you can choose to buy enough renewable energy to cover part, or all, of your electricity bill.  In some utility markets, the program is through your utility; in others with deregulated utility markets, you can choose a new provider which produces clean energy.  The DOE has more information about these programs here.  The map above is shaded based on the number of options available in each state – click your state to see the DOE page which lists your local options, and choose one today!  Really.  I worked hard on that map, make me proud!

How it works in Minnesota…

Here in Minnesota (and other states as well), Xcel offers Windsource, a program which funds contracts for additional wind energy on the grid.  In 2012, Xcel announced that it had sold its billionth kWh of Windsource energy – that’s one billion kWh generated from wind which would not have been generated without subscriber participation.  In 2013, 33,000 Minnesota homes and 264 businesses participated in Windsource.  It’s surprisingly cheap; while there is an extra cost for the program, all fuel cost charges are removed because, after all, the actual wind is free:

windsource_billAs you can see from my January bill above, my net cost was only half a cent per kWh after the fuel charge was removed.

If you’re an Xcel customer and want to sign up for Windsource, you can do it right here, right now.  You can choose all or part of your bill, but I’d go big, and do your whole bill.  It’s cheap, and it feels good.

… and everywhere else

Other utilities and other states have similar programs; click your state in the map above to get details, links, and pricing information for your local options.

Let me know!

Did this post motivate you to sign up for clean power?  Have you already signed up for renewable energy?  What has been your experience with these utility programs?  Let me know in the comments below, and hey – good job doing something more relevant on Earth Day!

April 21, 2015 06:01 AM

April 20, 2015

Daniel Vetter: Neat drm/i915 stuff for 4.1

With Linux kernel v3.20^W v4.0 already out there door my overview of what's in 4.1 for drm/i915 is way overdue.

First looking at the modeset side of the driver the big overall thing is all the work to convert i915 to atomic. In this release there's code from Ander Conselvan de Oliveira to have a struct drm_atomic_state allocated for all the legacy modeset code paths in the driver. With that we can switch the internals to start using atomic state objects and gradually convert over everything to atomic on the modeset side. Matt Roper on the other hand was busy to prepare the plane code to land the atomic watermark update code. Damien has reworked the initial plane configuration code used for fastboot, which also needs to be adapted to the atomic world.


For more specific feature work there's the DRRS (dynamic refresh rate switching) from Sonika, Vandana and more people, which is now enabled where supported. The idea is to reduce the refresh rate of the panel to save power when nothing changes on the screen. And Paulo Zanoni has provided patches to improve the FBC code, hopefully we can enable that by default soon too. Under the hood Ville has refactored the DP link rate computation and the sprite color key handling, both to prepare for future work and platform enabling. Intermediate link rate support for eDP 1.4 from Sonika built on top of this. Imre Deak has also reworked the Baytrail/Braswell DPLL code to prepare for Broxton.

Speaking of platforms, Skyleigh has gained runtime PM support from Damien, and RPS (render turbo and sleep states) from Akash. Another SKL exclusive is support for scanout of Y-tiled buffers and scanning out buffers rotated by 90°/270° (instead of just normal and rotated by 180°) from Tvrtko and Damien. Well the rotation support didn't quite land yet, but Tvrtko's support for the special pagetable binding needed for that feature in the form of rotated GGTT views. Finally Nick Hoath and Damien also submitted a lot of workaround patches for SKL.

Moving on to Braswell/Cherryview there have been tons of fixes to the DPLL and watermark code from Vijay and Ville, and BSW left the preliminary hw support. And also for the SoC platforms Chris Wilson has supplied a pile of patches to tune the rps code and bring it more in-line with the big core platforms.

On the GT side the big ongoing work is dyanmic pagetable allocations Michel Thierry based upon patches from Ben Widawsky. With per-process address spaces and even more so with the big address spaces gen8+ supports it would be wasteful if not impossible to allocate pagetables for the entire address space upfront. But changing the code to handle another possible memory allocation failure point needed a lot of work. Most of that has landed now, but the benefits of enabling bigger address spaces haven't made it into 4.1.

Another big work is XenGT client-side support fromYu Zhang and team. This is paravirtualization to allow virtual machines to tap into the render engines without requiring exclusive access, but also with a lot less overhead than full virtual hardware like vmware or virgil would provide. The host-side code is also submitted already, but needs a bit more work still to integrate cleanly into the driver.

And of course there's been lots of other smaller work all over, as usual. Internal documentation for the shrinker, more dead UMS code removed, the vblank interrupt code cleaned up and more.

April 20, 2015 03:57 AM

April 19, 2015

Michael Kerrisk (manpages): man-pages-3.83 is released

I've released man-pages-3.83. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

This release resulted from patches, bug reports,and  comments from 30 contributors. As well as a large number of minor fixes to more than 70 man pages, the more significant changes in man-pages-3.83 include the following:




April 19, 2015 02:55 PM

April 18, 2015

James Bottomley: Squirrelmail and imaps

Somewhere along the way squirrelmail stopped working with my dovecot imap server, which runs only on the secure port (imaps).  I only ever use webmail as a last resort, so the problem may be left over from years ago.  The problem is that I’m getting a connect failure but an error code of zero and no error message.  This is what it actually shows

Error connecting to IMAP server "localhost:993".Server error: (0)

Which is very helpful.  Everything else works with imaps on this system, so why not squirrelmail?

The answer, it seems, is buried deep inside php.  Long ago, when php first started using openssl, it pretty much did no peer verification.  Nowadays it does.  I know I ran into this a long time ago, so the self signed certificate my version of dovecot is using is present in the /etc/ssl/certs directory where php looks for authoritative certificates.  Digging into the sources of squirrelmail, it turns out this php statement (with the variables substituted) is the failing one

$imap_stream = @fsockopen('tls://localhost', 993, $errno, $errstr, 15);

It’s failing because $imap_stream is empty, but, as squirrelmail claims, it’s actually failing with a zero error code.  After several hours of casting about with the fairly useless php documentation, it turns out that php has an interactive mode where it will actually give you all the errors.  executing this

echo 'fsockopen("tls://localhost",993,$errno,$errmsg,15);'|php -a

Finally tells me what’s wrong

Interactive mode enabled

PHP Warning: fsockopen(): Peer certificate CN=`bedivere.hansenpartnership.com' did not match expected CN=`localhost' in php shell code on line 1
PHP Warning: fsockopen(): Failed to enable crypto in php shell code on line 1
PHP Warning: fsockopen(): unable to connect to tls://localhost:993 (Unknown error) in php shell code on line 1

So that’s it: php has tightened up the certificate verification not only to validate the certificate itself, but also to check that the CN matches the requested service.  In this case, because I’m connecting over the loopback device (localhost) instead of the internet to the DNS name, that CN check has failed and lead to the results I’m seeing.  Simply fixing squirrelmail to connect to imaps over the fully qualified hostname instead of localhost gets everything working again.

April 18, 2015 12:06 AM

April 14, 2015

Dave Jones: the more things change.. 4.0


$ ping gelk
PING gelk.kernelslacker.org (192.168.42.30) 56(84) bytes of data.
WARNING: kernel is not very fresh, upgrade is recommended.
...
$ uname -r
4.0.0

Remember that one time the kernel versioning changed and nothing in userspace broke ? Me either.

Why people insist on trying to think they can get this stuff right is beyond me.

YOU’RE PING. WHY DO YOU EVEN CARE WHAT KERNEL VERSION IS RUNNING.

update: this was already fixed, almost exactly a year ago in the ping git tree. The (now removed) commentary kind of explains why they cared. Sigh.

the more things change.. 4.0 is a post from: codemonkey.org.uk

April 14, 2015 03:01 PM

April 08, 2015

Rusty Russell: Lightning Networks Part IV: Summary

This is the fourth part of my series of posts explaining the bitcoin Lightning Networks 0.5 draft paper.  See Part I, Part II and Part III.

The key revelation of the paper is that we can have a network of arbitrarily complicated transactions, such that they aren’t on the blockchain (and thus are fast, cheap and extremely scalable), but at every point are ready to be dropped onto the blockchain for resolution if there’s a problem.  This is genuinely revolutionary.

It also vindicates Satoshi’s insistence on the generality of the Bitcoin scripting system.  And though it’s long been suggested that bitcoin would become a clearing system on which genuine microtransactions would be layered, it was unclear that we were so close to having such a system in bitcoin already.

Note that the scheme requires some solution to malleability to allow chains of transactions to be built (this is a common theme, so likely to be mitigated in a future soft fork), but Gregory Maxwell points out that it also wants selective malleability, so transactions can be replaced without invalidating the HTLCs which are spending their outputs.  Thus it proposes new signature flags, which will require active debate, analysis and another soft fork.

There is much more to discover in the paper itself: recommendations for lightning network routing, the node charging model, a risk summary, the specifics of the softfork changes, and more.

I’ll leave you with a brief list of requirements to make Lightning Networks a reality:

  1. A soft-fork is required, to protect against malleability and to allow new signature modes.
  2. A new peer-to-peer protocol needs to be designed for the lightning network, including routing.
  3. Blame and rating systems are needed for lightning network nodes.  You don’t have to trust them, but it sucks if they go down as your money is probably stuck until the timeout.
  4. More refinements (eg. relative OP_CHECKLOCKTIMEVERIFY) to simplify and tighten timeout times.
  5. Wallets need to learn to use this, with UI handling of things like timeouts and fallbacks to the bitcoin network (sorry, your transaction failed, you’ll get your money back in N days).
  6. You need to be online every 40 days to check that an old HTLC hasn’t leaked, which will require some alternate solution for occasional users (shut down channel, have some third party, etc).
  7. A server implementation needs to be written.

That’s a lot of work!  But it’s all simply engineering from here, just as bitcoin was once the paper was released.  I look forward to seeing it happen (and I’m confident it will).

April 08, 2015 03:59 AM

April 06, 2015

Rusty Russell: Lightning Networks Part III: Channeling Contracts

This is the third part of my series of posts explaining the bitcoin Lightning Networks 0.5 draft paper.

In Part I I described how a Poon-Dryja channel uses a single in-blockchain transaction to create off-blockchain transactions which can be safely updated by either party (as long as both agree), with fallback to publishing the latest versions to the blockchain if something goes wrong.

In Part II I described how Hashed Timelocked Contracts allow you to safely make one payment conditional upon another, so payments can be routed across untrusted parties using a series of transactions with decrementing timeout values.

Now we’ll join the two together: encapsulate Hashed Timelocked Contracts inside a channel, so they don’t have to be placed in the blockchain (unless something goes wrong).

Revision: Why Poon-Dryja Channels Work

Here’s half of a channel setup between me and you where I’m paying you 1c: (there’s always a mirror setup between you and me, so it’s symmetrical)

Half a channel: we will invalidate transaction 1 (in favour of a new transaction 2) to send funds.

The system works because after we agree on a new transaction (eg. to pay you another 1c), you revoke this by handing me your private keys to unlock that 1c output.  Now if you ever released Transaction 1, I can spend both the outputs.  If we want to add a new output to Transaction 1, we need to be able to make it similarly stealable.

Adding a 1c HTLC Output To Transaction 1 In The Channel

I’m going to send you 1c now via a HTLC (which means you’ll only get it if the riddle is answered; if it times out, I get the 1c back).  So we replace transaction 1 with transaction 2, which has three outputs: $9.98 to me, 1c to you, and 1c to the HTLC: (once we agree on the new transactions, we invalidate transaction 1 as detailed in Part I)

Our Channel With an Output for an HTLC

Note that you supply another separate signature (sig3) for this output, so you can reveal that private key later without giving away any other output.

We modify our previous HTLC design so you revealing the sig3 would allow me to steal this output. We do this the same way we did for that 1c going to you: send the output via a timelocked mutually signed transaction.  But there are two transaction paths in an HTLC: the got-the-riddle path and the timeout path, so we need to insert those timelocked mutually signed transactions in both of them.  First let’s append a 1 day delay to the timeout path:

Timeout path of HTLC, with locktime so it can be stolen once you give me your sig3.

Similarly, we need to append a timelocked transaction on the “got the riddle solution” path, which now needs my signature as well (otherwise you could create a replacement transaction and bypass the timelocked transaction):

Full HTLC: If you reveal Transaction 2 after we agree it’s been revoked, and I have your sig3 private key, I can spend that output before you can, down either the settlement or timeout paths.

Remember The Other Side?

Poon-Dryja channels are symmetrical, so the full version has a matching HTLC on the other side (except with my temporary keys, so you can catch me out if I use a revoked transaction).  Here’s the full diagram, just to be complete:

A complete lightning network channel with an HTLC, containing a glorious 13 transactions.

Closing The HTLC

When an HTLC is completed, we just update transaction 2, and don’t include the HTLC output.  The funds either get added to your output (R value revealed before timeout) or my output (timeout).

Note that we can have an arbitrary number of independent HTLCs in progress at once, and open and/or close as many in each transaction update as both parties agree to.

Keys, Keys Everywhere!

Each output for a revocable transaction needs to use a separate address, so we can hand the private key to the other party.  We use two disposable keys for each HTLC[1], and every new HTLC will change one of the other outputs (either mine, if I’m paying you, or yours if you’re paying me), so that needs a new key too.  That’s 3 keys, doubled for the symmetry, to give 6 keys per HTLC.

Adam Back pointed out that we can actually implement this scheme without the private key handover, and instead sign a transaction for the other side which gives them the money immediately.  This would permit more key reuse, but means we’d have to store these transactions somewhere on the off chance we needed them.

Storing just the keys is smaller, but more importantly, Section 6.2 of the paper describes using BIP 32 key hierarchies so the disposable keys are derived: after a while, you only need to store one key for all the keys the other side has given you.  This is vastly more efficient than storing a transaction for every HTLC, and indicates the scale (thousands of HTLCs per second) that the authors are thinking.

Next: Conclusion

My next post will be a TL;DR summary, and some more references to the implementation details and possibilities provided by the paper.

 


[1] The new sighash types are fairly loose, and thus allow you to attach a transaction to a different parent if it uses the same output addresses.  I think we could re-use the same keys in both paths if we ensure that the order of keys required is reversed for one, but we’d still need 4 keys, so it seems a bit too tricky.

April 06, 2015 11:21 AM

April 03, 2015

LPC 2015: LLVM Microconference Accepted into 2015 Linux Plumbers Conference

This microconference will cover all things LLVM related to Linux. Discussions will range from progress in compiling the Linux kernel (and changes in clang/LLVM) to support of clang in yocto, and even to compiling an entire distro with clang (while also using the “musl” replacement for glibc and uclibc). The topics will also include LLVM being used for bug hunting and for the extended Berkeley Packet Filter (eBPF). More topics and speakers will of course be added. If you have an LLVM-related discussion you would like to lead up, please contact Behan Webster <behanw@converseincode.com>.

April 03, 2015 02:23 PM

April 02, 2015

Daniel Vetter: Community Code of Conduct for intel-gfx

[This is a cross-post from the mail I just sent out to intel-gfx.]

Code of conducts seem to be in the news a bit recently, and I realized that I've never really documented how we run things. It's different from the kernel's overall CodeOfConflict and also differs from the official Intel/OTC one in small details about handling issues. And for completeness there's also the Xorg Foundation event policy. Anyway, I think this is worth clarifying and here it goes.

It's simple: Be respectful, open and excellent to each another.

Which doesn't mean we want to sacrifice quality to be nice. Striving for technical excellence very much doesn't exclude being excellent to someone else, and in our experience it tends to go hand in hand.

Unfortunately things go south occasionally. So if you feel threatened, personally abused or otherwise uncomfortable, even and especially when you didn't participate in a discussion yourself, then please raise this in private with the drm/i915 maintainers (currently Daniel Vetter and Jani Nikula, see MAINTAINERS for contact information). And the "in private" part is important: Humans screw up, disciplining minor fumbles by tarnishing someones google-able track record forever is out of proportion.

Still there are some teeth to this code of conduct:

1. First time around minor issues will be raised in private.

2. On repeat cases a public reply in the discussion will enforce that respectful behavior is expected.

3. We'll ban people who don't get it.

And severe cases will be escalated much quicker.

This applies to all community communication channels (irc, mailing list and bugzilla). And as mentioned this really just is a public clarification of the rules already in place - you can't see that though since we never had to go further than step 1.

Let's keep it at that.

And in case you have a problem with an individual drm/i915 maintainer and don't want to raise it with the other one there's the Xorg BoD, linux foundation TAB and the drm upstream maintainer Dave Airlie.

April 02, 2015 07:53 AM

April 01, 2015

Rusty Russell: Lightning Networks Part II: Hashed Timelock Contracts (HTLCs)

In Part I, we demonstrated Poon-Dryja channels; a generalized channel structure which used revocable transactions to ensure that old transactions wouldn’t be reused.

A channel from me<->you would allow me to efficiently send you 1c, but that doesn’t scale since it takes at least one on-blockchain transaction to set up each channel. The solution to this is to route funds via intermediaries;  in this example we’ll use the fictitious “MtBox”.

If I already have a channel with MtBox’s Payment Node, and so do you, that lets me reliably send 1c to MtBox without (usually) needing the blockchain, and it lets MtBox send you 1c with similar efficiency.

But it doesn’t give me a way to force them to send it to you; I have to trust them.  We can do better.

Bonding Unrelated Transactions using Riddles

For simplicity, let’s ignore channels for the moment.  Here’s the “trust MtBox” solution:

I send you 1c via MtBox; simplest possible version, using two independent transactions. I trust MtBox to generate its transaction after I send it mine.

What if we could bond these transactions together somehow, so that when you spend the output from the MtBox transaction, that automatically allows MtBox to spend the output from my transaction?

Here’s one way. You send me a riddle question to which nobody else knows the answer: eg. “What’s brown and sticky?”.  I then promise MtBox the 1c if they answer that riddle correctly, and tell MtBox that you know.

MtBox doesn’t know the answer, so it turns around and promises to pay you 1c if you answer “What’s brown and sticky?”. When you answer “A stick”, MtBox can pay you 1c knowing that it can collect the 1c off me.

The bitcoin blockchain is really good at riddles; in particular “what value hashes to this one?” is easy to express in the scripting language. So you pick a random secret value R, then hash it to get H, then send me H.  My transaction’s 1c output requires MtBox’s signature, and a value which hashes to H (ie. R).  MtBox adds the same requirement to its transaction output, so if you spend it, it can get its money back from me:

Two Independent Transactions, Connected by A Hash Riddle.

Handling Failure Using Timeouts

This example is too simplistic; when MtBox’s PHP script stops processing transactions, I won’t be able to get my 1c back if I’ve already published my transaction.  So we use a familiar trick from Part I, a timeout transaction which after (say) 2 days, returns the funds to me.  This output needs both my and MtBox’s signatures, and MtBox supplies me with the refund transaction containing the timeout:

Hash Riddle Transaction, With Timeout

MtBox similarly needs a timeout in case you disappear.  And it needs to make sure it gets the answer to the riddle from you within that 2 days, otherwise I might use my timeout transaction and it can’t get its money back.  To give plenty of margin, it uses a 1 day timeout:

MtBox Needs Your Riddle Answer Before It Can Answer Mine

Chaining Together

It’s fairly clear to see that longer paths are possible, using the same “timelocked” transactions.  The paper uses 1 day per hop, so if you were 5 hops away (say, me <-> MtBox <-> Carol <-> David <-> Evie <-> you) I would use a 5 day timeout to MtBox, MtBox a 4 day to Carol, etc.  A routing protocol is required, but if some routing doesn’t work two nodes can always cancel by mutual agreement (by creating timeout transaction with no locktime).

The paper refers to each set of transactions as contracts, with the following terms:

The hashing and timelock properties of the transactions are what allow them to be chained across a network, hence the term Hashed Timelock Contracts.

Next: Using Channels With Hashed Timelock Contracts.

The hashed riddle construct is cute, but as detailed above every transaction would need to be published on the blockchain, which makes it pretty pointless.  So the next step is to embed them into a Poon-Dryja channel, so that (in the normal, cooperative case) they don’t need to reach the blockchain at all.

April 01, 2015 11:46 AM

LPC 2015: Android/Mobile Microconference Accepted into 2015 Linux Plumbers Conference

As with 2014 and several years prior, 2015 is the year of the Linux smartphone. There are a number of mobile/embedded environments based on the Linux kernel, the most prominent of course being Android. One consequence of this prominence is a variety of projects derived from Android Open Source Project (AOSP), which raises the question of how best to manage them, and additionally if it is possible to run a single binary image of the various software components across a variety of devices. In addition, although good progress has been made upstreaming various Android patches, there is more work to be done for ADF, KMS, and Sync, among others. Migrating from Binder to KDBus is still a challenge, as are a number of other candidates for removal from drivers/staging. There are also issues remaining with ION, cenalloc, and DMA API. Finally, power management is still in need of improvement, with per-process power management being a case in point.

So when is the year of the Linux desktop? It seems that these developers are too busy working on mobile devices to have time to ask that question!

We hope to see you there!

April 01, 2015 07:18 AM

March 31, 2015

Michael Kerrisk (manpages): man-pages-3.82 is released

I've released man-pages-3.82. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

As well as a large number of minor fixes to more than 80 man pages, the more significant changes in man-pages-3.82 include the following:

March 31, 2015 07:00 AM

March 30, 2015

Rusty Russell: Lightning Networks Part I: Revocable Transactions

I finally took a second swing at understanding the Lightning Network paper.  The promise of this work is exceptional: instant reliable transactions across the bitcoin network. But the implementation is complex and the draft paper reads like a grab bag of ideas; but it truly rewards close reading!  It doesn’t involve novel crypto, nor fancy bitcoin scripting tricks.

There are several techniques which are used in the paper, so I plan to concentrate on one per post and wrap up at the end.

Revision: Payment Channels

I open a payment channel to you for up to $10

A Payment Channel is a method for sending microtransactions to a single recipient, such as me paying you 1c a minute for internet access.  I create an opening transaction which has a $10 output, which can only be redeemed by a transaction input signed by you and me (or me alone, after a timeout, just in case you vanish).  That opening transaction goes into the blockchain, and we’re sure it’s bedded down.

I pay you 1c in the payment channel. Claim it any time!

Then I send you a signed transaction which spends that opening transaction output, and has two outputs: one for $9.99 to me, and one for 1c to you.  If you want, you could sign that transaction too, and publish it immediately to get your 1c.

Update: now I pay you 2c via the payment channel.

Then a minute later, I send you a signed transaction which spends that same opening transaction output, and has a $9.98 output for me, and a 2c output for you. Each minute, I send you another transaction, increasing the amount you get every time.

This works because:

  1.  Each transaction I send spends the same output; so only one of them can ever be included in the blockchain.
  2. I can’t publish them, since they need your signature and I don’t have it.
  3. At the end, you will presumably publish the last one, which is best for you.  You could publish an earlier one, and cheat yourself of money, but that’s not my problem.

Undoing A Promise: Revoking Transactions?

In the simple channel case above, we don’t have to revoke or cancel old transactions, as the only person who can spend them is the person who would be cheated.  This makes the payment channel one way: if the amount I was paying you ever went down, you could simply broadcast one of the older, more profitable transactions.

So if we wanted to revoke an old transaction, how would we do it?

There’s no native way in bitcoin to have a transaction which expires.  You can have a transaction which is valid after 5 days (using locktime), but you can’t have one which is valid until 5 days has passed.

So the only way to invalidate a transaction is to spend one of its inputs, and get that input-stealing transaction into the blockchain before the transaction you’re trying to invalidate.  That’s no good if we’re trying to update a transaction continuously (a-la payment channels) without most of them reaching the blockchain.

The Transaction Revocation Trick

But there’s a trick, as described in the paper.  We build our transaction as before (I sign, and you hold), which spends our opening transaction output, and has two outputs.  The first is a 9.99c output for me.  The second is a bit weird–it’s 1c, but needs two signatures to spend: mine and a temporary one of yours.  Indeed, I create and sign such a transaction which spends this output, and send it to you, but that transaction has a locktime of 1 day:

The first payment in a lightning-style channel.

Now, if you sign and publish that transaction, I can spend my $9.99 straight away, and you can publish that timelocked transaction tomorrow and get your 1c.

But what if we want to update the transaction?  We create a new transaction, with 9.98c output to me and 2c output to a transaction signed by both me and another temporary address of yours.  I create and sign a transaction which spends that 2c output, has a locktime of 1 day and has an output going to you, and send it to you.

We can revoke the old transaction: you simply give me the temporary private key you used for that transaction.  Weird, I know (and that’s why you had to generate a temporary address for it).  Now, if you were ever to sign and publish that old transaction, I can spend my $9.99 straight away, and create a transaction using your key and my key to spend your 1c.  Your transaction (1a below) which could spend that 1c output is timelocked, so I’ll definitely get my 1c transaction into the blockchain first (and the paper uses a timelock of 40 days, not 1).

Updating the payment in a lightning-style channel: you sent me your private key for sig2, so I could spend both outputs of Transaction 1 if you were to publish it.

So the effect is that the old transaction is revoked: if you were to ever sign and release it, I could steal all the money.  Neat trick, right?

A Minor Variation To Avoid Timeout Fallback

In the original payment channel, the opening transaction had a fallback clause: after some time, it is all spendable by me.  If you stop responding, I have to wait for this to kick in to get my money back.  Instead, the paper uses a pair of these “revocable” transaction structures.  The second is a mirror image of the first, in effect.

A full symmetric, bi-directional payment channel.

So the first output is $9.99 which needs your signature and a temporary signature of mine.  The second is  1c for meyou.  You sign the transaction, and I hold it.  You create and sign a transaction which has that $9.99 as input, a 1 day locktime, and send it to me.

Since both your and my “revocable” transactions spend the same output, only one can reach the blockchain.  They’re basically equivalent: if you send yours you must wait 1 day for your money.  If I send mine, I have to wait 1 day for my money.  But it means either of us can finalize the payment at any time, so the opening transaction doesn’t need a timeout clause.

Next…

Now we have a generalized transaction channel, which can spend the opening transaction in any way we both agree on, without trust or requiring on-blockchain updates (unless things break down).

The next post will discuss Hashed Timelock Contracts (HTLCs) which can be used to create chains of payments…

Notes For Pedants:

In the payment channel open I assume OP_CHECKLOCKTIMEVERIFY, which isn’t yet in bitcoin.  It’s simpler.

I ignore transaction fees as an unnecessary distraction.

We need malleability fixes, so you can’t mutate a transaction and break the ones which follow.  But I also need the ability to sign Transaction 1a without a complete Transaction 1 (since you can’t expose the signed version to me).  The paper proposes new SIGHASH types to allow this.

[EDIT 2015-03-30 22:11:59+10:30: We also need to sign the other symmetric transactions before signing the opening transaction.  If we released a completed opening transaction before having the other transactions, we might be stuck with no way to get our funds back (as we don’t have a “return all to me” timeout on the opening transaction)]

March 30, 2015 10:47 AM

March 29, 2015

LPC 2015: Development Tools Tutorial Accepted into 2015 Linux Plumbers Conference

In a departure from prior Plumbers tradition, we are pleased to announce not a Development Tools Microconference, but rather a set of Development Tools tutorials, including interactive tutorials, demos, and short presentations. Topics include Coccinelle (Julia Lawall), testing and debugging tools (Shuah Khan), issues with copying and pasting Linux kernel code (Michael Godfrey), and LLVM/clang and the Linux kernel (Behan Webster).

Given how important tools are to productivity of developers and the quality of their code, the time devoted to these tutorials promises to be time well spent!

Come and find out how to use the tools you have heard about! We hope to see you there!

March 29, 2015 07:37 PM

March 27, 2015

LPC 2015: Registration for LPC 2015 Is Now Open

The 2015 Linux Plumbers Conference organizing committee is pleased to announce that the registration for this year’s conference is now open. The conference will be taking place in Seattle, Washington, USA, August 19th through August 21st. Information on how to register can be found on the ATTEND page. Registration prices and cutoff dates are also published in the ATTEND page. As usual, contact us if you have questions.

March 27, 2015 11:04 PM

March 26, 2015

LPC 2015: Containers Microconference Accepted into 2015 Linux Plumbers Conference

Over the past year, the advent of Docker has further increased the level of Containers excitement. Additional points of Containers interest include the LXC 1.1 release (which includes CRIU checkpoint/restore, in-container systemd support, and feature-set compatibility across systemd, sysvinit, and upstart), the recently announced merger of OpenVZ and Cloud server, and progress in the kernel namespace and cgroups infrastructure.

The goal of this microconference is to get these diverse groups together to discuss the long-standing issue of container device isolation, live migration, security features such as user namespaces, and the dueling-systemd challenges stemming from running systemd both within containers and on the host OS (see “The Grumpy Editor’s guide to surviving the systemd debate” for other systemd-related topics).

Please join us for a timely and important discussion!

March 26, 2015 11:10 AM

March 25, 2015

Matthew Garrett: Python for remote reconfiguration of server firmware

One project I've worked on at Nebula is a Python module for remote configuration of server hardware. You can find it here, but there's a few caveats:

  1. It's not hugely well tested on a wide range of hardware
  2. The interface is not yet guaranteed to be stable
  3. You'll also need this module if you want to deal with IBM (well, Lenovo now) servers
  4. The IBM support is based on reverse engineering rather than documentation, so who really knows how good it is

There's documentation in the README, and I'm sorry for the API being kind of awful (it suffers rather heavily from me writing Python while knowing basically no Python). Still, it ought to work. I'm interested in hearing from anybody with problems, anybody who's interested in getting it on Pypi and anybody who's willing to add support for new HP systems.

comment count unavailable comments

March 25, 2015 11:51 PM

March 24, 2015

LPC 2015: Energy-Aware Scheduling and CPU Power Management Microconference Accepted into 2015 Linux Plumbers Conference

Energy efficiency has received considerable attention, for example, the microconference at last year’s Plumbers.  However, despite another year’s worth of vigorous efforts, there is still quite a bit left to be desired in Linux’s power management and in its energy-aware scheduling in particular, hence this year’s microconference.

This microconference will look at progress on frequency/performance scaling, thermal management, ACPI power management, device-tree representation of power-management features, energy-aware scheduling, management of power domains, integration of system-wide with runtime power management, and of course measurement techniques and tools.

We hope to see you there!

March 24, 2015 05:19 PM

LPC 2015: Checkpoint/Restart Microconference Accepted into 2015 Linux Plumbers Conference

Checkpoint/restart technology is the basis for live migration as well as its traditional use to take a snapshot of a long-running job. This microconference will focus on the C/R project called CRIU and will bring together people from Canonical, CloudLinux, Georgia Institute of Technology, Google, Parallels, and Qualcomm to discuss CRIU integration with the various containers projects, its use on Android, performance and testing issues and, of course, to show some live demoes.  See the Checkpoint/Restart wiki for more information.

Please join us for a timely and important discussion!

 

March 24, 2015 05:15 PM

March 16, 2015

Dave Airlie: virgil3d local rendering test harness

So I've still been working on the virgil3d project along with part time help from Marc-Andre and Gerd at Red Hat, and we've been making steady progress. This post is about a test harness I just finished developing for adding and debugging GL features.

So one of the more annoying issuess with working on virgil has been that while working on adding 3D renderer features or trying to track down a piglit failure, you generally have to run a full VM to do so. This adds a long round trip in your test/development cycle.

I'd always had the idea to do some sort of local system renderer, but there are some issues with calling GL from inside a GL driver. So my plan was to have a renderer process which loads the renderer library that qemu loads, and a mesa driver that hooks into the software rasterizer interfaces. So instead of running llvmpipe or softpipe I have a virpipe gallium wrapper, that wraps my virgl driver and the sw state tracker via a new vtest winsys layer for virgl.

So the virgl pipe driver sits on top of the new winsys layer, and the new winsys instead of using the Linux kernel DRM apis just passes the commands over a UNIX socket to a remote server process.

The remote server process then uses EGL and the renderer library, forks a new copy for each incoming connection and dies off when the rendering is done.

The final rendered result has to be read back over the socket, and then the sw winsys is used to putimage the rendering onto the screen.

So this system is probably going to be slower in raw speed terms, but for developing features or debugging fails it should provide an easier route without the overheads of the qemu process. I was pleasantly surprised it only took two days to pull most of this test harness together which was neat, I'd planned much longer for it!

The code lives in two halves.
http://cgit.freedesktop.org/~airlied/virglrenderer
http://cgit.freedesktop.org/~airlied/mesa virgl-mesa-driver

[updated: pushed into the main branches]

Also the virglrenderer repo is standalone now, it also has a bunch of unit tests in it that are run using valgrind also, in an attempt to lock down some more corners of the API and test for possible ways to escape the host.

March 16, 2015 06:24 AM

March 13, 2015

Dave Jones: LSF/MM 2015 recap.

It’s been a long week.
Spent Monday/Tuesday at LSFMM. This year it was in Boston, which was convenient in that I didn’t have to travel anywhere, but less convenient in that I had to get up early and do a rush-hour commute to get to the conference location in time. At least the weather got considerably better this week compared to the frankly stupid amount of snow we’ve had over the last month.
LWN did their usual great write-up which covers everything that was talked about in a lot more detail than my feeble mind can remember.

A lot of things from last years event seem to still be getting a lot of discussion. SMR drives & persistent memory being the obvious stand-outs. Lots of discussion surrounding various things related to huge pages (so much so one session overran and replaced a slot I was supposed to share with Sasha, not that I complained. It was interesting stuff, and I learned a few new reasons to dislike the way we handle hugepages & forking), and I lost track how many times the GFP_NOFAIL discussion came up.

In a passing comment in one session, one of the people Intel sent (Dave Hansen iirc) mentioned that Intel are now shipping a 18 core/36 thread CPU. A bargain at just $4642. Especially when compared to this madness.

A few days before the event, I had been asked if I wanted to do a “how Akamai uses Linux” type talk at LSFMM, akin to what Chris Mason did re: facebook at last years event. I declined, given I’m still trying to figure that out myself. Perhaps another time.

Wednesday/Thursday, I attended Vault at the same location.
My take-away’s:

I got asked “What are you doing at Akamai ?” a lot. (answer right now: trying to bring some coherence to our multiple test infrastructures).
Second most popular question: “What are going to do after that ?”. (answer: unknown, but likely something more related to digging into networking problems rather than fighting shell scripts, perl and Makefiles).

All that, plus a lot of hallway conversations, long lunches, and evening activities that went on possibly a little later than they should have have led to me almost losing my voice today.
Really good use of time though. I had fun, and it’s always good to catch up with various people.

LSF/MM 2015 recap. is a post from: codemonkey.org.uk

March 13, 2015 08:57 PM

March 12, 2015

Paul E. Mc Kenney: Confessions of a Recovering Proprietary Programmer, Part XV

So the Linux kernel now has a Documentation/CodeOfConflict file. As one of the people who provided an Acked-by for this file, I thought I should set down what went through my mind while reading it. Taking it one piece at a time:

The Linux kernel development effort is a very personal process compared to “traditional” ways of developing software. Your code and ideas behind it will be carefully reviewed, often resulting in critique and criticism. The review will almost always require improvements to the code before it can be included in the kernel. Know that this happens because everyone involved wants to see the best possible solution for the overall success of Linux. This development process has been proven to create the most robust operating system kernel ever, and we do not want to do anything to cause the quality of submission and eventual result to ever decrease.

In a perfect world, this would go without saying, give or take the “most robust” chest-beating. But I am probably not the only person to have noticed that the world is not always perfect. Sadly, it is probably necessary to remind some people that “job one” for the Linux kernel community is the health and well-being of the Linux kernel itself, and not their own pet project, whatever that might be.

On the other hand, I was also heartened by what does not appear in the above paragraph. There is no assertion that the Linux kernel community's processes are perfect, which is all to the good, because delusions of perfection all too often prevent progress in mature projects. In fact, in this imperfect world, there is nothing so good that it cannot be made better. On the other hand, there also is nothing so bad that it cannot be made worse, so random wholesale changes should be tested somewhere before being applied globally to a project as important as the Linux kernel. I was therefore quite happy to read the last part of this paragraph: “we do not want to do anything to cause the quality of submission and eventual result to ever decrease.”

If however, anyone feels personally abused, threatened, or otherwise uncomfortable due to this process, that is not acceptable.

That sentence is of course critically important, but must be interpreted carefully. For example, it is all too possible that someone might feel abused, threatened, and uncomfortable by the mere fact of a patch being rejected, even if that rejection was both civil and absolutely necessary for the continued robust operation of the Linux kernel. Or someone might claim to feel that way, if they felt that doing so would get their patch accepted. (If this sounds impossible to you, be thankful, but also please understand that the range of human behavior is extremely wide.) In addition, I certainly feel uncomfortable when someone points out a stupid mistake in one of my patches, but that discomfort is my problem, and furthermore encourages me to improve, which is a good thing. For but one example, this discomfort is exactly what motivated me to write the rcutorture test suite. Therefore, although I hope that we all know what is intended by the words “abused”, “threatened”, and “uncomfortable” in that sentence, the fact is that it will never be possible to fully codify the difference between constructive and destructive behavior.

Therefore, the resolution process is quite important:

If so, please contact the Linux Foundation's Technical Advisory Board at <tab@lists.linux-foundation.org>, or the individual members, and they will work to resolve the issue to the best of their ability. For more information on who is on the Technical Advisory Board and what their role is, please see:

http://www.linuxfoundation.org/programs/advisory-councils/tab

There can be no perfect resolution process, but this one seems to be squarely in the “good enough” category. The timeframes are long enough that people will not be rewarded by complaining to the LF TAB instead of fixing their patches. The composition of the LF TAB, although not perfect, is diverse, consisting of both men and women from multiple countries. The LF TAB appears to be able to manage the inevitable differences of opinion, based on the fact that not all members provided their Acked-by for this Code of Conflict. And finally, the LF TAB is an elected body that has oversight via the LF, so there are feedback mechanisms. Again, this is not perfect, but it is good enough that I am willing to overlook my concerns about the first sentence in the paragraph.

On to the final paragraph:

As a reviewer of code, please strive to keep things civil and focused on the technical issues involved. We are all humans, and frustrations can be high on both sides of the process. Try to keep in mind the immortal words of Bill and Ted, “Be excellent to each other.”

And once again, in a perfect world it would not be necessary to say this. Sadly, we are human beings rather than angels, and so it does appear to be necessary. Then again, if we were all angels, this would be a very boring world.

Or at least that is what I keep telling myself!

March 12, 2015 05:36 PM

Matthew Garrett: Vendors continue to break things

Getting on for seven years ago, I wrote an article on why the Linux kernel responds "False" to _OSI("Linux"). This week I discovered that vendors were making use of another behavioural difference between Linux and Windows to change the behaviour of their firmware and breaking things in the process.

The ACPI spec defines the _REV object as evaluating "to the revision of the ACPI Specification that the specified \_OS implements as a DWORD. Larger values are newer revisions of the ACPI specification", ie you reference _REV and you get back the version of the spec that the OS implements. Linux returns 5 for this, because Linux (broadly) implements ACPI 5.0, and Windows returns 2 because fuck you that's why[1].

(An aside: To be fair, Windows maybe has kind of an argument here because the spec explicitly says "The revision of the ACPI Specification that the specified \_OS implements" and all modern versions of Windows still claim to be Windows NT in \_OS and eh you can kind of make an argument that NT in the form of 2000 implemented ACPI 2.0 so handwave)

This would all be fine except firmware vendors appear to earnestly believe that they should ensure that their platforms work correctly with RHEL 5 even though there aren't any drivers for anything in their hardware and so are looking for ways to identify that they're on Linux so they can just randomly break various bits of functionality. I've now found two systems (an HP and a Dell) that check the value of _REV. The HP checks whether it's 3 or 5 and, if so, behaves like an old version of Windows and reports fewer backlight values and so on. The Dell checks whether it's 5 and, if so, leaves the sound hardware in a strange partially configured state.

And so, as a result, I've posted this patch which sets _REV to 2 on X86 systems because every single more subtle alternative leaves things in a state where vendors can just find another way to break things.

[1] Verified by hacking qemu's DSDT to make _REV calls at various points and dump the output to the debug console - I haven't found a single scenario where modern Windows returns something other than "2"

comment count unavailable comments

March 12, 2015 10:03 AM

March 10, 2015

Pavel Machek: Happy Easter from DRAM vendors

DRAM in about 50% of recent notebooks (and basically 50% of machines without ECC) is so broken it is exploitable. You can get root from normal user account (and more). But while everyone and their dog wrote about heartbleed and bash bugs, press did not notice yet. I guess it is because the vulnerability does not yet have a logo?

I propose this one:

           +==.
           |   \                                                                                     
  +---+    +====+
 -+   +-     ||
  |DDR|      ||
 -+   +-     ||
  +---+      ||


Memory testing code is at github. Unfortunately, Google did not publish list of known bad models. If you run a test, can you post the results in commments? My thinkpad X60 is not vulnerable, my Intel(R) Core(TM)2 Duo CPU     E7400 -based desktop (with useless DMI information, so I don't know who made the board) is vulnerable.

March 10, 2015 09:30 PM

Pavel Machek: Random notes

Time: 1039 Safe: 101

Plane 'X' landed at the wrong airport.

You beat your previous score!

 #:  name      host      game                time  real time  planes safe
 -------------------------------------------------------------------------------
  1:  pavel     duo       default             1040      34:38   101



And some good news: Old thinkpad x60 can take 3GiB RAM, making machine usable a little longer.

gpsd can run non-root, and seems to accept output from named pipe. Which is good, because it means using wifi accesspoints to provide position to clients such as foxtrotgps is easier. Code is in gitorious tui.

March 10, 2015 09:15 PM

March 09, 2015

Paul E. Mc Kenney: Verification Challenge 4: Tiny RCU

The first and second verification challenges were directed to people working on verification tools, and the third challenge was directed at developers. Perhaps you are thinking that it is high time that I stop picking on others and instead direct a challenge at myself. If so, this is the challenge you were looking for!

The challenge is to take the v3.19 Linux kernel code implementing Tiny RCU, unmodified, and use some formal-verification tool to prove that its grace periods are correctly implemented.

This requires a tool that can handle multiple threads. Yes, Tiny RCU runs only on a single CPU, but the proof will require at least two threads. The basic idea is to have one thread update a variable, wait for a grace period, then update a second variable, while another thread accesses both variables within an RCU read-side critical section, and a third parent thread verifies that this critical section did not span a grace period, like this:

 1 int x;
 2 int y;
 3 int r1;
 4 int r2;
 5
 6 void rcu_reader(void)
 7 {
 8   rcu_read_lock();
 9   r1 = x; 
10   r2 = y; 
11   rcu_read_unlock();
12 }
13
14 void *thread_update(void *arg)
15 {
16   x = 1; 
17   synchronize_rcu();
18   y = 1; 
19 }
20
21 . . .
22
23 assert(r2 == 0 || r1 == 1);


Of course, rcu_reader()'s RCU read-side critical section is not allowed to span thread_update()'s grace period, which is provided by synchronize_rcu(). Therefore, rcu_reader() must execute entirely before the end of the grace period (in which case r2 must be zero, keeping in mind C's default initialization to zero), or it must execute entirely after the beginning of the grace period (in which case r1 must be one).

There are a few technical problems to solve:


  1. The Tiny RCU code #includes numerous “interesting” files. I supplied empty files as needed and used “-I .” to focus the C preprocessor's attention on the current directory.
  2. Tiny RCU uses a number of equally interesting Linux-kernel primitives. I stubbed most of these out in fake.h, but copied a number of definitions from the Linux kernel, including IS_ENABLED, barrier(), and bool.
  3. Tiny RCU runs on a single CPU, so the two threads shown above must act as if this was the case. I used pthread_mutex_lock() to provide the needed mutual exclusion, keeping in mind that Tiny RCU is available only with CONFIG_PREEMPT=n. The thread that holds the lock is running on the sole CPU.
  4. The synchronize_rcu() function can block. I modeled this by having it drop the lock and then re-acquire it.
  5. The dyntick-idle subsystem assumes that the boot CPU is born non-idle, but in this case the system starts out idle. After a surprisingly long period of confusion, I handled this by having main() invoke rcu_idle_enter() before spawning the two threads. The confusion eventually proved beneficial, but more on that later.


The first step is to get the code to build and run normally. You can omit this step if you want, but given that compilers usually generate better diagnostics than do the formal-verification tools, it is best to make full use of the compilers.

I first tried goto-cc, goto-instrument, and satabs [Slide 44 of PDF] and impara [Slide 52 of PDF], but both tools objected strenuously to my code. My copies of these two tools are a bit dated, so it is possible that these problems have since been fixed. However, I decided to download version 5 of cbmc, which is said to have gained multithreading support.

After converting my code to a logic expression with no fewer than 109,811 variables and 457,344 clauses, cbmc -I . -DRUN fake.c took a bit more than ten seconds to announce VERIFICATION SUCCESSFUL. But should I trust it? After all, I might have a bug in my scaffolding or there might be a bug in cbmc.

The usual way to check for this is to inject a bug and see if cbmc catches it. I chose to break up the RCU read-side critical section as follows:

 1 void rcu_reader(void)
 2 {
 3   rcu_read_lock();
 4   r1 = x; 
 5   rcu_read_unlock();
 6   cond_resched();
 7   rcu_read_lock();
 8   r2 = y; 
 9   rcu_read_unlock();
10 }


Why not remove thread_update()'s call to synchronize_rcu()? Take a look at Tiny RCU's implementation of synchronize_rcu() to see why not!

With this change enabled via #ifdef statements, “cbmc -I . -DRUN -DFORCE_FAILURE fake.c” took almost 20 seconds to find a counter-example in a logic expression with 185,627 variables and 815,691 clauses. Needless to say, I am glad that I didn't have to manipulate this logic expression by hand!

Because cbmc catches an injected bug and verifies the original code, we have some reason to hope that the VERIFICATION SUCCESSFUL was in fact legitimate. As far as I know, this is the first mechanical proof of the grace-period property of a Linux-kernel RCU implementation, though admittedly of a rather trivial implementation. On the other hand, a mechanical proof of some properties of the dyntick-idle counters came along for the ride, courtesy of the WARN_ON_ONCE() statements in the Linux-kernel source code. (Previously, researchers at Oxford mechanically validated the relationship between rcu_dereference() and rcu_assign_pointer(), taking the whole of Tree RCU as input, and researchers at MPI-SWS formally validated userspace RCU's grace-period guarantee—manually.)

As noted earlier, I had confused myself into thinking that cbmc did not handle pthread_mutex_lock(). I verified that cbmc handles the gcc atomic builtins, but it turns out to be impractical to build a lock for cbmc's use from atomics. The problem stems from the “b” for “bounded” in “cbmc”, which means cbmc cannot analyze the unbounded spin loops used in locking primitives.

However, cbmc does do the equivalent of a full state-space search, which means it will automatically model all possible combinations of lock-acquisition delays even in the absence of a spin loop. This suggests something like the following:

 1 if (__sync_fetch_and_add(&cpu_lock, 1))
 2   exit();


The idea is to exclude from consideration any executions where the lock cannot be immediately acquired, again relying on the fact that cbmc automatically models all possible combinations of delays that the spin loop might have otherwise produced, but without the need for an actual spin loop. This actually works, but my mis-modeling of dynticks fooled me into thinking that it did not. I therefore made lock-acquisition failure set a global variable and added this global variable to all assertions. When this failed, I had sufficient motivation to think, which caused me to find my dynticks mistake. Fixing this mistake fixed all three versions (locking, exit(), and flag).

The exit() and flag approaches result in exactly the same number of variables and clauses, which turns out to be quite a bit fewer than the locking approach:

exit()/flaglocking
Verification69,050 variables, 287,548 clauses (output)109,811 variables, 457,344 clauses (output)
Verification Forced Failure113,947 variables, 501,366 clauses (output)   185,627 variables, 815,691 clauses (output)


So locking increases the size of the logic expressions by quite a bit, but interestingly enough does not have much effect on verification time. Nevertheless, these three approaches show a few of the tricks that can be used to accomplish real work using formal verification.

The GPL-licensed source for the Tiny RCU validation may be found here. C-preprocessor macros select the various options, with -DRUN being necessary for both real runs and cbmc verification (as opposed to goto-cc or impara verification), -DCBMC forcing the atomic-and-flag substitute for locking, and -DFORCE_FAILURE forcing the failure case. For example, to run the failure case using the atomic-and-flag approach, use:

cbmc -I . -DRUN -DCBMC -DFORCE_FAILURE fake.c


Possible next steps include verifying dynticks and interrupts, dynticks and NMIs, and of course use of call_rcu() in place of synchronize_rcu(). If you try these out, please let me know how it goes!

March 09, 2015 06:42 PM

Lucas De Marchi: Taking maintainership of dolt

For those who don’t know, dolt is a wrapper and replacement for libtool on sane systems that don’t need it at all. It was created some years ago by Josh Triplett to overcome the slowness of libtool.

Nowadays libtool should be much faster so the necessity for dolt should not be as big anymore. However as can be seen here, it’s not true (at least in libtool > 2.4.2). Yeah, this seems a bug in libtool that should be fixed. However I don’t like it being any more complicated than it should.

After talking to Josh and to Luca Barbato (who maintains an updated version in github) it seemed a good idea to revive dolt on its original repository. Since this file is supposed to be integrated in each project’s m4 directory, there are several copies out there with different improvements. Now the upstream repository has an updated version that should work for any project — feel free to copy it to your repository, synchronizing with upstream. I don’t really expect much maintenance. Credit where it’s due: most of the missing commits came from the version maintained by Luca.

So, if you want to integrate it in your repository using autotools, it’s pretty simple, feel free to follow the README.md file.

March 09, 2015 05:27 PM

March 06, 2015

Pavel Machek: Position privacy protection

Mozilla maintains access points (AP) database at location.services.mozilla.com. Location using WIFI is cool: you don't need GPS hardware, and you can fix quicker/for less battery power in cities, and you can get a fix indoors.

Mozilla will return your position if you know SSIDs of two nearby access points, using web service. That has disadvantages: you need working internet connection, connection costs you time, power and money, and Mozilla now knows where you are.
Obvious solution would be to publish AP database, but that has downside: if you visit Anicka once and learn SSID of her favourite access point, you could locate Anicka with simple database query once she moves.

Solution:
first = select N numerically lower (or most commonly seen) access points in the area
second = all access points in the area
for i in first:
      for j in second:
              at position sha1(i, j, salt?) in the database, store GPS coordinates.
If probability of missing access point when you are in the right area is P, probability of not being able to tell your location is P^N. Database will grow approximately N times.
Storing salt: it will make it harder to see differences between different version (good). But if we play some tricks with hash-size to artificaly introduce collisions, this may make them ineffective.
Problem: There is only 2^48 access points. Someone could brute force hash. Solution: store fewer bits of hash to create collisions?
Problem: If you can guess Anicka anicka likes South Pole, and suddenly new access point appears in the area, you can guess her address. Comment: not a problem since Anicka would have to take two access points to the South Pole? Or still a problem since you don't need to know address of the second AP to locate her?
Problem: If you know Anicka likes Mesto u Chciplyho psa, where noone ever moves and noone ever activates/deactivats APs, you can still locate her. Comment: is it a problem? Are there such places?

Any ideas? Does it work, or did I make a mistake somewhere? Is there solution with lower overhead?

March 06, 2015 10:45 PM

Andy Grover: iSER target should work fine in RHEL 7.1

Contrary to what RHEL 7.1 release notes might say, RHEL 7.1 should be fine as an iSER target, and it should be fine to use iSER even during the discovery phase. There was significant late-breaking work by our storage partners to fix both of these issues.

Unfortunately, there were multiple Bugzilla entries for the same issues, and while some were properly closed, others were not, and the issues erroneously were mentioned in the release notes.

So, for the hordes out there eager to try iSER target on RHEL 7.1 and who actually read the release notes —  I hope you see this too and know it’s OK give it a go :-)

March 06, 2015 12:47 AM

March 04, 2015

Matt Domsch: Dell’s Linux Engineering team is hiring

Dell’s Linux Engineering team, based in Austin, TX, is hiring a Senior Principal Engineer. This role is one I’ve previously held and enjoyed greatly – ensuring that Linux (all flavors) “just works” on all Dell PowerEdge servers. It is as much a relationship role (working closely with Dell and partner hardware teams, OS vendors and developers, internal teams, and the greater open source community) as it is technical (device driver work, OS kernel and userspace work). If you’re a “jack of all trades” in Linux and looking for a very senior technical role to continue the outstanding work that ranks Dell at the top of charts for Linux servers, we’d love to speak with you.

The formal job description is on Dell’s job site. If you’d like to speak with me personally about it, drop me a line.

March 04, 2015 03:31 PM

March 03, 2015

Dave Jones: Trinity 1.5 release.

As announced this morning, today I decided that things had slowed down (to an almost standstill of late) enough that it was worth making a tarball release of Trinity, to wrap up everything that’s gone in over the last year.

The email linked above covers most of the major changes, but a lot of the change over the last year has actually been groundwork for those features. Things like..

As I mentioned in the announcement, I don’t see myself having a huge amount of time for at least this year to work on Trinity. I’ve had a number of people email me asking the status of some feature. Hopefully this demarkation point will answer the question.

So, it’s not abandoned, it just won’t be seeing the volume of change it has over the last few years. I expect my personal involvement will be limited to merging patches, and updating the syscall lists when new syscalls get added.

Trinity used to be on roughly a six month release schedule. We’ll see if by the end of the year there’s enough input from other people to justify doing a 1.6 release.

I’m also hopeful that time working on other projects mean I’ll come back to this at some point with fresh eyes. There are a number of features I wanted to implement that needed a lot more thought. Perhaps working on some other things for a while will give me the perspective necessary to realize those features.

Trinity 1.5 release. is a post from: codemonkey.org.uk

March 03, 2015 02:14 AM

March 02, 2015

Michael Kerrisk (manpages): man-pages-3.81 is released

I've released man-pages-3.81. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

The changes in man-pages-3.81 relate exclusively to the (glibc) thread-safety markings in various man pages. More than 400 patches, mainly by Ma Shimiao and Peng Haitao of Fujitsu brought the following changes:

By now, thanks mainly to the work of Peng Haitao and Ma Shimiao, nearly 400 of the (around 980) pages in man-pages carry thread-safety information.

In addition, a new attributes(7) man page, based on text supplied by Alexandre Oliva (who was responsible for adding thread-safety information to the GNU C Library manual) provides an overview of the thread-safety concepts documented in man-pages, and a description of the notation used in man-pages to describe the thread safety of functions. (Thanks also to Carlos O'Donell for helping us to obtain the permissions needed so that man-pages could recycle this text from the GNU C Library manual.)

March 02, 2015 04:02 PM

Paul E. Mc Kenney: Verification Challenge 3: cbmc

The first and second verification challenges were directed to people working on verification tools, but this one is instead directed at developers.

It turns out that there are a number of verification tools that have seen heavy use. For example, I have written several times about Promela and spin (here, here, and here), which I have used from time to time over the past 20 years. However, this tool requires that you translate your code to Promela, which is not conducive to use of Promela for regression tests.

For those of use working in the Linux kernel, it would be nice to have a verification tool that operated directly on C source code. And there are tools that do just that, for example, the C Bounded Model Checker (cbmc). This tool, which is included in a number of Linux distributions, converts a C-language input file into a (possibly quite large) logic expression. This expression is constructed so that if any combination of variables causes the logic expression to evaluate to true, then (and only then) one of the assertions can be triggered. This logic expression is then passed to a SAT solver, and if this SAT solver finds a solution, then there is a set of inputs that can trigger the assertion. The cbmc tool is also capable of checking for array-bounds errors and some classes of pointer misuse.

Current versions of cbmc can handle some useful tasks. For example, suppose it was necessary to reverse the sense of the if condition in the following code fragment from Linux-kernel RCU:

 1   if (rnp->exp_tasks != NULL ||
 2       (rnp->gp_tasks != NULL &&
 3        rnp->boost_tasks == NULL &&
 4        rnp->qsmask == 0 &&
 5        ULONG_CMP_GE(jiffies, rnp->boost_time))) {
 6     if (rnp->exp_tasks == NULL) 
 7       rnp->boost_tasks = rnp->gp_tasks;
 8     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
 9     t = rnp->boost_kthread_task;
10     if (t)   
11       rcu_wake_cond(t, rnp->boost_kthread_status);
12   } else {
13     rcu_initiate_boost_trace(rnp);
14     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
15   }


This is a simple application of De Morgan's law, but an error-prone one, particularly if carried out in a distracting environment. Of course, to test a validation tool, it is best to feed it buggy code to see if it detects those known bugs. And applying De Morgan's law in a distracting environment is an excellent way to create bugs, as you can see below:

 1   if (rnp->exp_tasks == NULL &&
 2       (rnp->gp_tasks == NULL ||
 3        rnp->boost_tasks != NULL ||
 4        rnp->qsmask != 0 &&
 5        ULONG_CMP_LT(jiffies, rnp->boost_time))) {
 6     rcu_initiate_boost_trace(rnp);
 7     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
 8   } else {
 9     if (rnp->exp_tasks == NULL) 
10       rnp->boost_tasks = rnp->gp_tasks;
11     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
12     t = rnp->boost_kthread_task;
13     if (t)   
14       rcu_wake_cond(t, rnp->boost_kthread_status);
15   }


Of course, a full exhaustive test is infeasible, but structured testing would result in a manageable number of test cases. However, we can use cbmc to do the equivalent of a full exhaustive test, despite the fact that the number of combinations is on the order of two raised to the power 1,000. The approach is to create task_struct and rcu_node structures that contain only those fields that are used by this code fragment, but that also contain flags that indicate which functions were called and what their arguments were. This allows us to wrapper both the old and the new versions of the code fragment in their respective functions, and call them in sequence on different instances of identically initialized task_struct and rcu_node structures. These two calls are followed by an assertion that checks that the return value and the corresponding fields of the structures are identical.

This approach results in checkiftrans-1.c (raw C code here). Lines 5-8 show the abbreviated task_struct structure and lines 13-22 show the abbreviated rcu_node struButcture. Lines 10, 11, 24, and 25 show the instances. Lines 27-31 record a call to rcu_wake_cond() and lines 33-36 record a call to rcu_initiate_boost_trace().

Lines 38-49 initialize a task_struct/rcu_node structure pair. The rather unconventional use of the argv[] array works because cbmc assumes that this array contains random numbers. The old if statement is wrappered by do_old_if() on lines 51-71, while the new if statement is wrappered by do_new_if() on lines 73-93. The assertion is in check() on lines 95-107, and finally the main program is on lines 109-118.

Running cbmc checkiftrans-1.c gives this output, which prominently features VERIFICATION FAILED at the end of the file. On lines 4, 5, 12 and 13 of the file are complaints that neither ULONG_CMP_GE() nor ULONG_CMP_LT() are defined. Lacking definitions for these these two functions, cbmc seems to treat them as random-number generators, which could of course cause the two versions of the if statement to yield different results. This is easily fixed by adding the required definitions:

 1 #define ULONG_MAX         (~0UL)
 2 #define ULONG_CMP_GE(a, b)  (ULONG_MAX / 2 >= (a) - (b))
 3 #define ULONG_CMP_LT(a, b)  (ULONG_MAX / 2 < (a) - (b))


This results in checkiftrans-2.c (raw C code here). However, running cbmc checkiftrans-2.c gives this output, which still prominently features VERIFICATION FAILED at the end of the file. At least there are no longer any complaints about undefined functions!

It turns out that cbmc provides a counterexample in the form of a traceback. This traceback clearly shows that the two instances executed different code paths, and a closer examination of the two representations of the if statement show that I forgot to convert one of the && operators to a ||—that is, the “rnp->qsmask != 0 &&” on line 84 should instead be “rnp->qsmask != 0 ||”. Making this change results incheckiftrans-3.c (raw C code here). The inverted if statement is now as follows:

 1   if (rnp->exp_tasks == NULL &&
 2       (rnp->gp_tasks == NULL ||
 3        rnp->boost_tasks != NULL ||
 4        rnp->qsmask != 0 ||
 5        ULONG_CMP_LT(jiffies, rnp->boost_time))) {
 6     rcu_initiate_boost_trace(rnp);
 7     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
 8   } else {
 9     if (rnp->exp_tasks == NULL) 
10       rnp->boost_tasks = rnp->gp_tasks;
11     /* raw_spin_unlock_irqrestore(&rnp->lock, flags); */
12     t = rnp->boost_kthread_task;
13     if (t)   
14       rcu_wake_cond(t, rnp->boost_kthread_status);
15   }


This time, running cbmc checkiftrans-3.c produces this output, which prominently features VERIFICATION SUCCESSFUL at the end of the file. Furthermore, this verification consumed only about 100 milliseconds on my aging laptop. And, even better, because it refused to verify the buggy version, we have at least some reason to believe it!

Of course, one can argue that doing such work carefully and in a quiet environment would eliminate the need for such verification, and 30 years ago I might have emphatically agreed with this argument. I have since learned that ideal work environments are not always as feasible as we might like to think, especially if there are small children (to say nothing of adult-sized children) in the vicinity. Besides which, human beings do make mistakes, even when working in ideal circumstances, and if we are to have reliable software, we need some way of catching these mistakes.

The canonical pattern for using cbmc in this way is as follows:

 1 retref = funcref(...);
 2 retnew = funcnew(...);
 3 assert(retref == retnew && ...);


The ... sequences represent any needed arguments to the calls and any needed comparisons of side effects within the assertion.

Of course, there are limitations:


  1. The “b” in cbmc stands for “bounded.” In particular, cbmc handles neither infinite loops nor infinite recursion. The --unwind and --depth arguments to cbmc allow you to control how much looping and recursion is analyzed. See the manual for more information.
  2. The SAT solvers used by cbmc have improved greatly over the past 25 years. In fact, where a 100-variable problem was at the edge of what could be handled in the 1990s, most ca-2015 solvers can handle more than a million variables. However, the NP-complete nature of SAT does occasionally make its presence known, for example, programs that reduce to a proof involving the pigeonhole principle are not handled well as of early 2015.
  3. Handling of concurrency is available in later versions of cbmc, but is not as mature as is the handling of single-threaded code.


All that aside, everything has its limitations, and cbmc's ease of use is quite impressive. I expect to continue to use it from time to time, and strongly recommend that you give it a try!

March 02, 2015 01:08 AM

February 27, 2015

Matthew Garrett: Actions have consequences (or: why I'm not fixing Intel's bugs any more)

A lot of the kernel work I've ended up doing has involved dealing with bugs on Intel-based systems - figuring out interactions between their hardware and firmware, reverse engineering features that they refuse to document, improving their power management support, handling platform integration stuff for their GPUs and so on. Some of this I've been paid for, but a bunch has been unpaid work in my spare time[1].

Recently, as part of the anti-women #GamerGate campaign[2], a set of awful humans convinced Intel to terminate an advertising campaign because the site hosting the campaign had dared to suggest that the sexism present throughout the gaming industry might be a problem. Despite being awful humans, it is absolutely their right to request that a company choose to spend its money in a different way. And despite it being a dreadful decision, Intel is obviously entitled to spend their money as they wish. But I'm also free to spend my unpaid spare time as I wish, and I no longer wish to spend it doing unpaid work to enable an abhorrently-behaving company to sell more hardware. I won't be working on any Intel-specific bugs. I won't be reverse engineering any Intel-based features[3]. If the backlight on your laptop with an Intel GPU doesn't work, the number of fucks I'll be giving will fail to register on even the most sensitive measuring device.

On the plus side, this is probably going to significantly reduce my gin consumption.

[1] In the spirit of full disclosure: in some cases this has resulted in me being sent laptops in order to figure stuff out, and I was not always asked to return those laptops. My current laptop was purchased by me.

[2] I appreciate that there are some people involved in this campaign who earnestly believe that they are working to improve the state of professional ethics in games media. That is a worthy goal! But you're allying yourself to a cause that disproportionately attacks women while ignoring almost every other conflict of interest in the industry. If this is what you care about, find a new way to do it - and perhaps deal with the rather more obvious cases involving giant corporations, rather than obsessing over indie developers.

For avoidance of doubt, any comments arguing this point will be replaced with the phrase "Fart fart fart".

[3] Except for the purposes of finding entertaining security bugs

comment count unavailable comments

February 27, 2015 12:28 AM

February 23, 2015

Dave Jones: backup solutions.

For the longest time, my backup solution has been a series of rsync scripts that have evolved over time into a crufty mess. Having become spoiled on my mac with time machine, I decided to look into something better that didn’t involve a huge time investment on my part.

The general consensus seemed to be that for ready-to-use home-nas type devices, the way to go was either Synology, or Drobo. You just stick in some disks, and setup NFS/SAMBA etc with a bunch of mouse clicking. Perfect.

I had already decided I was going to roll with a 5 disk RAID6 setup, so bit the bullet and laid down $1000 for a Synology 8-Bay DS1815+. It came *triple* boxed, unlike the handful of 3TB HGST drives.
I chose the HGST’s after reading backblaze’s report on failure rates across several manufacturers, and figured that after the RAID6 overhead, 8TB would be more than enough for a long time, even at the rate I accumulate flac and wav files. Also, worst case, I still had 3 spare bays I could expand into later if needed.

Installation was a breeze. The plastic drive caddies felt a little flimsy, but the drives were secure once in them, even if they did feel like they were going to snap as I flexed them to pop them into place. After putting in all the drives, I connected the four ethernet ports, I powered it up.
After connecting to its web UI, it wanted to do a firmware update, like just about every internet connected device wants to do these days. It rebooted, and finally I could get about setting things up.

On first logging into the device over ssh, I think the first command I typed was uname. Seeing a 3.2 kernel surprised me a little. I got nervous thinking about how many VFS,EXT4,MD bugfixes hadn’t made their way back to long-term stable, and got the creeps a little. I decided to not think too much about it, and put faith in the Synology people doing backports (though I never got as far as looking into their kernel package).

The web ui is pretty slick, though felt a little sluggish at times. I set up my RAID6 volume with a bunch of clicks, and then listened as all those disks started clattering away. After creation, it wanted to do an initial parity scan. I set it going, and went to bed. The next morning before going to work, I checked on it, and noticed it wasn’t even at 20% done. I left it going while I went into the office the next day. I spent the night away from home, and so didn’t get back to it until another day later.

When I returned home, the volume was now ready, but I noticed the device was now noticeably hotter to touch than I remembered. I figured it had been hammering the disks non-stop for 24hrs, so go figure, and that it would probably cool off a little as it idled. As the device was now ready for exporting, I set up an nfs export, and then spent some time fighting uid mappings, as you do. The device does have ability to deal with LDAP and some other stuff that I’ve never had time to setup, so I did things the hard way. Once I had the export mounted, I started my first rsync from my existing backups.

While it was running, I remembered I had intended to set up bonding. A little bit of clicky-clicky later, it was done, and transfers started getting even faster. Very nice. I set up two bonds, with a pair of NICs in each. Given my desktop only has a dual NIC, that was good enough. Having a 2nd 2GigE bond I figured was nice in case I had multiple machines wanting to use it while I was doing a backup.

So the backup was going to take a while, so I left it running.
A few hours later, I got back to it, and again, it was getting really hot. There are two pretty big fans in the back of the units, and they were cranking out heat. Then, things started getting really weird. I noticed that the rsync had hung. I ctrl-c’d it, and tried logging into the device as root. It took _minutes_ to get a command prompt. I typed top and waited. About two minutes later top started. Then it spontaneously rebooted.

When it came back up, I logged in, and poked around the log files, and didn’t see anything out of the ordinary.
I restarted the rsync, and left it go for a while. About 20 minutes later, I came back to check on it again, and found that the box had just hung completely. The rsync was stalled, I couldn’t ssh in. I rebooted the device, cursed a bit, and then decided to think about it for a while, so never restarted the rsync. I clicked around in the interface, to see if there was anything I could turn on/off that would perhaps give me some clues wtf was going on.
Then it rebooted spontaneously again.

It was about this time I was ready to throw the damn thing out the window. I bought this thing because I wanted a turn-key solution that ‘just worked’, and had quickly come to realize that with this device when something went bad, I was pretty screwed. Sometimes “It runs Linux” just isn’t enough. For some people, the Synology might be a great solution, but it wasn’t for me. Reading some of the Amazon reviews, it seems there were a few people complaining about their units overheating, which might explain the random reboots I saw. For a device I wanted to leave switched on 24/7 and never think about, something that overheats (especially when I’m not at home) really doesn’t give me feel good vibes. Some of the other reviews on Amazon rave about the DS1815+. It may be that there was a bad batch, and I got unlucky, but I felt burnt on the whole experience, and even if I had got a replacement, I don’t know if I would have felt like I could have trusted this thing with my data.

I ended up returning it to Amazon for a refund, and used the money to buy a motherboard, cpu, ram etc to build a dedicated backup computer. It might not have the fancy web ui, and it might mean I’ll still be using my crappy rsync scripts, but when things go wrong, I generally have a much better chance of fixing the problems.

Other surprises: At one point, I opened the unit up to install an extra 4GB of RAM (It comes with just 2GB by default), I noticed that it runs off a single 250W power supply, which seemed surprising to me. I thought disks during spin-up used considerably more power, but apparently they’re pretty low power these days.

So, two weeks of wasted time, frustration, and failed experiments. Hopefully by next week I’ll have my replacement solution all set up and can move on to more interesting things instead of fighting appliances.

backup solutions. is a post from: codemonkey.org.uk

February 23, 2015 02:04 AM

February 21, 2015

Michael Kerrisk (manpages): man-pages-3.80 is released

I've released man-pages-3.80. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

Aside from very many small fixes and improvements to various pages (by more than 30 contributors!), the most notable changes in man-pages-3.80 are the following:

February 21, 2015 12:50 PM

Paul E. Mc Kenney: Confessions of a Recovering Proprietary Programmer, Part XIV

Although junk mail, puppies, and patches often are unwelcome, there are exceptions. For example, if someone has been wanting a particular breed of dog for some time, that person might be willing to accept a puppy, even if that means giving it shots, housebreaking it, teaching it the difference between furniture and food, doing bottlefeeding, watching over it day and night, and even putting up with some sleepless nights.

Similarly, if a patch fixes a difficult and elusive bug, the maintainer might be willing to apply the patch by hand, fix build errors and warnings, fix a few bugs in the patch itself, run a full set of tests, fix and style problems, and even accept the risk that the bug might have unexpected side effects, some of which might result in some sleepless nights. This in fact is one of the reasons for the common advice given to open-source newbies: start by fixing bugs.

Other good advice for new contributors can be found here:


  1. Greg Kroah-Hartman's HOWTO do Linux kernel development – take 2 (2005)
  2. Jonathan Corbet's How to Participate in the Linux Community (2008)
  3. Greg Kroah-Hartman's Write and Submit your first Linux kernel Patch (2010)
  4. My How to make a positive difference in a FOSS project (2012)
  5. Daniel Lezcano's What do we mean by working upstream: A long-term contributor’s view


This list is mostly about contributing to the Linux kernel, but most other projects have similar pages giving good new-contributor advice.

February 21, 2015 04:51 AM

February 20, 2015

Pavel Machek: What you might want to know about uloz.to

1. captcha is not case-sensitive

2. you can get around concurrent downloads limit using incognito window from chromium. If you need more downloads, chromium --temp-profile does the trick, too.

What you might want to know about Debian stable

Somehow, Debian stable rules do not apply to the Chromium web browser: you won't get security updates for it. I'd say that Chromium is the most security critical package on the system, so it is strange decision to me. In any case, you want to uninstall chromium, or perhaps update to Debian testing.

February 20, 2015 09:56 AM

February 19, 2015

Matthew Garrett: It has been 0 days since the last significant security failure. It always will be.

So blah blah Superfish blah blah trivial MITM everything's broken.

Lenovo deserve criticism. The level of incompetence involved here is so staggering that it wouldn't be a gross injustice for the company to go under as a result[1]. But let's not pretend that this is some sort of isolated incident. As an industry, we don't care about user security. We will gladly ship products with known security failings and no plans to update them. We will produce devices that are locked down such that it's impossible for anybody else to fix our failures. We will hide behind vague denials, we will obfuscate the impact of flaws and we will deflect criticisms with announcements of new and shinier products that will make everything better.

It'd be wonderful to say that this is limited to the proprietary software industry. I would love to be able to argue that we respect users more in the free software world. But there are too many cases that demonstrate otherwise, even where we should have the opportunity to prove the benefits of open development. An obvious example is the smartphone market. Hardware vendors will frequently fail to provide timely security updates, and will cease to update devices entirely after a very short period of time. Fortunately there's a huge community of people willing to produce updated firmware. Phone manufacturer is never going to fix the latest OpenSSL flaw? As long as your phone can be unlocked, there's a reasonable chance that there's an updated version on the internet.

But this is let down by a kind of callous disregard for any deeper level of security. Almost every single third-party Android image is either unsigned or signed with the "test keys", a set of keys distributed with the Android source code. These keys are publicly available, and as such anybody can sign anything with them. If you configure your phone to allow you to install these images, anybody with physical access to your phone can replace your operating system. You've gained some level of security at the application level by giving up any real ability to trust your operating system.

This is symptomatic of our entire ecosystem. We're happy to tell people to disable security features in order to install third-party software. We're happy to tell people to download and build source code without providing any meaningful way to verify that it hasn't been tampered with. Install methods for popular utilities often still start "curl | sudo bash". This isn't good enough.

We can laugh at proprietary vendors engaging in dreadful security practices. We can feel smug about giving users the tools to choose their own level of security. But until we're actually making it straightforward for users to choose freedom without giving up security, we're not providing something meaningfully better - we're just providing the same shit sandwich on different bread.

[1] I don't see any way that they will, but it wouldn't upset me

comment count unavailable comments

February 19, 2015 07:43 PM

February 18, 2015

Pavel Machek: When you are drowning in regressions

..then the first step is to stop more regressions. That's the situation with Nokia N900 kernel now: it has a lot of hardware, and there's kernel support for most of that, but userland support really is not mature enough. So I added test script to tui/ofone, which allows testing of battery, audio, LEDs, backlight, GPS, bluetooth and more. It is called "tefone". "ofone" script (with gtk gui) can be used to control modem, place calls and read SMSes. You'll need a library to actually get voice calls with audio.

On a related note, my PC now resumes faster than my monitor turns on. I guess we are doing good job. (Or maybe Fujitsu did not do such a good job). Too bad resume broke completely in 3.20-rc0 on thinkpad.

February 18, 2015 10:50 AM

February 16, 2015

Matthew Garrett: Intel Boot Guard, Coreboot and user freedom

PC World wrote an article on how the use of Intel Boot Guard by PC manufacturers is making it impossible for end-users to install replacement firmware such as Coreboot on their hardware. It's easy to interpret this as Intel acting to restrict competition in the firmware market, but the reality is actually a little more subtle than that.

UEFI Secure Boot as a specification is still unbroken, which makes attacking the underlying firmware much more attractive. We've seen several presentations at security conferences lately that have demonstrated vulnerabilities that permit modification of the firmware itself. Once you can insert arbitrary code in the firmware, Secure Boot doesn't do a great deal to protect you - the firmware could be modified to boot unsigned code, or even to modify your signed bootloader such that it backdoors the kernel on the fly.

But that's not all. Someone with physical access to your system could reflash your system. Even if you're paranoid enough that you X-ray your machine after every border crossing and verify that no additional components have been inserted, modified firmware could still be grabbing your disk encryption passphrase and stashing it somewhere for later examination.

Intel Boot Guard is intended to protect against this scenario. When your CPU starts up, it reads some code out of flash and executes it. With Intel Boot Guard, the CPU verifies a signature on that code before executing it[1]. The hash of the public half of the signing key is flashed into fuses on the CPU. It is the system vendor that owns this key and chooses to flash it into the CPU, not Intel.

This has genuine security benefits. It's no longer possible for an attacker to simply modify or replace the firmware - they have to find some other way to trick it into executing arbitrary code, and over time these will be closed off. But in the process, the system vendor has prevented the user from being able to make an informed choice to replace their system firmware.

The usual argument here is that in an increasingly hostile environment, opt-in security isn't sufficient - it's the role of the vendor to ensure that users are as protected as possible by default, and in this case all that's sacrificed is the ability for a few hobbyists to replace their system firmware. But this is a false dichotomy - UEFI Secure Boot demonstrated that it was entirely possible to produce a security solution that provided security benefits and still gave the user ultimate control over the code that their machine would execute.

To an extent the market will provide solutions to this. Vendors such as Purism will sell modern hardware without enabling Boot Guard. However, many people will buy hardware without consideration of this feature and only later become aware of what they've given up. It should never be necessary for someone to spend more money to purchase new hardware in order to obtain the freedom to run their choice of software. A future where users are obliged to run proprietary code because they can't afford another laptop is a dystopian one.

Intel should be congratulated for taking steps to make it more difficult for attackers to compromise system firmware, but criticised for doing so in such a way that vendors are forced to choose between security and freedom. The ability to control the software that your system runs is fundamental to Free Software, and we must reject solutions that provide security at the expense of that ability. As an industry we should endeavour to identify solutions that provide both freedom and security and work with vendors to make those solutions available, and as a movement we should be doing a better job of articulating why this freedom is a fundamental part of users being able to place trust in their property.

[1] It's slightly more complicated than that in reality, but the specifics really aren't that interesting.

comment count unavailable comments

February 16, 2015 08:44 PM

February 15, 2015

Matt Domsch: s3cmd 1.5.2 – major update coming to Fedora and EPEL

As new upstream maintainer for the popular s3cmd program, I have been collecting and making fixes all across the codebase for several months. In the last couple weeks it has finally gotten stable enough to warrant publishing a formal release. Aside from bugfixes, its primary enhancement is adding support for the AWS Signature v4 method, which is required to create S3 buckets in the eu-central-1 (Frankfurt) region, and is a more secure request-signing method usable in all AWS S3 regions. Since releasing s3cmd v1.5.0, python 2.7.9 (as exemplified in Arch Linux) added support for SSL certificate validation. Unfortunately, that validation broke for SSL wildcard certificates (e.g. *.s3.amazonaws.com). Ubuntu 14.04 has an intermediate flavor of this validation, which also broke s3cmd. A couple quick fixes later, and v1.5.2 is published now.

I’ve updated the packages in Fedora 20, 21, and rawhide. EPEL 6 and 7 epel-testing repositories has these as well. If you use s3cmd on RHEL/CentOS, please upgrade to the package in epel-testing and give it karma. Bug reports are welcome at https://github.com/s3tools/s3cmd.

February 15, 2015 05:05 PM

February 13, 2015

Rusty Russell: lguest Learns PCI

With the 1.0 virtio standard finalized by the committee (though minor non-material corrections and clarifications are still trickling in), Michael Tsirkin did the heavy lifting of writing the Linux drivers (based partly on an early prototype of mine).

But I wanted an independent implementation to test: both because OASIS insists on multiple implementations before standard ratification, but also because I wanted to make sure the code which is about to go into the merge window works well.

Thus, I began the task of making lguest understand PCI.  Fortunately, the osdev wiki has an excellent introduction on how to talk PCI on an x86 machine.  It didn’t take me too long to get a successful PCI bus scan from the guest, and start about implementing the virtio parts.

The final part (over which I procrastinated for a week) was to step through the spec and document all the requirements in the lguest comments.  I also added checks that the guest driver was behaving sufficiently, but now it’s finally done.

It also resulted in a few minor patches, and some clarification patches for the spec.  No red flags, however, so I’m reasonably confident that 3.20 will have compliant 1.0 virtio support!

February 13, 2015 06:31 AM

James Morris: Save the Date — Linux Security Summit 2015, August 20-21, Seattle WA, USA

The Linux Security Summit for 2015 will be held across 20-21 August, in Seattle, WA, USA.  As with previous events, we’ll be co-located with LinuxCon.

Preliminary event details are available at the event site:

http://events.linuxfoundation.org/events/linux-security-summit

A CFP will be issued soon — stay tuned!

Thank you to Máirín Duffy, who created wonderful logos for the event.


February 13, 2015 03:26 AM

February 11, 2015

Daniel Vetter: Neat drm/i915 Stuff for 3.20

Linux 3.19 was just released and my usual overview of what the next merge window will bring is more than overdue. The big thing overall is certainly all the work around atomic display updates, but read on for what else all has been done.

Let's first start with all the driver internal rework to support atomic. The big thing with atomic is that it requires a clean split between code that checks display updates and the code that commits a new display state to the hardware. The corallary from that is that any derived state that's computed in the validation code and needed int the commit code must be stored somewhere in the state object. Gustavo Padovan and Matt Roper have done all that work to support atomic plane updates. This is the code that's now in 3.20 as a tech preview. The big things missing for proper atomic plane updates is async commit support (which has already landed for 3.21) and support to adjust watermark settings on the fly. Patches for from Ville have been around for a long time, but need to be rebased, reviewed and extended for recently added platforms.

On the modeset side Ander Conselvan de Oliveira has done a lot of the necessary work already. Unfortunately converting the modeset code is much more involved for mainly two reaons: First there's a lot more derived state that needs to be handled, and the existing code already has structures and code for this. Conceptually the code has been prepared for an atomic world since the big display rewrite and the introduction of CRTC configuration structures. But converting the i915 modeset code to the new DRM atomic structures and interface is still a lot of work. Most of these refactorings have landed in 3.20. The other hold-up is shared resources and the software state to handle that. This is mostly for handling display PLLs, but also other shared resources like the display FIFO buffers. Patches to handle this are still in-flight.

Continuing with modeset work Jani Nikula has reworked the DSI support to use the shared DSI helpers from the DRM core. Jani also reworked the DSI to in preparation for dual-link DSI support, which Gaurav Singh implemented. Rodrigo Vivi and others provided a lot of patches to improve PSR support and enable it for Baytrail/Braswell. Unfortunately there's still issues with the automated testcase and so PSR unfortunately stays disabled by default for now. Rodrigo also wrote some nice DocBook documentation for fbc, another step towards fully documenting i915 driver internals.

Moving on to platform enabling there has been a lot of work from Ville on Cherryview: RPS/gpu turbo and pipe CRC support (used for automated display testing) are both improved. On Skylake almost all the basic enabling is merged now: PM patches, enabling mmio pageflips and fastboot support from Damien have landed. Tvrtko Ursulin also create the infrastructure for global GTT views. This will be used for some upcoming features on Skylake. And to simplify enabling future platforms Chris Wilson and Mika Kuoppala have completely rewritten the forcewake domains handling code.

Also really important for Skylake is that the execlist support for gen8+ command submission is finally in a good enough to be used by default - on Skylake that's the only support path, legacy ring submission has been deprecated. Around that feature and a few other closely related ones a lot of code was refactoring: John Harrison implemented the conversion from raw sequence numbers to request objects for gpu progress tracking. This as is also prep work for landing a gpu scheduler. Nick Hoath removed the special execlist request tracking structures, simplifying the code. The engine initialization code was also refactored for a cleaner split between software and hardware initialization, leading to robuster code for driver load and system resume. Dave Gordon has also reworked the code tracking and computing the ring free space. On top of that we've also enabled full ppgtt again, but for now only where execlists are available since there are still issues with the legacy ring-based pagetable loading.

For generic GEM work there's the really nice support for write-combine cpu memory mappings from Akash Goel and Chris Wilson. On Atom SoC platforms lacking the giant LLC bigger chips have this gives a really fast way to upload textures. And even with LLC it's useful for uploading to scanout targets since those are always uncached. But like the special-purpose uploads paths for LLC platforms the cpu mmap views do not detile textures, hence special-purpose fastpaths need to be written in mesa and other userspace to fully exploit this. In other GEM features the shadow batch copying code for the command parser has now also landed.

Finally there's the redesign from Imre Deak to use the component framework for the snd-hda/i915 interactions. Modern platforms need a lot of coordination between the graphics and sound driver side for audio over hdmi, and thus far this was done with some ad-hoc dynamic symbol lookups. Which results in a lot of headaches to get the ordering correctly for driver load or system suspend and resume. With the component framework this depency is now explicit, which means we will be able to get rid of a few hacks. It's also much easier to extend for the future - new platforms tend to integrate different components even more.

February 11, 2015 11:20 PM

Paul E. Mc Kenney: Confessions of a Recovering Proprietary Programmer, Part XIII

True confession: I was once a serial junk mailer. Not mere email spam, but physical bulk-postage-rate flyers, advertising a series of non-technical conferences. It was of course for a good cause, and one of the most difficult things about that task was convincing that cause's leaders that this flyer was in fact junk mail. They firmly believed that anyone with even a scrap of compassion would of course read the flyer from beginning to end, feeling the full emotional impact of each and every lovingly crafted word. They reluctantly came around to my view, which was that we had at most 300 milliseconds to catch the recipient's attention, that being the amount of time that the recipient might glance at the flyer on its way into the trash. Or at least I think that they came around to my view. All I really know is that they stopped disputing the point.

But junk mail for worthy causes is not the only thing that can be less welcome than its sender might like.

For example, Jim Wasko noticed a sign at a daycare center that read: “If you are late picking up your child and have not called us in advance, we will give him/her an espresso and a puppy. Have a great day.”

Which goes to show that although puppies are cute and lovable, and although their mother no doubt went to a lot of trouble to bring them into this world, they are, just like junk mail, not universally welcome. And this should not be too surprising, given the questions that come to mind when contemplating a free puppy. Has it had its shots? Is it housebroken? Has it learned that furniture is not food? Has it been spayed/neutered? Is it able to eat normal dogfood, or does it still require bottlefeeding? Is it willing to entertain itself for long periods? And, last, but most definitely not least, is it willing to let you sleep through the night?

Nevertheless, people are often surprised and bitterly disappointed when their offers of free puppies are rejected.

Other people are just as surprised and disappointed when their offers of free patches are rejected. After all, they put a lot of work into their patches, and they might even get into trouble if the patch isn't eventually accepted.

But it turns out that patches are a lot like junk mail and puppies. They are greatly valued by those who produce them, but often viewed with great suspicion by the maintainers receiving them. You see, the thought of accepting a free patch also raises questions. Does the patch apply cleanly? Does it build without errors and warnings? Does it run at all? Does it pass regression tests? Has it been tested with the commonly used combination of configuration parameters? Does the patch have good code style? Is the patch maintainable? Does the patch provide a straightforward and robust solution to whatever problem it is trying to solve? In short, will this patch allow the maintainer to sleep through the night?

I am extremely fortunate in that most of the RCU patches that I receive are “good puppies.” However, not everyone is so lucky, and I occasionally hear from patch submitters whose patches were not well received. They often have a long list of reasons why their patches should have been accepted, including:


  1. I put a lot of work into that patch, so it should have been accepted! Unfortunately, hard work on your part does not guarantee a perception of value on the maintainer's part.
  2. The maintainer's job is to accept patches. Maybe not, your maintainer might well be an unpaid volunteer.
  3. But my maintainer is paid to maintain! True, but he is probably not being paid to do your job.
  4. I am not asking him to do my job, but rather his/her job, which is to accept patches! The maintainer's job is not to accept any and all patches, but instead to accept good patches that further the project's mission.
  5. I really don't like your attitude! I put a lot of work into making this be a very good patch! It should have been accepted! Really? Did you make sure it applied cleanly? Did you follow the project's coding conventions? Did you make sure that it passed regression tests? Did you test it on the full set of platforms supported by the project? Does it avoid problems discussed on the project's mailing list? Did you promptly update your patch based on any feedback you might have received? Is your code maintainable? Is your code aligned with the project's development directions? Do you have a good reputation with the community? Do you have a track record of supporting your submissions? In other words, will your patch allow the maintainer to sleep through the night?
  6. But I don't have time to do all that! Then the maintainer doesn't have time to accept your patch. And most especially doesn't have time to deal with all the problems that your patch is likely to cause.

As a recovering proprietary programmer, I can assure you that things work a bit differently in the open-source world, so some adjustment is required. But participation in an open-source project can be very rewarding and worthwhile!

February 11, 2015 07:40 PM

Michael Kerrisk (manpages): Linux/UNIX System Programming course scheduled for May 2015, Munich

I've scheduled a further 5-day Linux/UNIX System Programming course to take place in Munich, Germany, for the week of 4-8 May 2015.

The course is intended for programmers developing system-level, embedded, or network applications for Linux and UNIX systems, or programmers porting such applications from other operating systems (e.g., Windows) to Linux or UNIX. The course is based on my book, The Linux Programming Interface (TLPI), and covers topics such as low-level file I/O; signals and timers; creating processes and executing programs; POSIX threads programming; interprocess communication (pipes, FIFOs, message queues, semaphores, shared memory), and network programming (sockets).
     
The course has a lecture+lab format, and devotes substantial time to working on some carefully chosen programming exercises that put the "theory" into practice. Students receive printed and electronic copies of TLPI, along with a 600-page course book that includes all slides presented in the course. A reading knowledge of C is assumed; no previous system programming experience is needed.

Some useful links for anyone interested in the course:

Questions about the course? Email me via training@man7.org.

February 11, 2015 09:06 AM

February 10, 2015

James Bottomley: Anatomy of the UEFI Boot Sequence on the Intel Galileo

The Basics

UEFI boot officially has three phases (SEC, PEI and DXE).  However, the DXE phase is divided into DXEBoot and DXERuntime (the former is eliminated after the call to ExitBootSerivices()).  The jobs of each phase are

  1. SEC (SECurity phase). This contains all the CPU initialisation code from the cold boot entry point on.  It’s job is to set the system up far enough to find, validate, install and run the PEI.
  2. PEI (Pre-Efi Initialization phase).  This configures the entire platform and then loads and boots the DXE.
  3. DXE (Driver eXecution Environment).  This is where the UEFI system loads drivers for configured devices, if necessary; mounts drives and finds and executes the boot code.  After control is transferred to the boot OS, the DXERuntime stays resident to handle any OS to UEFI calls.

How it works on Quark

This all sounds very simple (and very like the way an OS like Linux boots up).  However, there’s a very crucial difference: The platform really is completely unconfigured when SEC begins.  In particular it won’t have any main memory, so you begin in a read only environment until you can configure some memory.  C code can’t begin executing until you’ve at least found enough writable memory for a stack, so the SEC begins in hand crafted assembly until it can set up a stack.

On all x86 processors (including the Quark), power on begins execution in 16 bit mode at the ResetVector (0xfffffff0). As a helping hand, the default power on bus routing has the top 128KB of memory mapped into the top of SPI flash (read only, of course) via a PCI routing in the Legacy Bridge, meaning that the reset vector executes directly from the SPI Flash (this is actually very slow: SPI means Serial Peripheral Interface, so every byte of SPI flash has to be read serially into the instruction cache before it can be executed).

The hand crafted assembly clears the cache, transitions to Flat32 bit execution mode and sets up the necessary x86 descriptor tables.  It turns out that memory configuration on the Quark SoC is fairly involved and complex so, in order to spare the programmer from having to do this all in assembly, there’s a small (512kB) static ram chip that can be easily configured, so the last assembly job of the SEC is to configure the eSRAM (to a fixed address at 2GB), set the top as the stack, load the PEI into the base (by reconfiguring the SPI flash mapping to map the entire 8MB flash to the top of memory and then copying the firmware volume containing the PEI) and begin executing.

QuarkPlatform Build Oddities

Usually the PEI code is located by the standard Flash Volume code of UEFI and the build time PCDs (Platform Configuration Database entries) which use the values in the Flash Definition File to build the firmware.  However, the current Quark Platform package has a different style because it rips apart and rebuilds the flash volumes, so instead of using PCDs, it uses something it calls Master Flash Headers (MFHs) which are home grown for Quark.  These are a fixed area of the flash that can be read as a database giving the new volume layout (essentially duplicating what the PCDs would normally have done).  Additionally the Quark adds a non-standard signature header occupying 1k to each flash volume which serves two purposes: For the SECURE_LD case, it actually validates the volume, but for the three items in the firmware that don’t have flash headers (the kernel, the initrd and the grub config) it serves to give the lengths of each.

Laying out Flash Rom

This is a really big deal for most embedded systems because the amount of flash available is really limited.  The Galileo board is nice because it supplies 8MB of flash … which is huge in embedded terms.  All flash is divided into Flash Volumes1.  If you look at OVMF for instance, it builds its flash as four volumes: Three for the three SEC, PEI and DXE phases and one for the EFI variables.  In EdkII, flash files are built by the flash definition file (the one with a .fdf ending).  Usually some part of the flash is compressed and has to be inflated into memory (in OVMF this is PEI and DXE) and some are designed to be execute in place (usually SEC).  If you look at the Galileo layout, you see that it has a big SEC phase section (called BOOTROM_OVERRIDE) designed for the top 128kb of the flash , the usual variable area and then five additional sections, two for PEI and DXE and three recovery ones. (and, of course, an additional payload section for the OS that boots from flash).

Embedded Recovery Sections

For embedded devices (and even normal computers) recovery in the face of flash failure (whether from component issues or misupdate of the flash) is really important, so the Galileo follows a two stage fallback process.  The first stage is to detect a critical error signalled by the platform sticky bit, or recovery strap in the SEC and boot up to the fixed phase recovery which tries to locate a recovery capsule on the USB media2. The other recovery is a simple copy of the PEI image for fallback in case the primary PEI image fails (by now you’ll have realised there are three separate but pretty much identical copies of PEI in the flash rom).  One of the first fixes that can be made to the Quark build is to consolidate all of these into a single build description.

Putting it all together: Implementing a compressed PEI Phase

One of the first things I discovered when trying to update the UEFI version to something more modern is that the size of the PEI phase overflows the allowed size of the firmware volume.  This means either redo the flash layout or compress the PEI image.  I chose the latter and this is the story of how it went.

The first problem is that debug prints don’t work in the SEC phase, which is where the changes are going to have to be.  This is a big stumbling block because without debugging, you never know where anything went wrong.  It turns out that UEFI nicely supports this via a special DebugLib that outputs to the serial console, but that the Galileo firmware build has this disabled by this line:

[LibraryClasses.IA32.SEC]
...
 DebugLib|MdePkg/Library/BaseDebugLibNull/BaseDebugLibNull.inf

The BaseDebugLibNull does pretty much what you expect: throws away all Debug messages.  When this is changed to something that outputs messages, the size of the PEI image explodes again, mainly because Stage1 has all the SEC phase code in it.  The fix here is only to enable debugging inside the QuarkResetVector SEC phase code.  You can do this in the .dsc file with

 QuarkPlatformPkg/Cpu/Sec/ResetVector/QuarkResetVector.inf {
   <LibraryClasses>
     DebugLib|MdePkg/Library/BaseDebugLibSerialPort/BaseDebugLibSerialPort.inf
 }

And now debugging works in the SEC phase!

It turns out that a compressed PEI is possible but somewhat more involved than I imagined so that will be the subject of the next blog post.  For now I’ll round out with other oddities I discovered along the way

Quark Platform SEC and PEI Oddities

On the current quark build, the SEC phase is designed to be installed into the bootrom from 0xfffe 0000 to 0xffff ffff.  This contains a special copy of the reset vector (In theory it contains the PEI key validation for SECURE_LD, but in practise the verifiers are hard coded to return success).  The next oddity is that the stage1 image, which should really just be the PEI core actually contains another boot from scratch SEC phase, except this one is based on the standard IA32 reset vector code plus a magic QuarkSecLib and then the PEI code.  This causes the stage1 bring up to be different as well, because usually, the SEC code locates the PEI core in stage1 and loads, relocates and executes it starting from the entry point PeiCore().  However, quark doesn’t do this at all.  It relies on the Firmware Volume generator populating the first ZeroVector (an area occupying the first 16 bytes of the Firmware Volume Header) with a magic entry (located in the ResetVector via the magic string ‘SPI Entry Point ‘ with the trailing space).  The SEC code indirects through the ZeroVector to this code and effectively re-initialises the stack and begins executing the new SEC code, which then locates the internal copy of the PEI core and jumps to it.

February 10, 2015 12:58 AM

February 06, 2015

LPC 2015: LPC 2015 Call for Microconference Proposals

The LPC 2015 Planning Committee has opened the submissions for Microconference Proposals. Please review the instructions on our Participate page carefully, because this year we have implemented a few new changes to the process. We look forward to your proposals!

As usual if you have any questions, please contact us.

February 06, 2015 10:21 PM

LPC 2015: LPC 2015 Call for Refereed Presentations

We are pleased to announce that the Call for Refereed Presentations for the Linux Plumbers Conference 2015 is now open. Please follow the instructions on our Participate page. Just like in the past, the Refereed Presentation track is shared with LinuxCon North America which will take place August 17 – 19 at the same location. In particular, the LPC specific refereed presentations will be scheduled on the overlap day, August 19, 2015. Mark your calendar!

If you have any questions, please contact the LPC planning committee.

February 06, 2015 10:13 PM

LPC 2015: Welcome to the 2015 Edition of the Linux Plumbers Conference!

The 2015 LPC planning committee is pleased to announce that the planning for this year’s edition of the Linux Plumbers Conference has started. The conference will take place in Seattle, Washington USA from Wednesday August 19 to Friday August 21, 2015, at the Sheraton Seattle Hotel.
We hope to see you there for a very productive set of discussions!

February 06, 2015 09:57 PM

February 02, 2015

Michael Kerrisk (manpages): man-pages-3.79 is released

I've released man-pages-3.79. The release tarball is available on kernel.org. The browsable online pages can be found on man7.org. The Git repository for man-pages is available on kernel.org.

Aside from many smaller fixes and improvements to various pages, the most notable changes in man-pages-3.79 are the following:

February 02, 2015 08:47 AM

February 01, 2015

Paul E. Mc Kenney: Parallel Programming: January 2015 Update

This release of Is Parallel Programming Hard, And, If So, What Can You Do About It? features a new chapter on SMP real-time programming, a updated formal-verification chapter, removal of a couple of the less-relevant appendices, several new cartoons (along with some refurbishing of old cartoons), and other updates, including contributions from Bill Pemberton, Borislav Petkov, Chris Rorvick, Namhyung Kim, Patrick Marlier, and Zygmunt Bazyli Krynicki.

As always, git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git will be updated in real time.

February 01, 2015 02:36 AM

January 29, 2015

Paul E. Mc Kenney: Verification Challenge 2: RCU NO_HZ_FULL_SYSIDLE

I suppose that I might as well get the “But what about Verification Challenge 1?” question out of the way to start with. You will find Verification Challenge 1 here.

Now, Verification Challenge 1 was about locating a known bug. Verification Challenge 2 takes a different approach: The goal is instead to find a bug if there is one, or to prove that there is no bug. This challenge involves the Linux kernel's NO_HZ_FULL_SYSIDLE functionality, which is supposed to determine whether or not all non-housekeeping CPUs are idle. The normal Linux-kernel review process located an unexpected bug (which was allegedly fixed), so it seemed worthwhile to apply some formal verification. Unfortunately, all of the tools that I tried failed. Not simply failed to verify, but failed to run correctly at all—though I have heard a rumor that one of the tools was fixed, and thus advanced to the “failed to verify” state, where “failed to verify” apparently meant that the tool consumed all available CPU and memory without deigning to express an opinion as to the correctness of the code.

So I fell back to 20-year-old habits and converted my C code to a Promela model and used spin to do a full-state-space verification. After some back and forth, this model did claim verification, and correctly refused to verify bug-injected perturbations of the model. Mathieu Desnoyers created a separate Promela model that made more deft use of temporal logic, and this model also claimed verification and refused to verify bug-injected perturbations. So maybe I can trust them. Or maybe not.

Unfortunately, regardless of whether or not I can trust these models, they are not appropriate for regression testing. How large a change to the C code requires a corresponding change to the Promela models? And how can I be sure that this corresponding change was actually the correct change? And so on: These kinds of questions just keep coming.

It would therefore be nice to be able to validate straight from the C code. So if you have a favorite verification tool, why not see what it can make of NO_HZ_FULL_SYSIDLE? The relevant fragments of the C code, along with both Promela models, can be found here. See the README file for a description of the files, and you know where to find me for any questions that you might have.

If you do give it a try, please let me know how it goes!

January 29, 2015 05:06 AM

January 28, 2015

Daniel Vetter: Update for Atomic Display Updates

Another kernel release is imminent and a lot of things happened since my last big blog post about atomic modeset. Read on for what new shiny things 3.20 will bring this area.

Atomic IOCTL and Properties


The big thing for sure is that the actual atomic IOCTL from Ville has finally landed for 3.20. That together with all the work from Rob Clark to add all the new atomic properties to make it functional (there's no IOCTL structs for any standard values, everything is a property now) means userspace can finally start using atomic. Well, it's still hidden behind the experimental module option drm.atomic=1 but otherwise it's all there. There's a few big differences compared to earlier iterations:
Another missing piece that's now resolved is the DPMS support. On a high level this greatly reduces the complexity of the legacy DPMS settings into a  simple per-CRTC boolean. Contemporary userspace really wants nothing more and anything but a simple on/off is the only thing that current hardware can support. Furthermore all the bookkeeping is done in the helpers, which call down into drivers to enable or disable entire display pipelines as needed. Which means that drivers can rip out tons of code for the legacy DPMS support by just wiring up drm_atomic_helper_connector_dpms.

New Driver Conversions and Backend Hooks


Another big thing for 3.20 is that driver support has moved forward greatly: Tegra has now most of the basic bits ready, MSM is already converted. Both still lack conversion and testing for DPMS since that landed very late though. There's a lot of prep work for exynos, but unfortunately nothing yet towards atomic support. And i915 is in the process of being converted to the new structures and will have very preliminary atomic plane updates support in 3.20, hidden behind a module option for now.

And all that work resulted in piles of little fixes. Like making sure that the legacy cursor IOCTLs don't suddenly stall a lot more, breaking old userspace. But we also added a bunch more hooks for driver backends to simplifiy driver code:
Finally driver conversions showed that vblank handling requirements imposed by the atomic helpers are a lot stricter than what userspace tends to cope with. i915 has recently reworked all its vblank handling and improved the core helpers with the addition of drm_crtc_vblank_off() and drm_crtc_vblank_on(). If these are used in the CRTC disable and enable hooks the vblank code will automatically reject vblank waits when the pipe is off. Which is the behaviour both the atomic helpers and the transitional helpers expect.

One complication is that on load the driver must ensure manually that the vblank state matches up with the hardware and atomic software state with a manual call to these functions. In the simple case where drivers reset everything to off (which is what the reset implementations provided by the atomic helpers presume) this just means calling drm_crtc_vblank_off() somewhen in the driver load code. For drivers that read out the actual hardware state they need to call either _off() or _on() matching on the actual display pipe status.

Future Work


Of course there's still a few things left to do before atomic display updates can be rolled out to the masses. And a few things that would be rather nice to have, too:
It's promising to keep interesting!

January 28, 2015 05:18 PM

January 27, 2015

James Bottomley: Secure Boot on the Intel Galileo

The first thing to do is set up a build environment.  The Board support package that Intel supplies comes with a vast set of instructions and a three stage build process that uses the standard edk2 build to create firmware volumes, rips them apart again then re-lays them out using spi-flashtools to include the Arduino payload (grub, the linux kernel, initrd and grub configuration file), adds signed headers before creating a firmware flash file with no platform data and then as a final stage adds the platform data.  This is amazingly painful once you’ve done it a few times, so I wrote my own build script with just the essentials for building a debug firmware for my board (it also turns out there’s a lot of irrelevant stuff in quarkbuild.sh which I dumped).  I’m also a git person, not really an svn one, so I redid the Quark_EDKII tree as a git tree with full edk2 and FatPkg as git submodules and a single build script (build.sh) which pulls in all the necessary components and delivers a flashable firmware file.  I’ve linked the submodules to the standard tianocore github ones.  However, please note that the edk2 git pack is pretty huge and it’s in the nature of submodules to be volatile (you end up blowing them away and re-pulling the entire repo with git submodule update a lot, especially because the build script ends up patching the submodules) so you’ll want to clone your own edk2 tree and then add the submodule as a reference.  To do this, execute

git submodule init
git submodule update --reference <wherever you put your local edk2 tree> .module/edk2

before running build.sh; it will save a lot of re-cloning of the edk2 tree. You can also do this for FatPkg, but it’s tiny by comparison and not updated by the build scripts.

A note on the tree format: the Intel supplied Quark_EDKII is a bit of a mess: one of the requirements for the edk2 build system is that some files have to have dos line endings and some have to have unix.  Someone edited the Quark_EDKII less than carefully and the result is a lot of files where emacs shows splatterings of ^M, which annoys me incredibly so I’ve sanitised the files to be all dos or all unix line endings before importing.

A final note on the payload:  for the purposes of UEFI testing, it doesn’t matter and it’s another set of code to download if you want the Arduino payload, so the layout in my build just adds the EFI Shell as the payload for easy building.  The Arduino sections are commented out in the layout file, so you can build your own Arduino payload if you want (as long as you have the necessary binaries).  Also, I’m building for Galileo Kips Bay D … if you have a gen 2, you need to fix the platform-data.ini file.

An Aside about vga over serial consoles

If, like me, you’re interested in the secure boot aspect, you’re going to be spending a lot of time interacting with the VGA console over the serial line and you want it to look the part.  The default VGA PC console is very much stuck in an 80s timewarp.  As a result, the world has moved on and it hasn’t leading to console menus that look like this

vgaThis image is directly from the Intel build docs and actually, this is Windows, which is also behind the times1; if you fire this up from an xterm, the menu looks even worse.  To get VGA looking nice from an xterm, first you need to use a vga font (these have the line drawing characters in the upper 128 bytes), then you need to turn off UTF-8 (otherwise some of the upper 128 character get seen as UTF-8 encodings), turn off the C1 control characters and set a keyboard mapping that sends what UEFI is expecting as F7.  This is what I end up with

xterm -fn vga -geometry 80x25 +u8 -kt sco -k8 -xrm "*backarrowKeyIsErase: false"

And don’t expect the -fn vga to work out of the box on your distro either … vga fonts went out with the ark, so you’ll probably have to install it manually.  With all of this done, the result is

vga-xtermAnd all the function keys work.

Back to our regularly scheduled programming: Secure Boot

The Quark build environment comes with a SECURE_LD mode, which, at first sight, does look like secure boot.  However, what it builds is a cryptographic verification system for the firmware payloads (and disables the shell for good measure).  There is also an undocumented SECURE_BOOT build define instead; unfortunately this doesn’t even build (it craps out trying to add a firmware menu: the Quark is embedded and embedded don’t have firmware menus). Once this is fixed, the image will build but it won’t boot properly.  The first thing to discover is that ASSERT() fails silently on this platform.  Why? Well if you turn on noisy assert, which prints the file and line of the failure, you’ll find that the addition of all these strings make the firmware volumes too big to fit into their assigned spaces … Bummer.  However printing nothing is incredibly useless for debugging, so just add a simple print to let you know when ASSERT() failure is hit.  It turns out there are a bunch of wrong asserts, including one put in deliberately by intel to prevent secure boot from working because they haven’t tested it.  Once you get rid of these,  it boots and mostly works.

Mostly because there’s one problem: authenticating by enrolled binary hashes mostly fails.  Why is this?  Well, it turns out to be a weakness in the Authenticode spec which UEFI secure boot relies on.  The spec is unclear on padding and alignment requirements (among a lot of other things; after all, why write a clear spec when there’s only going to be one implementation … ?).  In the old days, when we used crcs, padding didn’t really matter because additional zeros didn’t affect the checksum but in these days of cryptographic hashes, it does matter.  The problem is that the signature block appended to the EFI binary has an eight byte alignment.  If you add zeros at the end to achieve this, those zeros become part of the hash.  That means you have to hash IA32 binaries as if those padded zeros existed otherwise the hash is different for signed and unsigned binaries. X86-64 binaries seem to be mostly aligned, so this isn’t noticeable for them.  This problem is currently inherent in edk2, so it needs patching manually in the edk2 tree to get this working.

With this commit, the Quark build system finally builds a working secure boot image.  All you need to do is download the ia32 efitools (version 1.5.2 or newer — turns out I also didn’t compute hashes correctly) to an sd card and you’re ready to play.

January 27, 2015 03:20 AM

January 26, 2015

Dave Jones: So much to learn..

At lunch a few days ago, I was discussing with a coworker how right now I’m feeling a little overwhelmed. Not completely unsurprising I suppose given the volume of new things I need to learn. But as I’m getting acclimated in my new job, it’s becoming clearer which things I either don’t know well enough or am completely unfamiliar with.

I’m only now realizing that not only the scope of what I’m working on changed, but how I work on things. During the ten years I worked on Fedora, because we were woefully understaffed and buried alive in bugs, there was never really time to spend a lot of time on an individual bug, unless a lot of users were hitting it. Because of this, the things that I ended up fixing were usually fairly small fixes. The occasional NULL pointer deference. Perhaps a use after free. Maybe linked-list corruption. Pretty basic stuff that anyone with familiarity with any part of the kernel could figure out reasonably quickly. Do enough of this, and it becomes less about engineering, and more about pattern recognition.

Even a lot of the bugs that trinity finds aren’t terribly invasive fixes.
(Screwed up error paths being probably the most common thing it picks up, because nothing else really ever tests them).

The more complicated bugs, like a WARN_ON being hit ? That could take a lot more understanding that could take a considerable time to get to.

As Fedora kernels were so close to mainline a lot of the time we could chase up the maintainer, pass the bug along, and move onto something else. The result of ten years of this way of working has meant I have a passing understanding how lots of parts of the kernel interact from a 10,000ft perspective, and perhaps some understanding of some nuances based on past interactions, but no deep architectural understanding of for eg: how an sk_buff traverses through the various parts of the network stack. A warning deep in the tcp guts, caused by a packet that’s been through bonding, netfilter, etc ? I’m not your guy. At least today.

Thankfully I’ve got some time to ramp up my learning on unfamiliar parts of the kernel, and get a better understanding of how things work on a deeper level. I suspect I’ll end up turning at least some of the stuff I learn into future posts here.

In the meantime, I’ve got a lot of reading to do.
I felt a little reassured at least when my coworker responded “yeah, me too”.

So much to learn.. is a post from: codemonkey.org.uk

January 26, 2015 03:09 PM

Pavel Machek: On limping mice and cool-looking ideas

I got two Logitech mice, and they worked rather nice. But then buttons started acting funny on first, then second, and I switched to Genius one (probably with lower resolution sensor). I quickly found that I have two mice with working sensor, and one mouse with working buttons. Not good.
Now, I got new Logitech M90.. this one has working sensor, two working buttons, and three legs of the same length. Ouch. It wobbles on table :-(.
It also shows why mouse-box is cool-looking, but not quite cool idea. Mouse is something that gets damaged over time, and needs to be replaced... which is something you don't want to do with your computer. (Plus, your mouse will not be too comfortable to use when it has three cables that were not designed for bending, connected.)

January 26, 2015 11:19 AM