Kernel Planet
July 30, 2010
My router is a pretty underpowered machine. It has 512MB of RAM, and its ‘disk’ is a 2GB flash card on a CF to ATA adaptor (read as: really slow). But given its job is just routing packets 99% of the time, neither of these deficiencies are an issue.
Asides from one problem. Every time I did a yum update that pulled in an selinux policy update, it would consistently exhaust all the ram in the machine. I filed a bug on this, and as usual, Dan Walsh dropped some selinux knowledge that I had no idea about.
You can customize the bzip block size and “small” flag via
/etc/selinux/semanage.conf. After applying you can add entries like these to
your /etc/selinux/semanage.conf to trade off memory vs disk space (block size)
and to trade off memory vs runtime (small):
bzip-blocksize=4
bzip-small=true
You can also disable bzip compression altogether for your module store
via:
bzip-blocksize=0
Since I put that first tweak in place, it’s survived several policy updates without a hiccup.
SELinux on low memory systems. is a post from: codemonkey.org.uk
No related posts.
July 30, 2010 01:15 AM
July 29, 2010
The Augen Android tablet being sold in Kmart stores at the moment is (shockingly) running a 2.6.29 kernel and Android 2.1 on top of that. It's also (shockingly) currently impossible to get hold of the source code for the kernel - Augen (whose corporate address is a small unit in Florida) say that the software comes installed on the units by the OEM and they don't have any access to the source either. This isn't an excuse, of course, and they say that they hope to have it on their website within the next few days - but even so, it seems that the Android device GPL violation trend is still on course. It'll be interesting to see what the long-term outcome of this kind of violation is, especially with these devices increasingly being sold by mainstream stores.
July 29, 2010 09:51 PM
I saw this post by, Satish Jha, the well fed guy in a suit shown here, says "I asked several mothers who earn less than $20 a month and they all said that they all live and have learnt to live with hunger."
Yes, that's right, you and your child don't need adequate nutrition. Don't be silly. There's no link between undernourishment and brain damage. What you need is a laptop. Monocle smile.
July 29, 2010 06:53 PM

First month graph of solar power
The solar array was turned on for real 30 days ago; in that time, it’s produced 389 kWh of energy, which covered 96% of our usage for the month. I’m pretty pleased with this! As compared to household use, it was:
- 389 kWh gross production, of which
- 230 kWh net was pushed out to the grid (meaning 389-230=159 kWh were used directly), and
- 245 kWh net was drawn from the grid
So, we used a mere 15kWh more than we made. I blame it on my niece’s baking in the electric oven
(and the need to run the dehumidifier a few days; it was very rainy). 404kWh for the month was actually a fair bit higher than the last several months; the other big hitter was running the fans a lot due to the heat.
In terms of daily output, we saw:
- Peak power output of 2185W (I think this is due to a 199W limit on each microinverter)
- Maximum daily energy – 18kWh (on the first day of operation!)
- Minimum daily energy – 5kWh
- Average daily energy – 13kWh
What I really miss now, though, is the whole-house energy monitoring that I had; we climbed in usage last month, and I can point to some causes, but I’m flying blind now. I’ll have to break down and buy a Ted 5000 if I don’t manage to put together my own monitor with CTs soon.
July 29, 2010 01:31 AM
July 28, 2010
Things I hate today include:
- Symbian on my Nokia N97 — for
spontaneously rebooting as soon as I got off the ferry.
- Google Maps — for not caching the map tiles
I'd carefully downloaded while I was on the free ferry
wireless, showing my route to the hotel.
- Mobile phone networks — for the insane
amount of money it will have cost me to re-download the same
map tiles again, as I was driving.
It's almost as if it's a conspiracy — especially
between the latter two.
I really need to get myself an N900 and start using
maemo-mapper again. Every time I try to use non-free
software, it hurts.
July 28, 2010 08:08 AM
July 27, 2010
libeblob is a low-level IO library which stores data in huge blob files appending records one after another.
I implemented all missing functionality, added comments and README, and rolled out the first version: 0.0.1
Here is a short changelog:
- defragmentation tool: entries to be deleted are only marked as removed,
eblob_check will iterate over specified blob files and actually remove those blocks
- off-line blob consistency checker:
eblob_check can verify checksums for all records which have them
- run-time sync support - dedicated thread runs fsync on all files on timed base
- added documentation and comments
libeblob can be downloaded from git tree ($ git clone http://www.ioremap.net/git/eblob.git/) or archive.
July 27, 2010 06:51 PM
July 26, 2010
Elliptics network is a very modular key/value (distributed hash table) storage. Among others it allows to build pluggable low-level IO backends - those entities which store data.
Currently elliptics network supports following IO backends:
- file IO backend, where each transaction is stored as a separate file
- TokyoCabinet database backend - each transaction is stored as a record in appropriate table. I dropped BerkelyDB support because of its low performance, even though TC does not provide ACID contrary to BDB
And now I added append-only blob storage - libeblob.
Following features are already supported:
- fast append-only updates which do not require disk seeks
- compact index to populate lookup information from disk
- multi-threaded index reading during starup
- O(1) data location lookup time
- ability to lock in-memory lookup index (hash table) to eliminate memory swap
- readahead games with data and index blobs for maximum performance
- multiple blob files support (tested with blob-file-as-block-device too)
- optional sha256 on-disk checksumming
- 2-stage write: prepare (which reserves the space) and commit (which calculates checksum and update in-memory and on-disk indexes). One can (re)write data using
pwrite() in between without locks
- usuall 1-stage write interface
- flexible configuration of hash table size, flags, alignment
TODO list includes:
- defragmentation tool: entries to be deleted are only marked as removed, we need to have a tool (or embed it into library) to actually remove those blocks from blob files
- proper off-line blob consistency checker: we put a checksum into data blob, someone may want to check if read data matches
- run-time sync support - we should have a dedicate thread to call syncs on timed base
Elliptics network uses it as one of its low-level IO backends. Numbers I posted (1, 2, 3) also highlight its advantages.
But during elliptics network integration with libeblob I found how unoptimal transaction history log was implemented in the storage (and maybe found an answer why monsters like Cassandra do not support it at all). Maybe its time to rethink and reinvent it though...
Anyway, there is a set of features I will create to complete this implementation as well as new elliptics network release (with C++ and Python bindings and new IO backend).
Stay tuned!
July 26, 2010 11:13 PM
July 25, 2010
Thanks for all the helpful comments. I got msp-gcc working in chroot, and can now update firmware in my watch. 0wn3r3D!
July 25, 2010 05:14 PM
July 24, 2010
My philosophy for Free/Open Source Software comes down to this: that others can take what I do and do something awesome with it. Since I don’t know what you’ll need, I hand you every capability I have to get you started. Others granted me that very power to get where I am, so I seek to increment that.
It’s not always simple to do: sometimes you build it, and nobody comes. My experience is that clear code, convenient install, documentation and useful functionality all help. An encouraging welcome helps even more, as does getting the software out in front of people. It’s all about avoiding or lowering barriers.
We accept some barriers to modification which are reasonable: if you modify the software, I don’t have to support it. If I make you sign a support deal which says I won’t support the software (even unmodified versions) if you modify or redistribute the software, that’s not reasonable. If you get fired for publishing pre-release modified copyleft source code, that’s reasonable. If you get fired for publishing post-release, that’s not.
The hardware and electricity costs are barriers, but they’re undiscriminating and reasonable, so we accept them. Even the GPL explicitly allows you to charge for costs to make a copy. The software cost of the system is also explicitly allowed as a barrier. The software costs of the build environment are also accepted barriers (though historic for most of us): again even the GPL doesn’t require your software to be buildable with freely available tools.
As this shows, your choice of licensing is among your arsenal in keeping barriers low for co-hackers (who are not necessarily contributors!). I believe copyright gives too much power to the copyright holder, but as copyright is so hard to unmake, I favor the GPL. It tries to use legal power to meet the aims in the first paragraph: to force you to hand onwards all the pieces I handed to you. It’s also well understood by people and that common understanding gels a community.
Yet, as I alluded at the top, there are a realm of barriers which licenses don’t even try to address: the code could be an unmodifiable tangle, the documentation awful, the installation awkward, or the trademarks invasive. A license can’t make coders welcome newcomers, be friendly to upstream, responsive to bugs, write useful code or speak english.
The spectrum of barriers goes roughly from “that’s cool” through “I’m not so comfortable with that” to “that’s not Free/Open Source”. It’s entirely up to your definition of reasonableness; only in the simplest cases will that be the same point at which the license is violated, even if that license is explicitly drafted to defend that freeness!
So, don’t start with an analysis of license clauses. Start with “is that OK?”. Is there a fundamental or only a cosmetic problem? If it’s not OK, ask “does it matter?”. Is it effecting many people, is it setting a bad example, is it harming the project’s future, is it causing consternation among existing developers? If it is, then it’s time to look at the license to see if you can do anything. Remember that the license is merely a piece of text. It can’t stop anything, it can only give you leverage to do so. It certainly can’t make using the law easy or convenient, or even worthwhile pursuing.
To close, I will leave the last word to Kim Weatherall, who once told me: “if you’re spending your time on legal issues, you’re doing it wrong”.
July 24, 2010 03:00 AM
July 23, 2010
sgx535 drivers in today's Meego kernel tree: 3 (GMA600, CE4100, N900)
sgx535 drivers submitted upstream: 1 (Tungsten GMA500 driver, submitted March 2009, rejected due to significant chunks of functionality there purely to support closed userspace)
To be fair, the rest of the Moorestown support code seems to be shaping up fairly nicely. But the lack of a coherent story about what graphics support is going to look like isn't hugely reassuring.
Update:
The Meego kernel tree isn't really a fair comparison here - I should be talking about the Intel MID tree. That's only got the GMA600 and CE4100 drivers at the moment, and I'm told that there's consolidation work going on there.
July 23, 2010 07:22 PM
July 21, 2010
I extended C++ and Python bindings for elliptics library, although python part was a little bit messy at first.
Python is massively ... single-threaded language: GIL is a tricky global lock monster, which does not easily allow to implement not only threads but also async communications. Of course python has threads, but they are internal entities which can not be worked with from the outside system threads.
Contrary elliptics network library is a multi-threaded application, and the main problem related to python was its async completion notifications. When transaction is finished or being processed, remote side can send multiple replies about its state (like chunks of data being read for exampl), which are processed in different thread than original sending one.
Python does not expect itself to be interrupted by those callbacks (even if we properly wrap them into python classes). But still we can (or it can be called a hack) invoke async python callbacks from C/C++ code and external threads.
Python may have multiple execution threads, or states, and at startup we have to select the one, which will be used to invoke our C++ callbacks. In older python versions it took quite a bit of efforts: stack selection, saving it somewhere in private data, then switch to/from it and so on. In newer python versions it is just as simple as calling PyEval_InitThreads(). Python thread which called it first will be selected as the one to dispatch exernal callbacks. Then just doing
PyGILState_STATE st = PyGILState_Ensure();
this->get_override("some_virtual_callback_invoked_from_cpp")(its, data);
PyGILState_Release(st);
will schedule C++ callback invocation. It will take care about thread state and GIL.
And when I managed to finally implement all wrappers and helpers for async bidirectional C++-to-Python communication, I dropped its support. Just because it is much simpler to read/write data using blocking calls, which is I believe the most common Python programming model.
That's how this works in python now:
#!/usr/bin/python
from libelliptics_python import *
from array import *
import sys
id = array('B')
for x in xrange(0, 20) :
id.append(x + 1)
trans = array('B')
for x in xrange(0, 20) :
trans.append(1)
try:
log = elliptics_log_file("/dev/stderr", 15)
n = elliptics_node_python(id.buffer_info()[0], log)
t = elliptics_transform_openssl("sha1")
n.add_transform(t)
# weird thing happens if I write n.add_transform(elliptics_transform_openssl("sha1"))
# we crash somewhere inside c++ binding, probably because I implemented lazy
# reference counting model (i.e. not at all :)
# thus object MUST live after this function is completed
# this should be fixed of course with proper copy constructors
# the same applies to logger actually
n.add_remote("devfs8", 1025)
#n.write_file(trans.buffer_info()[0], "/tmp/test_file", 0, 0, 0)
#n.read_file(trans.buffer_info()[0], "/tmp/test_file.read", 0, 0)
data = array('B', "1234567890")
n.write_data(trans.buffer_info()[0], data.buffer_info()[0], 0, data.buffer_info()[1])
read = array('B')
for x in xrange(0, len(data)) : read.append(0)
n.read_data(trans.buffer_info()[0], read.buffer_info()[0], 0, read.buffer_info()[1])
for x in xrange(0, len(data)) :
print data[x], " ", read[x]
except:
print "Ooops, error:", sys.exc_info()[0]
$ ./test.py # written and read data from example above
49 49
50 50
51 51
52 52
53 53
54 54
55 55
56 56
57 57
48 48
Also finished proper object copy for logger, it will clone logger and when proper methods are implemented one can create own private python-made loggers. But that's details.
To date I consider python bindings as well as C++ ones fully finished. C++ has async callbacks as well as blocking sync IO operations.
July 21, 2010 09:46 PM
July 19, 2010
So I used Windows machine briefly... and now it should be possible to upgrade firmware over-the-air. That's good, because that should be doable from Control Center... and that runs from Linux.
But I'm now hitting other problem: where to get gcc-msp430 that works under Linux? I got one from TinyOS project, but that will not work with CPU in Chronos. I tried one from tevp.net, but that depends on binutils-msp430... Ideas?
July 19, 2010 08:45 PM
July 18, 2010
After several bottles of cold beer with this killing heat in Moscow... And brain suddenly starts 'thinking' in the right or better say - alternative direction.
[zbr@baccara lib]$ ./test.py
010203040506: successfully initialized notify hash table (256 entries).
Server is now listening at 0.0.0.0:0.
010203040506: new node has been created at 0.0.0.0:0, id_size: 20.
connected to 127.0.0.1:1025.
123abc000000 reverse lookup -> 127.0.0.1:1025.
123abc000000: node list dump:
id: 123abc000000 [12], addr: 127.0.0.1:1025.
do_transform: transform
calling openssl::transform
transform
000000000000: created trans: 1, cmd: 4, size: 575, offset: 0, local_offset: 0 -> 127.0.0.1:1025.
010101010101: created trans: 2, cmd: 4, size: 64, offset: 0, local_offset: 0 -> 127.0.0.1:1025.
000000000000: transactions sent: 2, error: 0.
010203040506: started resending thread. Timeout: 60 seconds.
000000000000: object write completed: trans: 1, status: 0.
010101010101: object write completed: trans: 2, status: 0.
Successfully wrote file: '/tmp/test_file' into the storage, size: 575.
010101010101: created trans: 3, cmd: 5, size: 0, offset: 0, local_offset: 0 -> 127.0.0.1:1025.
010101010101: read completed: file: '/tmp/test_file.read.history', offset: 0, size: 96, status: 0.
010101010101: read completed: file: '/tmp/test_file.read.history', status: 0, freeing: 1.
/tmp/test_file.read.history: objects: 1, range: 0-18446744073709551615, counting from the most recent.
a13677c93a73: created trans: 4, cmd: 5, size: 575, offset: 0, local_offset: 0 -> 127.0.0.1:1025.
a13677c93a73: reading chunk into file '/tmp/test_file.read', direct: 0, offset: 0/0, size: 575, err: 0.
a13677c93a73: flags: 00000080, offset: 0, size: 575: match: -2, rest: 18446744073709551615
a13677c93a73: read completed: file: '/tmp/test_file.read', offset: 0, size: 575, status: 0.
a13677c93a73: read completed: file: '/tmp/test_file.read', status: 0, freeing: 1.
010203040506: destroying node at 0.0.0.0:0, st: 0x8609f50.
[zbr@baccara lib]$ md5sum /tmp/test_file /tmp/test_file.read
d712a7ccfe66a45bef31892befa250f8 /tmp/test_file
d712a7ccfe66a45bef31892befa250f8 /tmp/test_file.read
[zbr@baccara lib]$
Sort of completed - there are couple of other functions (read/write data via pointer and not file) to test, but overall it is done. Although I can not say I understand how boost::python works, and why it may crash or did not compile if I change A to B.
But what do you want, I started to write c++ bindings 2 days ago and wrote first python program today.
After all: program crashes - its time to make alpha release!
To solve 'unsigned char *' and other non-existing types in Python I use this hack:
class elliptics_node_python : public elliptics_node {
public:
elliptics_node_python(unsigned long lptr, elliptics_log &l) :
elliptics_node((unsigned char *)lptr, &l) {};
void read_file_by_id(unsigned long lid, const char *file, uint64_t offset, uint64_t size) {
elliptics_node::read_file((unsigned char *)lid, const_cast(file), offset, size);
}
I.e. transform 'unsigned long' into pointer in C++ code. I wonder why Python calls it 'int' no matter what :)
[zbr@baccara lib]$ cat test.py
#!/usr/bin/python
from libelliptics_python import *
from array import *
id = array('B')
for x in xrange(0, 20) :
id.append(x + 1)
trans = array('B')
for x in xrange(0, 20) :
trans.append(1)
log = elliptics_log_file("/dev/stderr", 10)
n = elliptics_node_python(id.buffer_info()[0], log)
t = elliptics_transform_openssl("sha1")
n.add_transform(t)
# weird thing happens if I write n.add_transform(elliptics_transform_openssl("sha1"))
# we crash somewhere inside c++ binding, probably because I implemented lazy
# reference counting model (i.e. not at all :)
# thus object MUST live after this function is completed
# this should be fixed of course with proper copy constructors
# the same applies to logger actually
n.add_remote("localhost", 1025)
n.write_file(trans.buffer_info()[0], "/tmp/test_file", 0, 0, 0)
n.read_file(trans.buffer_info()[0], "/tmp/test_file.read", 0, 0)
July 18, 2010 09:49 PM
My study of this language was started today. And while my knowledge base is somewhere between zero and void I already can ask stupid qustions, which were not answered on #python / irc.freenode.net
What python type to use when c++ to python boost::python binding requires 'unsigned char *' parameter?
Code snippet will tell much more:
class test_class : public test_class_base {
public:
test_class(const char *path);
virtual ~test_class();
virtual void log(const char *msg);
unsigned char test(unsigned char *ptr) { return *ptr; };
private:
std::ofstream *stream;
};
....
class_ >("test_class", init())
.def("log", &test_class::log, &test_class_wrap::default_log)
.def("test", &test_class::test)
;
I ommitted wrapper class definition, since it is not critical. It works for python-string to c++ 'const char *' transform, but unsigned char pointer fires up an error. I tried both struct.unpack_from("P", array) and array.buffer_info()[0], but python says that they both return 'int' while c++ code expects 'unsigned char *':
id = array('B')
for x in xrange(0, 20) :
id.append(x)
s = struct.unpack_from("P", id);
print hex(s[0])
t = test_class("/dev/stderr")
print hex(t.test(s[0]))
error:
[zbr@baccara python]$ ./test.py
0x3020100
Traceback (most recent call last):
File "./test.py", line 19, in
print hex(t.test(s[0]))
Boost.Python.ArgumentError: Python argument types in
test_class.test(test_class, int)
did not match C++ signature:
test(test_class {lvalue}, unsigned char*)
Code snippets can be also found at http://paste.pocoo.org/show/239085/.
If no solution will be found, I will refactor elliptics python binding to support new classes for missing types like 'elliptics_id' for 'unsigned char *'.
July 18, 2010 05:18 PM
(sorry, I'm on very slow connection, so I duplicated previous entry by mistake).
So my JF1.43 installation periodically
wanted to update from google, downloading the update file then erasing
it because of signature mismatch. I now updated to JF1.51... which
also downloaded update from google, but at least it asks me what it
should do with it (on each boot)... I guess confirming the update
would result in very non-working device :-(, and just hope I'll not
hit that by accident.
I'm now trying to update CM5 ROM, but the process is more tricky than
I thought.
I'd like to get NavDroyd to work, but that seems to need android1.6; I
actually hate new android versions, because they are slower than
android1.5, but....
Aha, so NavDroyd is no-no -- it is actually $6 application, thus unavailable to me.
Hmm, CM5 actually seems to have tethering build-in. Good. But it does
not seem to use different config than android-wifi-tether...? And the ROM is also very slow on G1. Back to JF1.51.
Is there some android-1.5 based ROM -- that is "reasonably fast rom" -- that does not try to auto-update?
July 18, 2010 12:45 PM
Waze is a nice software... client is even GPLed. Unfortunately, it is of thin-client kind, so all the magic is really done on server.
But... you know... mapping Czech Republic again for
waze after we mapped it for OpenStreetMap is kind of lame. Could you
maybe just use openstreetmap data? Pretty please?
July 18, 2010 12:41 PM
Waze is a nice software... client is even GPLed. Unfortunately, it is of thin-client kind, so all the magic is really done on server.
But... you know... mapping czech republic again for
waze after we mapped it for openstreetmap is kind of lame. Could you
maybe just use openstreetmap data? Pretty please?
July 18, 2010 12:32 PM
I study C++ effectively for last two days. Well, about 15 years ago I knew it to some degree (without STL, although with templates), but I would not count it now.
That's why my first sexual experience with C++, STL and boost::python pushed me into deep depression.
Any single error and you will get 7-10 pages of compiler errors, which are completely non-understandible for newbie like me. Googling them for 30 minutes I found, for example, that stl::iostream and its children are non-copyable streams. No need to tell that this was not obvious from things like (5 pages of):
class test_class {
public:
test_class(const char *path) : stream(path) {};
std::ofstream stream;
};
In file included from /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/bits/localefwd.h:43,
from /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/string:45,
from /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/stdexcept:39,
from /usr/include/boost/function/function_base.hpp:14,
from /usr/include/boost/function/detail/prologue.hpp:17,
from /usr/include/boost/function/function_template.hpp:13,
from /usr/include/boost/function/detail/maybe_include.hpp:13,
from /usr/include/boost/function/function0.hpp:11,
from /usr/include/boost/python/errors.hpp:13,
from /usr/include/boost/python/handle.hpp:11,
from /usr/include/boost/python/args_fwd.hpp:10,
from /usr/include/boost/python/args.hpp:10,
from /usr/include/boost/python.hpp:11,
from eee_python.cpp:16:
/usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/bits/ios_base.h: In copy constructor
‘std::basic_ios<char, std::char_traits<char> >::basic_ios(const std::basic_ios<char, std::char_traits<char> >&)’:
/usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/iosfwd:47: instantiated from
‘boost::python::objects::value_holder::value_holder(PyObject*, A0)
[with A0 = boost::reference_wrapper<const test_class>, Value = test_class]’
I stopped to count such shit error-reporting pages when I tried to export virtual functions via multi-level inheritance through boost::python. ALthough it was quite simple in tutorial...
And yet after 5 A.M. I fucking won:
[zbr@baccara lib]$ python
Python 2.6.2 (r262:71600, Jun 4 2010, 18:28:04)
[GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from libelliptics_python import elliptics_log_file
>>> w = elliptics_log_file("/dev/stderr", 31)
>>> w.log(31, "qweqweqe\n")
qweqweqe
Tomorrow I will complete Python bindings and happily hopefully forget C++ for another 15 years.
July 18, 2010 01:12 AM
July 17, 2010
In recent days, the story about Motorola locking out its users (and developers)
from their more recent Droid phones has made big news. As it seems, the exact
functionality implemented by eFuses remains unclear, and the behavior of
Motorola might thus not be too different from what has more or less become
the industry standard.
For those of you who are not following the mobile world as close on a technical
level as people like me do: In the last five years, more and more cellphone
manufacturers have used cryptographic code signing to lock-down the software
that you can run on the phone. Major parts of the system including the software
update mechanism and the bootloader on the device contain a verification process
of those cryptographic signatures to ensure that you can only software signed
by the phone manufacturer.
I have seen this with the MotoMAGX phones like the ROKR2 v8, various Windows
Mobile handhelds from HTC, The non-developer (non-ADP) version of the
Google/Android G1 and many other phones.
This puts the user into a strange situation where he buys some hardware from
the manufacturer, but yet doesn't have control over what this device does.
Just imagine buying a computer, but being limited to run Windows 98 and Office
97 on it. You could not update to a later version of the operating system, and
you could not install an alternative operating system such as a version of
GNU/Linux. If the computer vendor decides that he will drop support for it,
you will not even be able to install security updates to the operating system.
From my point of view, this is an abusive, anti-competitive behavior by the
manufacturer. For no reason but his ever-growing hunger for power he makes
you completely dependent on his decision. It is not in the control of the user,
what operating system or even applications you can install. It is under the
control of the manufacturer.
I would accept this if the phone was rented. In this case, I would
only pay a small rental fee, but the phone is the property of the manufacturer
and I am only using it. But the manufacturer actually sells the device.
He wants to be paid the full price, but still not actually hand control over
to the buyer.
Compare this with buying a CD-player that has arbitrary restrictions so it
would only play CDs from one of the major music labels/distributors like EMI,
but not CDs from any of the other publishers, for no technical reason whatsoever.
Or buying a TV set that is locked down so you can only watch one TV channel,
while you need to buy another TV for a different channel.
I actually think the antitrust authorities should investigate this behavior
of the mobile phone industry. Simply compare it with the PC situation and look
at the fact how often Microsoft has been judged in some kind of
anti-competitive behavior in the PC world. In the mobile phone industry,
the situation is worse than it ever was in the PC world, yet we do not see
big antitrust cases being brought forward.
And please don't buy those pseudo-arguments that this has any relation to
regulatory/FCC approval or the safety of mobile networks themselves. The
entire software stack interacting with the mobile network runs on a separate
processor (the baseband processor) anyway. It doesn't matter what you install
on the application processor. Once again, compare it to laptops: You can
insert a 3G miniPCI, expressCard or USB dongle. Inside this dongle you run
the communications stack on a processor that is completely different from your
main processor that runs your regular OS (be it GNU/Linux, OS X, Windows,
Solaris or whatever makes you happy).
July 17, 2010 02:00 AM
July 16, 2010
Our generic system contains just 4 sata disks combined into software RAID10. Putting BLOB IO backend to ext4 fs and installing two nodes, we got quite surprising skyrocketed results:

2 sata storages, each contains 4-disks software RAID10
There are 10 millions of records, total of 87 Gb. Each node contains about 44 Gb of data, and it has 24 Gb of RAM. Although we flush caches prior each test run, readahead games quickly suck blob files back into ram, which I believe explains such results: 700 rps (of completely random IO) within 200 ms, 1000 rps within 300 ms.
I wonder why 4-disks SATA setup is close to 16-disks SAS storage. Looks like raid10 requires a serious tuning for larger storages, otherwise I can not explain such major hardware difference and quite similar performance numbers.
July 16, 2010 09:52 PM
It took a little bit more time than we expected: it tends that people suddenly get some other tasks and of course with higher priority.
So, I decided to write thinkgs up myself. To implement python bindings I will use boost::python, but first I have to wrap common elliptics network operations into proper classes.
That's what I ended up with today:
int main()
{
unsigned char id[DNET_ID_SIZE];
elliptics_transform_openssl t("sha1");
elliptics_log_file log("/dev/stderr", 15);
memset(id, 1, DNET_ID_SIZE);
elliptics_node n(id, &log);
n.add_remote("devfs8", 1025, AF_INET);
n.add_transform(t);
#if 1
elliptics_callback_io callback(&log);
memset(id, 0xff, DNET_ID_SIZE);
n.read_data(id, 0, 0, callback);
#endif
n.write_file(id, const_cast<char *>("/tmp/some_file.txt.bak"), 0, 0, 0);
n.read_file(id, const_cast<char *>("/tmp/some_file.txt"), 0, 0);
n.read_file(reinterpret_cast<void *>(const_cast<char *>("1.xml")), 5,
const_cast<char *>("/tmp/1.xml.cpp"), 0, 0);
/* cool, yeah? we have to wait for read_data() to complete actually */
sleep(10);
}
Most of them can throw numeric exceptions. There are less than a dozen of classes, which I will put into proper boost::python wrappers.
And in a meantime it is of course possible to use them in c++ code. Current version can be found in git.
July 16, 2010 05:40 PM
I've just been working on Evolution's reply code, and have
added a couple more of those annoying "nag pop-ups",
including this one which I expect a lot of people will
appreciate when they don't get the resulting mail:
It's currently set to trigger if you hit 'Reply to All' on a
message with more than 15 recipients; unless it's a mailing
list message. And of course you can see that it's trivial to
turn it off if you never want to see it again.
I've also taken a moment to write down and post
some thoughts on the 'Reply to All' vs. 'Reply to List'
debate for mailing list messages.
July 16, 2010 02:29 PM
There are plenty of reports in recent days about the level of locking-down
that Motorola is apparently doing on their most recent Android products,
the Droid 2 and the Droid X.
This goes as far as to an (I believe unconfirmed) slashdot.org
report claiming that not only there is the more or less typical DRM on
software (i.e. cryptographic signature validation chain), but there also is an eFuse
that that is blown if something happens wrong during the booting process.
To the best of my knowledge (and I'm doing mobile phone reverse engineering for
about 6 years now), this is the first time I hear of something like this. If true,
it sounds pretty dangerous to me. What if something goes wrong during an update
(such as a power failure during software update)? What if you really have a
non-correctable multi-bit error in your NAND Flash? In that case,
cryptographic verification of the firmware fails and the eFuse would be blown,
resulting in your device being a brick. This could eventually backfire massively
to Motorola.
The best comment from the slashdot.org thread:
You can legally buy a gun that only shoots in the direction of the person pulling the trigger, but it doesn't mean it's a good idea.
Reading something like this almost makes me very depressed. Motorola is
benefitting from the billions-of-dollar-worth development of existing Free
Software projects like the Linux kernel, but they now want to take away the
fundamental right to run modified versions of that very software. Somebody
needs to slap them with a very large trout.
I'm not really surprised that they are doing it, though. Motorola has shown
that direction even years ago when they first used SELinux as part of their
later pre-Android Linux phones (EZX and MAGX). They didn't use it to enhance
the security of the user, but to enhance the security _from_ the user.
Please also note this great
post by Bradley M. Kuhn on the subject matter. If you don't know Bradley,
he's been doing GPL enforcement for the last 12 years - for the Free Software
Foundation and the Software Freedom Law Center. In his post, he actually
thanks Motorola to publicly state that they actually want to lock their phones
down (as opposed to Apple).
What's even more interesting though is his elaboration on the scripts to
control compilation and installation clause of GPLv2. This is indeed
something that most people tend to overlook when it comes to GPL[v2] compliance
and we see this a lot during our gpl-violations.org work.
And in fact, for a very long time, I have been teaching and educating this fact
during my GPL related talks and trainings: In software specific for embedded
devices, the scripts to control installation are incomplete, if you do not provide
a means to install the software onto the actual device. Where else would you
be reasonably install the Linux kernel image that is made specifically to work
on such a particular mobile phone model? Due to the custom nature of Linux
kernels for embedded targets, it wouldn't even run anywhere else.
I've never taken any such issue to court so far - but it was a frequent dispute
in out-of-court GPL enforcement we've been doing at gpl-violations.org.
I'm definitely curious to see what will be the first court case addressing that
issue. The ever power-hungry manufacturers of mobile phones seem like they
deserve it.
UPDATE:
Apparently Motorola has released some statement that denies they use eFuses to
brick the device. All it does is to render the device unable to boot until
some Motorola-certified/signed/authorized software is loaded on the device
again. They did not specify how that could be done, though. Still, even without
the eFuse bricking, I find it outrageous that the Industry (including Motorola)
expect their customers pay hundreds of dollars for a device that is then
still owned by Motorola rather than that very customer. It's like selling
something but still retaining ownership of it. Doesn't that make you feel
strange, too?
July 16, 2010 02:00 AM
July 15, 2010
DHT is virtually the same hash table spread over multiple nodes. Thus it shares its advantages, like O(1) access when properly configured, and its weak parts, like full absence of scalability.
Yes, hash table does not scale. Because to change its size one has to perform lots of tricks which are generally end up in a full table content rehash. With millions of entries on disks this may take a while...
Thus we can not easily extend hash table storage - each node addition will require table rebuild. Depending on DHT configuration and routing protocol used, it can be full or partial content copying.
Contrary to hash table, classical distributed storage with dedicated master server is able to scale without need to copy data each time new server is added. Master server will return this new node's address for each new write, then next node and so on.
This looks shine and well in theory. Let's face the practice.
Data tends to wear down with time - we do not lok at old photos and do not read old mails as frequently as access recently written information. Thus servers which host old data will not be loaded compared to the new nodes in the described master server example.
To fix this issue master server has to start data copying - old servers move their old unaccesible data to some new nodes, and new writes can go to old nodes too. This task is full of non-trivial heuristics about what data to move and what server to use. With time it ends up with data copy for each new write (or usually sufficiently large chunk write).
In distributed hash table this copy is not needed, since by design writes are balanced across the whole storage (when cryptographically strong hash function and storage size are configured of course).
In DHT there is no problem of data wearing, since each write goes to (kind of) random node, thus all nodes will contain roughly the same amount of old and new data.
Drawing the line, in DHT we have to copy data when new node is added to take some load from the neighbours, while in master-server scenario we have to copy each time new data is added (this can be limited to large chunks of course). In theory master-server scenario will copy data to the distribution DHT provides out of the box, and still will have to copy again when new nodes added.
In some cases this is not an issue - we may want not to start data redistribution in master-server storage because of some reasons, or may not foresee that this demand will appear though, while need for data copy in DHT when new node is added is a must - otherwise data will not be accessible because of changed hash distribution over nodes.
This two cases frequently (if not all the time :) becomes the most significant corner cases when DHT is not selected to be used for distributed storage.
July 15, 2010 12:58 PM
July 14, 2010
Previous elliptics test showed how good (or bad) is append-only BLOB backend, when HTTP proxy issued two reads to handle single client's request.
Namely it fetched transaction history log to find out which stored transaction has the same version as client asks.
Now let's see how well we behave when single client request results in single data read from the blob.

700 rps witin 100 ms, 900 rps within 300 ms
Surprisingly absolute numbers did not change - we still fit 1000 rps within 300 ms, which is rather unacceptible for single client. But at the beginning we are about 2 times faster than described 2-reads case: we handle 700 rps within 100 ms range.
Testing etup is the same as in previus test: 2 SAS storages attached, each has 16 disks in it. Ext4 over software RAID10. 2.6.34 kernel. Random requests. 10 millions of records (about 87 Gb total, 44 on each SAS storage).
We also ran the same test but moved storage blob to single SAS storage. Also moved it to block device directly instead of using usual file on ext4 filesystem. Results were 2 times degraded as expected: like about 600 rps within 200 ms.
There was no difference whether block device or ext4 was used as low-level storage.
In a meantime blob IO backend got loading index support to speedup its startup. The only missing feature is index truncation aka ability to compact and remove deleted entries. When this is done, I will start POHMELFS - POSIX frontend to elliptics network.
Its initial implementation will not be performance centered as well as feature-rich, instead I will create a rather simple client, which will allow trivial deployment procedure.
And I have to start writing elliptics network paper for Linux Kongress. This may take a while though...
Stay tuned!
July 14, 2010 09:36 PM
RCU callbacks are registered via call_rcu(). After an RCU grace period elapses, the callback (which is a C-language function) is invoked. RCU's fundamental guarantee states that once an RCU grace period has elapsed, all RCU read-side critical sections that were executing when the grace period began will have completed. (An RCU read-side critical section is a fragment of code enclosed by rcu_read_lock() and rcu_read_unlock().)
Now, an RCU callback, being a C-language function, has a definite beginning and end. But what about synchronize_rcu(), which blocks until an RCU read-side critical section has elapsed? How does RCU know how long to hold off new RCU read-side critical sections once synchronize_rcu() returns?
July 14, 2010 05:34 AM
July 13, 2010
Yay
Brazil!. They're making it illegal to use DRM to
prevent "fair dealing" with copyrighted works, or
access to works which are in the public domain. It's also
legal to "crack" DRM if you're only doing it for the purpose
of "fair dealing".
So, for example, it would be legal for me to crack the DRM
on the eBooks I buy, which is necessary just so that I can
read them. Currently I have to break the law just
to be able to buy and use eBooks.
UK citizens, go here
and add your vote; it's very simple to register if you
haven't already done so.
July 13, 2010 09:29 AM
July 12, 2010
I'm running a power management track at the Linux Plumbers Conference again this November. Unlike most conferences which focus on presenting completed work, Plumbers is an opportunity to focus on unsolved problems and throw around as many half-baked solutions as you want in order to try to find one that seems to stick. The suspend/resume problem in Linux is mostly solved[1], which means that it's time for us to focus on runtime power management and quality of service.
This has been an especially interesting year in the field. We've landed the infrastructure for generic runtime power management, glued that into PCI and started implementing that at the driver level. pm_qos is being reworked to improve performance and scalability as we start seeing more drivers that need to express their own constraints. And, of course, we had the wakelock/suspend blockers conversation that didn't end in a terribly satisfactory manner, although Rafael is now working on an implementation that presents equivalent functionality with a different userspace API. Runtime full-system suspend isn't solved yet either - the current cpuidle-based solution doesn't work well on multicore systems. And maybe we could be more aggressive still by looking at reclocking more system components on the fly even if the existing interfaces don't allow that. Do we have all the hooks we need to identify which system resources are being used? Are we doing the best we can in terms of avoiding trading off performance for power savings?
So if you'd like to talk about any of these things, or if there's any other problems that you don't think have been solved yet, head on over to the call for submissions and help make sure that we can make Linux the most power-efficient OS possible.
[1] Yes, some machines are broken, but those tend to be individual weird bugs which we're gradually tracking down rather than fundamental issues in our core code, so they're not really in the scope of Plumbers
July 12, 2010 01:53 PM
Audio: COMING SOON
For the weekend of the 4th of July 2010, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: Linux 2.6.35-rc4, Btrfs, Defconfig kernel configs, GDB, Timekeeping, and the VM.
*). Linux 2.6.35-rc4. Linus Torvalds announced the release of Linux 2.6.35-rc4 on July 4th 2010 at 8:44pm Best Coast Time (PDT). Linus says he’s been back online for a week and is happy at the relatively small number of changes building up, “having been strict for -rc3″, in his absence. He obviously sees the increased rigidity in enforcing the merge window has been a success, and considers that there will likely be an on time 2.6.35 release, “despite my vacation”. Linus says his vacation was very enjoyable and was the longest time away from the kernel in many years – apparently he did take a cellphone for email, but didn’t do any compiles while he was having “a great time under water.”
*). Btrfs. Edward Shishkin posted a rather scathing technical review of btrfs internal design, criticising variable record size allocations, file system utilization, the balancing algorithms used, and even suggesting that engineers leave the algorithm design up to academics, rather than re-inventing things for their programs. Edward performed various benchmarks and published his results in a thread entitled (variously), “Unbound(?) Internal fragmentation in Btrfs”, “Btrfs: broken file system design”, and “Balancing leaves when walking from top to down”. For his part, Chris Mason was very civil in his reply on a number of occasions, saying that he didn’t see a fundamental design problem existing in Btrfs. Edward “NACKed” Btrfs anyway for enterprise use (even though it’s been in tree for a while).
*). Defconfig kernel configs. Linus Torvalds (in a thread renamed to “ARM defconfig files”) essentially conveyed his discomfort with the continued existance of many dozens (or perhaps hundreds) of “defconfig” files in the architecture directories. These are reference files which are based upon copies of “known good” configuration files. They worked well back in the day, but as Linus says, times have changed and nobody is really making these files by hand any more without using Kconfig. So he proposes replacing them – eating the pain – with single config files per machine type that use Kconfig and source in particulars for the various chip and architecture family particulars. Russell King pointed out that this is basically what already happens, but the point of the defconfig files is to also handle stuff outside of the architecture – for example, choosing not to use certain “IDE” options on particular boards or systems – as Daniel Walker also pointed out. Daniel noted that those setting up e.g. a BeagleBoard or a Nexus One don’t really want to troll through thousands of possible kernel options if a good reference set is available to begin with. Daniel also point out a previous posting for a boolean SATisfiability solver in the kernel config. Linus thought that was interesting but ‘At the same time, “SAT solver” does scream “over-engineering failure” to me’. Linus later explained that he was looking to either kill the defconfigs or replace them with some templates and a means to generate them, but otherwise prefered them to live some place outside of the kernel.
*). GDB. David Howells posted a patch implementing GDB remote protocol support for the “p” command on FRV. The “p” command is used to transfer information about a single register, as opposed to the “g” command, that transfers data on several. But when a gdb client connects, it will attempt to use “p” or “g” and will then stick with that choice without varying. For this reason, Linus wondered aloud if using single reads would actually slow down clients connecting (since they usually will request a number of registers at a time). Jason Wessel said he had actualy done some fairly detailed benchmarking and would share his findings at a later point.
*). Timekeeping. Oleg Nesterov posted a thread entitled “Q: sys_futex() && timespec_valid()”, in which he attempted to summarize some concerns that the glibc folks were having with the Linux implementation of timespec timeouts. Ulrich Drepper replied, explaining that his point was that a negative value for tv_sec in the case of an absolute timeout should not return -EINVAL, but instead -ETIMEDOUT. He contends that a negative relative time in the 1960s is not an invalid time. Linus strongly disagreed, saying, “Ulrich – you’re wrong. Go away.” and then clarified, ‘In the end, it’s quite simple: the kernel doesn’t accept invalid timevals. And negative tv_secs are invalid. It’s that simple. If somebody gives the kernel a timeout from before the epoch [January 1st 1970], that somebody is being a total idiot. We know it’s not a valid absolute timeout, since there’s no way somebody is “waiting” for something that happened in the sixties. Yeah, yeah, maybe you’re waiting for flower power and and free sec. Good for you. But if you are, don’t ask the Linux kernel to wait with you. Ok?’ This author wonders what those still waiting for Elvis will do now that this is clarified.
*). VM. Larry Woodman posted a patch entitled “Call cond_resched() at bottom of main loo[sic: s/k/p/] in balance_pgdat()”, which handles a situation on small single CPU systems wherein a task should OOM (Out Of Memory) and call the OOM-killer, but it does not because kswapd is constantly running due to at least one system RAM zone being below the high page watermark. Larry adds a single cond_resched() call that will allow the watchdog, tasks, and OOM killer to run, freeing up the affected resources. Andrew Morton didn’t like this approach – implying he prefered something more specific than a cond_resched and waiting for the OOM killer to get chance to run – but he could live with it if there were a giant FIXME and/or some documentation at least explaining the essential nature of the specific cond_resched() call as opposed to a regular point of voluntary kernel preemption.
In today’s miscellaneous items:
*). Patrick Pannuto proposed a usleep API for the kernel to augment the existing msleep one, and be used as an alternative to udelay so as to allow the CPU to go into lower power C-states. After some dialogue between Patrick and Daniel Walker, in which Walker pointed out that some stats were needed to prove that this was power beneficial for small delays, it seemed that there was a small improvement for 50us delay values.
*). Ronny Tschuter had some issues with tracing power_start events when using the cpuidle framework with a menu governor and an cpi-based driver to handle idle states. There wer no instrumentation points in the processor_idle code, so he posted a patch, but Arjan van de Ven pointed out that the ACPI STATE type is pretty much “useless random garbage” so the posted should set their system to use mwait idle.
*). Dave Jones raised a concern with crypto and device-mapper. A potential regression was introduced somewhere between 2.6.32 and now, and the details are available in Red Hat Bugzilla 610278. Nobody replied to the posting on the list, but the Bugzilla says that one should be using LUKS, and in the case of not using it the default encryption options were changed due to a vulnerability. It is possible to mount the existing device using the instructions provided.
In today’s announcements:
*). Jeff Merkey announced the latest version of his MDB “Merkey’s Kernel Debugger” x86_64 2.6.34 07-01-2010 Release 4. It’s available on googlecode.com. There has been no community discussion thereof. Jeff also posted his Open Cworthy Libraries 07-01-2010.
*). Junio C Hamano announced Git version 1.7.1.1 is now available at: http://www.kernel.org/pub/software/scm/git/ He also announced Git 1.7.2.rc1 is available for review.
*). Karel Zak announced the latest stable release of util-linux-ng 2.18 is now available: http://www.kernel.org/pub/linux/utils/util-linux-ng/
*). Subrata Modak announced that the Linux Test Project for June 2010 has been released. http://ltp.sourceforge.net/
The latest kernel release is 2.6.35-rc4.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.
July 12, 2010 01:21 AM
July 11, 2010
Audio: COMING SOON
For the weekend of June 27th 2010, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: Concurrent coredumps, OpenFirmware, and Power management policy.
*). Concurrent coredumps. Edward Allcutt posted, inquiring about placing a limit on the number of concurrent process coredumps that should be allowed to take place on a system. He cited an example Apache-based webserver in which large numbers of CGI processes were crashing, each with a 150-200MB core file that needed writing to disk. He was using a custom patch that would cease dumping cores after a certain number were already concurrently taking place. Roland McGrath and Andrew Morton did not favor this approach, instead prefering either that core dumps would begin to block (but not consuming resources) after a point, or that the blkio_cgroup IO controller be used to limit the IO being consumed. Hiroyuki Kamezawa suggested that distributions like Fedora – which in that case has its own dumping tool called abrt that manages coredumps – could wire up the blkio cgroup prior to beginning the dump process.
*). OpenFirmware. Andres Salomon posted a patch implementing support for making calls into OpenFirmware on x86 OLPC XO systems. The patch works by preserving the necessary page mappings for the OpenFirmware (OFW), which remains in memory at a virtual address. Just the minimum number of mappings are retained, but this does allow calls into the firmware even after Linux has booted. It’s always been interesting to see the XO using OpenFirmware as one of the only x86-based devices doing so.
*). Power management policy. Len Brown posted an RFC patch implementing a new centralized location for userspace to express its power management vs. performance policy preferences to the kernel. In the patch, such expression occurs through the new /sys/power/policy_preference file, which contains 5 different possible levels – ranging from “max_performance”, through “balanced” (the new default), to the “max_powersave” option on the other extreme. The idea is to centralize setting scheduler, cpuidle, governor, and other options.
In today’s miscellaneous items:
*). Dave Chinner posted a 5 part patch series implementing some fixes for emergency filesystem thawing (via sysrq control).
*). Michael Kerrisk posted some man-pages text for the MADV_MERGEABLE and MADV_UNMERGEABLE flags added in 2.6.32 for use with KSM (Kernel Samepage Mapping – the kernel support for detecting duplicate pages in guest virtual machines and mapping them to a single shared page instance).
*). Paul E. McKenney concluded that it was sufficient to turn off the CONFIG_PROVE_RCU option in Fedora rawhide kernels since it’s mostly a developer tool, rather than change licensing or otherwise make it available to non-GPL modules with which it is not compatible.
*). Luis R. Rodriguez posted a script and some documentation to implement some rudimentary ASPM (a PCI extension that allows devices to go to an entirely electrically idle bus state) support. For further information: http://wireless.kernel.org/en/users/Documentation/ASPM
*). Konrad Rzeszutek Wilk posted a 19 part patch series implementing PCI pass-through for Paravirtualizaed Xen guests, using SWIOTLB support.
*). Mike McCormack wasn’t happy with the 32 (NGROUPS_SMALL) group limit on the number shown in /proc/
/status for a given process ID. He and others discussed various ways those who really want more than 32 groups assigned to a process could get the full data through various API changes.
*). Rusty Russell posted the last (hopefully) of his cpumask patches which he says now also means that everyone should be using the cpumask_functions. At least, everyone in kernel is, according to his tests on 32-bit.
In today’s announcements:
*). Mathieu Desnoyers announced that LTTng 0.218 for kernel 2.6.34 is now available. For further information: http://www.lttng.org/
*). Henrik Rydberg announced version 1.0.1 of the mtdev Multitouch Translation Library is now available (releaseed under the MIT license). mtdev does all of the necessary finger tracking pieces in userspace, and separate from the Xorg driver from which it came, as a means to further adoption. This author is still waiting for his Apple Multitouch keypad to work on a Fedora system without having to patch the kernel with a kludge. mtdev is available at: http://bitmath.org/code/mtdev/
*). Len Brown announced the Boston Linux Power Management Mini-Summit will take place concurrently with the Linux Foundation LinuxCon 2010, on the day immediately prior to the beginning of the main events, August 9th. For further information: http://events.linuxfoundation.org/
The latest kernel release was 2.6.35-rc3.
Finally today, Piotr Hosowicz wondered aloud why Linus’ git repository was not being updated, asking if it’s because he’s on vacation. As mentioned before, Linus was indeed on a (well deserved) vacation.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.
July 11, 2010 09:32 PM
Audio: COMING SOON
For the weekend of June 20th, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: Panic, Performance Events, Slow-work, and Timekeeping.
*). Panic. Shoichi Tamuki posted version 2 of a patch intended to fix keyboard LED blinking on panic. Existing systems will call mdelay to handle the reboot timeout post-panic, during which time the keyboard LEDs well blink. When a hypervisor is being used, those mdelay calls of 1 second or more will be implemented as spins, in order to avoid timeout accuracy slips, but the side effect is that the keyboard LEDs won’t blink properly. The patch will call panic_blink_enter() between every mdelay call, and it also fixes up the longer mdelays so that the blinking still occurs.
*). Performance Events. Nils Carlson, Andi Kleen, Eric W. Biederman, Tony Luck, and others, discussed the “Hardware Error Kernel Mini-Summit” followup in which it had been proposed to introduce a new hardware error subsystem. They pondered what (mostly) Andi saw as failings of EDAC and the need for a better way to find such things as which DIMM has failed without doing a binary search removal of individual modules (”the way of the 21st century”). Tony Luck proposed some further ideas for a generic subsystem.
*). Slow-work. Ted Ts’o reported that recent 2.6.35 kernels with an Ubuntu userspace would periodically get into a state in which large amounts of CPU time was spent in the kslowd worker threads. It turned out that this was caused by a change to the DRM/KMS code to pull polling of the display connectors into the DRM core. Reverting a specific commit fixed the issue for Nick Bowler, who had also been experiencing this problem.
*). Timekeeping. Suresh Rajashekara inquired as to what appeared to be a problem with timekeeping on his OMAP1 platform with a 2.6.29 kernel. It seemed odd that certain timers were not expiring immediately upon resume on a system that tries to spend most of its time in a suspend state (waking for 35 milliseconds every 4 seconds, apparently). Thomas Gleixner replied, saying that during such suspend operations, only the CLOCK_REALTIME based timers are kept correct (aligned to real time), whereas others won’t expire the moment the system resumes because there may otherwise be a thundering hurd problem as many timers expire at the point that the system wakes up from the suspend state.
In today’s miscellaneous items:
*). R. F. Burns inquired as to whether it was possible to “write a kernel module which, when loaded, will blow the PC speaker?”. Alan Cox replied that this wasn’t really likely, and in the absence of the root password and proper expertise, “throwing it out of the window or feeding it iron filings will work just as well.”
*). Lai Jiangshan posted a patch removing the use a default write bit with EPT page allocations under KVM virtualization. It wasn’t causing a problem now since get_user_pages is always called with write=1 at the moment.
*). Adrian Hunter posted MMC patches adding support for secure erase, trim, and secure trim – all now variants of erase in eMMC v4.4 cards.
*). Peter Zijlstra noted that the historical uses of perf_disable to prevent NMI races in the PMU code were basically now done per-arch, so he suggested that he would remove perf_disable as it did not seem to be really needed.
*). Christoph Hellwig posted the XFS status update for May 2010, in which he noted several of the important features that lands in 2.6.34 (including new inode and quota flushing code). Christoph also posted a patch (not entirely related to XFS) that removed the 4K stacks option on 32-bit x86 systems as it is deemed “too small” these days, even with now mandatory split IRQ/kernel stacks, given the depth of many kernel call chains.
*). A number of objections to the new automated addition of a “+” to the localversion for modified kernel trees, if no other is set. Mark Hills pointed out that this triggers a lengthy modpost step even when doing “casual kernel development” to test out some simple patch.
*). Dan Carpenter posted a patch that changes the output of kernel oops messages such that the previous “cut here” is replaced with a message asking for the entirity of the oops to be sent in to kernel folks.
*). Zachary Amsden (who has been working on this for some time) posted some TSC cleanup patches and documentation for KVM. This should help resolve many of the issues that have been affecting some TSC users under KVM. On that note, Hagen Paul Pfeifer sent a patch that effectively allows for deliberate speeding-up of time for certain guests for testing use.
*). Huan Ying posted a three-part “Unified NMI delayed call mechanism”, which essentially allows the deferment of certain NMI-time processing until the NMI context has been left. Ingo Molnar prefered that the solution be to re-use the existing unified NMI watchdog code. Sadly, the rest of the thread turned into a bit of a flamewar between Andi Kleen and Ingo.
In today’s announcements:
*). Jeff Merkey announced Open CWorth Libraries 06-19-2010, and ranted about wanting larger stack sizes. He also posted version 2.6.34-06-17-2010 of his “MDB” or “Merkey Debugger”. Nobody replied to any of these threads.
*). Etienne Lorrain announced version 2.8.2 of the gujin GPL bootloader. It contains several bugfixs and improvements – http://gujin.org/
*). James Morris announced the Program Schedule for the Linux Security Summit that will run in conjunction with the 2010 LinuxCon in Boston, on August 9. Further information is available at http://www.linuxfoundation.org/
*). Karel Zak announced that the second util-linux-ng 2.18 release candidate is now available. It contains lots of fixes (e.g. disable DOS mode and cylinders by default now in fdisk). Further information is available at: http://www.kernel.org/pub/linux/utils/util-linux-ng/v2.18/
*). Mathieu Desnoyers announced the release of Userspace RCU 0.4.6. The latest release includes added ARMv7l support. Further information is available at: http://www.lttng.org/urcu/
The latest kernel release was 2.6.35-rc3.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.
July 11, 2010 02:59 AM
The protocol by which traditional GSM core network components interact is called
MAP (Mobile Application Part). MAP itself is a user of the TCAP (Transaction
Capabilities Application Part) protocol, which in turn runs on a SS7 protocol
stack (i.e. SCCP over MTP or M3UA or SUA over SCTP).
For those users of OpenBSC who have a need to interoperate with other GSM
networks (roaming), the circuit-switched part of OpenBSC has so far relied on
the use of a proprietary MSC (by means of the A interface). This closed
MSC then talks MAP/TCAP/SS7 to roaming partners.
However, on the GPRS front, we now have OsmoSGSN. However, as opposed to the BSC
on the circuit switched side, the SGSN directly interacts with the core GSM
network components (both of the home network and the roaming partners).
So in order to run OsmoSGSN interacting with existing HLRs, we need to add
a MAP/TCAP/SS7 interface to it. Once this has been done for the SGSN, we of
course can do the same for the MSC-part that is currently integrated with
OpenBSC.
As there are existing implementations of SCTP (inside the Linux kernel) and
SUA (sualibrary), TCAP is the next step in the protocol stack that needs
to be implemented. I've been digging into TCAP for the last week(s), and
believe I finally understood every part of its operation.
You can think of TCAP as something that facilitates the transport of
request-response type transactions over a datagram oriented transport layer.
It intends to have lower overhead than a connection-oriented service (e.g.
establishing TCP sessions) and supports features such as aggregating multiple
user-messages (called components) in a single actual transport-layer
message. The idea is to reduce the overhead of message headers and routing.
TCAP is (unfortunately) specified in ASN.1 and thus requires significant
effort to parse and construct. Right now I'm using Lev Walkin's asn1c
ASN.1 C code generator to generate the parser and constructor functions. The
actual TCAP protocol logic is once again implemented in plain C, using the
various concepts and utility functions established in OpenBSC (and now part
of libosmocore).
The implementation is making good progress and I hope I can do some early testing
in about a week from now, and successively move straight to the MAP protocol,
implementing at least those parts that we need for GPRS authentication and
attach / routing area updates.
July 11, 2010 02:00 AM
July 08, 2010
The Schedule of the COSCUP 2010
conference has been posted on the conference homepage. I'm happy to see
such a large number of talks from a wide range of speakers - including many
friends from my time in Taiwan a couple of years back for Openmoko...
As it seems from this chinese blog
entry, the organizers were overwhelmed by the number of attendee registrations,
with all 610 available seats being occupied within 85 minutes of opening the
registration. It seems they are in need of a bigger venue next year ;)
July 08, 2010 02:00 AM
July 07, 2010
Elliptics network is a quite modular distributed hash table, which allows to implement and build-in different IO backends, enabled via config. IO backend is a quite simple entity which just stores data to media and allows to read it back using provided tranaction ID.
BLOB IO backend is a yet trivial append-only array of variable-sized elements, which are stored one after another on disk. Each entry's offset is stored in hash table in RAM, indexed by transaction IDs. Currently we do not even support ID index - to create this hash table in memory initialization code runs over whole file and jumps from entry to entry. With 10 millions of entries stored on single node (about 44 Gb of data) this takes about 8-9 minutes to initialize, so it is likely a good idea to implement external index.
Hash table is neven swapped to disk if configured to be locked in RAM.
Thus to get an object we ask hash table about object offset and read it directly from the storage (send it via sendfile() actually). In theory it should be noticebly faster than filesystem IO backend, where each object is stored in separate file.
Let's see raw results for 2 sas storages (each one contains 16 disks), about 10 millions of data objects (total of about 30 millions, since we have 2 additional history objects for each data one). To handle single request we have to read two objects from disk: one history (parse it and get ID for selected version) and data (with the ID read previously). It is possible to disable versioning and get data via single disk read of course.
Filesystem is default ext4 on 2.6.34 kernel. Machine has about 24 Gb of RAM. Random requests.

600 rps witin 200 ms, 1000 rps within 300 ms
And compare BLOB to filesystem IO backends.
Clearly blob is about 2 times faster than filesystem at the beginning (green one is blob, violet is file-per-object aka filesystem backend), but with time they become equal, likely because of filled hardware queues.
Although I play some simple tricks with read-ahead in blob backend, I still want to test with data stored in raw block device, thus eliminating potential FS overhead, although ext4 has extents it still may require multuple seeks to read different blocks in single file. Also direct-io case can be useful too.
Main problem with blobs is object removal support. While in common web scenarios we can just mark object as removed and drop it from index not even trying to 'squeeze' blob file, in a real life some external application (or IO backend itself triggered by timeout or whatever else) should be able to compact blobs.
Back to drawing board...
July 07, 2010 07:20 PM
I enjoyed Levitt & Dubner’s “Freakonomics”, and picked up the followup “Superfreakonomis” recently at an airport. The last chapter, however, was astonishing. The entire chapter was devoted to a glowing advertisement for Intellectual Ventures, pointing out that they own 20,000 patents “more than all but a few dozen companies in the world”, but of course “there is little hard evidence” that they are patent trolls.
But this bunch of wacky genius billionaires have solved global warming (much of which they dispute anyway) and can control malaria and prevent hurricanes from forming. Unlike the rest of the book which covers analysis of well-known facts and disputes them with insightful economic research, this chapter is so breathless and gushy that it makes me question the rest of the author’s work.
I first came across Intellectual Ventures when The Economist reversed their 100-year opposition to patents, and the only reason I could find was a similarly cheerleading piece about this company. (I had naively expected new research revealing some net positive of patents, or some such revelation).
Side note: when a respected information source covers something where you have on-the-ground experience, the result is often to make you wonder how much fecal matter you’ve swallowed in areas outside your own expertise.
So, what is IV actually doing? Buying up loads of patents and licensing them to companies who calculate it’s not worth the fight is patent trolling 101. Yet the scale they’re operating on puts them on new ground, and opens new opportunities. It seems obvious to get corporate investors on board by promising them immunity from patent claims. With enough patents you stop trying to license them one-by-one and just tax each industry at some non-negotiable rate. No doubt they have more tricks I haven’t even thought of, but these potential devices really do make them a new breed of Super Trolls.
Their efforts to actually attain their own patents could simply be more of the same, but it’s also a relatively cheap but clever PR exercise (as shown by their media treatment). This will help them when (legislative?) efforts are made to shut down patent trolls. I’m fairly confident that they’ll simply license rather than implement anything themselves; actually producing things requires much more work, and simply exposes you to others’ patents.
Without diving deeply into this, they seem to understand two things clearly:
- They learnt from Microsoft that government-enforced monopolies are worth billions. Microsoft had copyright on software, this is patents.
- Development is getting much cheaper, while patents are getting more valuable. Cheaper development is shown clearly by free software, open hardware and hackerspaces. Patent value increases as more of the world becomes a more profitable and enforceable patent target.
Now, I don’t really care if one company leeches off the others. But if they want to tax software, they have to attack free software otherwise people will switch to avoid their patent licensing costs. And if you don’t believe some useful pieces of free software could be effectively banned due to patent violations, you don’t think on the same scale as these guys.
July 07, 2010 10:52 AM
July 05, 2010
In case you're expecting a quick response from me these days, please apologize.
I'm currently having family visiting me in Berlin, and I very much enjoy being
the personal tourist guide for some days...
I shall be back to normal by the end of the week.
July 05, 2010 02:00 AM
July 04, 2010
This is a rather small release (actually it was made several days ago, but I postponed announcement to allow binding poll to stick on top) - it does not even break library or API at all.
Instead it changes IO server and its backends to use config file insted of zillions command line options.
Main purpose of this step was not to simplify deployment life actually, but instead to make a ground for the further extensions: namely automatic network topology configuration and new single-seek backend.
Currently elliptics network servers are required to be properly configured prior cloud join - they have to have unique IDs, which split address space according to administration policy. But all the time it is just as simple as spread IDs according to node's disk space (the bigger disk space is the larger ID set it covers). And in the common case of the same nodes IDs should be equally spread over covered address space.
This will be made automatic to reduce configuration only to network address selection. It should also alow to reconfigure storage on demand, for example when new nodes added or removed.
As of new IO backend, I plan to implement a rather trivial low-level storage, which will operate with huge blobs of data. With some (common) usage cases it is supposed to perform only single seek to get data by its index. Main users will be storages with lots (tens of millions) of rather small objects. Classical databases like TokyoCabinet IO backend do not work here, since even for several millions of objects it starts paging out which drops it down to floor immediately, which is unacceptible.
File IO backend is much worse in this scenario. For example having about 30 millions of objects (about tens of KB each) with versions (i.e. to get single object we have to read two times from disk) loaded into SAS and SATA 16-disk raid10 machines, we got following numbers (random objects are fetched):

SAS raid10 array (2 storages of 16 disks each)
Got 800 rps within 300 ms

SATA raid10 array (2 storages of 16 disks each)
Got 200 rps within 200 ms
Which in SATA case roughly corresponds to 20ms per seek and we make 2 seeks to get an object (4 seeks, which 2 times decreases performance, since we have to read two files from the disk when versions are used). With its uber-large NCQ depth of 32 this is the end of the story: 400 rps from such setup.
SAS was a little bit (3-4 times) faster - 800 rps within 300 ms and 600 rps within 200 ms.
Again, to handle a single request we have to read two times from the storage (each one in turn resulted in multiple seeks) - one to get version information, and another one to get object with some version itself.
With the new IO backend I believe we can suddenly increase performance. By the factor of 2.
But so far it is a speculation only, let's first implement the idea...
July 04, 2010 07:39 PM
July 03, 2010
I’m obsessed with music. I can’t imagine a day without it. Regardless of what I’m doing, there’s pretty much always something playing in the background. From time to time I move my work setup from one room to another, just to shake things up, and break some habits. Recently I did this, and it involved using a different machine to usual as my desktop.
After setting up, I noticed that something just didn’t sound right with my music. All the high end frequencies sounded harsh and mashed together. The low end wasn’t anything amazing either. I tried some different speakers. It sounded even worse. At this point I thought I was going crazy, and tried some headphones (my tried and tested Sennheiser HD-280′s
— What I like about these is that I’ve used them long enough that I know what to expect from them, so I know when something isn’t sounding right). Again, it sounded lifeless and dull, and high frequencies were almost painful.
What the hell was going on ? I started wondering if I could blame it on software. Maybe there was something in the driver that I could tweak. Maybe Pulseaudio was doing something wrong. I spent an afternoon looking for things to configure, going as far as disabling power management features in the hope that was the cause. In the end, I gave up. I just decided that the “High Definition Audio Controller” built into the ICH7 chipset, or some other components in the audio signal path on the motherboard was crap.
A few months ago, Chris Lee visited, and brought with him a NuForce Icon uDAC
. (He also brought a pair of $1500 headphones for which he took much ridicule for being an audiophile). I got the chance to try out his setup at the time, and I admit it did sound great (even with my cheapo $99 headphones).
Remembering all this, I decided to pick up a udac, and give it a shot. As suspected, it worked perfectly. Complete plug and play experience, with no complications, and the crystal clear audio that I wanted. I can hear bass frequencies again. High frequencies are reproduced in a manner that doesn’t sound like tinnitus.
It’s weird. I used to think that the days of add-in sound cards were over with the advent of onboard motherboard sound. For as long as there exist motherboard implementations that sound this bad, I’m thankful that you can still pick up inexpensive quality solutions.
nuforce udac is a post from: codemonkey.org.uk
Related posts:
- Linux Music Workflow: Switching from Mac OS X to Ubuntu with Kim Cascone Create Digital Music has an interesting post up today by...
July 03, 2010 11:33 PM
While the Enphase Enlighten monitoring site is pretty swanky, it’s slow to load and extremely flash-heavy. It has the advantage of being able to do per-panel monitoring, event log monitoring, etc, but I was hoping for something a little more lightweight. Enter pachube.com. Let’s build out the internet of things….
The Envoy system monitor for the Enphase inverters has very basic output monitoring abilities; it shows you current power, and daily, weekly, and lifetime energy production. So, we can screen-scrape this and upload it to pachube, then do what we like with the data.
I have this script on a 5-minute* 10-minute cron job to get the data. The Envoy doesn’t seem to update faster than 5 minutes, and it’s such a gutless wonder, doing it any more often than that brings it to its knees, and it stops updating the main site! If you have problems, you may want to reduce the updates to 15m or more.
To use the script, first get a Pachube API key, and set up a new Pachube feed. Add 4 datastreams, for instantaneous power, daily production, weekly production, and lifetime production, in that order. Edit the script to add your envoy hostname/IP, your API key, and your feed ID. Then put the script on a 5-minute cron job. You’ll start seeing the data on a Pachube feed page like this. Then you can use some of the apps highlighted on apps.pachube.com to create widgets as in the image above, as seen on this page. You can even get an iPhone app to monitor the data, or create an OSX dashboard widget from the HTML objects!
*Edit: Don’t set the cron job to be more frequent than 10 minutes. I’ve had trouble with the Enovy unit bogging down and not reporting to Enlighten if you hit it more than every 10 minutes (!)
July 03, 2010 02:50 PM
So to follow on from my posting stating my position wrt kernel drivers for closed source userspace drivers, lets take a look at the embedded GPU industry and Linux kernel relationship.
What does the embedded industry get from Linux?
They get a kernel which is royalty free, with 1000s of man-years of development experience and resources. Before Linux these vendors either sourced an OS on a royalty basis from some closed-shop, or rolled their own in-house one.
Now people might say "but the embedded GPU industry has to support Windows as well", but take one look at NVIDIA Tegra One and you can see the embedded windows marketplace is less than important, NVIDIA Tegra Two is all about the Linux, whereas they were pretty much only talking to MS on Tegra one.
So Linux is a great boon for this industry, and means they can produce higher quality products for a lower cost (or lower quality products at a lower cost in some cases). So really there are probably two games in town for these embedded vendors, selling into Apple or selling into Linux centric developments, like Android, Meego, Linaro.
So what are they actually hiding in userspace?
The main thing they seem to be hiding is shader compilers and their GPU assembler code, things that convert from GLES into the assembler code for their GPUs. This stuff isn't rocket science but it probably is where most of their speed up and tricks are hidden.
So why do they think it valuable?
I think all 3D IP vendors dream of becoming Imagination Technologies, they need to learn there is already one Imagination Technologies and the only way to easily disrupt their revenue stream and sell into other SOCs is to be disruptive, not just follow the herd. They also probably had to spend a lot of money writing a decent GPU compiler from scratch, whereas most embedded firmware is a lot more trivial, so they probably think they need to directly recoup the costs from this development instead of giving it away. The thing is they are hw vendors, the sw is a sunk cost, opening it would actually make future maintenance easier. HW companies never do well at SW and they would be best to just open it and try and involve some community development around it.
Is the value of this IP more valuable than what the receive from Linux?
This is the crux of my issue with these vendors, they are receiving the Linux kernel for free, but don't want to contribute anything back. They know they can't sell into any where else except Linux driven products, but they insist on keeping their development methodologies from the days of Windows and their own in-house OSes. Those days are gone, but they cling to the idea that for some reason they can produce a better GPU stack on their own than they could in collaboration with other, despite the fact that the kernel that forms the basis for their sales was developed in this fashion. They also all use gcc as the compiler for their CPUs again proving the insanity.
Isn't it up to them what they do?
Totally, but its also up to the Linux community to push back against them. The thing is they'd never have opened any code if it wasn't for the GPL making them at least open the kernel portions, they don't care about freedom or GPL, they care about their bottom line, and doing the least amount of work to remain legal and make money. Now they are getting all this wonderful software for free, Linux phone sales are driving their bottom line, but they still don't want to play the game by the rules of the kernel. They want to have their cake and eat it too. (the cake is a lie). Hence they spend their time creating their own solutions in private, releasing what they have to comply with legalese but never actually allowing people the freedom to use their devices.
So shouldn't we give a little?
The thing is two major vendors have been pushing Imagination Technologies for years to open something, these guys are aiming to sell thousands->millions of devices, we have gotten the ugliest kernel shim in the world in 4 years of trying. All the other vendors are only willing to give that little. I don't personally think any of them want to open this stuff and will hide behind IP excuses for ever.
What will make them change their minds?
a) money and lots of it. If google or olpc can demand open driver commitments (in contracts, not handwaving agreements) then I suspect these vendors will quickly realise the value of their IP is dwarved by the value of sales. This probably means a major chance for one of the vendors to control a lot of the space in the Linux world.
b) disruptive vendor, one vendor realises before the others that opening their IP will lead to more sales than keeping it closed and also lead to the chance of more people optimising their technology and leveraging other work in the industry.
So are you saying they should drop all their in-house developed solutions?
No I'm saying that the driver for their hardware is a single entity, and if the whole entity isn't open, then none of it is truly open. So if they don't want to release an open userspace, then they don't get to merge their open kernel bits to support the closed userspace. We have to keep the maintenance burden on them, so it keeps costing them money to track newer kernels, and they don't get community support from other vendors who have committed to doing things right.
So why should they re-write drivers?
This happens in Linux the whole time, with nearly every new technology. Wireless, RAID, SATA for example, all have had vendors trying to push complete stacks of their own writing, you'll notice over time the drivers that are actually written to the current stacks work best, an the crazy vendors drivers are often horror shows.
What would be nice to happen?
It would be great if there was a hero with time/funding and involvement in the ARM GPU community to take over being maintainer of these solutions, from kernel all the way to userspace. Vendor driver writers could ask this person for advice, and they could have some sort of working group where they develop a stack based around current Linux technologies, like GEM/TTM/DRI2/Mesa/Gallium3D. If you take a look at the mesa stack lately, there has been a lot of work on making it work as an EGL/GLES stack as well as a classic GL stack. Then vendors would supply open drivers compliant with this stack, and just sell lots of chips.
What would be most likely negative solution?
We get what we have now, they maintain the 5-6 GPU stacks in their own world, and never talk to each other, and it costs them more and more money going forward to maintain. Some hero reverse engineers one or two of the GPU architecture, maybe some hero writes a open driver stack from docs under NDA or with open docs.
I may update this post as I have more thoughts ;-)
July 03, 2010 01:28 AM
The Internet is made of tubes, right, and there are two kinds of tubes that feed into your living room walls: phone tubes, and cable tubes. My experience with phone Internet tubes is that if you are very patient and pay the phone company to fix their own broken wires, you might get Internet sometimes. Hence, I’ve had cable Internet tubes from Comcast for the last few years.
However, Comcast is not the local cable tube monopoly here in Tucson, Cox Communications is. I went to the Cox web site and thus began my Twilight Zone excursion into a world of cheap convenient Internet service providers. I’m getting 15 Mbps for half what I paid Comcast! The call center guy didn’t try to sell me phone service! I could use my Comcast cable modem! They reprogrammed my modem remotely!
But here’s the best part: This morning, I heard a cat meowing desperately and ran outside in time to see a man climbing down the tree in the courtyard with a kitten clinging to his shoulders. As the kitten jumped down, I realized that the rescuer was wearing a Cox uniform. “Did you just turn on the cable for apartment <mumble>?” “Yes, well, actually, the Internet.” I barely had time to thank him and note his name tag – Michael? – before he dashed off to his truck, no doubt late to turn on someone’s HBO and rescue a drowning puppy.
So there you have it. Cox Communications is the best cable company in the world – they give you fast cheap Internet service AND they save kittens.
July 03, 2010 12:21 AM
July 02, 2010
In the previous posting a code fragment dealt with memory ordering. Strangely enough, the foo_1() function contained only a pair of reads, but separated them with a full smp_mb() barrier. So why not use an smp_rmb() instead? Perhaps something like the following:
int x, y; /* shared variables */
int r1, r2, r3; /* private variables */
void foo_0(void)
{
ACCESS_ONCE(x) = 1;
}
void foo_1(void)
{
r1 = x;
smp_rmb(); /* The only change. */
r2 = y;
}
void foo_2(void)
{
y = 1;
smp_mb();
r3 = x;
}
After these three functions complete, we have an assertion. Please note that by “complete” I mean that all effects of the functions have become globally visible. One way to ensure this level of completion is for the thread that spawned
foo_0(),
foo_1(), and
foo_2() to do
pthread_join() on each of them in turn, and only then execute the following assertion:
assert(!(r1 == 1 && r2 == 0 && r3 == 0));
Can this assertion ever trigger?
July 02, 2010 04:22 AM
Through the last couple of days, I've been in extreme bug-squashing mode for
the GPRS/EDGE code base in OpenBSC (mostly the OsmoSGSN program). I'm now
at a point where I can reliably establish PDP contexts and access the Internet
from a variety of different phones with different baseband chipsets and
GPRS protocol stack implementations. All so-far-known bugs regarding
fragmentation/reassembly, sequence numbering and other issues have been
fixed. There definitely are plenty more, but we first need to find them.
Since it's working reliably now, it's quite fascinating what the various
phones do after connecting to the GPRS network. Like Windows Mobile phones
sending Netbios Name Service updates (and requests), which I think is funny
considering that they are sent to a network that is typically considered
to be the public Internet.
But to be fair and not anti-Windows, my Google/Android G1 also makes some https
connections back to Google - and I don't know what they are for [yet].
In any case, with OpenBSC, OsmoSGSN and OpenGGSN anyone interested in doing
true security (and privacy) research with mobile phones is now able to do so.
Using those programs, you can run your own GPRS+EDGE network and can see
first hand what your phones are doing on a cellular network, what kind of
data they are sending back home. In this setup, there is no packet filtering,
NAT, deep packet inspection and no intrusion detection systems between your PC
and the IP stack on your phone.
July 02, 2010 02:00 AM
July 01, 2010
[I posted this to lkml earlier - discussion should happen there not in comments here, but its nice to have somewhere easy to point people at].
Now this is just my opinion as maintainer of the drm, and doesn't
reflect anyone or any official policy, I've also no idea if Linus
agrees or not.
We are going to start to see a number of companies in the embedded
space submitting 3D drivers for mobile devices to the kernel. I'd like
to clarify my position once so they don't all come asking the same
questions.
If you aren't going to create an open userspace driver (either MIT or
LGPL) then don't waste time submitting a kernel driver to me.
My reasons are as follows, the thing is you can probably excuse some
of these on a point by point basis, but you need to justify why closed
userspace on all points.
a) licensing, Alan Cox pointed this out before, if you wrote a GPL
kernel driver, then wrote a closed userspace on top, you open up a
while world of derived work issues. Can the userspace operate on a
non-GPL kernel without major modifications etc. This is a can of worms
I'd rather not enter into, and there are a few workarounds.
b) verifying the sanity of the userspace API.
1. Security: GPUs can do a lot of damage if left at home alone, since
mostly you are submitting command streams unverified into the GPU and
won't tell us what they mean, there is little way we can work out if
the GPU is going to over-write my passwd file to get 5 fps more in
quake. Now newer GPUs have at least started having MMUs, but again
we've no idea if that is the only way they work without docs or a lot
of trust.
2. General API suitability and versioning. How do we check that API is
sane wrt to userspace, if we can't verify the userspace. What happens
if the API has lots of 32/64 compat issues or things like that, and
when we fix them the binary userspace breaks? How do we know, how do
we test etc. What happens if a security issue forces us to break the
userspace API? how do we fix the userspace driver and test to confirm?
c) supplying docs in lieu of an open userspace
If you were to fully document the GPU so we could verify the
security/api aspects it leaves us in the position of writing our own
driver. Now writing that driver on top of the current kernel driver
would probably limit any innovation, and most people would want to
write a new kernel driver from scratch. Now we end up with two drivers
fighting, how do we pick which one to load at boot? can we ever do a
generic distro kernel for that device (assuming ARM ever solves that
issue).
I've also noticed a trend to just reinvent the whole wheel instead of
writing a drm/kms driver and having that as the API, again maintainer
nightmares are made of this.
d) you are placing the maintenance burden in the wrong place
So you've upstreamed the kernel bits, kept the good userspace bits to
yourselfs, are stroking them on your lap like some sort of Dr Evil,
now why should the upstream kernel maintainers take the burden when
you won't actually give them the stuff to really make their hardware
work? This goes for nvidia type situations as well, the whole point is
to place the maintainer burden at the feet of the people causing the
problems in an effort to make them change. Allowing even an hour of
that burden to be transferred upstream, means more profit for them,
but nothing in return for us.
July 01, 2010 11:42 PM
I’ve been following the recent news story about the russian spy ring that was busted in the US. In particular, I found this blog has some fascinating info on how they were allegedly operating. The bit about a truck pulling up with an ad-hoc wireless network for the spy to connect to intrigued me. It seems like an obvious thing to do reading about it, but it made me realize, I don’t think I’ve ever actually used ad-hoc networking (for spy related activities or otherwise).
Russian spies. Ad-hoc networks. is a post from: codemonkey.org.uk
No related posts.
July 01, 2010 11:08 PM
A hilarious article at LWN reminded me about the value of not being a sore loser. I was very fortunate to receive my own lesson very early on (in 1.3 or early 2.0 days IIRC), when DaveM rejected my console code in favor of Geert's. I liked that console because it was capable of rendering effectively the characters of width other than 8 pixels. It may have been a sizeable corpus of work for me, but it was back when Linux was a bubbling soup of fun, and not important enough for me go on blowiating at conferences in front of the gullible. So we closed the book and went on hacking something even better (ok, in my case it was floppy.c, but you know what I mean).
P.S. For the love of god, read Rusty Russel's Mortal Kombat Model of Software Development (link).
July 01, 2010 03:37 PM
June 29, 2010
It turns out that it's actually really, really easy to set up an l2tp tunnel. You just need to install xl2tpd, configure some address ranges and then add an authentication entry to chap-secrets. It's just that the entire known universe appears to be more interested in using ipsec as well, and that looks worse than setting up Kerberos and I've already done that enough in my life thanks. I don't care about my connection being encrypted (I've got encrypted protocols for that), so this seems to be an entirely reasonable solution.
June 29, 2010 06:41 PM
Searching for information on setting up an L2TP VPN takes me here, where I get to choose between OpenSWAN, KAME and some OpenBSD port. Searching for information on setting up a PPTP VPN takes me here, where I'm told exactly what I need to do.
Given choices, I chose the one that reduced my choices. THERE IS A LESSON HERE.
(Sadly, I'm now going to have to deal with L2TP anyway because something in the intermediate network is dropping GRE)
June 29, 2010 05:37 PM
I found this article interesting. The power management slides could be retitled “what Linux has done in the last three years”. So windows 7 didn’t ship with a dynamic timer tick ? Surprising. Linux isn’t perfect when it comes to power management, but we’re a lot better than we were, and it’s interesting to see Windows now planning on using some of the same innovations.
The ‘fast startup’ slides are interesting too. The ‘logoff & hibernate’ is the only real ‘new’ feature there afaics. It’s interesting to see terms used that have become passe in Linux like ‘cache prefetching’ and ‘parallel startup’.
interesting windows 8 leaked info. is a post from: codemonkey.org.uk
No related posts.
June 29, 2010 04:25 AM
This posting relates to memory barriers in the Linux kernel. We start with smp_mb(), which is a full memory barrier when the kernel is built with CONFIG_SMP=y and is otherwise a compiler barrier that constrains compiler optimizations, but which generates no code.
Consider the following code fragment, where each function foo_n() runs on CPU n, all concurrently:
int x, y; /* shared variables */
int r1, r2, r3; /* semi-private variables */
void foo_0(void)
{
ACCESS_ONCE(x) = 1;
}
void foo_1(void)
{
r1 = ACCESS_ONCE(x);
smp_mb();
r2 = ACCESS_ONCE(y);
}
void foo_2(void)
{
ACCESS_ONCE(y) = 1;
smp_mb();
r3 = ACCESS_ONCE(x);
}
Now suppose that the following assertion runs after all of the preceding functions complete.
assert(!(r1 == 1 && r2 == 0 && r3 == 0));
Can this assertion ever trigger? Why or why not?
June 29, 2010 03:44 AM
June 28, 2010
Dear Evgeniy,
Congratulations! The program committee has finished their work and is glad to tell you that your submission for a /Refereed Paper/ with the title
"Elliptics network - a distributed hash table, design and implementation"
was accepted!
Wanna chat? I will show up a small presentation for Linux Kongress this year. Back to writing table now... its time to write some bits.
June 28, 2010 10:52 PM

I got the meters installed today. It took about 2 minutes of work, after 2 weeks of wait.
The meter on the left is the “production” meter which measures how much power the panels have made over their lifetime, period. It’s so the utility knows what they got for their rebate money. (Oddly, it’s the exact same Centron C1SC meter I had previously, for usage measurements!) The meter on the right is the net meter. Today it’s running backwards, even though it’s cloudy, because the house is pretty much at base load.
All that’s left is the anti-islanding inspection*, but my installer tells me the utility is OK with having them on prior, so we’re up and running! On a cloudy day, of course.
I don’t yet have the official Enlighten URL for the array, but my homebrew monitoring is here on pachube.com.
*And one concern about the physical install of the panels & cables, which might possibly require removal and re-fastening of the panels, which would just be awful at this point.
Edit: The official monitoring site is now active.
June 28, 2010 07:01 PM
During my work on airprobe and OsmocomBB I've been wondering why you see
paging by IMSI in real-world GSM networks.
A quick recap: The IMSI is the world-wide unique serial number of your SIM.
Since it is easy to identify and track people, the TMSI was introduced as
a temporary identifier that is frequently re-allocated over encrypted channels.
The only reason for the TMSI to exist is to prevent tracking of a subscriber
by watching where his IMSI appears on the paging channel.
According to the theory, the IMSI is only used when first registering to any
GSM network. At that time, a TMSI is allocated to the SIM card in the phone,
and this TMSI is used for the next transaction(s). Later, this TMSI is
re-allocated and re-allocated, but the IMSI shouldn't show up again in any
paging requests.
Even if you switch mobile networks (i.e. in the roaming case), you would
once send the IMSI as part of a LOCATION UPDATE REQUEST or IDENTITY RESPONSE,
but the network has no need to page the SIM by IMSI.
So far the theory. If you look at the Paging Channel (PCH) of cells in
real-world networks, you see a significant (10-20%) amount of paging requests
that contain paging by IMSI. This seems strange on first sight, given the
theory described above.
I have the following plausible explanation for this:
-
The VLR keeping the IMSI-TMSI mappings doesn't have non-volatile storage. This
means at a VLR restart, all the TMSI allocations will be lost, and the network
has to resort to paging by IMSI.
-
The VLR has a limited amount of RAM, which can store a limited number of IMSI-TMSI
mappings. Especially if the operator is interested in saving money, the amount of
memory is insufficient for all subscribers in the network. This means, the VLR
will expire some old entries in the mapping table to store new entries. Thus,
mobile phones whose last transaction with the GSM network was relatively long ago
are likely candidates for such VLR expiration. Once a phone for an expired entry
needs to be paged again, paging will happen by IMSI.
-
Last, but not least: GSM networks do not page a phone by the last known cell, but
by the last known location area of the phone. A location area might be relatively
big. This means that at any cell you will see a lot of paging messages, even for
phones that are not even anywhere near this cell. If there is no response within
the location area, the MSC might decide to do paging on a larger radius, possibly
the entire MSC area. Since such MSC-wide paging is likely to occur for phones
that haven't shown activity for a long time (and thus might have moved or
disappeared without properly unregistering from the network), those are the exact
same phones for which the IMSI-TMSI mappings have expired from the VLR. Thus,
the rate of paging-by-IMSI looks disproportionately high.
So the relatively high percentage of paging by IMSI vs. TMSI should not be
taken as a measurement with regard to the total number of transactions or even
the total number of subscribers. It is simply the mechanics of the network
resulting in a distortion of those figures caused by phones that have never
properly unregistered from the network.
June 28, 2010 02:00 AM
June 27, 2010
I've just returned back from the First OpenBTS workshop held by David Burgess and hosted by Dieter Spaar in south-east Bavaria (Germany). While I'm not involved with OpenBTS so far (except from using it occasionally), I still thought the community surrounding Free Software / Open Source in the GSM field is small enough to make me participate.
On the request of the participants, I also did a short demonstration of both
OpenBSC and OsmocomBB. And just like I managed to crash OpenBTS by
accidentally sending invalid messages, my OpenBSC demo crashed at some point
[due to a not-yet-known bug regarding SMS delivery. I suppose the intrusive
changes of the BSC/MSC split are to be blamed for that. But I don't mind,
we need that split...
I definitely had a great time meeting the participants of the workshop. There
definitely is a very diverse crowd with equally diverse reasons for their
interest in using and/or deploying OpenBTS.
Finally, there was a chance to discuss the need for a common 'application interface'
in both OpenBSC and OpenBTS. Using that interface, external applications (e.g.
implementing USSD or RRLP) could be written in a way to work with both OpenBTS
and OpenBSC. I hope we can get started on this soon and remove another bit of
fragmentation in what is already a fairly small special interest community...
Given the excellent weather conditions, the motorbike ride to and from the
venue went fine - despite being at 650 km distance from my home.
June 27, 2010 02:00 AM
June 26, 2010
I've uploaded man-pages-3.25 into the release directory (or view the online pages). The most notable changes in man-pages-3.25 are the following:
- A new migrate_pages(2) manual page, written by Andi Kleen, documenting the migrate_pages() system call (added to Linux back in kernel 2.6.16).
- A major update of the quotactl(2) manual page. This update incorporates material from the version of this page (mostly written by Jan Kara) that was in the quota-tools package, and also adds new material by me. The quotactl(2) manual page that was in quota-tools has been dropped from that package, so that there is now a single canonical quotactl(2) page--the one in man-pages.
- The mkstemp(3) manual page adds descriptions of the mkstemps() and mkostemps() library functions, which were added to glibc in version 2.11.
- The fcntl(2) man page adds descriptions of the F_SETPIPE_SZ and F_GETPIPE_SZ operations, which are new in Linux 2.6.35.
- The madvise(2) manual page adds descriptions of the following operations: MADV_HWPOISON (new in kernel 2.6.32), MADV_MERGEABLE and MADV_UNMERGEABLE (new in Linux 2.6.32), and MADV_SOFT_OFFLINE (new in Linux 2.6.33).
- The prctl(2) manual page adds descriptions of the hardware poison operations (PR_MCE_KILL and PR_MCE_KILL_GET) added in kernel 2.6.32.
- The sched_setscheduler(2) manual page adds a description of the SCHED_RESET_ON_FORK flag, which was new in kernel 2.6.32.
- The umount(2) manual page adds a description of UMOUNT_NOFOLLOW (new in kernel 2.6.34).
- The socket(7) manual page adds descriptions of the read-only socket operations SO_DOMAIN and SO_PROTOCOL, both new in kernel 2.6.32.
June 26, 2010 04:59 PM
June 25, 2010
I’m leaving for Tucson in three days! Apartment rented, movers scheduled, etc. Still no time to write something thoughtful and subtle, as I’m trying to get another release of union mounts today before I go on two weeks vacation.
I’m going to miss San Francisco, but I’ll miss my friends even more. Thanks to everyone who made my 8 years in California so great.

June 25, 2010 06:17 PM
The former is actually quite religious question, but still it has a fair amount of technical background to talk about.
Main pros of the dedicated metadata server is its incredible flexibility and control. To determine where given object lives we ask special server which can perform whatever we want to make the answer: system can check permissions, locate the least loaded server, update centralized statistics or contact oracle and notify external entities.
Thus metadata server becomes a complex database which is too hard to replicate and generally maintain in consistent state with its copies. And we do need another copies, since every server fails and dedicated metadata one fails too. In some practical scenarious they fail even more frequently than storage ones, although quite contrary I have example where they never fail during several years of maintenance and everyday access.
But no matter what, generally we want to replicate metadata server and preferably to implement master-master operation mode to unload single entry of failure. To date I do not know production-quality master-master replication solution neither in free nor in proprietary world. And by production I mean hundreds of millions of records with millions of records updated/created per day with physically separated datacenters with flaky link between.
A very elegant solution for metadata servers is ... full absence of them. Distributed hash storage is one of them. But it only solves access problem - client can determine needed server itself, but we still have to implement control access, statistics and notifications somewhere. If we put more complex logic into the storage, unflexibility of the central control point absence becomes even more visible.
One such problem is caching. When some object is popular we want to put it to faster media to satisfy increased access rate. For example we can create multiple copies and distribute clients between them.
With metadata server this is a trivial case - we just update appropriate database record, so that connected client could get random (or preferable, doesn't matter) path to the data object. The more popular content is the more copies we put into the storage and update metadata database.
With the distributed storage without central metadata server we have to perform a full lookup to determine whether given object is present in the storage or not. Which means we have to contact remote server and try to read some data from its media. This slows things down noticebly, especially for non-popular content which does not have hundred of cached copies.
A simple solution is to move a control entity from the storage to higher levels. In case of cache it means some external storage, which will contact low-level one only when requested objects are not in the cache. Depending on the cache implementation this can be very cheap price for the access problems.
Thus we build a layerd system where DHT is a low-level storage.
Actually this solution will also work for metadata server too, except that for some workloads classical LRU caches do not work. Thus squid or page cache should not be used for them, and metadata server solution wins again.
But what wins and especially what is needed not only for distributed metadata-free storages is cache with content weights - the more frequently given object is requested the more time it will live in cache even if currently it is not requested. Last access time does not work in this case.
To my shame I do not know such cache systems - very popular memcached and squid do not support this iirc. And they do not allow to distribute cache content among multiple nodes. Memcached actually has quite nice frontends, which can form DHT, but still pure memcached lacks some features.
Plan has been plotted by itself...
June 25, 2010 05:46 PM
The GNOME Foundation released their conference speaker guidelines today. This is an important step not just in helping speakers know what's acceptable, but also in helping audience members understand in advance what the community is likely to find objectionable and ensure that they can feel comfortable in raising concerns.
Of course, guidelines mean little without enforcement. My original draft of these suggested that event runners be able to stop presentations if they felt they were gratuitously in breach of the guidelines. Opinions on this were fairly strongly split, with several people concerned that this effectively allowed individuals to immediately shut down presentations with little oversight. That's a genuine concern, but it does seem to assume bad faith on the part of conference organisers in a way we've rarely (never?) seen. On the other hand, conferences in our field have endured presentations that have contained offensive material from start to finish. If an offended individual is in a minority then it's not easy for them to potentially challenge the audience by vocally expressing their unhappiness, and even standing up and leaving may be a difficult and obvious act.
But I don't set the behavioural standards of the community, and attempting to enforce standards that people don't agree with isn't going to fly. Some people are likely to feel that even the level of enforcement suggested is an unwelcome intrusion into free discussion of some topics, so I think this is a good compromise that is a great signal for our unwillingness to accept inappropriate presentations. With luck we'll see other communities enact similar guidelines and we can come to a broad consensus that covers the majority of our conferences.
June 25, 2010 12:09 AM
June 24, 2010
I've had the opportunity to look into the Joojoo tablet recently. It's an interesting device in various ways, ranging from the screen being connected upside down and everything having to be rotated before display, to the ACPI implementation that's so generic it has no support for actually attaching most embedded controller interrupts to ACPI devices and so relies on a hacked kernel that exposes individual interrupts as ACPI events that are parsed in userspace, to the ChangeOrientation binary that's responsible for switching between landscape and portrait modes containing gems like ps aux | grep fgplayer | grep -v grep and containing references to org.freedesktop.PandaSystem, a somewhat gratuitous namespace grab. Hardware-wise it seems to be little more than battery, generic nvidia reference design board and touchscreen with an accelerometer and LED glued to the chipset's GPIO lines. The entire impression is one of an ambitious project not backed up by the level of technical expertise required to get things done properly. Frankly, I think Michael Arrington came out of this rather better than he could have done - behind the reasonably attractive UI, the entire device is pretty much held together by string and a following wind.
Of course, releasing shoddily put together technology isn't generally illegal and from that point of view Fusion Garage aren't any worse than a number of products I've had the misfortune to actually spend money on. But they're distributing Linux (stock Ubuntu with some additional packages and a modified kernel) without any source or an offer to provide source. I emailed them last week and got the following reply:
Dear Sir,
we are still actively making changes to the joojoo software. We will make
the source release available once we feel we are ready to do so and also
having the resources to get this sorted out and organized for publication.
We seek your kind understanding on our position and appreciate your
patience on this. Thank you.
Best Regards
joojoo Support Team
Strong work, Fusion Garage. Hardware and software may not be your strong points, but you're managing copyright infringement with the best of them.
June 24, 2010 04:10 PM
Content copyright by their respective authors.