diff options
author | 2018-12-17 16:46:36 +0100 | |
---|---|---|
committer | 2018-12-17 21:36:30 +0100 | |
commit | 98a7b55a5318b72c55410be7d6f5d92b745e1f07 (patch) | |
tree | 679f798ced84894093bc092a1beffd24b52915d1 /NEWS | |
parent | NEWS: add missing 'not' (diff) | |
download | systemd-98a7b55a5318b72c55410be7d6f5d92b745e1f07.tar.gz systemd-98a7b55a5318b72c55410be7d6f5d92b745e1f07.tar.bz2 systemd-98a7b55a5318b72c55410be7d6f5d92b745e1f07.zip |
NEWS: document the usern/mknod borkage in 4.18 a bit
Diffstat (limited to 'NEWS')
-rw-r--r-- | NEWS | 28 |
1 files changed, 28 insertions, 0 deletions
@@ -384,6 +384,34 @@ CHANGES WITH 240 in spe: SD_ID128_ALLF to test if a 128bit ID is set to all 0xFF bytes, and to initialize one to all 0xFF. + * KERNEL API BREAKAGE: Linux kernel 4.18 changed behaviour regarding + mknod() handling in user namespaces. Previously mknod() would always + fail with EPERM in user namespaces. Since 4.18 mknod() will succeed + but device nodes generated that way cannot be opened, and attempts to + open them result in EPERM. This breaks the "graceful fallback" logic + in systemd's PrivateDevices= sand-boxing option. This option is + implemented defensively, so that when systemd detects it runs in a + restricted environment (such as a user namespace, or an environment + where mknod() is blocked through seccomp or absence of CAP_SYS_MKNOD) + where device nodes cannot be created the effect of PrivateDevices= is + bypassed (following the logic that 2nd-level sand-boxing is not + essential if the system systemd runs in is itself already sand-boxed + as a whole). This logic breaks with 4.18 in container managers where + user namespacing is used: suddenly PrivateDevices= succeeds setting + up a private /dev/ file system containing devices nodes — but when + these are opened they don't work. + + At this point is is recommended that container managers utilizing + user namespaces that intend to run systemd in the payload explicitly + block mknod() with seccomp or similar, so that the graceful fallback + logic works again. + + We are very sorry for the breakage and the requirement to change + container configurations for newer kernels. It's purely caused by an + incompatible kernel change. The relevant kernel developers have been + notified about this userspace breakage quickly, but they chose to + ignore it. + Contributions from: afg, Alan Jenkins, Aleksei Timofeyev, Alexander Filippov, Alexander Kurtz, Alexey Bogdanenko, Andreas Henriksson, Andrew Jorgensen, Anita Zhang, apnix-uk, Arkan49, Arseny Maslennikov, |