aboutsummaryrefslogtreecommitdiff
path: root/NEWS
diff options
context:
space:
mode:
authorLennart Poettering <lennart@poettering.net>2018-12-17 16:46:36 +0100
committerLennart Poettering <lennart@poettering.net>2018-12-17 21:36:30 +0100
commit98a7b55a5318b72c55410be7d6f5d92b745e1f07 (patch)
tree679f798ced84894093bc092a1beffd24b52915d1 /NEWS
parentNEWS: add missing 'not' (diff)
downloadsystemd-98a7b55a5318b72c55410be7d6f5d92b745e1f07.tar.gz
systemd-98a7b55a5318b72c55410be7d6f5d92b745e1f07.tar.bz2
systemd-98a7b55a5318b72c55410be7d6f5d92b745e1f07.zip
NEWS: document the usern/mknod borkage in 4.18 a bit
Diffstat (limited to 'NEWS')
-rw-r--r--NEWS28
1 files changed, 28 insertions, 0 deletions
diff --git a/NEWS b/NEWS
index f74e0f112..31d4b46cc 100644
--- a/NEWS
+++ b/NEWS
@@ -384,6 +384,34 @@ CHANGES WITH 240 in spe:
SD_ID128_ALLF to test if a 128bit ID is set to all 0xFF bytes, and to
initialize one to all 0xFF.
+ * KERNEL API BREAKAGE: Linux kernel 4.18 changed behaviour regarding
+ mknod() handling in user namespaces. Previously mknod() would always
+ fail with EPERM in user namespaces. Since 4.18 mknod() will succeed
+ but device nodes generated that way cannot be opened, and attempts to
+ open them result in EPERM. This breaks the "graceful fallback" logic
+ in systemd's PrivateDevices= sand-boxing option. This option is
+ implemented defensively, so that when systemd detects it runs in a
+ restricted environment (such as a user namespace, or an environment
+ where mknod() is blocked through seccomp or absence of CAP_SYS_MKNOD)
+ where device nodes cannot be created the effect of PrivateDevices= is
+ bypassed (following the logic that 2nd-level sand-boxing is not
+ essential if the system systemd runs in is itself already sand-boxed
+ as a whole). This logic breaks with 4.18 in container managers where
+ user namespacing is used: suddenly PrivateDevices= succeeds setting
+ up a private /dev/ file system containing devices nodes — but when
+ these are opened they don't work.
+
+ At this point is is recommended that container managers utilizing
+ user namespaces that intend to run systemd in the payload explicitly
+ block mknod() with seccomp or similar, so that the graceful fallback
+ logic works again.
+
+ We are very sorry for the breakage and the requirement to change
+ container configurations for newer kernels. It's purely caused by an
+ incompatible kernel change. The relevant kernel developers have been
+ notified about this userspace breakage quickly, but they chose to
+ ignore it.
+
Contributions from: afg, Alan Jenkins, Aleksei Timofeyev, Alexander
Filippov, Alexander Kurtz, Alexey Bogdanenko, Andreas Henriksson,
Andrew Jorgensen, Anita Zhang, apnix-uk, Arkan49, Arseny Maslennikov,