

Controlling access to user namespaces
source link: https://lwn.net/Articles/673597/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Controlling access to user namespaces
LWN.net needs you!
Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing
The user namespaces feature holds an interesting promise for system security: users can be confined within a namespace, given full root privileges within that namespace, and still be unable to adversely affect the system as a whole. The path to better security has, perhaps predictably, proved to be a bit rocky, however. In response, there is now an effort to make the feature configurable by system administrators, but this new configuration knob is proving to be a harder sell than one might expect.User namespaces are created by passing the CLONE_NEWUSER flag to the clone() or unshare() system calls. Administrators who are nervous about allowing access to this feature currently only have one option: configure out support at kernel build time. That option is not easily available to the many systems running distribution-built kernels, though. Kees Cook set out to create an easier way with this patch set creating a new sysctl knob to control access to the user-namespace feature, saying:
In particular, the patch adds a knob called /proc/sys/kernel/userns_restrict. When it is set to the default value (zero), user namespaces are unrestricted. Setting it to one allows only privileged users to create user namespaces; a setting of two disables user namespaces altogether. In that final case, it is not possible to re-enable user namespaces without rebooting the system.
One of the first issues to be aired had to do with naming: it turns out that Debian currently carries a similar patch, but, on Debian systems, the knob is called unprivileged_userns_clone and doesn't support the "privileged users only" setting. Ben Hutchings agreed that the new naming was probably better and said that, should Kees's patch go upstream, Debian would slowly move over to it.
Some developers worried that allowing user namespaces to be turned off would slow the process of finding and fixing any remaining security issues. Additionally, Serge Hallyn suggested that, if application developers could not count on the availability of user namespaces, they wouldn't use them at all. He suggested that, if the knob is accepted, it be marked as a short-term workaround that would eventually be removed.
The strongest opposition, though, came from Eric Biederman, the creator of user namespaces and also the developer who has done the most work on the sysctl code in recent times. He stated flat out that "the code is buggy, and poorly thought through" and would not be merged. In another message he described his objections in detail, starting with a challenge to the idea that user namespaces are a security risk at all:
Others, though, seem to think that, if problems elsewhere are being "amplified," there is indeed a security exposure. Andy Lutomirski described some concerns of his own:
Eric echoed the point that making it possible to disable user namespaces would be a net loss in security, since the feature would not be available on all systems. He cited web browsing with Chrome as a use case; Kees responded that this patch wasn't really aimed at desktop systems in the first place.
Next on Eric's list was a complaint that a system-wide knob was too coarse; he suggested that perhaps the seccomp() mechanism should be used instead if access to user namespaces must really be restricted. Kees's answer here is that it's not really possible to set a global seccomp() policy, that performance would suffer in any case, and that seccomp() is meant for developers to use rather than system administrators. "It's an extraordinarily big hammer for wanting to turn off a single area of the kernel with a long history of problems." He noted that trying to use a Linux security module to achieve this end would have a number of similar problems.
Then, Eric said, the sysctl knob could create "a false sense of security" since it would have no effect on processes that are already running in a user namespace. If a security issue comes to light, just turning off the knob will not be enough to protect a system; a reboot will also be necessary. Eric returned to this point later, calling the patch "fatally flawed" as a result of the "subtlety and nuance" involved in using it.
Kees acknowledged the "corner case" in the sysctl implementation, one that, he said, applies to a number of other, existing knobs as well. But, he said, it really does not matter to an administrator who simply wants to disable the feature outright as a way of reducing the attack surface of a system. Even so, he allowed: "I'm open to having this sysctl kill all CLONE_NEWUSERed process trees," without noting that having a sysctl knob kill off processes might pose some interesting "subtlety and nuance" of its own.
As a sort of postscript, Eric suggested that, perhaps, the desired restriction could be implemented as a resource limit controlling the number of user namespaces that any user would be allowed to create. Setting that number to zero would effectively disable the feature. Kees indicated a willingness to look at this idea; it is the end result he wants, rather than the sysctl knob itself.
There is an evident desire for the ability to turn off access to user namespaces; various other developers spoke in its favor over the course of the discussion. But this desire is clearly not universal and, as a result, the current patches do not appear to have an easy path into the mainline. It is entirely possible that the concerns blocking this feature may eventually be addressed and overcome, but it also seems possible that, in the end, this knob ends up being part of the patch set carried by distributors and users. It seems that getting security-related changes into the kernel is still a difficult task.
(Log in to post comments)
Controlling access to user namespaces
Posted Jan 29, 2016 1:56 UTC (Fri) by zuki (subscriber, #41808) [Link]
Also, I don't really buy the argument that setting the sysctl does not work retroactively and this is terrrrrrible. The same is true for most settings... If I had a setuid binary, dropping the bit only affects the future, running instances are not killed. If I change the permissions on a file, processes which had it open just continue. Etc, etc. For example kernel.modules_disabled=1 follows a similar pattern.
It seems that EB doesn't like that people want to disable some feature which he deeply cares about and loses objectivity. The "shortcomings" of the patch seem like things made up post factum to justify the initial emotional response.
Also a global per-user limit doesn't seem very useful. If there's a vulnerability, just one namespace is enough to exploit it. And otherwise, why would we care how many namespaced processes are running? So only two values of the limit make sense: 0 and infinity. So we're back to the original sysctl patch.
Recommend
-
147
This is NOT an official Google product. Overview NsJail is a process isolation tool for Linux. It utilizes Linux namespace subsystem, resource limits, and the seccomp-bpf syscall filters of the Linux kernel. It...
-
116
This is NOT an official Google product. Overview NsJail is a process isolation tool for Linux. It utilizes Linux namespace subsystem, resource limits, and the seccomp-bpf syscall filters of the Linux k...
-
49
HistoryAllCommentsChangesGit/SVN commitsRelated reports [2012-05-28 14:44 UTC] shiranai7 at hotmail dot com I cannot imagine any "valid" use case for this. Au...
-
74
除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。
-
55
Almost three years ago — wow, how time flies — I blogged aboutnamespace aliases and called them one of C++ most underrated features (which probably was a bit of a click bait). Let’s talk about some oth...
-
11
CISSP Study Notes Chapter 14 - Controlling and Monitoring Access Chapter 14 is about identity and access management (IAM), and discusses all kinds of different access control: role based, rule based, mandatory,discretionary,...
-
9
Jan 1, 2017 Controlling Access To The Memory Cache 原文链接https://lwn.net/Articles/694800/ 控制对Cache的访问 cpu对内存的访问一直以来都会通过L1/L2/L3缓存来加速...
-
7
Evolving Container Security With Linux User NamespacesEvolving Container Security With Linux User NamespacesBy Fabio Kung, Sargun Dhillon...
-
8
Community Controlling access in today’s digital-first world: Why it really, really matters
-
6
Controlling the access to the clipboard contents – Gonçalo Valério In a previous blog post published earlier this year I explored some security considerat...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK