Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] Forbid invocation of kexec_load() outside initial PID namespace

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


berrange at redhat

Aug 3, 2012, 3:53 AM

Post #1 of 10 (217 views)
Permalink
[PATCH] Forbid invocation of kexec_load() outside initial PID namespace

From: "Daniel P. Berrange" <berrange [at] redhat>

The following commit

commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
Author: Daniel Lezcano <daniel.lezcano [at] free>
Date: Wed Mar 28 14:42:51 2012 -0700

pidns: add reboot_pid_ns() to handle the reboot syscall

introduced custom handling of the reboot() syscall when invoked
from a non-initial PID namespace. The intent was that a process
in a container can be allowed to keep CAP_SYS_BOOT and execute
reboot() to shutdown/reboot just their private container, rather
than the host.

Unfortunately the kexec_load() syscall also relies on the
CAP_SYS_BOOT capability. So by allowing a container to keep
this capability to safely invoke reboot(), they mistakenly
also gain the ability to use kexec_load(). The solution is
to make kexec_load() return -EPERM if invoked from a PID
namespace that is not the initial namespace

Signed-off-by: Daniel P. Berrange <berrange [at] redhat>
Cc: Serge Hallyn <serge.hallyn [at] canonical>
Cc: Daniel Lezcano <daniel.lezcano [at] free>
Cc: Michael Kerrisk <mtk.manpages [at] gmail>
Cc: "Eric W. Biederman" <ebiederm [at] xmission>
Cc: Tejun Heo <tj [at] kernel>
Cc: Oleg Nesterov <oleg [at] redhat>
---
kernel/kexec.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 0668d58..b152bde 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
if (!capable(CAP_SYS_BOOT))
return -EPERM;

+ /* Processes in containers must not be allowed to load a new
+ * kernel, even if they have CAP_SYS_BOOT */
+ if (task_active_pid_ns(current) != &init_pid_ns)
+ return -EPERM;
+
/*
* Verify we have a legal set of flags
* This leaves us room for future extensions.
--
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


richard.weinberger at gmail

Aug 3, 2012, 4:25 AM

Post #2 of 10 (206 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

On Fri, Aug 3, 2012 at 12:53 PM, Daniel P. Berrange <berrange [at] redhat> wrote:
> From: "Daniel P. Berrange" <berrange [at] redhat>
>
> The following commit
>
> commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
> Author: Daniel Lezcano <daniel.lezcano [at] free>
> Date: Wed Mar 28 14:42:51 2012 -0700
>
> pidns: add reboot_pid_ns() to handle the reboot syscall
>
> introduced custom handling of the reboot() syscall when invoked
> from a non-initial PID namespace. The intent was that a process
> in a container can be allowed to keep CAP_SYS_BOOT and execute
> reboot() to shutdown/reboot just their private container, rather
> than the host.
>
> Unfortunately the kexec_load() syscall also relies on the
> CAP_SYS_BOOT capability. So by allowing a container to keep
> this capability to safely invoke reboot(), they mistakenly
> also gain the ability to use kexec_load(). The solution is
> to make kexec_load() return -EPERM if invoked from a PID
> namespace that is not the initial namespace
>
> Signed-off-by: Daniel P. Berrange <berrange [at] redhat>
> Cc: Serge Hallyn <serge.hallyn [at] canonical>
> Cc: Daniel Lezcano <daniel.lezcano [at] free>
> Cc: Michael Kerrisk <mtk.manpages [at] gmail>
> Cc: "Eric W. Biederman" <ebiederm [at] xmission>
> Cc: Tejun Heo <tj [at] kernel>
> Cc: Oleg Nesterov <oleg [at] redhat>

Why is this commit not marked for -stable?

--
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Aug 3, 2012, 5:45 AM

Post #3 of 10 (203 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

The solution is to use user namespaces and to only test ns_capable on the magic reboot path.

For the 3.7 timeframe that should be a realistic solution.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


berrange at redhat

Aug 3, 2012, 5:52 AM

Post #4 of 10 (205 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

On Fri, Aug 03, 2012 at 05:45:40AM -0700, Eric W. Biederman wrote:
> The solution is to use user namespaces and to only test ns_capable on the magic reboot path.
>
> For the 3.7 timeframe that should be a realistic solution.

Hmm, that would imply that if LXC wants to allow reboot()/CAP_SYS_BOOT
they will be forced to use CLONE_NEWUSER. I was rather looking for a way
to allow the container to keep CAP_SYS_BOOT, without also mandating use
of user namespaces.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Aug 3, 2012, 6:07 AM

Post #5 of 10 (205 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

"Daniel P. Berrange" <berrange [at] redhat> wrote:

>On Fri, Aug 03, 2012 at 05:45:40AM -0700, Eric W. Biederman wrote:
>> The solution is to use user namespaces and to only test ns_capable on
>the magic reboot path.
>>
>> For the 3.7 timeframe that should be a realistic solution.
>
>Hmm, that would imply that if LXC wants to allow reboot()/CAP_SYS_BOOT
>they will be forced to use CLONE_NEWUSER. I was rather looking for a
>way
>to allow the container to keep CAP_SYS_BOOT, without also mandating use
>of user namespaces.

If we remove the use of CAP_SYS_BOOT on the container reboot path perhaps.

But you have hit one small issue in the huge pile of issues why giving contaners capabilities is generally a bad idea.

This is the reason I have been insisting on a reasonable version of user namespaces for a long time.

When the security issues become important it is time for user namespaces. That is their purpose.

Eric



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge at hallyn

Aug 4, 2012, 4:15 PM

Post #6 of 10 (200 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

Eric,

during the container reboot discussion, the agreement was reached that rebooting for real fron non-init pid ns is not safe. Restarting userspace (in pidns caller owns) is. I argue the same reasoning supports this.

I haven't had a chance to review the patch, but the idea gets my ack. I'll look at the patch asap.

I'm also fine with splitting cap_sys_boot into a user and system caps. The former would only be needed targeted to the userns of the init pid, while the latter would be required to init_user_ns. Then containers could safely be given cap_sys_restart or whatever, but not cap_sys_boot which authorizes kexec and machine reset/poweroff.

----- Original message -----
> "Daniel P. Berrange" <berrange [at] redhat> wrote:
>
> > On Fri, Aug 03, 2012 at 05:45:40AM -0700, Eric W. Biederman wrote:
> > > The solution is to use user namespaces and to only test ns_capable on
> > the magic reboot path.
> > >
> > > For the 3.7 timeframe that should be a realistic solution.
> >
> > Hmm, that would imply that if LXC wants to allow reboot()/CAP_SYS_BOOT
> > they will be forced to use CLONE_NEWUSER. I was rather looking for a
> > way
> > to allow the container to keep CAP_SYS_BOOT, without also mandating use
> > of user namespaces.
>
> If we remove the use of CAP_SYS_BOOT on the container reboot path
> perhaps.
>
> But you have hit one small issue in the huge pile of issues why giving
> contaners capabilities is generally a bad idea.
>
> This is the reason I have been insisting on a reasonable version of user
> namespaces for a long time.
>
> When the security issues become important it is time for user
> namespaces.    That is their purpose.
>
> Eric
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to majordomo [at] vger
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge at hallyn

Aug 6, 2012, 12:00 PM

Post #7 of 10 (197 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

Quoting Daniel P. Berrange (berrange [at] redhat):
> From: "Daniel P. Berrange" <berrange [at] redhat>
>
> The following commit
>
> commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
> Author: Daniel Lezcano <daniel.lezcano [at] free>
> Date: Wed Mar 28 14:42:51 2012 -0700
>
> pidns: add reboot_pid_ns() to handle the reboot syscall
>
> introduced custom handling of the reboot() syscall when invoked
> from a non-initial PID namespace. The intent was that a process
> in a container can be allowed to keep CAP_SYS_BOOT and execute
> reboot() to shutdown/reboot just their private container, rather
> than the host.
>
> Unfortunately the kexec_load() syscall also relies on the
> CAP_SYS_BOOT capability. So by allowing a container to keep
> this capability to safely invoke reboot(), they mistakenly
> also gain the ability to use kexec_load(). The solution is
> to make kexec_load() return -EPERM if invoked from a PID
> namespace that is not the initial namespace
>
> Signed-off-by: Daniel P. Berrange <berrange [at] redhat>
> Cc: Serge Hallyn <serge.hallyn [at] canonical>

Acked-by: Serge Hallyn <serge.hallyn [at] canonical>

(Please see my previous email explaining why I believe the pidns
is an appropriate check)

> Cc: Daniel Lezcano <daniel.lezcano [at] free>
> Cc: Michael Kerrisk <mtk.manpages [at] gmail>
> Cc: "Eric W. Biederman" <ebiederm [at] xmission>
> Cc: Tejun Heo <tj [at] kernel>
> Cc: Oleg Nesterov <oleg [at] redhat>
> ---
> kernel/kexec.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 0668d58..b152bde 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -947,6 +947,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> if (!capable(CAP_SYS_BOOT))
> return -EPERM;
>
> + /* Processes in containers must not be allowed to load a new
> + * kernel, even if they have CAP_SYS_BOOT */
> + if (task_active_pid_ns(current) != &init_pid_ns)
> + return -EPERM;
> +
> /*
> * Verify we have a legal set of flags
> * This leaves us room for future extensions.
> --
> 1.7.11.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Aug 6, 2012, 12:16 PM

Post #8 of 10 (202 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

"Serge E. Hallyn" <serge [at] hallyn> writes:

> Quoting Daniel P. Berrange (berrange [at] redhat):
>> From: "Daniel P. Berrange" <berrange [at] redhat>
>>
>> The following commit
>>
>> commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
>> Author: Daniel Lezcano <daniel.lezcano [at] free>
>> Date: Wed Mar 28 14:42:51 2012 -0700
>>
>> pidns: add reboot_pid_ns() to handle the reboot syscall
>>
>> introduced custom handling of the reboot() syscall when invoked
>> from a non-initial PID namespace. The intent was that a process
>> in a container can be allowed to keep CAP_SYS_BOOT and execute
>> reboot() to shutdown/reboot just their private container, rather
>> than the host.
>>
>> Unfortunately the kexec_load() syscall also relies on the
>> CAP_SYS_BOOT capability. So by allowing a container to keep
>> this capability to safely invoke reboot(), they mistakenly
>> also gain the ability to use kexec_load(). The solution is
>> to make kexec_load() return -EPERM if invoked from a PID
>> namespace that is not the initial namespace
>>
>> Signed-off-by: Daniel P. Berrange <berrange [at] redhat>
>> Cc: Serge Hallyn <serge.hallyn [at] canonical>
>
> Acked-by: Serge Hallyn <serge.hallyn [at] canonical>
>
> (Please see my previous email explaining why I believe the pidns
> is an appropriate check)

Serge as to your objects.

If we define kexec_load in terms of the pid namespace then something
makes sense, but the error should be EINVAL, or something of that sort.

That is what we did with reboot. We defined reboot in terms of the pid
namespace.

Not defining kexec_load in terms of the pid namespace and then returning
EPERM because having we happen to have CAP_SYS_BOOT for other reasons is
semantically horrible.

At the end of the day the effect is the same, but I think it matters a
great deal in how we think about things.

We have CAP_SYS_BOOT in the initial user namespace. We do have
permission to make the system call.

So I continue to see this patch the way it is current constructed as
broken.

Nacked-by: "Eric W. Biederman" <ebiederm [at] xmission>

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge at hallyn

Aug 6, 2012, 12:20 PM

Post #9 of 10 (195 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

Quoting Serge Hallyn (serge [at] hallyn):
> Eric,
>
> during the container reboot discussion, the agreement was reached that rebooting for real fron non-init pid ns is not safe. Restarting userspace (in pidns caller owns) is. I argue the same reasoning supports this.
>
> I haven't had a chance to review the patch, but the idea gets my ack. I'll look at the patch asap.
>
> I'm also fine with splitting cap_sys_boot into a user and system caps. The former would only be needed targeted to the userns of the init pid, while the latter would be required to init_user_ns. Then containers could safely be given cap_sys_restart or whatever, but not cap_sys_boot which authorizes kexec and machine reset/poweroff.

Splitting the cap up into CAP_RESTART (restart /sbin/init) and CAP_BOOT
(reboot hardware or kexec kernel) has the advantage that the capabilities
each remain simpler to parse, no 'in this context it means that'.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge at hallyn

Aug 6, 2012, 12:24 PM

Post #10 of 10 (193 views)
Permalink
Re: [PATCH] Forbid invocation of kexec_load() outside initial PID namespace [In reply to]

Quoting Eric W. Biederman (ebiederm [at] xmission):
> "Serge E. Hallyn" <serge [at] hallyn> writes:
>
> > Quoting Daniel P. Berrange (berrange [at] redhat):
> >> From: "Daniel P. Berrange" <berrange [at] redhat>
> >>
> >> The following commit
> >>
> >> commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
> >> Author: Daniel Lezcano <daniel.lezcano [at] free>
> >> Date: Wed Mar 28 14:42:51 2012 -0700
> >>
> >> pidns: add reboot_pid_ns() to handle the reboot syscall
> >>
> >> introduced custom handling of the reboot() syscall when invoked
> >> from a non-initial PID namespace. The intent was that a process
> >> in a container can be allowed to keep CAP_SYS_BOOT and execute
> >> reboot() to shutdown/reboot just their private container, rather
> >> than the host.
> >>
> >> Unfortunately the kexec_load() syscall also relies on the
> >> CAP_SYS_BOOT capability. So by allowing a container to keep
> >> this capability to safely invoke reboot(), they mistakenly
> >> also gain the ability to use kexec_load(). The solution is
> >> to make kexec_load() return -EPERM if invoked from a PID
> >> namespace that is not the initial namespace
> >>
> >> Signed-off-by: Daniel P. Berrange <berrange [at] redhat>
> >> Cc: Serge Hallyn <serge.hallyn [at] canonical>
> >
> > Acked-by: Serge Hallyn <serge.hallyn [at] canonical>
> >
> > (Please see my previous email explaining why I believe the pidns
> > is an appropriate check)
>
> Serge as to your objects.
>
> If we define kexec_load in terms of the pid namespace then something
> makes sense, but the error should be EINVAL, or something of that sort.

Makes sense.

> That is what we did with reboot. We defined reboot in terms of the pid
> namespace.
>
> Not defining kexec_load in terms of the pid namespace and then returning
> EPERM because having we happen to have CAP_SYS_BOOT for other reasons is
> semantically horrible.
>
> At the end of the day the effect is the same, but I think it matters a
> great deal in how we think about things.
>
> We have CAP_SYS_BOOT in the initial user namespace. We do have
> permission to make the system call.
>
> So I continue to see this patch the way it is current constructed as
> broken.
>
> Nacked-by: "Eric W. Biederman" <ebiederm [at] xmission>

I do also prefer splitting the capability. Michael Kerrisk, do you
have any good suggestions for better names than CAP_RESTART (for
killing or restarting /sbin/init) and CAP_BOOT (for kexec and/or
hardware resets)? Maybe CAP_RESTART_USER and CAP_RESTART_HW?
(CAP_SYS_BOOT being an alias for both for backward compatibility)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.