
cal.leeming at simplicitymedialtd
May 31, 2011, 5:42 AM
Post #10 of 13
(690 views)
Permalink
|
|
Re: port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - which can cause DoS conditions.
[In reply to]
|
|
On 31/05/2011 13:25, Gert Doering wrote: > Hi, > > On Tue, May 31, 2011 at 12:11:13PM +0100, Cal Leeming [Simplicity Media Ltd] wrote: >> Could you point out the line of code where oom_adj_save is set to the >> original value, because I've looked everywhere, and from what I can >> tell, it's only ever set to INT_MIN, and no where else is it changed. >> (C is not my strongest language tho, so I most likely have overlooked >> something). This is where I got thrown off. > oom_adjust_setup() does this: > > if ((fp = fopen(oom_adj_path, "r+")) != NULL) { > if (fscanf(fp, "%d",&oom_adj_save) != 1) > verbose("error reading %s: %s", oom_adj_path, > strerror(errno)); > > the "fscanf()" call will read an integer ("%d") from the file named, > and write that number into the variable being pointed to (&oom_adj_save). > > The loop is a bit tricky to read as it takes different paths into > account, and will exit after the first successful update. > > fscanf() will return the number of successful conversions, so if it > was able to read "one number", the return value is "1", and it will > jump to the else block > > else { > rewind(fp); > if (fprintf(fp, "%d\n", value)<= 0) > verbose("error writing %s: %s", > oom_adj_path, strerror(errno)); > else > verbose("Set %s from %d to %d", > oom_adj_path, oom_adj_save, value); > } > > where it will overwrite what is in that file with the new value > ("value"), and then print the "Set ... from -17 to -17" message that > you saw. Ah, thank you for explaining this. Makes a lot more sense now :) > >>> The question here is why sshd is sometimes started with -17 and sometimes >>> with 0 - that's the bug, not that sshd keeps what it's given. >>> >>> (Ask yourself: if sshd had no idea about oom_adj at all, would this make >>> it buggy by not changing the value?) >> This was what I was trying to pinpoint down before. I had came to this >> conclusion myself, sent it to the Debian bug list, and they dismissed >> on the grounds that it was an openssh problem... > I must admit that I have no idea what is causing it, but from the logs, > it very much looks like sshd is started with "-17" in there - but only > in the problem case. > > >> So far, the buck has been passed from kernel-mm to debian-kernel, to >> openssh, and now back to debian-kernel lol. The most annoying thing, >> is that you can't get this bug to happen unless you physically test on >> a machine which requires the bnx2 firmwire, so I get the feeling this >> won't get resolved :/ > And *that* strongly points to a bug in the bnx2 stuff - if other programs > change their behaviour based on the existance of a given driver, that > does not smell very healthy. Agreed.. I was thinking of adding some debug into the fs/proc/ code which does a kprint on every oom_adj read/write, but I couldn't figure out how to extract the pid from the task (pointer?). > [..] >>> Anyway, as a workaround for your system, you can certainly set >>> >>> oom_adj_save = 0; >>> >>> in the beginning of port-linux.c / oom_adjust_restore(), to claim that >>> "hey, this was the saved value to start with" and "restore" oom_adj to 0 >>> then - but that's just hiding the bug, not fixing it. >> I'm disappointed this wasn't the correct fix, I honestly thought I had >> patched it right :( > Well, that's the short hand - "just ignore everything that happened at > init / save time, and forcibly write back '0', no matter what was there > before". > >> But, on the other hand, ssh users should really never have a default >> oom_adj of -17, so maybe 0 should be set as default anyway? If this is >> not the case, could you give reasons why?? > Well, I would say "the default value in there is a matter of local policy", > so what if someone wants to make sure that whatever is run from sshd is > always privileged regarding oom, even if a local firefox etc. is running > amock and you need to ssh-in and kill the GUI stuff... > > One might opt to run sshd (and all its children) at "-5" (slightly special > treatment), or "0" (no special treatment), or even at "-17" - but that's > local policy. Ah, okay that's make sense. > > Mmmh. > > Since this seems to be inherited, it might even work if you just change > the sshd startup script, and insert > > echo 0>/proc/self/oom_adj > > in there, right before it starts the sshd... "local policy at work". Yeah I was going to do this, but then I thought "well if this problem is occurring for openssh, then what else could it be affecting?". As you pointed out above, having the oom_adj changed based on the existence of a driver is really not good. I will paste this convo trail into the debian ticket, and hopefully it'll help convince them this issue needs fixing. > gert Thanks again for taking the time to reply! _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev [at] mindrot https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
|