Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] jffs2: Fix memory corruption in jffs2_read_inode_range()

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


avorontsov at ru

Nov 20, 2009, 11:45 AM

Post #1 of 4 (139 views)
Permalink
[PATCH] jffs2: Fix memory corruption in jffs2_read_inode_range()

In 2.6.23 kernel, commit a32ea1e1f925399e0d81ca3f7394a44a6dafa12c
("Fix read/truncate race") fixed a race in the generic code, and as a
side effect, now do_generic_file_read() can ask us to readpage() past
the i_size, which seems to be correctly handled by the block routines
(e.g. block_read_full_page() fills the page with zeroes in case if
somebody is trying to read past the last inode's block).

JFFS2 doesn't handle this, instead jffs2_read_inode_range() is trying
to lookup the fragments which do not belong to anything, and
jffs2_lookup_node_frag() happily returns "the closest smaller match"
which is OK for most cases (I guess), but in our case there wasn't
anything meaningful to lookup in the first place.

After we found the bogus fragment, the code can branch to 'Filling
frag hole' case that does

memset(buf, 0, min(end, frag->ofs + frag->size) - offset);

and since 'frag' is bogus, its ofs + size can be smaller than the real
'offset' (i.e. index << PAGE_CACHE_SHIFT), and so the code turns into

memset(buf, 0, <huge unsigned negative>);

Hopefully, in most cases the corruption is fatal, and quickly causing
random oopses, like this:

root [at] 10:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc01cd980
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
NIP [c01cd980] rb_insert_color+0x38/0x184
LR [c0043978] enqueue_hrtimer+0x88/0xc4
Call Trace:
[c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable)
[c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4
[c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc
[c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8
[c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254
[c6c63c30] [c001352c] ret_from_except+0x0/0x14
--- Exception: 901 at memset+0x38/0x5c
LR = jffs2_read_inode_range+0x144/0x17c
[c6c63cf0] [00000000] (null) (unreliable)

This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with
JFFS2 filesystem that were failing since 2.6.23 (seems like the bug
above also broke the truncation).

Note that the bug is only reproducible with write-buffered MTD devices
(e.g. NAND, UBI), which is a mystery to me (though I didn't look close
enough, possibly there is a race between gc and wbuf code, but I don't
see it and obviously it isn't that fatal).

Signed-off-by: Anton Vorontsov <avorontsov [at] ru>
---
fs/jffs2/file.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index 5edc2bf..e1d7a1d 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -85,7 +85,13 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
pg_buf = kmap(pg);
/* FIXME: Can kmap fail? */

- ret = jffs2_read_inode_range(c, f, pg_buf, pg->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
+ if (pg->index > ((i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT)) {
+ ret = 0;
+ memset(pg_buf, 0, PAGE_CACHE_SIZE);
+ } else {
+ ret = jffs2_read_inode_range(c, f, pg_buf,
+ pg->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
+ }

if (ret) {
ClearPageUptodate(pg);
--
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dwmw2 at infradead

Nov 22, 2009, 2:20 AM

Post #2 of 4 (123 views)
Permalink
Re: [PATCH] jffs2: Fix memory corruption in jffs2_read_inode_range() [In reply to]

On Fri, 2009-11-20 at 12:45 -0700, Anton Vorontsov wrote:
> + if (pg->index > ((i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT)) {
> + ret = 0;
> + memset(pg_buf, 0, PAGE_CACHE_SIZE);
> + } else {
> + ret = jffs2_read_inode_range(c, f, pg_buf,
> + pg->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
> + }

Thank you for the excellent diagnosis and the patch.

I think I'd prefer to fix it a little differently though -- I would be
happier to make jffs2_read_inode_range() cope with out-of-file reads,
rather than adding this special case where we don't call it.

That way we aren't at all susceptible to potential races between the
VFS-maintained i_size and our own internal fragtree handling. And
jffs2_read_inode_range() already handles the memset to zero for various
other reasons anyway.

Does this patch look OK to you? It seems to work on the test cases I've
tried.

diff --git a/fs/jffs2/read.c b/fs/jffs2/read.c
index cfe05c1..9640175 100644
--- a/fs/jffs2/read.c
+++ b/fs/jffs2/read.c
@@ -164,12 +164,14 @@ int jffs2_read_inode_range(struct jffs2_sb_info *c, struct jffs2_inode_info *f,

/* XXX FIXME: Where a single physical node actually shows up in two
frags, we read it twice. Don't do that. */
- /* Now we're pointing at the first frag which overlaps our page */
+ /* Now we're pointing at the first frag which overlaps our page
+ * (or perhaps is before it, if we've been asked to read off the
+ * end of the file). */
while(offset < end) {
D2(printk(KERN_DEBUG "jffs2_read_inode_range: offset %d, end %d\n", offset, end));
- if (unlikely(!frag || frag->ofs > offset)) {
+ if (unlikely(!frag || frag->ofs > offset || frag->ofs + frag->size <= offset)) {
uint32_t holesize = end - offset;
- if (frag) {
+ if (frag && frag->ofs > offset) {
D1(printk(KERN_NOTICE "Eep. Hole in ino #%u fraglist. frag->ofs = 0x%08x, offset = 0x%08x\n", f->inocache->ino, frag->ofs, offset));
holesize = min(holesize, frag->ofs - offset);
}


--
David Woodhouse Open Source Technology Centre
David.Woodhouse [at] intel Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avorontsov at ru

Nov 23, 2009, 5:16 AM

Post #3 of 4 (119 views)
Permalink
Re: [PATCH] jffs2: Fix memory corruption in jffs2_read_inode_range() [In reply to]

On Sun, Nov 22, 2009 at 10:20:59AM +0000, David Woodhouse wrote:
> On Fri, 2009-11-20 at 12:45 -0700, Anton Vorontsov wrote:
> > + if (pg->index > ((i_size_read(inode) - 1) >> PAGE_CACHE_SHIFT)) {
> > + ret = 0;
> > + memset(pg_buf, 0, PAGE_CACHE_SIZE);
> > + } else {
> > + ret = jffs2_read_inode_range(c, f, pg_buf,
> > + pg->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
> > + }
>
> Thank you for the excellent diagnosis and the patch.
>
> I think I'd prefer to fix it a little differently though -- I would be
> happier to make jffs2_read_inode_range() cope with out-of-file reads,
> rather than adding this special case where we don't call it.
>
> That way we aren't at all susceptible to potential races between the
> VFS-maintained i_size and our own internal fragtree handling. And
> jffs2_read_inode_range() already handles the memset to zero for various
> other reasons anyway.
>
> Does this patch look OK to you? It seems to work on the test cases I've
> tried.

Yep, it looks good (and works).

Thanks David!

--
Anton Vorontsov
email: cbouatmailru [at] gmail
irc://irc.freenode.net/bd2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dwmw2 at infradead

Nov 23, 2009, 5:47 AM

Post #4 of 4 (120 views)
Permalink
[PATCH] jffs2: Fix memory corruption in jffs2_read_inode_range() [In reply to]

In 2.6.23 kernel, commit a32ea1e1f925399e0d81ca3f7394a44a6dafa12c
("Fix read/truncate race") fixed a race in the generic code, and as a
side effect, now do_generic_file_read() can ask us to readpage() past
the i_size. This seems to be correctly handled by the block routines
(e.g. block_read_full_page() fills the page with zeroes in case if
somebody is trying to read past the last inode's block).

JFFS2 doesn't handle this; it assumes that it won't be asked to read
pages which don't exist -- and thus that there will be at least _one_
valid 'frag' on the page it's being asked to read. It will fill any
holes with the following memset:

memset(buf, 0, min(end, frag->ofs + frag->size) - offset);

When the 'closest smaller match' returned by jffs2_lookup_node_frag() is
actually on a previous page and ends before 'offset', that results in:

memset(buf, 0, <huge unsigned negative>);

Hopefully, in most cases the corruption is fatal, and quickly causing
random oopses, like this:

root [at] 10:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc01cd980
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
NIP [c01cd980] rb_insert_color+0x38/0x184
LR [c0043978] enqueue_hrtimer+0x88/0xc4
Call Trace:
[c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable)
[c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4
[c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc
[c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8
[c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254
[c6c63c30] [c001352c] ret_from_except+0x0/0x14
--- Exception: 901 at memset+0x38/0x5c
LR = jffs2_read_inode_range+0x144/0x17c
[c6c63cf0] [00000000] (null) (unreliable)

This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with
JFFS2 filesystem that were failing since 2.6.23 (seems like the bug
above also broke the truncation).

Reported-By: Anton Vorontsov <avorontsov [at] ru>
Tested-By: Anton Vorontsov <avorontsov [at] ru>
Signed-off-by: David Woodhouse <David.Woodhouse [at] intel>
Cc: stable [at] kernel

diff --git a/fs/jffs2/read.c b/fs/jffs2/read.c
index cfe05c1..3f39be1 100644
--- a/fs/jffs2/read.c
+++ b/fs/jffs2/read.c
@@ -164,12 +164,15 @@ int jffs2_read_inode_range(struct jffs2_sb_info *c, struct jffs2_inode_info *f,

/* XXX FIXME: Where a single physical node actually shows up in two
frags, we read it twice. Don't do that. */
- /* Now we're pointing at the first frag which overlaps our page */
+ /* Now we're pointing at the first frag which overlaps our page
+ * (or perhaps is before it, if we've been asked to read off the
+ * end of the file). */
while(offset < end) {
D2(printk(KERN_DEBUG "jffs2_read_inode_range: offset %d, end %d\n", offset, end));
- if (unlikely(!frag || frag->ofs > offset)) {
+ if (unlikely(!frag || frag->ofs > offset ||
+ frag->ofs + frag->size <= offset)) {
uint32_t holesize = end - offset;
- if (frag) {
+ if (frag && frag->ofs > offset) {
D1(printk(KERN_NOTICE "Eep. Hole in ino #%u fraglist. frag->ofs = 0x%08x, offset = 0x%08x\n", f->inocache->ino, frag->ofs, offset));
holesize = min(holesize, frag->ofs - offset);
}


--
David Woodhouse Open Source Technology Centre
David.Woodhouse [at] intel Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.