Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


y-goto at jp

Jul 31, 2008, 4:50 AM

Post #1 of 9 (2213 views)
Permalink
[RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing

Hello.

This patch set is first trial and to describe my rough idea of
"how to remove pgdat".

I would like to confirm "current my idea is good way or not" by this post.
This patch is incomplete and not tested yet, If my idea is good way,
I'll continue to make them and test.

I think pgdat removing is diffcult issue,
because any code doesn't know pgdat will be removed, and access
them without any locking now. But the pgdat remover must wait their access,
because the node may be removed electrically after it soon.

Current my idea is using RCU feature for waiting them.
Because it is the least impact against reader's performance,
and pgdat remover can wait finish of reader's access to pgdat
which is removing by synchronize_sched().

So, I made followings read_lock for accessing pgdat.
- pgdat_remove_read_lock()/unlock()
- pgdat_remove_read_lock_sleepable()/unlock_sleepable()
These definishions use rcu_read_lock and srcu_read_lock().

Writer uses node_set_offline() which uses clear_bit(),
and build_all_zonelists() with stop_machine_run().


There are a few types of pgdat access.

1) via node_online_bitmap.
Many code use for_each_xxx_node(), for_each_zone(), and so on.
These code must be used with pgdat_remove_read_lock/unlock().

2) mempolicy
There are callback interface when memory offline works. mempolicy
must use callbacks for disable removing node.
This patch set includes quite simple (sample) patch to point
what will be required. However more detail specification will be necessary.
(ex, When preffered node of mempolicy is removing, how does kernel should do?)

3) zonelist
alloc_pages access zones via zonelist. However, zone may be removed
by pgdat remover too. It must be check zones might be removed
before accessing zonliest which is guarded between pgdat_remove_read_lock()
and unlock().

4) via NODE_DATA() with node_id.
This type access is called with numa_node_id() in many case.
Basically, CPUs on the removing node must be removed before removing node.
So, I used BUG_ON() when numa_node_id() is points offlined node.

If node id is specified by other way, offline_node must be checked and
escape when it is offline...


If my idea is bad way, other way I can tell is...
- read_write_lock(). (It should n't be used...)
- collect pgdats on one node (depends on performance)

If you have better idea, please let me know.


Note:
- I don't add pgdat_remove_read_lock() on boot code.
Because pgdat hot-removing will not work at boot time.
(But I may overlook some places which must use pgdat_remove_read_lock() yet.)


Thanks.


--
Yasunori Goto


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux-foundation

Jul 31, 2008, 7:04 AM

Post #2 of 9 (2163 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

Yasunori Goto wrote:

> Current my idea is using RCU feature for waiting them.
> Because it is the least impact against reader's performance,
> and pgdat remover can wait finish of reader's access to pgdat
> which is removing by synchronize_sched().

The use of RCU disables preemption which has implications as to what can be done in a loop over nodes or zones. This would also potentially add more overhead to the page allocator hotpaths.


> If you have better idea, please let me know.

Use stop_machine()? The removal of a zone or node is a pretty rare event after all and it would avoid having to deal with rcu etc etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


y-goto at jp

Aug 1, 2008, 2:42 AM

Post #3 of 9 (2141 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

> Yasunori Goto wrote:
>
> > Current my idea is using RCU feature for waiting them.
> > Because it is the least impact against reader's performance,
> > and pgdat remover can wait finish of reader's access to pgdat
> > which is removing by synchronize_sched().
>
> The use of RCU disables preemption which has implications as to
> what can be done in a loop over nodes or zones.

Yeap. It's the one of (big) cons.

> This would also potentially add more overhead to the page allocator hotpaths.

Agree.

To tell the truth, I tried hackbench with 3rd patch which add rcu_read_lock
in hot-path before this post to make rough estimate its impact.

%hackbench 100 process 2000

without patch.
39.93

with patch
39.99
(Both is 10 times avarage)

I guess this result has effect of disable preemption.
So, throughput looks not so bad, but probably, latency would be worse
as you mind.

Kame-san advised me I should take more other benchmarks which can get memory
performance. I'll do it next week.

> > If you have better idea, please let me know.
>
> Use stop_machine()? The removal of a zone or node is a pretty rare event
> after all and it would avoid having to deal with rcu etc etc.
>

I thought it at first, but are there the following worst case?


CPU 0 CPU 1
-------------------------------------------------------
__alloc_pages()

parsing_zonelist()
:
enter page_reclarim()
sleep (and remember zone) :
:
update zonelist and node_online_map
with stop_machine_run()
free pgdat().
remove the Node electrically.

wake up and touch remembered
zone, but it is removed
(Oops!!!)



Anyway, I'm happy if there is better way than my poor idea. :-)

Thanks for your comment.


--
Yasunori Goto


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux-foundation

Aug 1, 2008, 6:51 AM

Post #4 of 9 (2143 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

Yasunori Goto wrote:

> I thought it at first, but are there the following worst case?
>
>
> CPU 0 CPU 1
> -------------------------------------------------------
> __alloc_pages()
>
> parsing_zonelist()
> :
> enter page_reclarim()
> sleep (and remember zone) :
> :
> update zonelist and node_online_map
> with stop_machine_run()
> free pgdat().
> remove the Node electrically.
>
> wake up and touch remembered
> zone, but it is removed
> (Oops!!!)
>
>
>
> Anyway, I'm happy if there is better way than my poor idea. :-)
>
> Thanks for your comment.

Duh. Then the use of RCU would also mean that all of reclaim must be in a rcu period. So reclaim cannot sleep anymore.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


y-goto at jp

Aug 1, 2008, 5:16 PM

Post #5 of 9 (2140 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

> Yasunori Goto wrote:
>
> > I thought it at first, but are there the following worst case?
> >
> >
> > CPU 0 CPU 1
> > -------------------------------------------------------
> > __alloc_pages()
> >
> > parsing_zonelist()
> > :
> > enter page_reclarim()
> > sleep (and remember zone) :
> > :
> > update zonelist and node_online_map
> > with stop_machine_run()
> > free pgdat().
> > remove the Node electrically.
> >
> > wake up and touch remembered
> > zone, but it is removed
> > (Oops!!!)
> >
> >
> >
> > Anyway, I'm happy if there is better way than my poor idea. :-)
> >
> > Thanks for your comment.
>
> Duh. Then the use of RCU would also mean that all of reclaim must
> be in a rcu period. So reclaim cannot sleep anymore.

I use srcu_read_lock() (sleepable rcu lock) if kernel must be sleep for
page reclaim. So, my patch basic idea is followings.


CPU 0 CPU 1
-------------------------------------------------------
__alloc_pages()

rcu_read_lock() and check
online bitmap
parsing_zonelist()
rcu_read_unlock()
:
enter page_reclarim()
srcu_read_lock()
parse zone/zonelist.
sleep (and remember zone) :
:
update zonelist and node_online_map
with stop_machine_run()

wake up and touch remembered zone,
srcu_read_unlock()
syncronized_sched().
free_pgdat()


Thanks.

--
Yasunori Goto


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux-foundation

Aug 4, 2008, 6:25 AM

Post #6 of 9 (2128 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

Yasunori Goto wrote:

>>> Thanks for your comment.
>> Duh. Then the use of RCU would also mean that all of reclaim must
>> be in a rcu period. So reclaim cannot sleep anymore.
>
> I use srcu_read_lock() (sleepable rcu lock) if kernel must be sleep for
> page reclaim. So, my patch basic idea is followings.

But that introduces more overhead in __alloc_pages.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


y-goto at jp

Aug 4, 2008, 11:39 PM

Post #7 of 9 (2124 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

> >> Duh. Then the use of RCU would also mean that all of reclaim must
> >> be in a rcu period. So reclaim cannot sleep anymore.
> >
> > I use srcu_read_lock() (sleepable rcu lock) if kernel must be sleep for
> > page reclaim. So, my patch basic idea is followings.
>
> But that introduces more overhead in __alloc_pages.

Hmmm. I think SRCU should be used when kernel has to sleep, and sleep time
will be bigger than SRCU's overhead.....

The followings are results of unixbench and lmbench.
I suppose my patch impacts lantency rather than throghput.
In these results, 100fd select and page fault latencies of lmbench became worse.
So I can't say there is no problem in my patches.

Anyway, I'll retry to find other less impact way if there is,
and compare benchmark results with this way.

Bye.

------------

Unixbench
-----

Normal 2.6.27-rc1-mm1


BYTE UNIX Benchmarks (Version 4.1.0)
System -- Linux localhost.localdomain 2.6.27-rc1-mm1 #1 SMP Mon Aug 4 16:08:48 JST 2008 ia64 ia64 ia64 GNU/Linux
Start Benchmark Run: 2008年 8月 5日 火曜日 10:24:35 JST
1 interactive users.
10:24:35 up 9 min, 1 user, load average: 0.16, 0.08, 0.03
lrwxrwxrwx 1 root root 4 2008-02-25 15:48 /bin/sh -> bash
/bin/sh: symbolic link to `bash'
/dev/sda5 33792348 18360424 13687672 58% /home
Execl Throughput 2954.0 lps (29.8 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks 1211570.0 KBps (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks 281599.0 KBps (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks 218859.0 KBps (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks 328725.0 KBps (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks 72850.0 KBps (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks 57095.0 KBps (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks 3883690.0 KBps (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks 1050752.0 KBps (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks 564703.0 KBps (30.0 secs, 3 samples)
Pipe Throughput 462027.5 lps (10.0 secs, 10 samples)
Pipe-based Context Switching 105824.3 lps (10.0 secs, 10 samples)
Process Creation 2242.9 lps (30.0 secs, 3 samples)
System Call Overhead 1320907.8 lps (10.0 secs, 10 samples)
Shell Scripts (1 concurrent) 4442.1 lpm (60.0 secs, 3 samples)
Shell Scripts (8 concurrent) 1810.0 lpm (60.0 secs, 3 samples)
Shell Scripts (16 concurrent) 1042.7 lpm (60.0 secs, 3 samples)


INDEX VALUES
TEST BASELINE RESULT INDEX

Execl Throughput 43.0 2954.0 687.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 218859.0 552.7
File Copy 256 bufsize 500 maxblocks 1655.0 57095.0 345.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 564703.0 973.6
Pipe Throughput 12440.0 462027.5 371.4
Pipe-based Context Switching 4000.0 105824.3 264.6
Process Creation 126.0 2242.9 178.0
Shell Scripts (8 concurrent) 6.0 1810.0 3016.7
System Call Overhead 15000.0 1320907.8 880.6
=========
FINAL SCORE 565.6



2.6.27-rc1-mm1 with my patch


BYTE UNIX Benchmarks (Version 4.1.0)
System -- Linux localhost.localdomain 2.6.27-rc1-mm1-goto-test #2 SMP Mon Aug 4 18:50:56 JST 2008 ia64 ia64 ia64 GNU/Linux
Start Benchmark Run: 2008年 8月 4日 月曜日 20:35:11 JST
1 interactive users.
20:35:11 up 1:37, 1 user, load average: 0.00, 0.29, 0.71
lrwxrwxrwx 1 root root 4 2008-02-25 15:48 /bin/sh -> bash
/bin/sh: symbolic link to `bash'
/dev/sda5 33792348 18360420 13687676 58% /home
Execl Throughput 2949.0 lps (29.7 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks 1317211.0 KBps (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks 282643.0 KBps (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks 220360.0 KBps (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks 361448.0 KBps (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks 73172.0 KBps (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks 57489.0 KBps (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks 3819448.0 KBps (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks 1026563.0 KBps (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks 585218.0 KBps (30.0 secs, 3 samples)
Pipe Throughput 482681.7 lps (10.0 secs, 10 samples)
Pipe-based Context Switching 101437.7 lps (10.0 secs, 10 samples)
Process Creation 2237.5 lps (30.0 secs, 3 samples)
System Call Overhead 1282198.4 lps (10.0 secs, 10 samples)
Shell Scripts (1 concurrent) 4447.7 lpm (60.0 secs, 3 samples)
Shell Scripts (8 concurrent) 1812.7 lpm (60.0 secs, 3 samples)
Shell Scripts (16 concurrent) 1041.7 lpm (60.0 secs, 3 samples)


INDEX VALUES
TEST BASELINE RESULT INDEX

Execl Throughput 43.0 2949.0 685.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 220360.0 556.5
File Copy 256 bufsize 500 maxblocks 1655.0 57489.0 347.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 585218.0 1009.0
Pipe Throughput 12440.0 482681.7 388.0
Pipe-based Context Switching 4000.0 101437.7 253.6
Process Creation 126.0 2237.5 177.6
Shell Scripts (8 concurrent) 6.0 1812.7 3021.2
System Call Overhead 15000.0 1282198.4 854.8
=========
FINAL SCORE 566.8





LMBENCH

The first lines are results of normal 2.6.27-rc1-mm1.
The second lines are results with my patch.



L M B E N C H 3 . 0 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)

Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1
localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
localhost Linux 2.6.27- 1600 0.03 0.23 3.12 4.45 6.73 0.27 1.75 227. 463. 2219
localhost Linux 2.6.27- 1600 0.03 0.23 3.13 4.44 6.74 0.27 1.73 207. 448. 2230

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
localhost Linux 2.6.27- 11.3 11.4 11.5 11.5 12.7 11.8 14.6
localhost Linux 2.6.27- 11.5 11.4 11.5 11.6 12.8 11.9 14.7

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
localhost Linux 2.6.27- 11.3 8.464 28.3 13.4 28.7 46.
localhost Linux 2.6.27- 11.5 8.470 28.3 13.4 32.2 46.

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
localhost Linux 2.6.27- 15.1 13.4 45.6 25.4 24.0K 0.384 0.23850 2.804
localhost Linux 2.6.27- 15.8 13.3 43.0 26.0 24.1K 0.401 0.25150 2.835

*Local* Communication bandwidths in MB/s - bigger is better
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1
localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
localhost Linux 2.6.27- 1600 0.03 0.23 3.12 4.45 6.73 0.27 1.75 227. 463. 2219
localhost Linux 2.6.27- 1600 0.03 0.23 3.13 4.44 6.74 0.27 1.73 207. 448. 2230

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
localhost Linux 2.6.27- 11.3 11.4 11.5 11.5 12.7 11.8 14.6
localhost Linux 2.6.27- 11.5 11.4 11.5 11.6 12.8 11.9 14.7

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
localhost Linux 2.6.27- 11.3 8.464 28.3 13.4 28.7 46.
localhost Linux 2.6.27- 11.5 8.470 28.3 13.4 32.2 46.

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
localhost Linux 2.6.27- 15.1 13.4 45.6 25.4 24.0K 0.384 0.23850 2.804 <---!!!
localhost Linux 2.6.27- 15.8 13.3 43.0 26.0 24.1K 0.401 0.25150 2.835 <----!!!

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
localhost Linux 2.6.27- 4814 4100 1188 2087.4 523.2 549.6 274.9 458. 523.5
localhost Linux 2.6.27- 4811 4111 1219 2090.8 523.1 549.4 276.1 458. 523.5
(END)



--
Yasunori Goto


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Aug 5, 2008, 4:14 AM

Post #8 of 9 (2121 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

On (05/08/08 15:39), Yasunori Goto didst pronounce:
>
> > >> Duh. Then the use of RCU would also mean that all of reclaim must
> > >> be in a rcu period. So reclaim cannot sleep anymore.
> > >
> > > I use srcu_read_lock() (sleepable rcu lock) if kernel must be sleep for
> > > page reclaim. So, my patch basic idea is followings.
> >
> > But that introduces more overhead in __alloc_pages.
>
> Hmmm. I think SRCU should be used when kernel has to sleep, and sleep time
> will be bigger than SRCU's overhead.....
>
> The followings are results of unixbench and lmbench.
> I suppose my patch impacts lantency rather than throghput.
> In these results, 100fd select and page fault latencies of lmbench became worse.
> So I can't say there is no problem in my patches.
>
> Anyway, I'll retry to find other less impact way if there is,
> and compare benchmark results with this way.
>

Maybe I am missing something, but what is wrong with stop_machine during
memory hot-remove?

> Bye.
>
> ------------
>
> Unixbench
> -----
>
> Normal 2.6.27-rc1-mm1
>
>
> BYTE UNIX Benchmarks (Version 4.1.0)
> System -- Linux localhost.localdomain 2.6.27-rc1-mm1 #1 SMP Mon Aug 4 16:08:48 JST 2008 ia64 ia64 ia64 GNU/Linux
> Start Benchmark Run: 2008?$BG/ 8?$B7n 5?$BF| ?$B2PMKF| 10:24:35 JST
> 1 interactive users.
> 10:24:35 up 9 min, 1 user, load average: 0.16, 0.08, 0.03
> lrwxrwxrwx 1 root root 4 2008-02-25 15:48 /bin/sh -> bash
> /bin/sh: symbolic link to `bash'
> /dev/sda5 33792348 18360424 13687672 58% /home
> Execl Throughput 2954.0 lps (29.8 secs, 3 samples)
> File Read 1024 bufsize 2000 maxblocks 1211570.0 KBps (30.0 secs, 3 samples)
> File Write 1024 bufsize 2000 maxblocks 281599.0 KBps (30.0 secs, 3 samples)
> File Copy 1024 bufsize 2000 maxblocks 218859.0 KBps (30.0 secs, 3 samples)
> File Read 256 bufsize 500 maxblocks 328725.0 KBps (30.0 secs, 3 samples)
> File Write 256 bufsize 500 maxblocks 72850.0 KBps (30.0 secs, 3 samples)
> File Copy 256 bufsize 500 maxblocks 57095.0 KBps (30.0 secs, 3 samples)
> File Read 4096 bufsize 8000 maxblocks 3883690.0 KBps (30.0 secs, 3 samples)
> File Write 4096 bufsize 8000 maxblocks 1050752.0 KBps (30.0 secs, 3 samples)
> File Copy 4096 bufsize 8000 maxblocks 564703.0 KBps (30.0 secs, 3 samples)
> Pipe Throughput 462027.5 lps (10.0 secs, 10 samples)
> Pipe-based Context Switching 105824.3 lps (10.0 secs, 10 samples)
> Process Creation 2242.9 lps (30.0 secs, 3 samples)
> System Call Overhead 1320907.8 lps (10.0 secs, 10 samples)
> Shell Scripts (1 concurrent) 4442.1 lpm (60.0 secs, 3 samples)
> Shell Scripts (8 concurrent) 1810.0 lpm (60.0 secs, 3 samples)
> Shell Scripts (16 concurrent) 1042.7 lpm (60.0 secs, 3 samples)
>
>
> INDEX VALUES
> TEST BASELINE RESULT INDEX
>
> Execl Throughput 43.0 2954.0 687.0
> File Copy 1024 bufsize 2000 maxblocks 3960.0 218859.0 552.7
> File Copy 256 bufsize 500 maxblocks 1655.0 57095.0 345.0
> File Copy 4096 bufsize 8000 maxblocks 5800.0 564703.0 973.6
> Pipe Throughput 12440.0 462027.5 371.4
> Pipe-based Context Switching 4000.0 105824.3 264.6
> Process Creation 126.0 2242.9 178.0
> Shell Scripts (8 concurrent) 6.0 1810.0 3016.7
> System Call Overhead 15000.0 1320907.8 880.6
> =========
> FINAL SCORE 565.6
>
>
>
> 2.6.27-rc1-mm1 with my patch
>
>
> BYTE UNIX Benchmarks (Version 4.1.0)
> System -- Linux localhost.localdomain 2.6.27-rc1-mm1-goto-test #2 SMP Mon Aug 4 18:50:56 JST 2008 ia64 ia64 ia64 GNU/Linux
> Start Benchmark Run: 2008?$BG/ 8?$B7n 4?$BF| ?$B7nMKF| 20:35:11 JST
> 1 interactive users.
> 20:35:11 up 1:37, 1 user, load average: 0.00, 0.29, 0.71
> lrwxrwxrwx 1 root root 4 2008-02-25 15:48 /bin/sh -> bash
> /bin/sh: symbolic link to `bash'
> /dev/sda5 33792348 18360420 13687676 58% /home
> Execl Throughput 2949.0 lps (29.7 secs, 3 samples)
> File Read 1024 bufsize 2000 maxblocks 1317211.0 KBps (30.0 secs, 3 samples)
> File Write 1024 bufsize 2000 maxblocks 282643.0 KBps (30.0 secs, 3 samples)
> File Copy 1024 bufsize 2000 maxblocks 220360.0 KBps (30.0 secs, 3 samples)
> File Read 256 bufsize 500 maxblocks 361448.0 KBps (30.0 secs, 3 samples)
> File Write 256 bufsize 500 maxblocks 73172.0 KBps (30.0 secs, 3 samples)
> File Copy 256 bufsize 500 maxblocks 57489.0 KBps (30.0 secs, 3 samples)
> File Read 4096 bufsize 8000 maxblocks 3819448.0 KBps (30.0 secs, 3 samples)
> File Write 4096 bufsize 8000 maxblocks 1026563.0 KBps (30.0 secs, 3 samples)
> File Copy 4096 bufsize 8000 maxblocks 585218.0 KBps (30.0 secs, 3 samples)
> Pipe Throughput 482681.7 lps (10.0 secs, 10 samples)
> Pipe-based Context Switching 101437.7 lps (10.0 secs, 10 samples)
> Process Creation 2237.5 lps (30.0 secs, 3 samples)
> System Call Overhead 1282198.4 lps (10.0 secs, 10 samples)
> Shell Scripts (1 concurrent) 4447.7 lpm (60.0 secs, 3 samples)
> Shell Scripts (8 concurrent) 1812.7 lpm (60.0 secs, 3 samples)
> Shell Scripts (16 concurrent) 1041.7 lpm (60.0 secs, 3 samples)
>
>
> INDEX VALUES
> TEST BASELINE RESULT INDEX
>
> Execl Throughput 43.0 2949.0 685.8
> File Copy 1024 bufsize 2000 maxblocks 3960.0 220360.0 556.5
> File Copy 256 bufsize 500 maxblocks 1655.0 57489.0 347.4
> File Copy 4096 bufsize 8000 maxblocks 5800.0 585218.0 1009.0
> Pipe Throughput 12440.0 482681.7 388.0
> Pipe-based Context Switching 4000.0 101437.7 253.6
> Process Creation 126.0 2237.5 177.6
> Shell Scripts (8 concurrent) 6.0 1812.7 3021.2
> System Call Overhead 15000.0 1282198.4 854.8
> =========
> FINAL SCORE 566.8
>
>
>
>
>
> LMBENCH
>
> The first lines are results of normal 2.6.27-rc1-mm1.
> The second lines are results with my patch.
>
>
>
> L M B E N C H 3 . 0 S U M M A R Y
> ------------------------------------
> (Alpha software, do not distribute)
>
> Basic system parameters
> ------------------------------------------------------------------------------
> Host OS Description Mhz tlb cache mem scal
> pages line par load
> bytes
> --------- ------------- ----------------------- ---- ----- ----- ------ ----
> localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1
> localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1
>
> Processor, Processes - times in microseconds - smaller is better
> ------------------------------------------------------------------------------
> Host OS Mhz null null open slct sig sig fork exec sh
> call I/O stat clos TCP inst hndl proc proc proc
> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
> localhost Linux 2.6.27- 1600 0.03 0.23 3.12 4.45 6.73 0.27 1.75 227. 463. 2219
> localhost Linux 2.6.27- 1600 0.03 0.23 3.13 4.44 6.74 0.27 1.73 207. 448. 2230
>
> Context switching - times in microseconds - smaller is better
> -------------------------------------------------------------------------
> Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
> ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
> --------- ------------- ------ ------ ------ ------ ------ ------- -------
> localhost Linux 2.6.27- 11.3 11.4 11.5 11.5 12.7 11.8 14.6
> localhost Linux 2.6.27- 11.5 11.4 11.5 11.6 12.8 11.9 14.7
>
> *Local* Communication latencies in microseconds - smaller is better
> ---------------------------------------------------------------------
> Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
> ctxsw UNIX UDP TCP conn
> --------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
> localhost Linux 2.6.27- 11.3 8.464 28.3 13.4 28.7 46.
> localhost Linux 2.6.27- 11.5 8.470 28.3 13.4 32.2 46.
>
> File & VM system latencies in microseconds - smaller is better
> -------------------------------------------------------------------------------
> Host OS 0K File 10K File Mmap Prot Page 100fd
> Create Delete Create Delete Latency Fault Fault selct
> --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
> localhost Linux 2.6.27- 15.1 13.4 45.6 25.4 24.0K 0.384 0.23850 2.804
> localhost Linux 2.6.27- 15.8 13.3 43.0 26.0 24.1K 0.401 0.25150 2.835
>
> *Local* Communication bandwidths in MB/s - bigger is better
> ------------------------------------------------------------------------------
> Host OS Description Mhz tlb cache mem scal
> pages line par load
> bytes
> --------- ------------- ----------------------- ---- ----- ----- ------ ----
> localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1
> localhost Linux 2.6.27- ia64-linux-gnu 1600 128 1
>
> Processor, Processes - times in microseconds - smaller is better
> ------------------------------------------------------------------------------
> Host OS Mhz null null open slct sig sig fork exec sh
> call I/O stat clos TCP inst hndl proc proc proc
> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
> localhost Linux 2.6.27- 1600 0.03 0.23 3.12 4.45 6.73 0.27 1.75 227. 463. 2219
> localhost Linux 2.6.27- 1600 0.03 0.23 3.13 4.44 6.74 0.27 1.73 207. 448. 2230
>
> Context switching - times in microseconds - smaller is better
> -------------------------------------------------------------------------
> Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
> ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
> --------- ------------- ------ ------ ------ ------ ------ ------- -------
> localhost Linux 2.6.27- 11.3 11.4 11.5 11.5 12.7 11.8 14.6
> localhost Linux 2.6.27- 11.5 11.4 11.5 11.6 12.8 11.9 14.7
>
> *Local* Communication latencies in microseconds - smaller is better
> ---------------------------------------------------------------------
> Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
> ctxsw UNIX UDP TCP conn
> --------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
> localhost Linux 2.6.27- 11.3 8.464 28.3 13.4 28.7 46.
> localhost Linux 2.6.27- 11.5 8.470 28.3 13.4 32.2 46.
>
> File & VM system latencies in microseconds - smaller is better
> -------------------------------------------------------------------------------
> Host OS 0K File 10K File Mmap Prot Page 100fd
> Create Delete Create Delete Latency Fault Fault selct
> --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
> localhost Linux 2.6.27- 15.1 13.4 45.6 25.4 24.0K 0.384 0.23850 2.804 <---!!!
> localhost Linux 2.6.27- 15.8 13.3 43.0 26.0 24.1K 0.401 0.25150 2.835 <----!!!
>
> *Local* Communication bandwidths in MB/s - bigger is better
> -----------------------------------------------------------------------------
> Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
> UNIX reread reread (libc) (hand) read write
> --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
> localhost Linux 2.6.27- 4814 4100 1188 2087.4 523.2 549.6 274.9 458. 523.5
> localhost Linux 2.6.27- 4811 4111 1219 2090.8 523.1 549.4 276.1 458. 523.5
> (END)
>
>
>
> --
> Yasunori Goto
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux-foundation

Aug 5, 2008, 10:08 AM

Post #9 of 9 (2123 views)
Permalink
Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing [In reply to]

Mel Gorman wrote:

> Maybe I am missing something, but what is wrong with stop_machine during
> memory hot-remove?

Reclaim can sleep while going down a zonelist. There would need to be some
form of synchronization to avoid removing a zone from the zonelist that we are
just scanning.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.