Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Re: monitor and graph "data transfer rate" [howto benchmark using ./dm]

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


Lars.Ellenberg at linbit

Aug 9, 2006, 4:04 PM

Post #1 of 2 (563 views)
Permalink
Re: monitor and graph "data transfer rate" [howto benchmark using ./dm]

/ 2006-08-09 17:50:26 -0400
\ Vampire D:
> I was looking for something like this, but super easy query or command.
> Like Monty said, since it is block level most transactions may not "ramp" up
> enough to really show real throughput, but some way of "testing" what the
> devices are able to keep up would go a long way for benchmarking, scaling,
> sizing, and making sure everything keeps running "up to snuff".

in the drbd tarball in the benchmark subdir,
we have a very simple tool called dm
(I don't remember why those letters).
you can also get it from
http://svn.drbd.org/drbd/trunk/benchmark/dm.c
compile it: gcc -O2 -o dm dm.c

it is basically some variation on the "dd" tool,
but you can switch on "progress" and "throughput" output,
and you can switch on fsync() before close.
it just does sequential io.

to benchmark WRITE throughput,
you use it like this:
./dm -a 0 -b 1M -s 500M -y -m -p -o $out_file_or_device

this will print lots of 'R's (requested by "-m", 500 of them to be
exact, one for each "block" (-b) up to the requested "size" (-s)).
the first of those Rs will print very fast, if you request several Gig
you will see it "hang" for a short time every few "R"s, and finally it
will hang for quite a while (thats the fsync requested by -y).
finally it will tell you the overall throughput.

if you leave off the fsync (-y), you will get very fast writes, as long
as they fit in some of the involved caches... this would be the
"virtual" throughput seen by most processes which don't use fsync.
but these are not very useful to figure out bottlenecks in the drbd
configuration and general setup.

you can tell dm where to write its data: use "-l 378G", and it will (try
to) seek 378G into the device (file would probably result in a sparse
file, which is not of particular interest). so if you have one disk of
400G, and have one partition on it using 400G, you could benchmark the
"inner" 10G, and the "outer" 10G by using different offsets here.

you will notice that the throughput differs significantly when using
inner or outer cylinders of your disks.

example run with "-b 1M -s 2M":
RR
10.48 MB/sec (2097152 B / 00:00.190802)

if you don't like the "R"s, leave off the -m ...

to measure local io bandwidth, you can use it directly on the lower
level device (or an equivalent dummy partition).
!!this is destructive!!
!!you will have to recreate a file system on that thing!!
./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/vg00/dummy

to measure local io bandwidth including file system overhead:
./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/dummy/dummy-out

to measure drbd performance in disconnected mode:
drbdadm disconnect dummy
./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9

(be prepared for some additional latency,
drbd housekeeping has to remember which
blocks are dirty now...)

... in connected mode
drbdadm connect dummy
./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9

still, the first write may be considerably slower than successive runs
of the same command, since the activity log will be "hot" after the
first one (as long as the size fits in the activity log completely)

... with file system
mkfs.xfs /dev/drbd9 ; mount /dev/drbd9 /mnt/drbd9-mount-point
./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/drbd9-mount-point/dummy-out

if you want to see the effect on power usage when writing 0xff instead
of 0x00, use "-a 0xff" :)

if you want to see the effect of the drbd activity log, use a size
considerably larger than what you configured as al-extents.

maybe you want to use "watch -n1 cat /proc/drbd" at the same time,
so you can see the figures move, the pe go up, the lo go up sometimes,
the ap go up, the dw and ns increase all the time, the al increasing not
too often, finally the pe, lo, and ap fall back to zero...

if you like, you could use
watch -n1 "cat /proc/drbd ; netstat -tn | grep -e ^Proto -e ':7788\>'"
which would also show you the drbd socket buffer usage, in case 7788 is
your drbd port. if you are curious, you should run this on both nodes.

to see the effect of resync on that, you could invalidate one node
(cause a full sync), and benchmark again.
then play with the sync rate parameter.

to be somewhat more reliable,
you should repeat each command several times.

to benchmark READ throughput, you use
./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/sdx
./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/drbd9
be careful: you'll need to use a size _considerably_ larger than your
RAM, or you'll see the linux caching effects on the second usage.
of course, you could also "shrink" the caches first.
to do so, since 2.6.16, you can
echo 3 > /proc/sys/vm/drop_caches
to get clean read throughput results.
before that, you can allocate and use huge amounts of memory, like this:
perl -e '$x = "X" x (1024*1024*500)'
# would allocate and use about 1 GB, it uses about twice as much as you
# say in those brackets... use as much as you got RAM (as long as you
# have some swap available) and the caches will shrink :)

or, even easier: you can just seek into the input device to some area
where it is unlikely to have been read before:
./dm -o /dev/null -b 1M -s 500M -m -p -k 7G -i /dev/drbd9
"-k 17G" makes it seek 17 gig into the given input "file".

you will notice here, too, that read performance varies considerably
with the "inner" and "outer" cylinders.
this can be as gross as 50MB/sec inner and 30MB/sec outer.


you can also benchmark network throughput with dm,
if you utilize netcat. e.g.,
me@x# nc -l -p 54321 -q0 >/dev/null
me@y# dm -a 0 -b 1M -s 500M -m -p -y | nc x 54321 -q0
two of them in reverse directions to see if your full duplex GigE does
what you think it should ...

...

you got the idea.

at least this is what we use to track down problems at customer clusters.
maybe sometime we script something around that, but most of the time we
like the flexibility of using the tool directly.
actually, we most of the time use rather a "data set size" of 800M to 6G...

but be prepared for a sligh degradation once you cross the size of the
activity log (al-extents parameter), as then drbd has to do synchronouse
updates to its meta data area for every additional 4M.


--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


vampired at gmail

Aug 9, 2006, 7:42 PM

Post #2 of 2 (540 views)
Permalink
Re: monitor and graph "data transfer rate" [howto benchmark using ./dm] [In reply to]

wow, this is great info, thanks!

On 8/9/06, Lars Ellenberg <Lars.Ellenberg [at] linbit> wrote:
>
> / 2006-08-09 17:50:26 -0400
> \ Vampire D:
> > I was looking for something like this, but super easy query or command.
> > Like Monty said, since it is block level most transactions may not
> "ramp" up
> > enough to really show real throughput, but some way of "testing" what
> the
> > devices are able to keep up would go a long way for benchmarking,
> scaling,
> > sizing, and making sure everything keeps running "up to snuff".
>
> in the drbd tarball in the benchmark subdir,
> we have a very simple tool called dm
> (I don't remember why those letters).
> you can also get it from
> http://svn.drbd.org/drbd/trunk/benchmark/dm.c
> compile it: gcc -O2 -o dm dm.c
>
> it is basically some variation on the "dd" tool,
> but you can switch on "progress" and "throughput" output,
> and you can switch on fsync() before close.
> it just does sequential io.
>
> to benchmark WRITE throughput,
> you use it like this:
> ./dm -a 0 -b 1M -s 500M -y -m -p -o $out_file_or_device
>
> this will print lots of 'R's (requested by "-m", 500 of them to be
> exact, one for each "block" (-b) up to the requested "size" (-s)).
> the first of those Rs will print very fast, if you request several Gig
> you will see it "hang" for a short time every few "R"s, and finally it
> will hang for quite a while (thats the fsync requested by -y).
> finally it will tell you the overall throughput.
>
> if you leave off the fsync (-y), you will get very fast writes, as long
> as they fit in some of the involved caches... this would be the
> "virtual" throughput seen by most processes which don't use fsync.
> but these are not very useful to figure out bottlenecks in the drbd
> configuration and general setup.
>
> you can tell dm where to write its data: use "-l 378G", and it will (try
> to) seek 378G into the device (file would probably result in a sparse
> file, which is not of particular interest). so if you have one disk of
> 400G, and have one partition on it using 400G, you could benchmark the
> "inner" 10G, and the "outer" 10G by using different offsets here.
>
> you will notice that the throughput differs significantly when using
> inner or outer cylinders of your disks.
>
> example run with "-b 1M -s 2M":
> RR
> 10.48 MB/sec (2097152 B / 00:00.190802)
>
> if you don't like the "R"s, leave off the -m ...
>
> to measure local io bandwidth, you can use it directly on the lower
> level device (or an equivalent dummy partition).
> !!this is destructive!!
> !!you will have to recreate a file system on that thing!!
> ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/vg00/dummy
>
> to measure local io bandwidth including file system overhead:
> ./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/dummy/dummy-out
>
> to measure drbd performance in disconnected mode:
> drbdadm disconnect dummy
> ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9
>
> (be prepared for some additional latency,
> drbd housekeeping has to remember which
> blocks are dirty now...)
>
> ... in connected mode
> drbdadm connect dummy
> ./dm -a 0 -b 1M -s 500M -y -m -p -o /dev/drbd9
>
> still, the first write may be considerably slower than successive runs
> of the same command, since the activity log will be "hot" after the
> first one (as long as the size fits in the activity log completely)
>
> ... with file system
> mkfs.xfs /dev/drbd9 ; mount /dev/drbd9 /mnt/drbd9-mount-point
> ./dm -a 0 -b 1M -s 500M -y -m -p -o /mnt/drbd9-mount-point/dummy-out
>
> if you want to see the effect on power usage when writing 0xff instead
> of 0x00, use "-a 0xff" :)
>
> if you want to see the effect of the drbd activity log, use a size
> considerably larger than what you configured as al-extents.
>
> maybe you want to use "watch -n1 cat /proc/drbd" at the same time,
> so you can see the figures move, the pe go up, the lo go up sometimes,
> the ap go up, the dw and ns increase all the time, the al increasing not
> too often, finally the pe, lo, and ap fall back to zero...
>
> if you like, you could use
> watch -n1 "cat /proc/drbd ; netstat -tn | grep -e ^Proto -e ':7788\>'"
> which would also show you the drbd socket buffer usage, in case 7788 is
> your drbd port. if you are curious, you should run this on both nodes.
>
> to see the effect of resync on that, you could invalidate one node
> (cause a full sync), and benchmark again.
> then play with the sync rate parameter.
>
> to be somewhat more reliable,
> you should repeat each command several times.
>
> to benchmark READ throughput, you use
> ./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/sdx
> ./dm -o /dev/null -b 1M -s 500M -m -p -i /dev/drbd9
> be careful: you'll need to use a size _considerably_ larger than your
> RAM, or you'll see the linux caching effects on the second usage.
> of course, you could also "shrink" the caches first.
> to do so, since 2.6.16, you can
> echo 3 > /proc/sys/vm/drop_caches
> to get clean read throughput results.
> before that, you can allocate and use huge amounts of memory, like this:
> perl -e '$x = "X" x (1024*1024*500)'
> # would allocate and use about 1 GB, it uses about twice as much as you
> # say in those brackets... use as much as you got RAM (as long as you
> # have some swap available) and the caches will shrink :)
>
> or, even easier: you can just seek into the input device to some area
> where it is unlikely to have been read before:
> ./dm -o /dev/null -b 1M -s 500M -m -p -k 7G -i /dev/drbd9
> "-k 17G" makes it seek 17 gig into the given input "file".
>
> you will notice here, too, that read performance varies considerably
> with the "inner" and "outer" cylinders.
> this can be as gross as 50MB/sec inner and 30MB/sec outer.
>
>
> you can also benchmark network throughput with dm,
> if you utilize netcat. e.g.,
> me@x# nc -l -p 54321 -q0 >/dev/null
> me@y# dm -a 0 -b 1M -s 500M -m -p -y | nc x 54321 -q0
> two of them in reverse directions to see if your full duplex GigE does
> what you think it should ...
>
> ...
>
> you got the idea.
>
> at least this is what we use to track down problems at customer clusters.
> maybe sometime we script something around that, but most of the time we
> like the flexibility of using the tool directly.
> actually, we most of the time use rather a "data set size" of 800M to
> 6G...
>
> but be prepared for a sligh degradation once you cross the size of the
> activity log (al-extents parameter), as then drbd has to do synchronouse
> updates to its meta data area for every additional 4M.
>
>
> --
> : Lars Ellenberg Tel +43-1-8178292-0 :
> : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
> : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
> __
> please use the "List-Reply" function of your email client.
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user
>



--
"Do the actors on Unsolved Mysteries ever get arrested because they look
just like the criminal they are playing?"

Christopher

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.