Langue: en

Autres versions - même langue

Version: $Date:$ (ubuntu - 24/10/10)

Autres sections - même nom

Section: 3 (Bibliothèques de fonctions)


lmbench - benchmarking toolbox


#include ``lmbench.h''

typedef u_long        iter_t

typedef (*benchmp_f)(iter_t iterations, void* cookie)

void  benchmp(benchmp_f initialize, benchmp_f benchmark, benchmp_f cleanup, int enough, int parallel, int warmup, int repetitions, void* cookie)

uint64        get_n()

void  milli(char *s, uint64 n)

void  micro(char *s, uint64 n)

void  nano(char *s, uint64 n)

void  mb(uint64 bytes)

void  kb(uint64 bytes)


Creating benchmarks using the
lmbench timing harness is easy. Since it is so easy to measure performance using lmbench , it is possible to quickly answer questions that arise during system design, development, or tuning. For example, image processing

There are two attributes that are critical for performance, latency and bandwidth, and lmbench's timing harness makes it easy to measure and report results for both. Latency is usually important for frequently executed operations, and bandwidth is usually important when moving large chunks of data.

There are a number of factors to consider when building benchmarks.

The timing harness requires that the benchmarked operation be idempotent so that it can be repeated indefinitely.

The timing subsystem, benchmp, is passed up to three function pointers. Some benchmarks may need as few as one function pointer (for benchmark).

void      benchmp(initialize, benchmark, cleanup, enough, parallel, warmup, repetitions, cookie)
measures the performance of
benchmark repeatedly and reports the median result. benchmp creates parallel sub-processes which run benchmark in parallel. This allows lmbench to measure the system's ability to scale as the number of client processes increases. Each sub-process executes initialize before starting the benchmarking cycle with iterations set to 0. It will call initialize , benchmark , and cleanup with iterations set to the number of iterations in the timing loop several times in order to collect repetitions results. The calls to benchmark are surrounded by start and stop call to time the amount of time it takes to do the benchmarked operation iterations times. After all the benchmark results have been collected, cleanup is called with iterations set to 0 to cleanup any resources which may have been allocated by initialize or benchmark. cookie is a void pointer to a hunk of memory that can be used to store any parameters or state that is needed by the benchmark.
void   benchmp_getstate()
returns a void pointer to the lmbench-internal state used during
benchmarking. The state is not to be used or accessed directly by clients, but rather would be passed into benchmp_interval.
iter_t     benchmp_interval(void* state)
returns the number of times the benchmark should execute its
benchmark loop during this timing interval. This is used only for weird benchmarks which cannot implement the benchmark body in a function which can return, such as the page fault handler. Please see lat_sig.c for sample usage.
uint64 get_n()
returns the number of times
loop_body was executed during the timing interval.
void   milli(char *s, uint64 n)
print out the time per operation in milli-seconds.
n is the number of operations during the timing interval, which is passed as a parameter because each loop_body can contain several operations.
void   micro(char *s, uint64 n)
print the time per opertaion in micro-seconds.
void   nano(char *s, uint64 n)
print the time per operation in nano-seconds.
void   mb(uint64 bytes)
print the bandwidth in megabytes per second.
void   kb(uint64 bytes)
print the bandwidth in kilobytes per second.

USING lmbench

Here is an example of a simple benchmark that measures the latency of the random number generator lrand48():
#include ``lmbench.h''

benchmark_lrand48(iter_t iterations, void* cookie) {
      while(iterations-- > 0)

main(int argc, char *argv[])
      benchmp(NULL, benchmark_lrand48, NULL, 0, 1, 0, TRIES, NULL);
      micro(lrand48(), get_n());

Here is a simple benchmark that measures and reports the bandwidth of bcopy:

#include ``lmbench.h''

#define MB (1024 * 1024)
#define SIZE (8 * MB)

struct _state {
      int size;
      char* a;
      char* b;

initialize_bcopy(iter_t iterations, void* cookie) {
      struct _state* state = (struct _state*)cookie;

if (!iterations) return;
      state->a = malloc(state->size);
      state->b = malloc(state->size);
      if (state->a == NULL || state->b == NULL)

benchmark_bcopy(iter_t iterations, void* cookie) {
      struct _state* state = (struct _state*)cookie;

      while(iterations-- > 0)
              bcopy(state->a, state->b, state->size);

cleanup_bcopy(iter_t iterations, void* cookie) {
      struct _state* state = (struct _state*)cookie;

if (!iterations) return;

main(int argc, char *argv[])
      struct _state state;

      state.size = SIZE;
      benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy,
              0, 1, 0, TRIES, &state);
      mb(get_n() * state.size);

A slightly more complex version of the bcopy benchmark might measure bandwidth as a function of memory size and parallelism. The main procedure in this case might look something like this:

main(int argc, char *argv[])
      int     size, par;
      struct _state state;

      for (size = 64; size <= SIZE; size <<= 1) {
              for (par = 1; par < 32; par <<= 1) {
                      state.size = size;
                      benchmp(initialize_bcopy, benchmark_bcopy,
                              cleanup_bcopy, 0, par, 0, TRIES, &state);
                      fprintf(stderr, d\t%d\t
                    mb(par * get_n() * state.size);


There are three environment variables that can be used to modify the lmbench timing subsystem: ENOUGH, TIMING_O, and LOOP_O.


Development of lmbench is continuing.


lmbench(8), timing(3), reporting(3), results(3).


Carl Staelin and Larry McVoy

Comments, suggestions, and bug reports are always welcome.