collectl-lustre

Langue: en

Version: 57253 (mandriva - 22/10/07)

Section: 1 (Commandes utilisateur)

Overview

The first thing to understand about lustre reporting is in most cases, where one has configured the server(s) and just wants to monitor them, all one need do is specify -sl or -sL and collectl will do the right thing. It will automatically detect the type of service(s) currently running and will either record or display the appropriate data. If you select -sl and the system doesn't have lustre installed, it will warn you and then disable that switch.

Controlling Which Data is Displayed

It turns out that lustre records a wealth of performance data, far more than makes sense to display all the time and by default collectl tends to display minimal information such as bytes/operations read and written. At the client detail level lustre can differentiate this data at the filesystem and even the OST level! In order to accomodate the broadest flexibility one is allowed to control the way data is collected/displayed via several complementary switches.

-s

As is normally the case, one can specify '-sl' for summary level data, '-sL' for detail data or combine them to get both. However, since the client detail data can actually be presented at the individual filesystem or OST level, a third option '-sLL' has been included to indicate OST level details, noting that '-sL' aggregates at the filesystem.

-O

The '-O' switch is used to provide further detail about the types of data that is to be collected/displayed. There are 4 such values that collectl cares about:
B - rpc buffer level data.
D - disk block statistics, whicj applies to both MDS and OSS servers. One should also note this is specific to HP SFS and this data is not available in the open source version.
M - client metadata (note that this was the default prior to collectl V1.6.2).
R - read_ahead statistics. Unlike the other options, which generate a lot of data, -OR may be used with brief mode.
As it turns out, nothing is quite as simple as it seems and while the following case is not typical, it need to be addressed for completeness. Since collectl allows one to collect one set of data and to later display a different set, consider what happens in one were to collect multiple types of lustre data for an OST using -DB, but then just play back the basic OST data which is collected without specifying -O. By default, playback mode defaults to the settings data was collected with and to change the display one needs to explicity change those settings. To meet this need, there are 3 additional values one can use with -O, namely c, o and m to indicate on playback one wants to see the base data. Natually these can be combined with other valid values for -O as well for maximum flexibility.

In the spirit of letting the user display whatever they want to, collectl will allow one to select multiple values for -O and it will try to display the results appropriately. However, if you include '-oh' to minimize headers, the result will be ugly.

There are a few combinations of -s and -O that do not make sense and if you choose one, you will be told.

What About Playback?

As is always the case with playback, by default collectl will playback its recorded data based on the parameters selected for collection. In other words, if you specify '-OBR' in record mode, collectl will record both RPC buffer and read_ahead stats. When you play the data back, it will then display both as well. However, you also have the option of specifying -O, typically thought of as a collection-only switch, and it will force the output to what you'd like it to be. If you select a statistics type that hasn't been recorded, that information will be displayed, but as zeros.

Recognizing Service Configuration Changes

In some cases lustre services may change after collectl starts. This includes services starting and stopping as well as the configurations of those services themselves changing. For example one might occasionally mount/umount different lustre filesystems on a client. Not to worry. Collectl periodically checks for configuration changes and automatically adjusts the data it collects as well as anything it may be currently displaying. If you know that the configuration will be limited to only 1 or 2 possible services, you can reduce the overhead in checking for those services by specifying a finite list with -L. However, in most cases this extra overhead is not enough to make it worth bothering with.

The frequency at which collectl checks for configuration changes is controlled by the variable 'LustreConfigInt' in 'collectl.conf' and so can easily be overriden, but this typically shouldn't be necessary. It is also possible to specify this monitoring frequency via -L when collectl is started. One should note that the overhead in monitoring the state changes is related to the complexity of the server and has been observed to be less than 0.1 percent when checked every 10 seconds on a minimally configured server. However, since configuration changes are infrequent one should avoid monitoring at this frequency unless unless it's really deemed necessary.

Changing the Default Recording/Display Behavior

There are some times when you want specific control over what data is recorded or displayed rather than the default behavior. This is typically the case when a system is playing multiple roles by providing more than one service. For example, if a system has been configured as both an OST and a client, every time you run collectl you will collect or display data about both and sometimes this is NOT what you want. There may be other times where you have developed some reports or graphs that expect data in a standard format and you`ve collected a subset (or superset) of data.

To override this behavior of the lustre portion of the data (remember you can control the displaying of individual subsystems with -s), use -L to specify the type of services you`re interested in and collectl will only pay attention to those, both for recording to a file as well as display. When in recording mode, this will also limit the types of configuration changes collectl will watch for too. One should also note with -L it is possible to collect data for some services and later display data for a different set. Naturally when displaying data for services you never collectled data on, those services will print as zeros.

If all this sounds confusing, just experiment with various combinations of -s, -L and -O and observer the behavior.

Known Problems

In the process of enhancing lustre data collection in V1.5.0, it was discovered that collectl was not as robust as it should have been with respect to the identification of data files. As a result, -sLL may capture erroneous data which cannot be played back with either the version that collected it or any newer versions.

The number of reads/writes of for lustre client side data is wrong! The number KB's read and written when using the -sLL switch are reported as zero. This is a problem with lustre and has been reported.

AUTHOR

Copyright 2003-2007 Hewlett-Packard Development Company, LP collectl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the source kit