mk-archiver

Langue: en

Version: 2008-12-29 (fedora - 04/07/09)

Section: 1 (Commandes utilisateur)

NAME

mk-archiver - Archive rows from a MySQL table into another table or a file.

SYNOPSIS

  mk-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
     --file '/var/log/archive/%Y-%m-%d-%D.%t' --limit 1000 --commit-each
 
 

DESCRIPTION

mk-archiver is the tool I use to archive tables as described in <http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-archiving-and-purging-jobs-in-sql/>. The goal is a low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much. You can insert the data into another table, which need not be on the same server. You can also write it to a file in a format suitable for LOAD DATA INFILE. Or you can do neither, in which case it's just an incremental DELETE.

mk-archiver is extensible via a plugin mechanism. You can inject your own code to add advanced archiving logic that could be useful for archiving dependent data, applying complex business rules, or building a data warehouse during the archiving process.

You need to choose values carefully for some options. The most important are ``--limit'', ``--retries'', and ``--txnsize''.

The strategy is to find the first row(s), then scan some index forward-only to find more rows efficiently. Each subsequent query should not scan the entire table; it should seek into the index, then scan until it finds more archivable rows. Specifying the index with the 'i' part of the ``--source'' argument can be crucial for this; use ``--test'' to examine the generated queries and be sure to EXPLAIN them to see if they are efficient (most of the time you probably want to scan the PRIMARY key, which is the default). Even better, profile mk-archiver with mk-query-profiler and make sure it is not scanning the whole table every query.

You can disable the seek-then-scan optimizations partially or wholly with ``--noascend'' and ``--ascendfirst''. Sometimes this may be more efficient for multi-column keys.

ERROR-HANDLING

mk-archiver tries to catch signals and exit gracefully; for example, if you send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the signal, print a message about the signal, and exit fairly normally. It will not execute ``--analyze'' or ``--optimize'', because these may take a long time to finish. It will run all other code normally, including calling after_finish() on any plugins (see ``EXTENDING'').

In other words, a signal, if caught, will break out of the main archiving loop and skip optimize/analyze.

DOWNLOADING

You can download Maatkit from Google Code at <http://code.google.com/p/maatkit/>, or you can get any of the tools easily with a command like the following:
    wget http://www.maatkit.org/get/toolname
    or
    wget http://www.maatkit.org/trunk/toolname
 
 

Where "toolname" can be replaced with the name (or fragment of a name) of any of the Maatkit tools. Once downloaded, they're ready to run; no installation is needed. The first URL gets the latest released version of the tool, and the second gets the latest trunk code from Subversion.

OPTIONS

Some options are negatable by specifying them in their long form with a --no prefix.
--analyze
Runs ANALYZE TABLE after finishing. The argument is an arbitrary string. If it contains the letter 's', the source will be analyzed. If it contains 'd', the destination will be analyzed. You can specify either or both. For example, the following will analyze both:
   --analyze=ds
 
 

See <http://dev.mysql.com/doc/en/analyze-table.html> for details on ANALYZE TABLE.

This option's short form used to be -A, but that conflicted with ``--charset'' so I changed it to -Z.

--ascendfirst
If you do want to use the ascending index optimization (see ``--noascend''), but do not want to incur the overhead of ascending a large multi-column index, you can use this option to tell mk-archiver to ascend only the leftmost column of the index. This can provide a significant performance boost over not ascending the index at all, while avoiding the cost of ascending the whole index.

See ``EXTENDING'' for a discussion of how this interacts with plugins.

--askpass
Prompt for a password when connecting to MySQL.
--buffer
Disables autoflushing to ``--file'' and flushes ``--file'' to disk only when a transaction commits. This typically means the file is block-flushed by the operating system, so there may be some implicit flushes to disk between commits as well. The default is to flush ``--file'' to disk after every row.

The danger is that a crash might cause lost data.

The performance increase I have seen from using ``--buffer'' is around 5 to 15 percent. Your mileage may vary.

--bulkdel
Delete each chunk of rows in bulk with a single "DELETE" statement. The statement deletes every row between the first and last row of the chunk, inclusive. It implies ``--commit-each'', since it would be a bad idea to "INSERT" rows one at a time and commit them before the bulk "DELETE".

The normal method is to delete every row by its primary key. Bulk deletes might be a lot faster. They also might not be faster if you have a complex "WHERE" clause.

This option completely defers all "DELETE" processing until the chunk of rows is finished. If you have a plugin on the source, its "before_delete" method will not be called. Instead, its "before_bulk_delete" method is called later.

WARNING: if you have a plugin on the source that sometimes doesn't return true from "is_archivable()", you should use this option only if you understand what it does. If the plugin instructs "mk-archiver" not to archive a row, it'll still be deleted by the bulk delete!

--bulkins
Insert each chunk of rows with "LOAD DATA LOCAL INFILE". This may be much faster than inserting a row at a time with "INSERT" statements. It is implemented by creating a temporary file for each chunk of rows, and writing the rows to this file instead of inserting them. When the chunk is finished, it uploads the rows.

To protect the safety of your data, this option forces bulk deletes to be used. It would be unsafe to delete each row as it is found, before inserting the rows into the destination first. Forcing bulk deletes guarantees that the deletion waits until the insertion is successful.

The ``--lpins'', ``--replace'', and ``--ignore'' options work with this option, but ``--delayedins'' does not.

--charset
Enables character set settings in Perl and MySQL. If the value is "utf8", sets Perl's binmode on STDOUT to utf8, passes the "mysql_enable_utf8" option to DBD::mysql, and runs "SET NAMES UTF8" after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs "SET NAMES" after connecting to MySQL.
--chkcols
Enabled by default; causes mk-archiver to check that the source and destination tables have the same columns. It does not check column order, data type, etc. It just checks that all columns in the source exist in the destination and vice versa. If there are any differences, mk-archiver will exit with an error.
--columns
Specify a comma-separated list of columns to fetch, write to the file, and insert into the destination table. If specified, mk-archiver ignores other columns unless it needs to add them to the "SELECT" statement for ascending an index or deleting rows. It fetches and uses these extra columns internally, but does not write them to the file or to the destination table. It does pass them to plugins.

See also --pkonly.

--commit-each
Commits transactions and flushes ``--file'' after each set of rows has been archived, before fetching the next set of rows, and before sleeping if ``--sleep'' is specified. Disables ``--txnsize''; use ``--limit'' to control the transaction size with ``--commit-each''.

This option is useful as a shortcut to make ``--limit'' and ``--txnsize'' the same value, but more importantly it avoids transactions being held open while searching for more rows. For example, imagine you are archiving old rows from the beginning of a very large table, with ``--limit'' 1000 and ``--txnsize'' 1000. After some period of finding and archiving 1000 rows at a time, mk-archiver finds the last 999 rows and archives them, then executes the next SELECT to find more rows. This scans the rest of the table, but never finds any more rows. It has held open a transaction for a very long time, only to determine it is finished anyway. You can use ``--commit-each'' to avoid this.

--delayedins
Adds the DELAYED modifier to INSERT or REPLACE statements. See <http://dev.mysql.com/doc/en/insert.html> for details.
--dest
This item specifies a table into which mk-archiver will insert rows archived from ``--source''. It uses the same key=val argument format as ``--source''. Most missing values default to the same values as ``--source'', so you don't have to repeat options that are the same in ``--source'' and ``--dest''. Use the ``--help'' option to see which values are copied from ``--source''.
--file
Filename to write archived rows to. A subset of MySQL's DATE_FORMAT() formatting codes are allowed in the filename, as follows:
    %d    Day of the month, numeric (01..31)
    %H    Hour (00..23)
    %i    Minutes, numeric (00..59)
    %m    Month, numeric (01..12)
    %s    Seconds (00..59)
    %Y    Year, numeric, four digits
 
 

You can use the following extra format codes too:

    %D    Database name
    %t    Table name
 
 

Example:

    --file '/var/log/archive/%Y-%m-%d-%D.%t'
 
 

The file's contents are in the same format used by SELECT INTO OUTFILE, as documented in the MySQL manual: rows terminated by newlines, columns terminated by tabs, NULL characters are represented by \N, and special characters are escaped by \. This lets you reload a file with LOAD DATA INFILE's default settings.

If you want a column header at the top of the file, see ``--header''. The file is auto-flushed by default; see ``--buffer''.

--forupdate
Adds the FOR UPDATE modifier to SELECT statements. For details, see <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
--header
Writes column names as the first line in the file given by ``--file''. If the file exists, does not write headers; this keeps the file loadable with LOAD DATA INFILE in case you append more output to it.
--help
Displays a help message.
--hpselect
Adds the HIGH_PRIORITY modifier to SELECT statements. See <http://dev.mysql.com/doc/en/select.html> for details.
--ignore
Causes INSERTs into ``--dest'' to be INSERT IGNORE.
--limit
Limits the number of rows returned by the SELECT statements that retrieve rows to archive. Default is one row. It may be more efficient to increase the limit, but be careful if you are archiving sparsely, skipping over many rows; this can potentially cause more contention with other queries, depending on the storage engine, transaction isolation level, and options such as ``--forupdate''.
--local
Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE queries. See ``--analyze'' for details.
--lpdel
Adds the LOW_PRIORITY modifier to DELETE statements. See <http://dev.mysql.com/doc/en/delete.html> for details.
--lpins
Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements. See <http://dev.mysql.com/doc/en/insert.html> for details.
--noascend
The default ascending-index optimization causes "mk-archiver" to optimize repeated "SELECT" queries so they seek into the index where the previous query ended, then scan along it, rather than scanning from the beginning of the table every time. This is enabled by default because it is generally a good strategy for repeated accesses.

Large, multiple-column indexes may cause the WHERE clause to be complex enough that this could actually be less efficient. Consider for example a four-column PRIMARY KEY on (a, b, c, d). The WHERE clause to start where the last query ended is as follows:

    WHERE (a > ?)
       OR (a = ? AND b > ?)
       OR (a = ? AND b = ? AND c > ?)
       OR (a = ? AND b = ? AND c = ? AND d >= ?)
 
 

Populating the placeholders with values uses memory and CPU, adds network traffic and parsing overhead, and may make the query harder for MySQL to optimize. A four-column key isn't a big deal, but a ten-column key in which every column allows "NULL" might be.

Ascending the index might not be necessary if you know you are simply removing rows from the beginning of the table in chunks, but not leaving any holes, so starting at the beginning of the table is actually the most efficient thing to do.

See also ``--ascendfirst''. See ``EXTENDING'' for a discussion of how this interacts with plugins.

--nodelete
Causes "mk-archiver" not to delete rows after processing them. This disallows ``--noascend'', because enabling them both would cause an infinite loop.

If there is a plugin on the source DSN, its "before_delete" method is called anyway, even though "mk-archiver" will not execute the delete. See ``EXTENDING'' for more on plugins.

--optimize
Runs OPTIMIZE TABLE after finishing. See ``--analyze'' for the option syntax and <http://dev.mysql.com/doc/en/optimize-table.html> for details on OPTIMIZE TABLE.
--pkonly
A shortcut for specifying ``--columns'' with the primary key columns. This is an efficiency if you just want to purge rows; it avoids fetching the entire row, when only the primary key columns are needed for "DELETE" statements. See also ``--purge''.
--plugin
Specify the Perl module name of a general-purpose plugin. It is currently used only for statistics (see ``--statistics'') and must have "new()" and a "statistics()" method.

The "new( src =" $src, dst => $dst, opts => \%opts )> method gets the source and destination DSNs, and their database connections, just like the connection-specific plugins do. It also gets a hashref of command-line options.

The "statistics(\%stats, $time)" method gets a hashref of the statistics collected by the archiving job, and the time the whole job started.

--progress
Prints current time, elapsed time, and rows archived every X rows.
--purge
Allows archiving without a ``--file'' or ``--dest'' argument, which is effectively a purge since the rows are just deleted.

If you just want to purge rows, consider specifying the table's primary key columns with ``--pkonly''. This will prevent fetching all columns from the server for no reason.

--quickdel
Adds the QUICK modifier to DELETE statements. See <http://dev.mysql.com/doc/en/delete.html> for details. As stated in the documentation, in some cases it may be faster to use DELETE QUICK followed by OPTIMIZE TABLE. You can use ``--optimize'' for this.
--quiet
Suppresses normal output, including the output of ``--statistics'', but doesn't suppress the output from ``--whyquit''.
--replace
Causes INSERTs into ``--dest'' to be written as REPLACE.
--retries
Specifies the number of times mk-archiver should retry when there is an InnoDB lock wait timeout or deadlock. When retries are exhausted, mk-archiver will exit with an error.

Consider carefully what you want to happen when you are archiving between a mixture of transactional and non-transactional storage engines. The INSERT to ``--dest'' and DELETE from ``--source'' are on separate connections, so they do not actually participate in the same transaction even if they're on the same server. However, mk-archiver implements simple distributed transactions in code, so commits and rollbacks should happen as desired across the two connections.

At this time I have not written any code to handle errors with transactional storage engines other than InnoDB. Request that feature if you need it.

--safeautoinc
Adds an extra WHERE clause to prevent mk-archiver from removing the newest row when ascending a single-column AUTO_INCREMENT key. This guards against re-using AUTO_INCREMENT values if the server restarts, and is enabled by default.

The extra WHERE clause contains the maximum value of the auto-increment column as of the beginning of the archive or purge job. If new rows are inserted while mk-archiver is running, it will not see them.

--sentinel
The presence of the file specified by ``--sentinel'' will cause mk-archiver to stop archiving and exit. The default is /tmp/mk-archiver-sentinel. You might find this handy to stop cron jobs gracefully if necessary. See also ``--stop''.
--setvars
Specify any variables you want to be set immediately after connecting to MySQL. These will be included in a "SET" command.
--sharelock
Adds the LOCK IN SHARE MODE modifier to SELECT statements. For details, see <http://dev.mysql.com/doc/en/innodb-locking-reads.html>.
--skipfkchk
Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
--sleep
Specifies how long to sleep between SELECT statements. Default is not to sleep at all. Transactions are NOT committed, and the ``--file'' file is NOT flushed, before sleeping. See ``--txnsize'' to control that.

If ``--commit-each'' is specified, committing and flushing happens before sleeping.

--source
Specifies a table to archive from. This argument is specially formatted as a key=value,key=value string. Keys are a single letter. Most options control how mk-archiver connects to MySQL:
    KEY MEANING
    === =======
    h   Connect to host
    P   Port number to use for connection
    S   Socket file to use for connection
    u   User for login if not current user
    p   Password to use when connecting
    F   Only read default options from the given file
 
 

The following options select a table to archive:

    KEY MEANING
    === =======
    D   Database to archive
    t   Table to archive
    i   Index to use
 
 

The following options specify pluggable actions, which an external Perl module can provide:

    KEY MEANING
    === =======
    m   Package name of an external Perl module (see EXTENDING).
 
 

The following actions set other options:

    KEY MEANING
    === =======
    a   Database to set as the connection's default with USE
    b   Disable binary logging with SET SQL_LOG_BIN=0
 
 

The only required part is the table; other parts may be read from various places in the environment (such as options files). Here is an example:

    --source h=my_server,D=my_database,t=my_tbl
 
 

The 'i' part deserves special mention. This tells mk-archiver which index it should scan to archive. This appears in a FORCE INDEX or USE INDEX hint in the SELECT statements used to fetch archivable rows. If you don't specify anything, mk-archiver will auto-discover a good index, preferring a "PRIMARY KEY" if one exists. In my experience this usually works well, so most of the time you can probably just omit the 'i' part.

The index is used to optimize repeated accesses to the table; mk-archiver remembers the last row it retrieves from each SELECT statement, and uses it to construct a WHERE clause, using the columns in the specified index, that should allow MySQL to start the next SELECT where the last one ended, rather than potentially scanning from the beginning of the table with each successive SELECT. If you are using external plugins, please see ``EXTENDING'' for a discussion of how they interact with ascending indexes.

The 'a' and 'b' options allow you to control how statements flow through the binary log. If you specify the 'b' option, binary logging will be disabled on the specified connection. If you specify the 'a' option, the connection will "USE" the specified database, which you can use to prevent slaves from executing the binary log events with "--replicate-ignore-db" options. These two options can be used as different methods to achieve the same goal: archive data off the master, but leave it on the slave. For example, you can run a purge job on the master and prevent it from happening on the slave using your method of choice.

--statistics
Causes mk-archiver to collect timing statistics about what it does. These statistics are available to the plugin specified by ``--plugin''

Unless you specify ``--quiet'', "mk-archiver" prints the statistics when it exits. The statistics look like this:

  Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
  Source: D=db,t=table
  SELECT 4
  INSERT 4
  DELETE 4
  Action         Count       Time        Pct
  commit            10     0.1079      88.27
  select             5     0.0047       3.87
  deleting           4     0.0028       2.29
  inserting          4     0.0028       2.28
  other              0     0.0040       3.29
 
 

The first two (or three) lines show times and the source and destination tables. The next three lines show how many rows were fetched, inserted, and deleted.

The remaining lines show counts and timing. The columns are the action, the total number of times that action was timed, the total time it took, and the percent of the program's total runtime. The rows are sorted in order of descending total time. The last row is the rest of the time not explicitly attributed to anything. Actions will vary depending on command-line options.

If ``--whyquit'' is given, its behavior is changed slightly. This option causes it to print the reason for exiting even when it's just because there are no more rows.

This option requires the standard Time::HiRes module, which is part of core Perl on reasonably new Perl releases.

--stop
Causes mk-archiver to create the sentinel file specified by ``--sentinel'' and exit. This should have the effect of stopping all running instances which are watching the same sentinel file.
--test
Causes mk-archiver to exit after printing the filename and SQL statements it will use.
--time
Causes mk-archiver to stop after the specified time has elapsed.
--txnsize
Specifies the size, in number of rows, of each transaction. Default is one row. Zero disables transactions altogether. After mk-archiver processes this many rows, it commits both the ``--source'' and the ``--dest'' if given, and flushes the file given by ``--file''.

This parameter is critical to performance. If you are archiving from a live server, which for example is doing heavy OLTP work, you need to choose a good balance between transaction size and commit overhead. Larger transactions create the possibility of more lock contention and deadlocks, but smaller transactions cause more frequent commit overhead, which can be significant. To give an idea, on a small test set I worked with while writing mk-archiver, a value of 500 caused archiving to take about 2 seconds per 1000 rows on an otherwise quiet MySQL instance on my desktop machine, archiving to disk and to another table. Disabling transactions with a value of zero, which turns on autocommit, dropped performance to 38 seconds per thousand rows.

If you are not archiving from or to a transactional storage engine, you may want to disable transactions so mk-archiver doesn't try to commit.

--version
Output version information and exit.
--where
Specifies a WHERE clause to limit which rows are archived. Do not include the word WHERE. You may need to quote the argument to prevent your shell from interpreting it. For example:
    --where 'ts < current_date - interval 90 day'
 
 

For safety, ``--where'' is required. If you do not require a WHERE clause, use ``--where'' 1=1.

--whyquit
Causes mk-archiver to print a message if it exits for any reason other than running out of rows to archive. This can be useful if you have a cron job with ``--time'' specified, for example, and you want to be sure mk-archiver is finishing before running out of time.

If ``--statistics'' is given, the behavior is changed slightly. It will print the reason for exiting even when it's just because there are no more rows.

This output prints even if ``--quiet'' is given. That's so you can put "mk-archiver" in a "cron" job and get an email if there's an abnormal exit.

EXTENDING

mk-archiver is extensible by plugging in external Perl modules to handle some logic and/or actions. You can specify a module for both the ``--source'' and the ``--dest'', with the 'm' part of the specification. For example:
    --source D=test,t=test1,m=My::Module1 --dest m=My::Module2,t=test2
 
 

This will cause mk-archiver to load the My::Module1 and My::Module2 packages, create instances of them, and then make calls to them during the archiving process.

You can also specify a plugin with ``--plugin''.

The module must provide this interface:

new(dbh => $dbh, db => $db_name, tbl => $tbl_name)
The plugin's constructor is passed a reference to the database handle, the database name, and table name. The plugin is created just after mk-archiver opens the connection, and before it examines the table given in the arguments. This gives the plugin a chance to create and populate temporary tables, or do other setup work.
before_begin(cols => \@cols, allcols => \@allcols)
This method is called just before mk-archiver begins iterating through rows and archiving them, but after it does all other setup work (examining table structures, designing SQL queries, and so on). This is the only time mk-archiver tells the plugin column names for the rows it will pass the plugin while archiving.

The "cols" argument is the column names the user requested to be archived, either by default or by the ``--columns'' option. The "allcols" argument is the list of column names for every row mk-archiver will fetch from the source table. It may fetch more columns than the user requested, because it needs some columns for its own use. When subsequent plugin functions receive a row, it is the full row containing all the extra columns, if any, added to the end.

is_archivable(row => \@row)
This method is called for each row to determine whether it is archivable. This applies only to ``--source''. The argument is the row itself, as an arrayref. If the method returns true, the row will be archived; otherwise it will be skipped.

Skipping a row adds complications for non-unique indexes. Normally mk-archiver uses a WHERE clause designed to target the last processed row as the place to start the scan for the next SELECT statement. If you have skipped the row by returning false from is_archivable(), mk-archiver could get into an infinite loop because the row still exists. Therefore, when you specify a plugin for the ``--source'' argument, mk-archiver will change its WHERE clause slightly. Instead of starting at ``greater than or equal to'' the last processed row, it will start ``strictly greater than.'' This will work fine on unique indexes such as primary keys, but it may skip rows (leave holes) on non-unique indexes or when ascending only the first column of an index.

"mk-archiver" will change the clause in the same way if you specify ``--nodelete'', because again an infinite loop is possible.

If you specify the ``--bulkdel'' option and return false from this method, "mk-archiver" may not do what you want. The row won't be archived, but it will be deleted, since bulk deletes operate on ranges of rows and don't know which rows the plugin selected to keep.

If you specify the ``--bulkins'' option, this method's return value will influence whether the row is written to the temporary file for the bulk insert, so bulk inserts will work as expected. However, bulk inserts require bulk deletes.

before_delete(row => \@row)
This method is called for each row just before it is deleted. This applies only to ``--source''. This is a good place for you to handle dependencies, such as deleting things that are foreign-keyed to the row you are about to delete. You could also use this to recursively archive all dependent tables.

This plugin method is called even if ``--nodelete'' is given, but not if ``--bulkdel'' is given.

before_bulk_delete(first_row => \@row, last_row => \@row)
This method is called just before a bulk delete is executed. It is similar to the "before_delete" method, except its arguments are the first and last row of the range to be deleted. It is called even if ``--nodelete'' is given.
before_insert(row => \@row)
This method is called for each row just before it is inserted. This applies only to ``--dest''. You could use this to insert the row into multiple tables, perhaps with an ON DUPLICATE KEY UPDATE clause to build summary tables in a data warehouse.

This method is not called if ``--bulkins'' is given.

before_bulk_insert(first_row => \@row, last_row => \@row)
This method is called just before a bulk insert is executed. It is similar to the "before_insert" method, except its arguments are the first and last row of the range to be deleted.
custom_sth(row => \@row, sql => $sql)
This method is called just before inserting the row, but after ``before_insert()''. It allows the plugin to specify different "INSERT" statement if desired. The return value (if any) should be a DBI statement handle. The "sql" parameter is the SQL text used to prepare the default "INSERT" statement. This method is not called if you specify ``--bulkins''.

If no value is returned, the default "INSERT" statement handle is used.

This method applies only to the plugin specified for ``--dest'', so if your plugin isn't doing what you expect, check that you've specified it for the destination and not the source.

custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql)
If you've specified ``--bulkins'', this method is called just before the bulk insert, but after ``before_bulk_insert()'', and the arguments are different.

This method's return value etc is similar to the ``custom_sth()'' method.

after_finish()
This method is called after mk-archiver exits the archiving loop, commits all database handles, closes ``--file'', and prints the final statistics, but before mk-archiver runs ANALYZE or OPTIMIZE (see "--analyze and "--optimize).

If you specify a plugin for both ``--source'' and ``--dest'', mk-archiver constructs, calls before_begin(), and calls after_finish() on the two plugins in the order ``--source'', ``--dest''.

mk-archiver assumes it controls transactions, and that the plugin will NOT commit or roll back the database handle. The database handle passed to the plugin's constructor is the same handle mk-archiver uses itself. Remember that ``--source'' and ``--dest'' are separate handles.

A sample module might look like this:

    package My::Module;
    
    sub new {
       my ( $class, %args ) = @_;
       return bless(\%args, $class);
    }
    
    sub before_begin {
       my ( $self, %args ) = @_;
       # Save column names for later
       $self->{cols} = $args{cols};
    }
    
    sub is_archivable {
       my ( $self, %args ) = @_;
       # Do some advanced logic with $args{row}
       return 1;
    }
    
    sub before_delete {} # Take no action
    sub before_insert {} # Take no action
    sub custom_sth    {} # Take no action
    sub after_finish  {} # Take no action
    
    1;
 
 

ENVIRONMENT

The environment variable "MKDEBUG" enables verbose debugging output in all of the Maatkit tools:
    MKDEBUG=1 mk-....
 
 

SYSTEM REQUIREMENTS

You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.

OUTPUT

If you specify ``--print'', the output is a header row, plus status output at intervals. Each row in the status output lists the current date and time, how many seconds mk-archiver has been running, and how many rows it has archived.

If you specify ``--statistics'', "mk-archiver" outputs timing and other information to help you identify which part of your archiving process takes the most time.

BUGS

Please use Google Code Issues and Groups to report bugs or request support: <http://code.google.com/p/maatkit/>.

Please include the complete command-line used to reproduce the problem you are seeing, the version of all MySQL servers involved, the complete output of the tool when run with ``--version'', and if possible, debugging output produced by running with the "MKDEBUG=1" environment variable.

ACKNOWLEDGEMENTS

Thanks to the following people, and apologies to anyone I've omitted:

Andrew O'Brien,

COPYRIGHT, LICENSE AND WARRANTY

This program is copyright 2007-2008 Baron Schwartz. Feedback and improvements are welcome.

THIS PROGRAM IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

AUTHOR

Baron Schwartz

VERSION

This manual page documents Ver 1.0.12 Distrib 2725 $Revision: 2311 $.