mk-slave-prefetch

Langue: en

Version: 2008-12-29 (fedora - 04/07/09)

Section: 1 (Commandes utilisateur)

NAME

mk-slave-prefetch - Pipeline relay logs on a MySQL slave to pre-warm caches.

SYNOPSIS

  mk-slave-prefetch
  mk-slave-prefetch --statistics > /path/to/saved/statistics
  mk-slave-prefetch /path/to/saved/statistics
 
 

DESCRIPTION

mk-slave-prefetch reads the slave's relay log slightly ahead of where the slave's SQL thread is reading, converts statements into "SELECT", and executes them. In theory, this should help alleviate the effects of the slave's single-threaded SQL execution. It will help take advantage of multiple CPUs and disks by pre-reading the data from disk, so the data is already in the cache when the slave SQL thread executes the un-modified version of the statement.

Statements that can't be converted into "SELECT" are ignored. However, there is always a chance of bugs. It would be a very good idea to connect as a read-only user. Here is an example of how to grant the necessary privileges:

    GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.*
    TO 'prefetch'@'%' IDENTIFIED BY 'sp33dmeup!';
 
 

"mk-slave-prefetch" learns how long it takes statements to execute, and doesn't try to execute those that take a very long time. You can ask it to print what it has learned after it executes. You can also specify a filename on the command line. The file should contain the statistics printed by a previous run. These will be used to pre-populate the statistics so it doesn't have to re-learn.

This program is based on concepts I heard Paul Tuckfield explain at the November 2006 MySQL Camp un-conference. However, the code is my own work. I have not seen any other implementation of Paul's idea.

DOES IT WORK?

Does it work? Does it actually speed up the slave?

That depends on your workload, hardware, and other factors. It might work when the following are true:

The slave's data is much larger than memory, and the workload is mostly randomly scattered small (single-row is ideal) changes.
There are lots of high-concurrency "UPDATE" and "DELETE" statements on the master.
The slave SQL thread is I/O-bound, but the slave overall has plenty of spare I/O capacity (definitely more than one disk spindle).
The slave uses InnoDB or another storage engine with row-level locking.

It does not speed up replication on my slaves, which mostly have large queries like "INSERT .. SELECT .. GROUP BY". In my benchmarks it seemed to make no difference at all, positive or negative.

On the wrong workload or slave configuration, this technique might actually make the slaves slower. Your mileage will vary.

User-contributed benchmarks are welcome.

DOWNLOADING

You can download Maatkit from Google Code at <http://code.google.com/p/maatkit/>, or you can get any of the tools easily with a command like the following:
    wget http://www.maatkit.org/get/toolname
    or
    wget http://www.maatkit.org/trunk/toolname
 
 

Where "toolname" can be replaced with the name (or fragment of a name) of any of the Maatkit tools. Once downloaded, they're ready to run; no installation is needed. The first URL gets the latest released version of the tool, and the second gets the latest trunk code from Subversion.

OPTIONS

--print and --daemonize are mutually exclusive.

Specify at least one of --print or --execute.

--askpass
Prompt for a password when connecting to MySQL.
--charset
short form: -A; type: string

Default character set.

Enables character set settings in Perl and MySQL. If the value is "utf8", sets Perl's binmode on STDOUT to utf8, passes the "mysql_enable_utf8" option to DBD::mysql, and runs "SET NAMES UTF8" after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs "SET NAMES" after connecting to MySQL.

--checkint
short form: -i; type: Array; default: 16,1,1024

How often to check the slave: init,min,max.

How many relay log events should pass before checking the output of "SHOW SLAVE STATUS". The syntax is a three-number range: initial, minimum, and maximum. You should be able to leave this at the defaults.

"mk-slave-prefetch" varies the check interval in powers of two, depending on whether it decides the check was necessary.

--daemonize
Fork and run in the background; POSIX OSes only.
--database
short form: -D; type: string

The database to use for the connection.

Connect to this database. "mk-slave-prefetch" will issue "USE" statements as required by the binary log events.

--defaults-file
short form: -F; type: string

Only read mysql options from the given file. You must give an absolute pathname.

--errors
culumative: yes; default: 0; type: int

Print queries that caused errors. If specified once, at exit; if twice, in realtime.

If you specify this option once, you will see a report at the end of the script execution, showing the normalized queries and the number of times they were seen. If you specify this option twice, you will see the errors printed out as they occur, but no normalized report at the end of execution.

--execute
short form: -x; negatable: yes; default: yes

Execute the transformed queries to warm the caches.

--host
short form: -h; type: string

Host to connect to.

--iolag
short form: -l; type: size; default: 1k

How many bytes to lag the slave I/O thread.

This helps avoid "mysqlbinlog" reading right off the end of the relay log file.

--maxquerytime
short form: -q; type: float; default: 1

Do not run queries longer than this many seconds; fractions allowed.

If "mk-slave-prefetch" predicts the query will take longer to execute, it will skip the query. This is based on the theory that pre-warming the cache is most beneficial for short queries.

"mk-slave-prefetch" learns how long queries require to execute. It keeps an average over the last ``--querysampsize'' samples of each query. The averages are based on an abstracted version of the query, with specific parameters replaced by placeholders. The result is a sort of ``fingerprint'' for the query, not executable SQL. You can see the learned statistics with the ``--statistics'' option.

You can pre-load query fingerprints, and average execution times, from a file. This way you don't have to wait for "mk-slave-prefetch" to learn all over every time you start it. Just specify the file on the command line. The format should be the same as the output from ``--statistics''.

You might also want to filter out some statements completely, or let only some statements through. See the ``--rejectregexp'' and ``--permitregexp'' options.

If "mk-slave-prefetch" hasn't seen a query's fingerprint before, and thus doesn't know how long it will take to execute, it wraps it in a subuery, like this:

    SELECT 1 FROM ( <query> ) AS X LIMIT 1;
 
 

This helps avoid fetching a lot of data back to the client when a query is very large. It requires a version of MySQL that supports subqueries (version 4.1 and newer). If yours doesn't, the subquery trick can't be used, so the query might fetch a lot of data back to the client.

Once a query's fingerprint has been seen, so it's known that the query isn't enormously slow, "mk-slave-prefetch" just rewrites the "SELECT" list for efficiency. (Avoiding the subquery reduces the query's overhead for short queries). The rewritten query will then look like the following;

    SELECT ISNULL(COALESCE(<columns>)) FROM ...
 
 
--numprefix
Abstract away numeric table name prefixes.

This causes the following two queries to ``fingeprint'' to the same thing:

   select from 1_2_users;
   select from 2_3_users;
 
 
--offset
short form: -o; type: size; default: 128

How many bytes "mk-slave-prefetch" will try to stay in front of the slave SQL thread.

It will not execute log events it doesn't think are at least this far ahead of the SQL thread. See also ``--window''.

--password
short form: -p; type: string

The password to use when connecting.

--permitregexp
type: string

Permit queries matching this Perl regexp.

This is a filter for log events. The regular expression is matched against the raw log event, before any transformations are applied. If specified, this option will permit only log events matching the regular expression.

--pid
type: string

Create the given PID file when daemonized.

For example, '--daemonize --pid /tmp/mk-slave-prefetch.pid' would cause mk-slave-prefetch to create the PID file /tmp/mk-slave-prefetch.pid.

/var/run/ is usually not writable by non-root users, therefore /tmp/ is a more reliable alternative.

The PID file is removed when the daemonized instance of mk-slave-prefetch exits.

--port
short form: -P; type: int

Port number to use for connection.

--print
Print the transformed relay log events to standard output.
--printnonrewritten
Print queries that could not be transformed into "SELECT".
--progress
type: int

Print progress information every X events.

The information is the current log file and position, plus a summary of the statistics gathered.

--querysampsize
type: int; default: 4

Average query exec time over this many queries.

The last "N" queries with a given fingerprint are averaged together to get the average query execution time (see ``--maxquerytime'').

--rejectregexp
type: string

Reject queries matching this Perl regexp.

Similar to ``--permitregexp'', but has the opposite effect: log events must not match the regular expression.

--sentinel
type: string; default: /tmp/mk-slave-prefetch-sentinel

Exit if this file exists.

--setvars
type: string; default: wait_timeout=10000

Set these MySQL variables.

Specify any variables you want to be set immediately after connecting to MySQL. These will be included in a "SET" command.

--socket
short form: -S; type: string

Socket file to use for connection.

--statistics
Print execution statistics after exiting.

The statistics are in two sections: counters, and queries. The counters simply count the number of times events occur. You may see the following counters:

    NAME                    MEANING
    ======================  =======================================
    mysqlbinlog             Executed mysqlbinlog to read log events.
    events                  The total number of relay log events.
    not_far_enough_ahead    An event was not at least --offset
                            bytes ahead of the SQL thread.
    too_far_ahead           An event was more than --offset
                            + --window bytes ahead of the SQL thread.
    too_close_to_io_thread  An event was less than --iolag bytes
                            away from the I/O thread's position.
    event_not_allowed       An event wasn't a SET, USE, INSERT,
                            UPDATE, DELETE or REPLACE query.
    event_filtered_out      An event was filtered out because of
                            --permitregexp or --rejectregexp.
    same_timestamp          A SET TIMESTAMP event was ignored because
                            it had the same timestamp as the last one.
    do_query                A transformed event was executed
                            or printed.
    query_error             An executed query had an error.
    query_too_long          An event was not executed because its
                            average query length exceeded
                            --maxquerytime.
    query_not_rewritten     A query could not be rewritten to a
                            SELECT.
    master_pos_wait         The tool waited for the SQL thread to
                            catch up.
    show_slave_status       The tool queried SHOW SLAVE STATUS.
    load_data_infile        The tool found a LOAD DATA INFILE query
                            and unlinked (deleted) the temp file.
    could_not_unlink        The tool failed to unlink a temp file.
    sleep                   The tool slept for a second because the 
                            slave's SQL thread was not running, or
                            because it read past the end of the log.
 
 

After the counters, "mk-slave-prefetch" prints information about each query fingerprint it has seen, two lines per fingerprint. The first line contains the query's fingerprint. The second line contains the number of times the fingerprint was seen, number of times executed, the sum of the execution times, and the average execution time over the last ``--querysampsize'' samples.

--stop
Stop running instances by creating the ``--sentinel'' file.
--time
short form: -t; type: time

How long "mk-slave-prefetch" should run before exiting.

The default is to run forever.

--tmpdir
type: string; default: /dev/null

Where to create temp files for "LOAD DATA INFILE" queries.

The default will cause "mysqlbinlog" to skip the file and the associated "LOAD DATA INFILE" command entirely.

If "mk-slave-prefetch" sees a "LOAD DATA INFILE" command (which it won't, if this is left at the default), it will try to remove the temporary file, then skip the event.

--user
short form: -u; type: string

User for login if not current user.

--window
short form: -w; type: size; default: 4k

The max bytes ahead of the slave "mk-slave-prefetch" should get.

Defines the window within which "mk-slave-prefetch" considers a query OK to execute. The window begins at the slave SQL thread's last known position plus ``--offset'' bytes, and extends for the specified number of bytes.

If "mk-slave-prefetch" sees a log event that is too far in the future, it will increment the "too_far_ahead" counter and wait for the slave SQL thread to catch up (which increments the "master_pos_wait" counter). If an event isn't far enough ahead of the SQL thread, it will be discarded and the "not_far_enough_ahead" counter increments.

Watching the mentioned statistics can help you understand how to tune the window. You want "mk-slave-prefetch" to run just ahead of the SQL thread, not throwing out a lot of events for being too far ahead or not far enough ahead.

SYSTEM REQUIREMENTS

You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.

ENVIRONMENT

The environment variable "MKDEBUG" enables verbose debugging output in all of the Maatkit tools:
    MKDEBUG=1 mk-....
 
 

BUGS

Please use Google Code Issues and Groups to report bugs or request support: <http://code.google.com/p/maatkit/>.

Please include the complete command-line used to reproduce the problem you are seeing, the version of all MySQL servers involved, the complete output of the tool when run with ``--version'', and if possible, debugging output produced by running with the "MKDEBUG=1" environment variable.

COPYRIGHT, LICENSE AND WARRANTY

This program is copyright 2007-2008 Baron Schwartz. Feedback and improvements are welcome.

THIS PROGRAM IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

AUTHOR

Baron Schwartz.

VERSION

This manual page documents Ver 1.0.5 Distrib 2725 $Revision: 2311 $.