Transient errors
Transient errors
Transient errors
Some distributed database clusters make use of
transient errors. A transient error is a temporary error that is
likely to disappear soon. By definition it is safe for a client to
ignore a transient error and retry the failed operation on the same
database server. The retry is free of side effects. Clients are not
forced to abort their work or to fail over to another database
server immediately. They may enter a retry loop before to wait for
the error to disappear before giving up on the database server.
Transient errors can be seen, for example, when using MySQL
Cluster. But they are not bound to any specific clustering solution
per se.
PECL/mysqlnd_ms can perform an automatic
retry loop in case of a transient error. This increases
distribution transparency and thus makes it easier to migrate an
application running on a single database server to run on a cluster
of database servers without having to change the source of the
application.
The automatic retry loop will repeat the requested
operation up to a user configurable number of times and pause
between the attempts for a configurable amount of time. If the
error disappears during the loop, the application will never see
it. If not, the error is forwarded to the application for
handling.
In the example below a duplicate key error is
provoked to make the plugin retry the failing query two times
before the error is passed to the application. Between the two
attempts the plugin sleeps for 100 milliseconds.
Example #1 Provoking a transient error
mysqlnd_ms.enable=1 mysqlnd_ms.collect_statistics=1
{ "myapp": { "master": { "master_0": { "host": "localhost" } }, "slave": { "slave_0": { "host": "192.168.78.136", "port": "3306" } }, "transient_error": { "mysql_error_codes": [ 1062 ], "max_retries": 2, "usleep_retry": 100 } } }
Example #2 Transient error retry loop
<?php
$mysqli = new mysqli("myapp", "username", "password", "database");
if (mysqli_connect_errno())
/* Of course, your error handling is nicer... */
die(sprintf("[%d] %s\n", mysqli_connect_errno(), mysqli_connect_error()));
if (!
$mysqli->query("DROP TABLE IF EXISTS test") ||
!$mysqli->query("CREATE TABLE test(id INT PRIMARY KEY)") ||
!$mysqli->query("INSERT INTO test(id) VALUES (1))")) {
printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
}
/* Retry loop is completely transparent. Checking statistics is
the only way to know about implicit retries */
$stats = mysqlnd_ms_get_stats();
printf("Transient error retries before error: %d\n", $stats['transient_error_retries']);
/* Provoking duplicate key error to see statistics change */
if (!$mysqli->query("INSERT INTO test(id) VALUES (1))")) {
printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
}
$stats = mysqlnd_ms_get_stats();
printf("Transient error retries after error: %d\n", $stats['transient_error_retries']);
$mysqli->close();
?>
The above example will output something similar to:
Transient error retries before error: 0 [1062] Duplicate entry '1' for key 'PRIMARY' Transient error retries before error: 2
Because the execution of the retry loop is
transparent from a users point of view, the example checks the
statistics provided by the plugin to learn about it.
As the example shows, the plugin can be instructed
to consider any error transient regardless of the database servers
error semantics. The only error that a stock MySQL server considers
temporary has the error code 1297
.
When configuring other error codes but
1297
make sure your configuration
reflects the semantics of your clusters error codes.
The following mysqlnd C API calls are monitored by
the plugin to check for transient errors: query(),
change_user(), select_db(),
set_charset(), set_server_option()
prepare(), execute(), set_autocommit(),
tx_begin(), tx_commit(), tx_rollback(),
tx_commit_or_rollback(). The corresponding user API calls
have similar names.
The maximum time the plugin may sleep during the
retry loop depends on the function in question. The a retry loop
for query(), prepare() or execute() will
sleep for up to max_retries * usleep_retry
milliseconds.
However, functions that control connection state
are dispatched to all connections. The retry loop settings are
applied to every connection on which the command is to be run.
Thus, such a function may interrupt program execution for longer
than a function that is run on one server only. For example,
set_autocommit() is dispatched to connections and may
sleep up to (max_retries * usleep_retry) *
number_of_open_connections) milliseconds. Please, keep this in
mind when setting long sleep times and large retry numbers. Using
the default settings of max_retries=1,
usleep_retry=100 and lazy_connections=1 it is
unlikely that you will ever see a delay of more than 1 second.