Saturday, September 21, 2013

Tips to Build a Fault-tolerant Database Application

Applications should be written taking into account that errors will eventually happen and, in particular, database application developers usually consider this while writing their applications.

Although the concepts required to write such applications are commonly taught in database courses and to some extent are widely spread, building a reliable and fault-tolerant database application is still not an easy task and hides some pitfalls that we intend to highlight in this post with a set of suggestions or tips.

In what follows, we consider that the execution flow in a database application is characterized by two distinct phases: connection and business logic. In the connection phase, the application connects to a database, sets up the environment and passes the control to the business logic phases. In this phase, it gets inputs from a source, which may be an operator, another application or a component within the same application, and issues statements to the database. Statements may be executed within the context of a transaction or not.

First Tip: Errors may happen so plan your database application taking this into account

So we should catch errors, i.e. exceptions, in both the connection and business logic phase. This idea can be translated into code using Python as follows:

class Application(object): 
    def __init__(self, **db_params): 
         self.__cnx = None 
         self.__cursor = None 
         self.__db_params = db_params 

    def connect(self): 
        try: 
            self.__cnx = MySQLConnection(**self.__db_params) 
            self.__cursor = self.__cnx.cursor() 
        except InterfaceError: 
            print "Error trying to get a new database connection" 

    def business(self, operation): 
        try: 
            self._do_operation(operation) 
        except DatabaseError: 
            print "Error executing operation" 

if main == "__main__": 
    app = Application(get_params()) 
    app.connect() 

    while True: 
        app.business(get_operation())

The InterfaceError class identifies errors in the interface, i.e. connection, between the application and the MySQL Server. The DatabaseError class identifies errors associated with database operations.

In this simple example though, the application may abort after any connection error. For instance, a MySQL Server will automatically close a connection after a period of inactivity thus causing an application error if one tries to use the invalid connection.

Second Tip: Set up the appropriate timeout properties

There are two properties which fall under this suggestion:

  • wait_timeout - It is a MySQL option that defines the interval that must elapse without any communication between an application and a MySQL Server before the MySQL Server closes a connection.
  • connection_timeout - Sets the socket_timeout property in the Connector Python which defines the maximum amount of time that a socket created to connect to a database will wait for an operation to complete before raising an exception.
The wait_timeout must be set according to the application's characteristics. On the other hand, the connection_timeout property is usually set to zero which means that there will be no socket timeout period. In rare cases, such as when applications must execute operations within a fixed interval, we should set it up.

Third Tip: Connection errors may happen at any time so handle them properly

The previous measurements will not circumvent problems related to transient network issues or server failures though. To handle this type of problem, one needs to consider that a connection may fail at any time. This requires to catch connection errors also while executing the business logic and get a fresh connection to proceed with the execution. In other words, this requires to combine the aforementioned two phases. This idea can be translated into code as follows:

class Application(object): 
    def __init__(self, **db_params): 
         self.__cnx = None 
         self.__cursor = None 
         self.__db_params = db_params.copy() 

    def connect(self): 
        try: 
            self.__cnx = MySQLConnection(**self.__db_params) 
            self.__cursor = self.__cnx.cursor() 
        except InterfaceError: 
            print "Error trying to get a new database connection" 

    def business(self, operation): 
        try: 
            self._do_operation(operation) 
        except (AttributeError, InterfaceError) 
            print "Database connection error" 
            self.connect() 
        except DatabaseError: 
            print "Error executing operation" 

if main == "__main__": 
    app = Application(get_params()) 
    app.connect() 

    while True: 
        app.business(get_operation())

In general, connectors cannot hide connection failures from the application because this may lead to data inconsistency. Only the application has enough knowledge to decide what is safe to do and as such any failure, including connection failures, must be reported back to the application. In what follows, we depict a problem that may happen when a connector tries to hide some failures from the application:



When the connection fails, the server rolls back the on-going transaction thus undoing any change made by the first insert statement. However, the connector gets the error and automatically tries to reconnect and succeeds. With a valid connection to the server, it executes the failed statement and succeeds. Unfortunately, the application does not find out about the connection issue and continues the execution as nothing has happened and by consequence a partial transaction is committed thus leaving the database in an inconsistent state.

It is worth noting that if statements are executed in “autocommit” mode, it is still unsafe to hide failures from the application. In this case, an attempt to automatically reconnect and try to execute the statement may lead to the statement being executed twice. This may happen because the connection may have failed after the statement has been successfully executed but before the server has had a chance to reply back to the connector.



Fourth Tip: Guarantee that session information is properly set after getting a connection

From a fault-tolerant perspective the application looks better now. However, we are still missing one key point.

We should use the "my.cnf" configuration file to set up the necessary MySQL's properties (e.g. autocommit, transaction isolation level). However if several applications share the same database server and require different configuration values, they should be defined along with the routine that gets a connection. If you do it in a different place, you may risk forgetting to set the options up when trying to get a new connection after a failure. Our code snippet already follows this rule and you are safe in that sense.

This suggestion is specially important when the applications (i.e. components) share the same address space and use a connection pool.

We should also avoid using temporary tables and/or user-defined variables to transfer data between transactions. Although this is a common technique among developers, this will fail miserably after a reconnection as the session information will be lost and may require an expensive routine to set up the necessary context. So starting every transaction with a “clean slate” is probably the safest and most solid approach.

Fifth Tip: Design all application components taking failures into account

Finally, it is worth noticing that if the database fails the system as whole will be unavailable. So to build a truly resilient solution, we still need to deploy some type of redundancy at the database level. We will discuss possible high availability solutions for MySQL in a different post.

See http://alfranio-distributed.blogspot.com/2013/09/writing-fault-tolerant-database.html

No comments: