Choosing a Database Proxy for MySQL and MariaDB

This whitepaper discusses what database proxies are, what their use is and how to build a highly available and highly controllable MySQL and MariaDB database environment using modern proxies.

Table of contents

  • 1. Introduction
    • 1.1. What Is a Database Proxy?
    • 1.2. Proxy Types
      • 1.2.1. Layer 4 Proxies
      • 1.2.2. SQL-Aware Proxies
  • 2. Features of Advanced, SQL-Aware Proxies
    • 2.1. Query Routing
    • 2.2. Query Rewriting
    • 2.3. Query Caching
    • 2.4. Traffic Control
      • 2.4.1. Kill a Query
      • 2.4.2. Slow Down Traffic
  • 3. Designing Highly Available Proxy Architecture
    • 3.1. Dedicated Proxy Instances
      • 3.1.1. A Proxy with a Virtual IP Assigned
      • 3.1.2. Elastic Load Balancing
      • 3.1.3. Application-Side Proxy Discovery
    • 3.2. Collocating Proxies on Application Hosts
    • 3.3. Silo Approach
  • 4. Managing Large Proxy Deployments
  • 5. Setting up Highly Available Proxy Layers with ClusterControl
    • 5.1. ProxySQL Deployment
    • 5.2. HAProxy Deployment
  • 6. Which Proxy Should I Pick?
    • 6.1. SQL-Aware Proxy or Not?
    • 6.2. Which SQL-Aware Proxy Should Be Chosen?

1. Introduction

These days, high availability is an imperative. Data is distributed across multiple instances, or even multiple datacenters. Clusters can be scaled out on more nodes. Failures can cause cluster reconfigurations. This begs the question - how does the application know which database node to access? How does an application detect that the database topology has changed? How do we shield the applications from the complexity of the underlying databases?

At some point, the concept of man-in-the-middle became popular and database environments started integrating proxies. This whitepaper will discuss what proxies are, what is their use and how to build a highly available and highly controllable database environment using modern proxies.

1.1. What Is a Database Proxy?

A proxy is a software which handles connectivity between two sides. Within a context of databases, a proxy sits in the middle, between application and database. The application connects to a proxy, which forwards connections to the database. Let’s stop here for a second and try to analyze this statement and see what might be the gains of using a proxy? For starters, one, huge gain is that application connects to the proxy only. In the database world, it is not easy to determine where traffic should be directed to. There are writeable or intermediate masters, and read replicas. The replication topology constantly evolves. It is not a good idea to hardcode connectivity patterns. On the other hand, writing code to track topology changes is something that needs to be carefully planned, designed and tested. This is where the proxy comes in. With a use of proxy, applications can connect to it (or to a pool of proxies) and the application may expect that the traffic will be routed to a functioning database.

Since traffic is relayed by the proxy, the latter can be also a great source of information about the traffic itself. It can provide statistics on the traffic, e.g. number of queries executed per second, their execution time, statistical data like 95 percentile, maximum, minimum, average, all based on the collected metrics.

Advanced proxies can also alter the traffic - as everything passes through them, such proxies can provide to an admin a high degree of control over queries - queries can be cached, rewritten, rerouted, stalled or killed. This allows the DBA to shape the traffic and react to the issues immediately, even without requiring an application developer to modify the application and redeploying it.

Finally, proxies can help to scale the environment not only through sending traffic to multiple slaves but also they can help to build sharded setups using traffic routing logic created within the proxy. As you can see, an advanced database proxy is not just a packet routing device but it can be utilized in multiple ways, improving the options of the operations team to manage the database tier.

1.2. Proxy Types

Before we dig into details of how proxies can be utilized, in this chapter we will discuss the two main types of proxies. We will look at examples for each type, and cover the main differences between them.

1.2.1. Layer 4 Proxies

The first and the oldest type are proxies which operate at layer 4 of ISO/OSI network model, the transport layer. Those proxies work on a package level. They receive TCP sessions and they route the traffic to pre-defined backend services.

Such proxy server does not care about what it routes, as long as it can send it to the backend and it’s in line with load balancing policy. Typically you can pick options like round robin, least amount of connections or sticky connections coming from one frontend host to the same background host. The best known example of such proxy is HAProxy, Nginx is another one.

The main problem with this type of proxies is that they operate only at the network layer. They do not understand the MySQL protocol, nor they understand the states that the MySQL or MariaDB backends are in. For replication setups, it would be a master or a slave. For Galera cluster this will become much more complex (Primary, non-Primary, donor or desynced, joining, joiner etc.). External scripts had to be developed to make it possible for such proxies to understand the state of MySQL backends.

An example of such a script is Percona’s clustercheck and all its modifications. The lack of understanding of the MySQL protocol results in more complex connectivity to the proxy. As we mentioned earlier, an application would ideally connect to a proxy and send all its traffic there and the proxy will direct writes to a single host and scale reads across all of MySQL backends.

Unfortunately, if a proxy cannot read the MySQL protocol, it cannot distinguish SELECTs from other queries. This is a serious issue - in a replication environment, you typically have just one host to send your writes to - the master. Galera can work in a multi-writer setup, but sometimes there are cases that mandate applications to direct all of the writes to one node in order to reduce conflicts between writes. As a result, the setup is less transparent to the application as it is the application which has to sort SELECTs and other queries and send them to correct ports on the proxy (assuming one has defined separate write and read-only ports).

1.2.2. SQL-Aware Proxies

The other type of proxy is the SQL-aware proxy. That software can understand MySQL protocol and, typically, offer a range of features related to that fact. First of all, such proxies should be able to understand the MySQL state. They are designed to differentiate between master and slaves. Some of them can also track and understand Galera states. All of this results in faster and more reliable responses to whatever happens to the MySQL topology.

Probably the most welcomed feature is the fact that, given their understanding of MySQL protocol, proxies can perform a read/write split. This makes it possible to implement a transparent proxy layer and ensure that application will not have to track anything related to the database tier. It will just connect to a predefined host and port, and that’s all it needs to know.

Of course, the fact that the proxy handles all the database traffic makes it possible to use the proxy for other things like traffic shaping, query routing, query blocking etc. It’s important to note that not all of the proxies share the same feature set. Some, like MySQL Router, can do the query routing but this is mostly it. Others, like ProxySQL or MaxScale, can be used to perform advanced tasks and can make it possible for a user to significantly alter the way how the traffic is sent to the databases.

SQL-aware proxies, typically, do not use external scripts to monitor or track the state of the databases. They rely on built-in tests which are intended to take care of that. One exception is ProxySQL and Galera cluster monitoring. Up to version 2.0 ProxySQL relied on external script, which was intended to track the state of Galera nodes. The internal support was introduced in ProxySQL v2.0, which, at the time of writing, is in beta state.

Want to read the rest?

Download the full whitepaper for free