Support for Sqlite and PostgreSQL in MailerQ
We have added support for PostgreSQL and Sqlite to MailerQ, as alternatives for MySQL that was already supported. And by doing so, we learned something about the differences between those different engines.
But let me explain why MailerQ needs a database connection in the first place. As you probably know, we have created MailerQ with high delivery performance in mind. And because of that we only use technologies for their performance capabilities: RabbitMQ because it is very good at message queueing, and Couchbase because it is a very fast and reliable key value store. And although traditional databases have many advantages and can be very fast too, they do not offer the best thinkable solution when it comes down to speed. So why do we use it then?
A traditional database is a very good tool for storing data in a well structured manner, and to look up specific data based on certain fields. And because traditional databases are already in use for such a long time, there is support for it in almost every thinkable programming language. And that is exactly why we use a database in MailerQ: to lookup the configuration, and to keep track of statistics so that other programs can easily access that data too. For tasks that require high performance (loading and storing messages) we rely on other technologies, but for statistics and configuration settings we use a database to make this information easily accessible.
But even although we only use the database for non-critical tasks, we still did our best to keep the performance penalty for using a database as small as possible. To prevent that database queries form a bottleneck, MailerQ runs all queries in a non-blocking asynchronous way. This means that when a query is being executed, MailerQ still continues its other duties - it does for example not stop any SMTP communication while a query is running. And besides that, all query results are cached in memory, so that queries do not need to be executed too often. In most cases the required data is already loaded in memory and no query needs to be started. Update operations and insert operations are grouped so that a single query can update many counters at once. This reduces the number of necessary queries even further.
Originally we only supported MySQL databases. And although we did already use a database abstraction layer for running queries on this database, it was not so trivial to add support for the other database engines as the theory makes you believe. The theory says that by using a database abstraction layer, you can simply switch engines, and that it is up to the abstraction layer to deal with this. But it turned out that we still had to rewrite almost every query to make this work. I will summarize the most important changes we made to the queries.
MailerQ automatically creates its own tables by running "create table" and "alter table" queries. This means that when you start up MailerQ you do not have to worry about setting up the database yourself, because this is automatically taken care of by MailerQ. But the "create table" queries that were in use for MySQL turned out not to compatible with Sqlite and PostgreSQL. There were a couple problems with it: PostgreSQL and Sqlite do not support the same type names as MySQL, and the indexes are created in a different manner. PostgreSQL does for example not recognize "binary" as type, but uses "bytea" instead. And MySQL supports creating indexes as part of the "create table" statement, while Sqlite requires a seperate query for this. Further more, in MySQL the name of an index only needs to be unique for the table (two different tables can both have an index with the same name). And specific features like 'auto_increment' are slighly different between the engines too.
That is why we have chosen to create special "create table" queries for all different engines. But that did not come as a surprise, because the data definition languages for all database engines are known to have many differences. The regular "select", "insert" and "update" queries on the other hand are supposed to be standardized SQL and should thus be supported by all engines. Those types of queries have been rewritten in such a way that the same query can be used for accessing all different engines. For almost all queries that we already used we had to do some rewriting because we were using specific MySQL features.
MailerQ stores IPv6 addresses in binary format in the database to keep the column size small. In MySQL there is no difference between escaping/quoting binary data and non-binary data, but for PostgreSQL and Sqlite this does make a difference. We had to change the "insert" and "update" queries to explicitly specify that certain fields contain binary data and need to be escaped differently.
Inside "select" queries we sometimes used the MySQL specific date_format() function. This function is not available in the other engines and we had to rewrite the query to ensure that it uses a different formatting function for the other engines, with other parameters.
MailerQ used "insert into ... on duplicate key" queries for updating the statistics. This is a specific MySQL feature that allowed us to use a single query to either insert a new record into the statistics table, or update an existing one if the record already exists. But this feature is not standardized and thus not supported by other engines. We had to rewrite this single query in a seperate "insert" and a seperate "update" query. And we had to check the result of the initial "insert" query to find out if a second "update" query is necessary, a task that was not trivial for asynchronous queries.
But we have managed to make these changes. It will now probably also be easier to add other database engines too, as we have simplified all our queries so much that only the most common features from SQL are used.
Now that three different engines are supported, the question for the best engine of the three is automatically raised. Well, Sqlite is by far the simplest engine to set up. You only need to set the location of a database file in the MailerQ config file - and you're ready to go. The file does not even have to exist, MailerQ will automatically create it if it is missing. But at the same time, Sqlite is also the slowest engine to use, and it is not possible to access a Sqlite database from scripts that run on different servers. Therefore, if you already have a database server running - either MySQL or PostgreSQL, we advise to use one of those two engines.