Sunday 13 August 2017

How to have a higher chance of success when restoring a big MySQL database

Restoring a MySQL database is fast and easy when you just copy files in datadir when the server is shutdown, or if you use Percona xtrabackup.

But if you for some reason (AWS RDS) only have MySQL protocol available for backup, you usually can have a compressed mysqldump, that is quite slow to restore, not because of the compression or because the decompressed version is a text file that needs to be parsed, but because MySQL is slow to push it through it's disk pipeline, and because it needs to build data indexes while doing a restore.

I've spent multiple days babysitting the process of restoring a 7GB gzip compressed MySQL dump file, and these are results and tips that could help you save some time.


So, make sure that:
- you have enough IO available: For restoring a 66 GB datadir 315.6 GB was written to the drive (as measured with iostat), with a tuned MySQL configuration. For a DB of this size a mechanical drive doesn't cut it, and restore will take multiple days. Use a good SDD.

- your database TRIGGERS all have BEGIN/END statements (even though you can create them without and even thought the bug was supposed to be fixed https://bugs.mysql.com/bug.php?id=16878),  it fails on restore, with all versions of MySQL 5.7/5.6 i tried

- you start with a really empty database in your datadir - DB I worked with had inconsistent data types on a foreign key, when the dependent table with an inconsistent key already exists MySQL will report a foreign key error (MariaDB will be more informative), but if it doesn't it will happily restore the database

- your max_allowed_packet conf value is big enough, or you'll get a MySQL server has gone away message from your client while restoring.

- your innodb_log_file_size is big enough (https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_log_file_size) - if you have large BLOB values in your DB, restore will fail if the value is lower than 10% of your blob field. This setting is important for quick restore too

- you have log-bin turned off in order to minimize your chance to run out of drive space and save IO (log-bin=Off doesn't mean that it's disabled just that the log bin files start with Off, the documentation can be confusing here :)  What worked for me is having all log-bin lines in the mysqld config section commented out


Finally, if you want it to finish quickly, use the fastest SSD you have available, and consider tuning MySQL configuration a bit. I'm also considering using a ramdisk, because it would help both with restore speed and when you need to do some DB transformations. MySQL defaults are not reasonable, especially for innodb_log_file_size, max_allowed_packet.

I used excellent pv for figuring out if the restore process will finish in reasonable time

pv db_dump.gz |gunzip -c |mysql -uroot database_name



Here's a full list of my MySQLd configuration variables that worked for me on my dev laptop

#my dev laptop is low on memory, for prod server you would use a lot more
innodb_buffer_pool_size=512M
innodb_additional_mem_pool_size=256M
innodb_log_buffer_size=256M
innodb_log_file_size=512M
max_allowed_packet=64M

#for saving disk IO, dont use on prod
innodb_flush_log_at_trx_commit = 2
innodb_flush_method=O_DIRECT_NO_FSYNC
skip-innodb_doublewrite




Sunday 30 July 2017

Some thoughts on (Modern) PHP



I have experience in both Java and PHP. Java mostly for traditional desktop apps, embedded UIs and PHP for websites.


Custom PHP frameworks I've built or helped build took into account the way PHP is executed: you are stateless and need to setup everything on every request (runtime is fast to start with FPM and opcode cache). Namespaces based cheap autoloader worked great. We used singletons for getting the configuration and connections to DBs. There was almost no setup code that needed to be run every time other than loading .ini based configuration and connecting to the DB. My webapps responded under 20ms (DB and other services like sphinx included), and I could get it to respond in 1 ms for things where we needed to be quick and didn't have to output HTML with Forms. It was really small and you could read the whole framework code in 1-2 hours. It worked with SQL in a reasonable way. You didn't have to write your SQL for simple CRUD, but for larger things involving joining multiple tables and more complex expressions we wrote native SQL. Caching was done thoughtfully, using  APC user cache (SHM with zero copy). It just felt nice.


I switched jobs recently, and started with Symfony 3. The thing felt like some Java framework, but poorly documented and harder to use than it should be. It had lots and lots of setup code done before handling every request. There's a whole DI framework with it's load of setup code for every component. And you have to do setup even though you don't use the component in that particular request. There are ways of doing setup lazily, but you still waste time to wire that up. Framework overhead can be 30-100ms. Other modern PHP frameworks often have similar overhead. I know that there's PHP-PM, to save some of that work that isn't really $_REQUEST specific, but it doesn't seem to be used much for production. And using Silex (deprecated by Symfony 4?) is really not that different, you still either reuse Symfony components, or rewrite them, but with similar "best practices" that are inspired by Java.


Regarding persistence, Doctrine and it's verbosity feels very ugly to me. I'd much rather use SQL syntax for defining relationships than bunch of PHP with special syntax comments or xml or yaml. And also use real SQL for complex queries.



Everybody is using type hints wherever they can, and it feels as verbose as Java, but without compile time type safety, and you can't really put type information everywhere (class members for example).


So you are almost using a type safe language, but can't get performance or compile time benefits, because inevitably, you'll have to use some dynamic typing or other dynamic language features.

Even though PHP runtime has made great progress with 7.x (it's probably the fastest interpreted language, and it's great it has reference counted garbage collection), it feels like language is struggling to find it's identity, with it taking a lot from Java and still coping with ugly legacy ($, having to use $this inside a class function, php.ini, features for supporting templating even though it's rarely used as a templating language in modern frameworks, https://3v4l.org/mcpi7).


Learning Python and Flask (as an example) was much more enjoyable than switching from a nimble custom PHP framework to Symfony. Using NodeJS and minimalistic components to build my own framework was also nice. I'd love to try GoLang, Swift or Rust in the backend too.


And there's the thing that most of the PHP frameworks try too hard to be full stack, when nowdays it's not rare you only do REST APIs on the backend. So there's a lot of infrastructure and assumptions in place for rendering HTML that you really don't need to use and that gets in the way when learning the framework and is wasteful when the code is executing.


I'd argue you can write fast, simple and maintainable PHP, by using state PHP runtime has setup for you ($_POST, $_GET, $_SERVER etc), namespaces and a namespace based autoloader, trying to use pure functions when you can (using static classes shouldn't be a sin - use it to split your code in sensible parts), and using general good practices for writing readable and maintainable code (avoid long functions, huge classes, too much block nesting, decoupling, naming things in a good way). With some coding conventions you can write a decent and productive framework quickly, but you could do that with a nicer language too, so what's the point?

(Thankfully, I'm not using Symfony on my new job, and Yii2 does suffer from some issues too, it at least feels better for now)