Migration of the Blog to Docker

Documenting the entire process of migrating the blog to Docker.

Although the architecture of this blog is straightforward, it cannot withstand my frequent tinkering, including but not limited to data center migrations, plugin development testing, testing new features, and more. Throughout this process, rebuilding the entire system repeatedly, the traditional approach involves pulling code, launching SQL, configuring Nginx…

As someone who adheres to the DRY principle, I cannot tolerate such repetitive and technically unchallenging tasks.

In the early days of programming, the crudest method I employed was configuring the development environment within a virtual machine and then copying the entire virtual machine file to a USB drive for manual migration. Later, I upgraded to Vagrant. Subsequently, I discovered Docker, and since then, all my development-related environments have been based on Docker. However, all production environments still operate in the traditional manner. Coincidentally, the JD.com data center is expiring soon, so I decided to take this opportunity to transition my blog to Docker.

Architectural Analysis

The blog architecture is relatively simple, following the traditional LNMP architecture. It involves only a few services, along with an additional backup service.

Although the blog contains minimal information, backups are crucial!

Currently, I haven’t considered separating the data layer, so the architecture temporarily does not support horizontal scaling. I believe this architecture is unnecessary for my blog.

Traditional Blog Architecture

Never run multiple programs within a single container! Do not treat Docker as a virtual machine!

One of Docker’s best practices is that a container’s functionality should be singular, focusing on a single application rather than a comprehensive service. A complete service should be realized through orchestration.

Therefore, the entire service will be implemented using four containers:

  1. Web Rule: This layer includes specific rewrite rules and web service configurations, serving as the sole exit point for the entire blog service.

  2. Blog: This container hosts the PHP environment for the blog.

  3. Data: This container is the blog’s database, exclusively serving the blog.

  4. Backup: This container is responsible for backup services and only interacts with the Data layer for database backups.

The web server is on the physical machine, supporting common HTTP protocol processing such as SSL and HTTP 2.0. It acts as a reverse proxy to the Web Rule.

The final service orchestration looks like this:

Blog Docker Architecture

Image Creation

Base

Alpine appears to be the trend, as it is a minimal 5MB Docker image with a complete package management toolset. It may have some differences in dependencies compared to traditional Ubuntu images. For instance, if you try to install openssl-dev, you may encounter issues:

ERROR: unsatisfiable constraints:
  openssl-dev-1.0.2o-r1:
    conflicts: libressl-dev-2.5.5-r0[pc:libcrypto=1.0.2k] libressl-dev-2.5.5-r0[pc:libssl=1.0.2k] libressl-dev-2.5.5-r0[pc:openssl=1.0.2k]
    satisfies: build-deps-0[openssl-dev]
  libressl-dev-2.6.5-r0:
    conflicts: openssl-dev-1.0.2k-r0[pc:libcrypto=2.5.5] openssl-dev-1.0.2k-r0[pc:libssl=2.5.5] openssl-dev-1.0.2k-r0[pc:openssl=2.5.5]
    satisfies: qt-dev-4.8.7-r6[libressl-dev]

Since most software is compiled with libressl-dev, and their paths are the same, you cannot install them simultaneously. Installing libressl-dev should suffice.

However, as Alpine matures, it offers more packages, making it more convenient than before. For instance, when compiling the PHP image, I needed to install the Kafka extension rdkafka. Previously, there was no librdkafka-dev library available, so I had to write the compilation logic in the image. This time, I found that the official package repository had it.

Web Rule

This layer comprises Nginx. Due to Nginx’s architecture, third-party modules cannot be fully automatically loaded. They must be recompiled. Unfortunately, the official image doesn’t provide this method. Compared to the official PHP image, which is extremely user-friendly, if you need third-party modules, you’ll have to recompile Nginx based on the Alpine image. For example, I needed the ngx_http_lower_upper_case module this time.

Blog

This layer represents PHP. It is compiled based on the official PHP image and includes most commonly used extensions. Some extensions are initially disabled and can be enabled later through configuration files, such as Xdebug.

By default, the official image does not enable opcache and PDO.

Data

This layer represents MariaDB. Although the official package repository includes MariaDB, the MariaDB official team has not provided an Alpine version. However, many experts have already provided one. Essentially, the core of any Alpine version of MariaDB lies in the initialization script, which is crucial for initializing the database.

Backup

This layer represents PHP. It is based on the Blog image and adds mysqlclient for database backups.

Container Orchestration

The core of orchestration involves connecting containers and specifying dependencies.

Dependencies

Connections and dependencies between containers should be realized through networks. Docker no longer recommends using links for this purpose, and it may be removed in future versions. Therefore, I have also switched to using networks.

Warning: >The --link flag is a legacy feature of Docker. It may eventually be removed. Unless you absolutely need to continue using it, we recommend that you use user-defined networks to facilitate communication between two containers instead of using --link

The purpose of networks is to declare that a container belongs to a particular network. It can also define a custom alias within the network. Containers can then access each other using these aliases. For example, in the orchestration below, the PHP container can directly access the MySQL container internally using BLOG_MYSQL. It’s simple.

By switching to networks instead of links, I have removed the explicit specification of dependencies and connections from the docker-compose file. This also avoids the issue of circular dependencies between containers.

Here is the orchestration for my blog service:

version: '3'
services:
  nginx:
    networks:
      blog-network:
  php:
    networks:
      blog-network:
        aliases:
          - BLOG_PHP
  mysql:
    networks:
      blog-network:
        aliases:
          - BLOG_MYSQL
networks:
  blog-network:
    driver: bridge

There is another issue to consider: the inability to explicitly manage dependencies. Previously, with links, you could see the dependencies through aliases. For example, using links, you would know that PHP depends on MySQL. However, with the new approach using networks, PHP can directly access MySQL using BLOG_MYSQL, but the dependency relationship is no longer visible in the docker-compose file.

Similarly, my backup service only needs to join the blog-network to access MySQL for the backup process.

Joining an existing network requires using the external option:

Here is the orchestration for the backup service:

version: '3'
services:
  php:
    networks:
      cron-network:
networks:
  cron-network:
    external:
      name: deploy_blog-network

Conflicts

Orphan containers, essentially a conflict issue, may occur when dealing with multiple projects. This is primarily due to conflicts caused by the default COMPOSE_PROJECT_NAME behavior.

WARNING: Found orphan containers (deploy_blog_nginx_1, deploy_blog_php_1, deploy_blog_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.

By default, docker-compose uses the current directory name as the project prefix, i.e., COMPOSE_PROJECT_NAME. If two projects have the same directory name, conflicts may arise, affecting existing containers and potentially leading to orphaned containers.

Additionally, the network names may also conflict!

For instance:

├── blog
│   └── deploy
│   	└── docker-compose.yml

├── backup
│   └── deploy
│   	└── docker-compose.yml

If you do not explicitly specify COMPOSE_PROJECT_NAME, both projects will use deploy as the COMPOSE_PROJECT_NAME value, resulting in conflicts.

The key issue is that Docker Compose does not currently support explicitly setting COMPOSE_PROJECT_NAME in the configuration file. It can only be set using the -p option on the command line, which is not very elegant.

Discussion on supporting the explicit setting of COMPOSE_PROJECT_NAME in the configuration file has been ongoing since 2014. You can participate in the discussion here.

Permissions

After containers are up and running, you may need to execute specific scripts to complete service initialization.

version: '3'
services:
  php:
	command: sh /www/init.sh
    networks:
      cron-network:
networks:
  cron-network:
    external:
      name: deploy_blog-network

However, you often encounter a permission issue:

permission denied 

This issue may arise because scripts are mounted at runtime and are not included during Docker image compilation. Therefore, they may lack execution permissions. Using sh /www/init.sh as the command can help avoid using chmod.

Business Issues

Database (DB)

Previously, the blog’s DB used the utf8 character encoding. Later, to support emojis 🤷‍♂️, I upgraded it to utf8mb4.

This change introduced a pitfall because the encoding is different, and consequently, index lengths are different:

Specified key was too long; max key length is 767 bytes

The solution is straightforward:

  1. Modify the index length. Here is a highly rated answer, which is very helpful.

  2. Update MariaDB to a higher version. I am currently using version 10.1.18, and version 10.2 has increased this limit.

Cron

Scheduled tasks are somewhat special. I have seen cron services running inside Docker containers, but I personally recommend running Cron jobs on the host machine and using Docker to execute tasks. Running cron jobs inside Docker is not advisable.

A better approach is to run a dedicated jobs service inside Docker, which offers better status support and management, eliminating the dependency on Cron.

If your service is simple, please refer to the first piece of advice: Keep it simple.

Recap

Because the blog architecture is simple, and the traffic is low, I used it as the first test subject.

However, if we abstract the blog service and rethink the entire architecture, it consists of three layers: Application (Blog+Backup), Storage (Data), and Gateway (Web Rule). The key idea is that every application should be stateless, and they should be decoupled from each other as much as possible.

In essence, the process mainly helped me become familiar with Docker’s usage. 🤦‍♂️