Startup order in Docker containers

Motivation

I recently dealt with an application that is comprised of multiple services running in containers. Even though every part of this application is correctly split into each separated microservice, the independence of each service is not enforced.
This lack of independence has several drawbacks, one of which is that containers must be started by following a pre-defined startup order. Otherwise, some containers might be terminated due to an application error (the application breaks when an unexpected error occurs, e.g. it is relying on another linked service that is not ready to accept the connection).

Not all applications suffer from this kind of problem: the application I was dealing with was not born with microservices in mind, but it was rather split and converted to separate containers across its lifetime. But it is not the only application that has this particular limit, for sure other applications out there are converted into a Franken-microservice-stein “monster”.

Workarounds

I am going to explore what are the possible workarounds to define and follow a startup order when launching containerized applications that span across multiple containers.

Depending on the scenario, it is possible that we do not want (or we cannot) change the containers and the application itself: there are multiple reasons behind these factors, namely:

  • the complexity of the application
  • whether the sources are available
  • if changes to the Dockerfiles are possible (especially ENTRYPOINTs)
  • the time required to change the architecture of the application

docker-compose and healthcheck

Using docker-compose, we can specify:

  • a healthcheck: it specifies what is the test (command) to check if the container is working. The test is executed at intervals (interval) and retried retries times:
db:
  image: my-db-image
  container_name: db-management
  ports:
    - 31337:31337
  healthcheck:
    test: ["CMD", "curl", "-fk", "https://localhost:31337"]
    interval: 300s
    timeout: 400s
    retries: 10
  • a depends_on field to describe to start the container after the dependency has been started and a restart_on_failure:
web:
  image: my-web-image
  restart: on-failure
  depends_on:
    - db
  links:
    - db

What is happening here?

  • docker-compose starts the service and starts the db container first (the web one depends on it)
  • the web container is started shortly after (it does not wait for db to be ready, because it does not know what “ready” means for us). Until the db container is ready to accept connections, the web container will be restarted (restart: on-failure).
  • the db service is marked as healthy as soon as curl -fk https://localhost:31337 returns 0 (the db-management image ships with an HTTP controller, and it returns 0 only when the database is ready to accept the connections). Marking the service is healthy means that service is working as expected (because the test is returning what we are expecting). When the service is no longer healthy, the container must be restarted and other policies and actions might be introduced.

NOTE: in docker-compose reference < 3, depends_on could also wait for the health checks, but starting from docker-compose reference specification version 3, depends_on can only accept other services as parameters in docker-compose.

This solution is not ideal, as the web container is restarted until the dependency is satisfied: that can be a huge problem if we are using that container for running tests, as a container exiting because of failure can be assimilated as failed tests.

wait-for-it wrapper script

This approach is slightly better than the previous, but it is still a workaround. We are going to use docker-compose and the wait-for-it script.
In the docker-compose.yml file we insert a depends_on (as described in the previous section) and a command:

db:
 container_name: db-management
  ports:
    - 31337:31337
  healthcheck:
    test: ["CMD", "curl", "-fk", "https://localhost:31337"]
    interval: 300s
    timeout: 400s
    retries: 10

web:
  image: my-web-image
  depends_on:
    - db
  links:
    - db
  command: ["./wait-for-it.sh", "db:31337", "--", "./webapp"]

The wait-for-it script waits for host:port to be open (TCP only). Again, this does not guarantee that the application is ready to serve but, compared to the previous workaround, we are not restarting the web container until its dependency is ready.
One drawback of this workaround is that it is invasive: it requires the container image to be rebuilt by adding the wait-for-it script (you can use a multi-stage build to do so).

Re-architect the application

This is not a workaround but it is rather the solution, and the best one we can achieve. It takes effort and it might cost a lot: the application architecture needs to be modified to make it resilient against failures. There are no general guidelines on how to successfully re-architect an application to be failproof and microservice ready, even though I strongly suggest to follow the 12 guidelines expressed in the 12-factor applications website.

Leave a Reply