Featured image of post Container (3): The Best Practice Guide for Docker——Managing Data Volumes

Container (3): The Best Practice Guide for Docker——Managing Data Volumes

How to manage data volumes in Docker for best practices

Motivation

This is the third article in the Docker series. The links to the other articles in this series are as follows:

In the previous two articles, we introduced the basic concepts of containers and how to use them, as well as the best practices for using containers.

In the previous articles, we did not focus on the management of data volumes, but managing data volumes is actually a very important part of using and managing containers. Generally speaking, the data in a container is temporary; when the container stops or is deleted, the data in the container will also be deleted. This mechanism makes containers very lightweight, and users do not have to worry about the data in the container occupying too much storage space.

However, when using containers to deploy certain services, we may need to persist some data in the container so that it does not get lost even if the container stops or is deleted. For example, in the article “Setting Up a Private Image Hosting Service with Chevereto-free”, we introduced how to set up a private image hosting service, where we need to persist the image data in Chevereto to the host machine so that these images can still be accessed after the container stops or is deleted. Or when we migrate Docker to another machine in the future, we can also migrate this data.

Data volumes are the mechanism in Docker used for data persistence, allowing data in containers to be saved to the host machine or shared between multiple containers.

Prerequisites

  • Docker and docker-compose are installed
  • Understand the basic concepts and basic usage of Docker
  • Use docker-compose to manage containers (if you are not using docker-compose to manage containers, or do not want to use docker-compose, this article is for reference only)

Introduction to Data Volumes

Data volumes are the mechanism in Docker used for data persistence, allowing data in containers to be saved to the host machine or shared between multiple containers. Data volumes can be shared between containers or between the host machine and containers.

Using data volumes can avoid the problem of data in containers being deleted when the container stops or is deleted. Using data volumes can also improve the performance of containers, as data volumes are directly mounted to the host machine rather than accessed over the network.

When using docker-compose to manage containers, we can define data volumes and their usage in the docker-compose.yml file using the volumes field. Generally, there are several types of data volumes in Docker:

  • Named Volume: Named volumes are the most commonly used type of data volume in Docker and can be shared between multiple containers. The name of a named volume is unique, and it can be accessed by its name. The data of named volumes is stored in the /var/lib/docker/volumes directory on the host machine.

    When using named volumes in docker-compose, you can define them in the docker-compose.yml file using the volumes field, for example:

    1
    2
    3
    4
    5
    6
    7
    8
    
    version: '3'
    services:
      app:
        image: nginx
        volumes:
          - my_volume:/usr/share/nginx/html
    volumes:
      my_volume:
    

    Note that after defining a named volume in services, it must also be defined in the volumes field.

  • Bind Mount: A bind mount mounts a directory from the host machine into the container, allowing data to be shared between the host and the container. The data of a bind mount is stored in a specified directory on the host machine.

    When using bind mounts in docker-compose, you can define them in the docker-compose.yml file using the volumes field, for example:

    1
    2
    3
    4
    5
    6
    
    version: '3'
    services:
      app:
        image: nginx
        volumes:
          - ./data:/usr/share/nginx/html
    

    The above configuration mounts the ./data directory on the host machine to the /usr/share/nginx/html directory in the container. Bind mounts also have a special usage called Read-Only Mount, which mounts a directory from the host machine into the container in read-only mode, allowing data to be read in the container but not modified. Read-only mounts are typically used for reading specific data from the host machine, such as system timezone files or SSL certificates.

    When using read-only mounts in docker-compose, you can define them in the docker-compose.yml file using the volumes field, for example:

    1
    2
    3
    4
    5
    6
    
    version: '3'
    services:
      app:
        image: nginx
        volumes:
          - /etc/localtime:/etc/localtime:ro
    

    Here the configuration mounts the /etc/localtime file from the host machine to the /etc/localtime file in the container in read-only mode. This ensures that the timezone in the container is consistent with that of the host machine.

  • tmpfs Mount: A tmpfs mount mounts the host machine’s memory into the container, allowing the container to use memory as data storage. The data stored in a tmpfs mount is kept in the host machine’s memory, and it will be lost when the container stops or is deleted.

    When using tmpfs mounts in docker-compose, you can define them in the docker-compose.yml file using the volumes field, for example:

    1
    2
    3
    4
    5
    6
    
    version: '3'
    services:
      app:
        image: nginx
        volumes:
          - /tmp:/usr/share/nginx/html:tmpfs
    

    Note that the syntax for mounting tmpfs is similar to that of bind mounts, but with :tmpfs added after the path to indicate that it is a tmpfs mount.

  • Anonymous Volume: Anonymous volumes are a less commonly used type of data volume in Docker, which do not have a name and can be shared between multiple containers. The data of anonymous volumes is stored in the /var/lib/docker/volumes directory on the host machine.

    When using anonymous volumes in docker-compose, you can define them in the docker-compose.yml file using the volumes field, for example:

    1
    2
    3
    4
    5
    6
    
    version: '3'
    services:
      app:
        image: nginx
        volumes:
          - /usr/share/nginx/html
    

    The above configuration mounts the /usr/share/nginx/html directory in the container to an anonymous volume on the host machine.

Best Practices for Using Data Volumes in Docker

Using data volumes in Docker can be very flexible, but it can also lead to some issues, such as permission problems with data volumes, backup and recovery of data volumes, etc. Therefore, when using data volumes, it is best to follow some principles:

Understand the Purpose of Data Volumes

When creating a data volume, we should understand its purpose. We can categorize data volumes into the following types:

  • Application Data Volume: Stores application data, such as databases, caches, etc.
  • Configuration Data Volume: Stores application configuration files, such as nginx, apache, etc.
  • Log Data Volume: Stores application log files, such as nginx, apache, etc.
  • Temporary Data Volume: Stores temporary files, such as uploaded files, cache files, etc.
  • Shared Data Volume: Stores data shared between multiple containers, such as data shared between nginx and php-fpm.

Classify Data Volumes

Based on the purpose of the data volume, determine the type of mount. Generally, we should classify them as follows:

  • Application Data Volume: Use named volumes or bind mounts.
  • Configuration Data Volume: Use bind mounts.
  • Log Data Volume: Use bind mounts.
  • Temporary Data Volume: Use tmpfs mounts.
  • Shared Data Volume: Use named volumes or bind mounts.

Unified Management of Data Volumes

Unified management of data volumes includes several different aspects:

  1. For the data volumes of the same application that need to use bind mounts, they should be uniformly mounted to the same directory or kept consistent with the path of the data volume in Docker. For example:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    version: '3'
    services:
      app:
        image: nextcloud
        volumes:
          - /media/user/docker_data/nextcloud/app/config:/var/www/html/config
          - /media/user/docker_data/nextcloud/app/custom_apps:/var/www/html/custom_apps
          - /media/user/docker_data/nextcloud/app/data:/var/www/html/data
          - /media/user/docker_data/nextcloud/app/themes:/var/www/html/themes
    

    We can mount all Nextcloud data volumes to the /media/user/docker_data/nextcloud/app directory, making it easy to manage these data volumes.

  2. For different applications, we should also try to place their data volumes in the same directory, such as /media/user/docker_data, so that we can easily manage these data volumes.

  3. For certain important data volumes, we can separate them from ordinary data volumes and store them on more stable storage media, such as RAID arrays. For example, in the above Nextcloud data volumes, the most important data volume is /var/www/html/data, which can be placed on a RAID array, such as mounting the RAID array at /media/user/raid, we can define the data volume as follows:

    1
    2
    3
    4
    5
    6
    
    version: '3'
    services:
      app:
        image: nextcloud
        volumes:
          - /media/user/raid/nextcloud/app/data:/var/www/html/data
    

    This way, we can store the data on the RAID array.

Backup Important Data Volumes

For important data volumes, we should regularly back up the data in the volume to prevent data loss. We can use the docker cp command to copy data from the volume to the host machine, or use the docker volume export command to export the volume as a tar file. Before updating, migrating, or upgrading containers, we should also back up the data in the volume to prevent data loss.

1
docker cp <container_id>:/path/to/data /path/to/backup

Or

1
docker volume export <volume_name> > backup.tar

Regularly Clean Up Unused Data Volumes

When using Docker, we may create some temporary data volumes that are no longer needed after use. We should regularly clean up these unused data volumes to free up storage space. We can use the docker volume prune command to delete all unused data volumes.

1
 docker volume prune
comments powered by Disqus