Motivation
This is the third article in the Docker series. The links to the other articles in this series are as follows:
- Container (1): Introduction to Container-related Knowledge——Containerization, Docker, Docker-compose, Kubernetes / K8s, etc.
- Container (2): The Best Practice Guide for Docker——docker-compose and Portainer
- Container (4): The Best Practice Guide for Docker——Container Update, Upgrade, and Migration
- Container (5): The Best Practice Guide for Docker——Container Update Monitoring Tool WUD (What’s Up Docker)
- Container (6): Misconceptions, Bad Habits, and Issues When Using Docker
In the previous two articles, we introduced the basic concepts of containers and how to use them, as well as the best practices for using containers.
In the previous articles, we did not focus on the management of data volumes, but managing data volumes is actually a very important part of using and managing containers. Generally speaking, the data in a container is temporary; when the container stops or is deleted, the data in the container will also be deleted. This mechanism makes containers very lightweight, and users do not have to worry about the data in the container occupying too much storage space.
However, when using containers to deploy certain services, we may need to persist some data in the container so that it does not get lost even if the container stops or is deleted. For example, in the article “Setting Up a Private Image Hosting Service with Chevereto-free”, we introduced how to set up a private image hosting service, where we need to persist the image data in Chevereto to the host machine so that these images can still be accessed after the container stops or is deleted. Or when we migrate Docker to another machine in the future, we can also migrate this data.
Data volumes are the mechanism in Docker used for data persistence, allowing data in containers to be saved to the host machine or shared between multiple containers.
Prerequisites
- Docker and docker-compose are installed
- Understand the basic concepts and basic usage of Docker
- Use docker-compose to manage containers (if you are not using docker-compose to manage containers, or do not want to use docker-compose, this article is for reference only)
Introduction to Data Volumes
Data volumes are the mechanism in Docker used for data persistence, allowing data in containers to be saved to the host machine or shared between multiple containers. Data volumes can be shared between containers or between the host machine and containers.
Using data volumes can avoid the problem of data in containers being deleted when the container stops or is deleted. Using data volumes can also improve the performance of containers, as data volumes are directly mounted to the host machine rather than accessed over the network.
When using docker-compose to manage containers, we can define data volumes and their usage in the docker-compose.yml
file using the volumes
field.
Generally, there are several types of data volumes in Docker:
Named Volume: Named volumes are the most commonly used type of data volume in Docker and can be shared between multiple containers. The name of a named volume is unique, and it can be accessed by its name. The data of named volumes is stored in the
/var/lib/docker/volumes
directory on the host machine.When using named volumes in docker-compose, you can define them in the
docker-compose.yml
file using thevolumes
field, for example:1 2 3 4 5 6 7 8
version: '3' services: app: image: nginx volumes: - my_volume:/usr/share/nginx/html volumes: my_volume:
Note that after defining a named volume in services, it must also be defined in the
volumes
field.Bind Mount: A bind mount mounts a directory from the host machine into the container, allowing data to be shared between the host and the container. The data of a bind mount is stored in a specified directory on the host machine.
When using bind mounts in docker-compose, you can define them in the
docker-compose.yml
file using thevolumes
field, for example:1 2 3 4 5 6
version: '3' services: app: image: nginx volumes: - ./data:/usr/share/nginx/html
The above configuration mounts the
./data
directory on the host machine to the/usr/share/nginx/html
directory in the container. Bind mounts also have a special usage called Read-Only Mount, which mounts a directory from the host machine into the container in read-only mode, allowing data to be read in the container but not modified. Read-only mounts are typically used for reading specific data from the host machine, such as system timezone files or SSL certificates.When using read-only mounts in docker-compose, you can define them in the
docker-compose.yml
file using thevolumes
field, for example:1 2 3 4 5 6
version: '3' services: app: image: nginx volumes: - /etc/localtime:/etc/localtime:ro
Here the configuration mounts the
/etc/localtime
file from the host machine to the/etc/localtime
file in the container in read-only mode. This ensures that the timezone in the container is consistent with that of the host machine.tmpfs Mount: A tmpfs mount mounts the host machine’s memory into the container, allowing the container to use memory as data storage. The data stored in a tmpfs mount is kept in the host machine’s memory, and it will be lost when the container stops or is deleted.
When using tmpfs mounts in docker-compose, you can define them in the
docker-compose.yml
file using thevolumes
field, for example:1 2 3 4 5 6
version: '3' services: app: image: nginx volumes: - /tmp:/usr/share/nginx/html:tmpfs
Note that the syntax for mounting tmpfs is similar to that of bind mounts, but with
:tmpfs
added after the path to indicate that it is a tmpfs mount.Anonymous Volume: Anonymous volumes are a less commonly used type of data volume in Docker, which do not have a name and can be shared between multiple containers. The data of anonymous volumes is stored in the
/var/lib/docker/volumes
directory on the host machine.When using anonymous volumes in docker-compose, you can define them in the
docker-compose.yml
file using thevolumes
field, for example:1 2 3 4 5 6
version: '3' services: app: image: nginx volumes: - /usr/share/nginx/html
The above configuration mounts the
/usr/share/nginx/html
directory in the container to an anonymous volume on the host machine.
Best Practices for Using Data Volumes in Docker
Using data volumes in Docker can be very flexible, but it can also lead to some issues, such as permission problems with data volumes, backup and recovery of data volumes, etc. Therefore, when using data volumes, it is best to follow some principles:
Understand the Purpose of Data Volumes
When creating a data volume, we should understand its purpose. We can categorize data volumes into the following types:
- Application Data Volume: Stores application data, such as databases, caches, etc.
- Configuration Data Volume: Stores application configuration files, such as nginx, apache, etc.
- Log Data Volume: Stores application log files, such as nginx, apache, etc.
- Temporary Data Volume: Stores temporary files, such as uploaded files, cache files, etc.
- Shared Data Volume: Stores data shared between multiple containers, such as data shared between nginx and php-fpm.
Classify Data Volumes
Based on the purpose of the data volume, determine the type of mount. Generally, we should classify them as follows:
- Application Data Volume: Use named volumes or bind mounts.
- Configuration Data Volume: Use bind mounts.
- Log Data Volume: Use bind mounts.
- Temporary Data Volume: Use tmpfs mounts.
- Shared Data Volume: Use named volumes or bind mounts.
Unified Management of Data Volumes
Unified management of data volumes includes several different aspects:
For the data volumes of the same application that need to use bind mounts, they should be uniformly mounted to the same directory or kept consistent with the path of the data volume in Docker. For example:
1 2 3 4 5 6 7 8 9
version: '3' services: app: image: nextcloud volumes: - /media/user/docker_data/nextcloud/app/config:/var/www/html/config - /media/user/docker_data/nextcloud/app/custom_apps:/var/www/html/custom_apps - /media/user/docker_data/nextcloud/app/data:/var/www/html/data - /media/user/docker_data/nextcloud/app/themes:/var/www/html/themes
We can mount all Nextcloud data volumes to the
/media/user/docker_data/nextcloud/app
directory, making it easy to manage these data volumes.For different applications, we should also try to place their data volumes in the same directory, such as
/media/user/docker_data
, so that we can easily manage these data volumes.For certain important data volumes, we can separate them from ordinary data volumes and store them on more stable storage media, such as RAID arrays. For example, in the above Nextcloud data volumes, the most important data volume is
/var/www/html/data
, which can be placed on a RAID array, such as mounting the RAID array at/media/user/raid
, we can define the data volume as follows:1 2 3 4 5 6
version: '3' services: app: image: nextcloud volumes: - /media/user/raid/nextcloud/app/data:/var/www/html/data
This way, we can store the data on the RAID array.
Backup Important Data Volumes
For important data volumes, we should regularly back up the data in the volume to prevent data loss. We can use the docker cp
command to copy data from the volume to the host machine, or use the docker volume export
command to export the volume as a tar file. Before updating, migrating, or upgrading containers, we should also back up the data in the volume to prevent data loss.
|
|
Or
|
|
Regularly Clean Up Unused Data Volumes
When using Docker, we may create some temporary data volumes that are no longer needed after use. We should regularly clean up these unused data volumes to free up storage space. We can use the docker volume prune
command to delete all unused data volumes.
|
|