Docker for Biologists

June 12, 2024

Docker is discussed in the bioimage analysis world a lot these days. But what are dockers? How does Docker make life easier? In this blog post, I will try to explain the basics of docker and why it matters to biologists.

Have you ever faced package dependency issues when creating Python environments or setting up software from the source code? Docker helps to eliminate these issues and we will see how. 

Let’s begin with “What are Dockers?”

Docker is an open platform to develop, run, and ship applications. The plural form of docker is “containers” or “docker containers” or “OCI containers” or “Linux containers” and Docker manages these. 

Docker:

Figure 1: Docker logo depicting the actual functionality of dockers

 

As the logo shows (Fig 1), Docker containers are akin to shipping containers, but instead of carrying physical goods, they carry all the information that is needed to run an application. What do they do?

  • Just like the physical containers encapsulate their contents and isolate them from the external environment; the Docker containers isolate their contents from the operating system.
  • The containers allow composability where in the physical ones it can be done by stacking on top of each other; in the software ones, it is done through complex orchestration.
  • The containers are standardized - the physical ones by the ISO (International Organization for Standardization) and the software ones by the Open Container Initiative (OCI)

Since containers encapsulate their contents, they can be easily combined into larger applications. For example in bioimage analysis tasks, a pipeline of image processing and analysis steps can be combined using something called Docker Compose. 

Creating a docker:

A user can either create an application on their own or use the ones that are pre-built and shared in a registry service called Docker Hub. To do either of these we first need to install the Docker engine from here.

To create a docker container, we need to have a file called “Dockerfile”. This is a text file that contains information on what an application’s dependencies are, and what it should do when a user runs it. This file will help in creating different layers in a Docker image. Under the hood, a Docker image is a set of files sitting on the computer’s storage. The Docker engine uses this set of files to create one or more running processes called containers. (Fig 2).

 

 

 

 

Figure 2: A simple workflow showing the steps followed in creating a docker container 

We will now go into the details of creating one. 

Step-by-step workflow to create a docker container:

  1. Creating a Dockerfile:

Let’s try to create a simple application that prints out “Hello! This is my first Docker trial”. Since I am planning to use Python to print the line, I need to create a .py file which I am naming as `app.py` (Fig 3).


Figure 3: A Python file that will be used in building the Docker image

 

So we have a task to run and now we can create a text file called `Dockerfile` to create the layers of a Docker image (Figure 4). 


Figure 4: A sample Dockerfile that has the instructions to run the Python file that we created in the previous step

 

With the above file, a Docker image is built with layers created based on each of the instructions. Here in the text file, 

  • `FROM` creates the first layer. We can also mention a pre-built Docker image here to start from something already available in the Docker registry. 
  • `WORKDIR` specifies the working directory.
  • `COPY` copies the .py file and the Dockerfile from the host to the image’s working directory.
  • `CMD` specifies the command to be invoked when the container is initialized from the image.

This is a simple example that we saw here. Similarly, we can save the code/scripts that we used for bioimage analysis as .py files. Please find an example .py file which can perform a simple bioimage analysis task. The packages that are needed to successfully run a script are its dependencies and can be specified in a `requirements.txt` file. We need to make sure that we copy both of these files to the working directory before we build an image (Fig 5). 

 

Figure 5: A sample Dockerfile if we have to copy the Dockerfile, the requirements file, and install the packages listed in the requirements file

 

An example of a `requirements.txt` file (Figure 6),

Figure 6: A sample requirements file with the package versions mentioned. 

 

For more details on how to write a Dockerfile or for advanced functionality, please check this page.   

  1. Building a Docker image:

Once we have the files ready, we can open the terminal on our local machine, navigate to the folder where we have our files saved and build the Docker image. The command that we use is, 

 

docker build . -t firsttrial:v1

`docker build` is the command to build a docker image, and `-t` is the name and tag we provide to uniquely identify the built Docker image. Here, we have named the Docker image `firsttrial:v1`, with a `v1` as the tag. We can check if the Docker image has been built using the following command,

 

docker images

 

The above command will list all the Docker images that are on our machine. Once the docker image is created, one or more containers can be created and run from it at any time. 

  1. Running a docker container:

 

docker run firsttrial:v1

 

Running the above command creates a container from the image, and in this example, will display the `print` statement that we had in the .py file. 


Figure 7: The image shows the action carried out when we run the Docker image. 


The image we built can be deposited in an image registry such as the Docker hub, making it available to others. Please follow this page to learn more about sharing Docker images. 

Advantages of containers:

  • For our purposes, the main advantage of using containers is that they run independently from the rest of the host environment which eliminates common issues faced during the installation of package dependencies, such as dependency conflicts.
  • Applications can be easily shared and the pre-built ones can be easily pulled from the Docker hub.

Biocontainers:

There is a community-driven open source project called BioContainers which allows users to create, manage, and distribute bioinformatics packages and containers. It is based on three main frameworks: Conda, Docker, and Singularity. For more resources on docker and BioContainers, please find the links below. 

I hope this blog post helped you build a docker container. You can also find the files that I used in the GitHub link here.

Resources:

References: