Input Modules Tutorial

Barbara Diaz-Rohrer

If you are working with single-channel images, you can just drag a few images into CellProfiler and start making your pipeline. Most of us, however, have images with multiple channels or more complex image metadata that you need to tell CellProfiler how to identify.The tutorial aims to introduce you to four modules in CellProfiler; Images, Metadata, NamesAndTypes, and Groups (collectively known as the Input modules). These modules are crucial for any CellProfiler pipeline because they define how images are loaded and organized in CellProfiler for downstream analysis. CellProfiler can load many different data types (e.g. Z-stacks, time series, image masks), but it’s important to input them carefully so that CellProfiler can parse them correctly. In this written tutorial, we’ll go over some best practices and examples of loading many different data types into CellProfiler.

This written tutorial is accompanied by a video tutorial that can be found here. The timestamps in this document correspond to the video tutorial. For each example the source of the image sets is provided. Some examples required a modification of the source images, here you can access all the files used in the tutorial.

In CellProfiler, an “image set” is the collection of channels representing a single field of view (e.g. DAPI and GFP images), and an “image group” is a collection of image sets that should be processed independently from each other (e.g. time-lapse movie, z-stack or plate)

Introduction

An overview of the input modules (0:00)

The Images module is where you will add the individual files or folders containing the images you wish to analyze; this module also allows you to optionally filter the set of files to use during the analysis.

The Metadata module allows you to extract information about the files. The information can be used in later modules to identify the image, to create categories to group a set of images or to obtain image information (such as genotype or compound treatment) that can be stored with the results of your image analysis.

The NamesAndTypes module is where you will create names for the images to be used during the later analysis modules. The naming can be assigned based on the filename, file extension, type of image, file directory, or image metadata. This module is also where you can specify the type of image that is being loaded, and configure image sets (e.g. different channels from a field of view).

The Groups module splits the images into collections or subsets that can be analyzed independently. This approach is helpful for many different experimental setups including Z-projecting images and analyzing time-lapse data. It is also often used when running CellProfiler on a cluster or in the cloud in order to break one large experiment into many easier to analyze pieces; (see our blog post on Distributed CellProfiler).

Preparing your data for import into CellProfiler

CellProfiler can work with many file formats. We encourage you to consult the Help manual; the Images module section provides more details regarding digital images and file formats. An underappreciated yet important aspect of any image analysis project is choosing what to name your images and the folders that contain them. Here are a few tips:

Avoid using spaces in filenames and folder names. Instead, separate words using either capital letters (MyData), dashes (My-Data), or underscores (My_Data).
Avoid special characters, such as periods, parenthesis, or slashes.
Keep in mind that the name of the files and folders can be used as metadata. You may want to include relevant information within the filenames separated by spacers such as a dash or underscore (but NOT a space) - an example could be CellTypeA-TreatmentX_Day5-channel1.tiff. This approach makes metadata extraction and downstream data analysis easier, as well as easier to find the exact file you need later.

The first step to to start working with you images is to drag and drop your images/image folders onto the “Drop files and folders here” pane in the Images module, from there it depends on the type of images you are working with, in the next section we provide numerous examples of types of data you might be working with, while the list is extensive it is not exhaustive.

Importing 2D images (5:21)

These examples show the simplest and most common cases for loading data into CellProfiler - cases where you are working with 2D images and have either one channel or a full set of matching channels for each field of view.

Example 1: Load and configure an image set consisting of three channels saved as separate images.

In the next 3 examples we use the images from ExampleHuman tutorial; in the original data each file contains a single channel, but for Example 2 and Example 3 we have combined all the channels into a single file, but with different architecture; a RGB image (Example 2) is combination of red, green and blue values for each pixel that depict the genuine color, in contrast in a composite image (Example 3) each component/channel is kept separate as layers within the image (see this manual for more information).

In the Images module you can optionally double click on each file to preview the image. For this example we have three channels that comprise an “image set”, and because the channel information was automatically added during the naming we can use the file name to determine the channel it corresponds to.
Configure NamesAndTypes to recognize the channels : Change “Assign a name to” from “All images” (which we would use if our channels were contained in a single file) to “Images matching rules”. We can now match our channels using rules, by stating that any file that has d0 in the name (File does contain d0) is recognized as a DNA image, any file that has d1 in the name (File does contain d1) is a GFP image, and any file that does contain “d2” is recognized as an RFP image) using rules.
You can add more than one rule to match to your particular situation (e.g. 2 images have same channel metadata, other value of metadata can be used to distinguish between them).
By default the image sets will be matched by order, alternatively you can match them based on the metadata, examples of this will be discussed in later sections (Matching images shared across image sets - Example 1 and Introduction to Groups module - Example 2)

Example 2: Load a color RGB image (10:25)

CellProfiler can work with different types of image files, for this example we use an RGB image saved as a .png file.
Since in this case all of our channels are contained in a single file to one image, we can set “Assign a name to” to “All images”.
Designate the image type as a “Color Image” in the NamesAndTypes module.
By using the ColorToGray module as the first module in our pipeline, we can combine all the color components into one grayscale image, in this case we will split the color components into single channels.

Example 3: Load a multichannel file (13:10)

When working with composite images there are two options to work with them.
One option is to use the ColorToGray module as the previous example using “Channels” instead of “RGB” as the “Image type” ; this allows you to use as many channels as you have.
The other option is to use the Metadata module to identify each channel and import the channels as individual grayscale images, use “Extract from image file headers” this option can be used to extract other information if available (e.g. time-point or z-plane). Configure NamesAndTypes to recognize the images based on the metadata ( “Have a C matching” “0” is recognized as a DNA image, “1” is a GFP image, and “2” is recognized as an RFP image)

Example 4: Loading a color histology image (16:50)

(Image from the Broad BioImage Benchmarck Collection, image set BBBC041v1)

On NamesAndTypes module set “Assign a name to” to “All images”, and designate the image type as a “Color Image”.
Use UnmixColors to separate the colors, and save each image as a greyscale image. The module has several preset stains as well as an option to introduce a Custom stain; you can set the absorbances if they are known or estimate a stain from a standard image or from a region of your image.

Matching images shared across image sets (19:46)

Thus far, the image sets we’ve created consist of entirely unique images. However, sometimes an image set consists of some unique images (e.g. channels in an imaging experiment like DAPI and GFP) and non-unique images shared by all of the image sets (e.g. a mask or illumination correction function).

Example 1: Matching images shared across image sets for illumination correction

In this section, we have two image sets, from the Example Human C-N translocation example data set. Each image set contains a GFP channel and a nuclear dye channel (“DNA”). In addition, in this dataset, illumination functions were created on a per-plate basis (see this paper for more information). Each image set needs to load the same illumination function image to apply it downstream.

Using a regular expression in the Metadata module (see the module help for more resources on learning to do this) to extract the channel number and Site from the image file names.
Configure NamesAndTypes using the metadata to name the GFP and DNA images.
In the Images module update the filter as a custom to recognize both image files and .npy files (illumination functions)
Using a regular expression in the Metadata module extract from the illumination function the channel number or name.
Update the NamesAndTypes module so that it now has 4 channels: DNA has a Channel of 1 and contains “tif”, GFP has a Channel of 2 and contains “tif”, Illum_DNA has a Channel of 1 and contains “.npy”, and Illum_GFP has a Channel of 2 and contains “tif”. Make sure to also set the correct file type (grayscale vs illumination function) for each channel!
In the NamesAndTypes module set the matching method as metadata and match the DNA and GFP by Site but set Illum_DNA and Illum_GFP as None. This teaches CellProfiler that the DNA and GFP images for all sites should be matched, but the illumination correction functions should be used for all images.

Example 2: Matching images shared across image sets to apply the same mask to many image sets (37:08)

In this section, we will use image available on the Example Yeast Colony Classification, we’ll demonstrate loading a single image that will be added to all image sets. In this case, the single image is a mask to eliminate pixels outside of the well.

Extract metadata from the file name, for this case the plate number.
Configure NamesAndTypes to name the images as “SamplePlate”.
Within NamesAndTypes, select “Add a single image” and add the mask image as a “Binary mask” type.

Introduction to the Groups module (42:45)

Thus far, none of our examples have used the Groups module, because each field of view was independent and could be processed before or after any other field of view. When that is NOT the case (such as “making a maximum projection” or “tracking multiple time points”), we can use the Groups module to teach CellProfiler for each field of view which other fields of view it needs to be processed alongside, and in which order. In this example, we’ll demo loading and grouping of time-lapse data. This data and pipeline are available on the Example Object Tracking.

Example 1: loading and processing time-lapse data

In this example, each movie consists of a single image for each frame. Three unique movies are contained within three different folders (Sequence1, Sequence2, and Sequence3). Note that there are no spaces in the file or folder names.

Extract metadata from the file names (to get Specimen, Stain, and FrameNumber) and folder names (to get information about which “Run” or movie each image belongs to).
Assign a single image name to each image using NamesAndTypes by using rules to match the extracted metadata.
Group the images into one group for each timelapse movie in the Groups module using the “Run” information extracted in the Metadata module. The images are grouped with the other images in the same folder.

This timelapse data is now configured for analysis in CellProfiler.

Example 2: Loading and processing time-lapse data with a constant frame across images (46:15)

Another common challenge when loading sequences is loading a sequence that includes one constant image or object set that should be matched to every frame in the timelapse. In this example, we’re using the same data, but have in a separate workflow made CellProfiler objects from the first frame. This would allow us to, for example, track intensity in a constant region over time.

Extract metadata from the file names (to get Specimen, Stain, and FrameNumber) and folder names (to get information about which “Run” or movie each image belongs to).
In the NamesAndTypes assign an image name to our GFP image and our object image (“Nucleus”). In this case, for the Nucleus image the type will be “Objects”.
Under “Image set matching method” select “Metadata”, for the first row match using the extracted metadata “Run” for both GFP and Nucleus.
Add a second row for metadata matching and add the FrameNumber metadata flag for the channel(s) we want to measure over time (GFP) and set the Nucleus object set (which will match to all frames of the sequence) to None.
Group the images into one group for each “Run”/movie in the Groups module using the information extracted in the Metadata module. The images are grouped with the other images in the same folder.

3D Processing (48:14)

If your data was acquired with multiple Z planes, you might be interested in doing a couple of different things with it - working in it as a 3D stack OR creating a projection image, such as a maximum or average projection. If you want to create a projection image, you must do so in a separate pipeline before running any other CellProfiler steps like segmentation or measurement - we will demonstrate that in Example 1.

If you want to analyze your data in 3D, for now CellProfiler requires that you have your data as one tiff image per channel, containing all the Z planes - so if you had 3 channels and 20 Z planes, you’d need 3 .tiff files each of which has 20 planes. If your data is already in that format, great, you can proceed to Example 3, otherwise Example 2 can teach you how to generate such a format by running an initial CellProfiler pipeline to create files of that type.

Example 1: Importing a multichannel 3D image and Z projecting each channel

Here, we’ll show how to use the Groups and MakeProjection module to appropriately group and Z project a multichannel 3D image.

Extract metadata from image file headers and “Update” the image list in the Metadata module.
Configure NamesAndTypes to give each image the same name.
Configure Groups to group each Z plane of each channel in a 3D image into one group using the metadata extracted as well as the file location.
Use one MakeProjection module per channel to create a Z projection of that channel.
Save the projected image using one SaveImages module per channel - set the “When to save” option to save the projected image on the last cycle of the group. You can use the metadata information extracted to name the images to be saved.

Example 2: Importing a multichannel 3D image and saving each channel as a .tiff file (53:45)

It’s quite common to have multiple channels and Z stacks contained within a single file format. If you’d like to analyze this data in CellProfiler in 3D, you will need to have a single file for each channel. You can import the original multichannel Z-stack data and then save the single channel images as .tif files using the SaveImages module, which we demonstrate here.

Extract metadata from the fileheaders (Channels, Z plane) and add a second metadata extractor to extract the FileName (keep it simple to get the whole filename).
Configure NamesAndTypes to give each channel a name.
Configure the Groups module to group the images by the metadata category FileName.
Save each channel as a single channel .tiff file using the Movie/Stack option.

Example 3: Importing 3D single channel .tiff files (59:10)

Next, we’ll load individual channels into CellProfiler for 3D analysis. When working in three dimensions in CellProfiler, you must import the data as single channel .tiff files. This process is very similar to importing 2D single channel images, with the exception that you must select “Yes” for “Process as 3D?” In this example, we’ll use the single channel .tiff files created in Example2 of this section.

Configure the NamesAndTypes module to recognize the data as 3D and to recognize each channel.
Enter the relative pixel spacing (relative spacing can be obtained from image info; if it’s not available the spacing can be set to 1, but this will affect volumetric measurements).
Configure NamesAndTypes to give each channel a name.

Throughout this video, we have shown you several different examples of how to use the four input modules (Images, Metadata, NamesAndTypes, and Groups) and other additional modules used to modify the images into a useful format. Even though we try to be as comprehensive as possible, if you have further questions or specific problems, you can refer to the online manual, which can be accessed in the welcome screen on CellProfiler or https://cellprofiler.org/manuals You can also submit your questions at https://forum.image.sc/tag/cellprofiler.