NSF CAREER project page

NSF CAREER Award Project
(March 2012–February 2017) [Project abstract at nsf.gov] [Project Outcomes Report (search here, using Fed Award ID 1148823 (PDF))]

thomas_cropped_colorNon-technical description of the problem:

Biologists want to test genetic and chemical treatments in environments that more closely mimic the human body, to increase the likelihood that their experiments will uncover causes and cures for human disease. Increasingly, they are able to create complex cell systems in a dish, mixing two or more cell types such as liver cells and their supporting fibroblasts. These systems are proven to more accurately reflect tissues from the human body, but it has been challenging to measure key features of cells in mixtures.

Project goals:

We aimed to identify and validate automated image analysis approaches to extract information from fluorescence microscopy images of co-cultured cell systems, while educating students, scientists, and the public about the theory, practice, and societal impact of biological image analysis. Biologists increasingly use co-culture systems, where two or more cell types are grown together in order to more accurately model their native environment. In order to use these powerful co-culture systems to tackle a wide range of basic biological research questions, the remaining challenge was to accurately extract quantitative measurements from each cell in microscopy images of such co-cultures, given that cell types with diverse morphologies are present. We aimed to give the scientific community a validated, open-source software toolbox of image processing and machine learning algorithms readily usable by biologists. The education and outreach efforts of the project produced validated, engaging educational materials that can be freely implemented by high school teachers around the world.

Project results:
as of project end, February 2017

We successfully developed algorithms to distinguish and quantify cell type content in images where two cell types are co-cultured. The approach involves a combination of model-based segmentation and supervised, iterative machine learning.

We developed an initial approach (model-based segmentation followed by supervised machine learning of the identified objects, Publication 1) to accurately count hepatocytes using image sets from two different laboratories and two different co-culture systems (hepatocytes/fibroblasts and hematopoietic stem cells/fibroblasts). We then developed an improved approach (pixel-based machine learning to distinguish the two cell types followed by model-based segmentation) that allows improved counting of fibroblasts as well as more accurate delineation of both cell types (Publication 7). All of our work is available through our open-source software project, CellProfiler (www.cellprofiler.org).

Working closely with liver disorder researchers, we tested this software in experiments that identified chemicals that can cause liver cells to grow (Publication 1), which could be developed into therapeutics or used in the laboratory to generate renewable sources of functional human liver cells, potentially for transplants. We also uncovered new cellular pathways involved in liver functions (Publication 10). With cancer researchers, we discovered chemical compounds that inhibit the growth of leukemia cells without damaging normal blood-making cells, including a class of drugs called statins that are already used for patients with high cholesterol (Publication 2).

In these experiments, biologists aimed to measure one particular cellular response relevant to their goals. But microscopy experiments can measure hundreds of biologically valuable features from each cell and capture rich mechanistic information about cell state; this information is rarely mined to its full potential. We formally demonstrated this in a literature review indicating most image-based screens pay attention to only one or two measured features of cells (Publication 9). Some of this information is, in fact, not even noticeable to the human eye, yet we hypothesized that accurate measurements of individual cells can reveal highly reproducible and relevant differences between two treatment conditions.

We therefore worked out computational algorithms that allow us to identify the function of an unknown chemical by grouping it with known chemicals (Publications 3 and 4). We did this by extracting 1,400+ features from each cell’s image, then comparing the resulting “fingerprint” of the cell’s appearance with fingerprints from other treatments. We tested the usefulness of profiles in an experiment to create a performance-diverse chemical library (Publication 5). We made several methodological improvements and discoveries (Publications 6 and 8).

Then, by grouping similar genetic treatments together, we were able to determine which corresponding genes share similar function. We proved the principle by creating an initial morphological map of gene function where 110 genetic treatments were grouped (Publication 11). This revealed many known-to-be-related genes grouping nearby each other. It also uncovered in human cells a new connection between two cancer signaling pathways, both of which critically regulate tumor initiation and progression. This provides evidence that a full map of all human genes would be useful to uncover other novel gene functions and connections between genes. Our work also lays the foundation for a personalized medicine approach, wherein gene variants found in patients with various diseases, for example cancer, would be tested to reveal similarities among individual patients’ tumors, thus guiding personalized treatments suited to the particular abnormality of the cells.

We also explored whether deep learning, a powerful approach recently successful in solving many practical problems, might be applicable to the problem of comparing genetic or chemical treatments. We found that a deep learning network that was “taught” using natural images (of cars, animals, people, etc.) was successful to compare images of cell populations treated with various chemical treatments (Publication 12). Although our current iteration only meets the accuracy of prior, classical methods (which explicitly locate and measure cells), we expect future improvements to beat prior approaches.

To assist in other researchers to build on our work, we strive to provide access to raw data and reproducible code in public repositories. We published a fully detailed protocol, complete with troubleshooting, to enable other researchers to implement the Cell Painting assay (Publication 13). We publicly released the full image set from a 30,000 small molecule screen to aid others in re-using this data (Publication 14). Lastly, the NSF CAREER award supported the PI’s effort contributing to key review articles in the fields of bioimage analysis (Publication 15) and image-based profiling (Publication 16).

Publication 1: [link] The Shan, et al. (Nat. Chem. Biol. 2013) hepatocyte proliferation project used our image analysis pipeline to identify chemical compounds that induce proliferation of primary human hepatocytes and others that induce the maturation of hepatocytes. Both should be useful in generating mature, stable, and renewable sources of this cell type for experiments involving realistic in vitro liver systems, including for liver bioengineering and for basic and applied research involving the liver.

Publication 2: [link] The Hartwell, et al. (Nat. Chem. Biol. 2013) leukemic stem cell project used our image analysis pipeline to identify that statins may cure certain leukemias.

Publication 3: [link] In Ljosa, et al. (JBS 2013), we implemented and tested a variety of existing image-based profiling methods and performed, for the first time, a side-by-side comparison of them. The aim of the experiment used as the test bed for this work was to classify chemical treatments on the basis of the morphology each induces on a population of cells. This lays the groundwork for our aim to develop large-scale machine learning approaches to identify subtle morphological changes between two cell populations.

Publication 4: [link] In Gustafsdottir, et al. (PLoS One 2013), we designed a novel multiplex assay specifically for cytological profiling. The goal was to “paint the cell” as richly as possible in order to detect diverse changes in cellular state and morphology. We plan to use this assay in our work to identify subtle morphological changes between two cell populations. The NSF CAREER award specifically supported our work to use image-based profiling methods to cluster chemical treatments based on morphological similarity, valuable for our aim to develop large-scale machine learning approaches to identify subtle morphological changes between two cell populations.

Publication 5: [link] In Wawer, et al. (PNAS 2014), we contributed to the finding that a single, multiplexed microscopy assay provides sufficient information about cellular impact to enable choosing small molecules from a large set to create a smaller, performance-diverse library. An exciting outcome of this work (beyond accomplishing its scientific goal of producing better screening libraries) is the finding that image-based profiling is even more powerful than gene expression profiling. This heightens our enthusiasm for using microscopy images as a source of quantitative information about cell state.

Publication 6: [link] In Singh, et al. (J. Microsc. 2014), we validated an image processing pipeline that improves data quality when making sensitive measurements from cells in images for profiling.

Publication 7: [link] In Logan, et al. (Methods 2015), we described and validated our improved computational pipeline that distinguishes primary human hepatocytes from mouse fibroblasts growing in co-culture. We confirmed our hypothesis that a workflow using pixel-­based machine learning would show advantages in speed, user­-friendliness, and accuracy in the identification and counting of hepatocytes and fibroblasts, for high-­throughput imaging experiments.

Publication 8: [link] In Singh, et al., (PLoS One 2015), we discovered that RNA interference for genetic perturbation presents challenges when attempting to analyze the >100­ dimensional profiles that result from image-­based profiling (a.k.a. morphological profiling). We carefully documented that the Cell Painting assay (which we previously developed) yields high­-quality morphological measurements that are sensitive and quite reproducible. Unfortunately, we find that the magnitude and prevalence of off­-target effects via the RNAi seed-­based mechanism make morphological profiles of RNAi reagents targeting the same gene generally look no more similar than reagents targeting different genes. Pairs of RNAi reagents that share the same seed sequence produce image­-based profiles that are much more similar to each other than profiles from pairs designed to target the same gene.

Publication 9: [link] In Singh, et al. (JBS 2014), we systematically reviewed high-content screening experiments published in the literature. Although these image-based experiments are increasingly popular, the information content lags behind what is possible. The majority of high-content screens published so far (60−80%) made use of only one or two image-based features measured from each sample and disregarded the distribution of those features among each cell population. We discuss why this might be so, and potential solutions.

Publication 10: [link] In Shan, et al. (J Biomol Screen 2016), we applied our computational pipeline to a high-throughput screen to identify genetic perturbations (by RNAi) that could influence hepatocytes growing in co-culture. The study uncovered 12 gene products that may be important for hepatocyte viability and/ or liver identity in vitro.

Publication 11: [link] In Rohban, et al. (eLife 2017), we discovered that genes can be systematically functionally annotated using morphological profiling of cDNA expression constructs, via the microscopy-based Cell Painting assay. We discovered a connection in human cells between the Hippo and NFkB pathways, both involved in cancer initiation and progression.

Publication 12: [link] In Pawlowski et al. (BioRxiv 2016, NIPS Workshop on Machine Learning in Computational Biology 2016), we demonstrated that deep learning networks trained on natural images are useful feature extractors for classifying images treated with various small molecules, eliminating the need for classical image segmentation and hand-designed feature extraction.

Publication 13: [link] In Bray, et al. (Nat Protocols 2016), we provided technical details and troubleshooting guidance to assist researchers in implementing the Cell Painting assay initially described in Publication 4.

Publication 14: [link] In Bray, et al. (Gigascience 2017), we described a large-scale chemical screen data set including images and morphological profiles for 30,000 small molecules tested in the Cell Painting assay.

Publication 15: [link] In Meijering, et al. (Nat Biotech 2016), together with 4 senior colleagues in the field, I reviewed trends and challenges for the future of bioimage analysis.

Publication 16: [link] In Caicedo, et al. (Curr Opin Biotech 2016), my laboratory reviewed progress and successes in the field of image-based profiling, including applications in drug discovery and functional genomics.


  1. Shan J, Schwartz RE, Ross NT, Logan DJ, Thomas D, Duncan SA, North TE, Goessling W, Carpenter AE, Bhatia SN (2013). Identification of small molecules for human hepatocyte expansion and iPS differentiation. Nature Chemical Biology 9(8):514–520 / doi. PMID: 23728495 PMCID: PMC3720805 (Research article) [pdf]
  2. Hartwell KA, Miller PG, Mukherjee S, Kahn AR, Stewart AL, Logan DJ, Negri JM, Duvet M, Järås M, Puram R, Dancik V, Al-Shahrour F, Kindler T, Tothova Z, Chattopadhyay S, Hasaka T, Narayan R, Dai M, Huang C, Shterental S, Chu LP, Haydu JE, Shieh JH, Steensma DP, Munoz B, Bittker JA, Shamji AF, Clemons PA, Tolliday NJ, Carpenter AE, Gilliland DG, Stern AM, Moore MAS, Scadden DT, Schreiber SL, Ebert BL, Golub TR (2013) Niche-based screening identifies small-molecule inhibitors of leukemia stem cells. Nature Chemical Biology 9(12):840-848 / doi. PMID: 24161946 PMCID: In process (Research article) [pdf]
  3. Ljosa V, Caie PD, ter Horst R, Sokolnicki KL, Jenkins EL, Daya S, Roberts ME, Jones TR, Singh S, Genovesio A, Clemons PA, Carragher NO, Carpenter AE (2013) Comparison of Methods for Image-Based Profiling of Cellular Morphological Responses to Small-Molecule Treatment. Journal of Biomolecular Screening 18:1321-1329 / doi. PMID: 24045582 PMCID: PMC3884769 (Research article) [pdf]
  4. Gustafsdottir SM, Ljosa V, Sokolnicki KL, Wilson JA, Walpita D, Kemp MM, Seiler KP, Carrel HA, Golub TR, Schreiber SL, Clemons PA, Carpenter AE, Shamji AF (2013) Multiplex cytological profiling assay to measure diverse cellular states. PLoS One 8(12):e80999 / doi. PMID: 24312513 PMCID: PMC3847047 (Research article) [pdf]
  5. Wawer MJ, Li K, Gustafsdottir SM, Ljosa V, Bodycombe NE, Marton MA, Sokolnicki KL, Bray M-A, Kemp MM, Winchester E, Taylor B, Grant GB, Hon CSY, Duvall JR, Wilson JA, Bittker JA, Dančík V, Narayan R, Subramanian A, Winckler W, Golub TR, Carpenter AE, Shamji AF, Schreiber SL, Clemons PA (2014). Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. PNAS. 111(30):10911-10916 / doi. PMID: 25024206 PMCID: PMC4121832 (Research article) [pdf]
  6. Singh S, Bray M-A, Jones TR, Carpenter AE (2014). Pipeline for illumination correction of images for high-throughput microscopy. Journal of Microscopy 256(3):231-236 / doi. PMID: 25228240 PMCID: PMC4359755 (Research Article) [pdf]
  7. Logan DJ, Shan J, Bhatia SN, Carpenter AE (2015). Quantifying co-cultured cell phenotypes in high-throughput using pixel-based classification. Methods: In Press / doi. PMID: In Press PMCID: In Press (Research Article)
  8. Singh S, Wu X, Ljosa V, Bray MA, Piccioni F, Root DE, Doench JG, Boehm JS, Carpenter AE (2015). Morphological Profiles of RNAi-Induced Gene Knockdown Are Highly Reproducible but Dominated by Seed Effects. PLoS One 10(7):e0131370 / doi. PMID: 26197079 PMCID: PMC4511418 (Research Article) [pdf]
  9. Singh S, Carpenter AE, Genovesio A (2015). Increasing the Content of High-Content Screening: An Overview. Journal of Biomolecular Screening 19(5):640-650 / doi. PMID: 24710339 PMCID:PMC4230961 (Research Article) [pdf]
  10. Shan J, Logan DJ, Root DE, Carpenter AE, Bhatia SN (2016). High-Throughput Platform for Identifying Molecular Factors Involved in Phenotypic Stabilization of Primary Human Hepatocytes In Vitro. Journal of Biomolecular Screening. 21(9):897-911 / doi. PMID: 27650791. PMCID: in process. (Research article) [pdf]
  11. Rohban MH, Singh S, Wu X, Berthet JB, Bray M-A, Shrestha Y, Varelas X, Boehm JS, Carpenter AE (2016). Systematic morphological profiling of human gene and allele function reveals Hippo-NF-κB pathway connectivity. eLife. 10.7554/eLife.24060 / doi. [link to bioRxiv pre-print]
  12. Pawlowski N, Caicedo JC, Singh S, Carpenter AE, Storkey A (2016). Automating Morphological Profiling with Generic Deep Convolutional Networks. doi [link to bioRxiv pre-print]
  13. Bray MA, Singh S, Han H, Davis CT, Borgeson B, Hartland C, Kost-Alimova M, Gustafsdottir SM, Gibson CC, Carpenter AE (2016). Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nature Protocols. 11(9):1757-74 / doi. PMID: 27560178. PMCID: PMC5223290 (Research article) [link to bioRxiv pre-print]
  14. Bray MA, Gustafsdottir SM, Ljosa V, Singh S, Sokolnicki KL, Bittker JA, Bodycombe NE, Dancík V, Hasaka TP, Hon CS, et al (2017). A dataset of images and morphological profiles of 30,000 small-molecule treatments using the Cell Painting assay. Gigascience / doi. PMID: 28327978. [pdf]
  15. Meijering E, Carpenter AE, Peng H, Hamprecht F, Olivo-Marin, JC (2016). Imagining the future of bioimage analysis. Nature Biotechnology. 34(12)1250-55 / doi. PMID: 27926723 PMCID: in process (Research article) [pdf]
  16. Caicedo JC, Singh S, Carpenter AE (2016). Applications in image-based profiling of perturbations. Current Opinion in Biotechnology 39:134-142 / doi. PMID: 27089218. PMCID: In Process. (Review article) [pdf]