By contrast, consider a hypothetical case where all drugs on Plate 1 cause a G1 cell cycle arrest and all drugs on Plate 2 cause a G2 cell cycle arrest. For typical chemical and genetic screens, this criterion is met. This strategy works great if you have a random sampling of treatments on a given plate and if all plates have similar proportions of unusual/active samples. However, for multiple plate experiments you need to think carefully about how treatments are scattered across plates. This is easy to do computationally, and does a great job correcting for the many tiny "issues" encountered by each plate on its journey to the microscope (see below). So the median and mad used for normalization for a sample are based on all samples within that same plate. Our standard choice is to normalize measurements within each plate individually, also called ‘whole-plate’ normalization. How do we choose what this reference should be? While there are many different ways one could normalize, I’ll discuss three in detail below: whole-plate normalization, normalizing to negative controls, and between-plate normalization. In the above descriptions of different types of normalization, we need to compute statistics (mean, mad, median, etc.) based on a reference pool of data. It is important to think about how to apply this normalization to our measurements. How do we apply normalization to cell painting datasets? You can see the code for the function here. This form of normalization results in a scaled dataset where the median is 0 and the mad is 1. ![]() In other words, the scaled feature measurement output for each sample is based on subtracting the median value across all samples considered (as chosen below) and dividing by those samples’ median absolute deviation. In RobustMAD normalization, the median of all data points is subtracted from each data point (x) and then the result is divided by the median absolute deviation (mad) or the median difference between each datapoint x and the median of all data points: ![]() In our cell painting analysis workflows, we typically use a slightly different normalization method called “RobustMAD.” RobustMAD uses medians instead of means and is thus less impacted by outliers in the data than standardization. This is computed by taking each data point (x) and subtracting the mean value of the data (μ), then dividing by the standard deviation (s) of the data: If you’re familiar with statistics, you might recognize this as a z score. The default pycytominer method for normalization is called “standardize,” which scales the data so that the mean is 0 and the standard deviation (the average distance from each data point to the overall mean) is 1. We also normalize to reduce the variation in data caused by technical effects (more on this later). The absolute difference in centimeters looks to be larger, but really both measurements represent exactly the same difference in height, just on different scales. Consider a simple example where you measure the height of two people both in feet and in centimeters. This is important when our measurements might have very different units or scales and we want to ensure they contribute equally to analysis. Normalization has a few definitions, but we can think about it generally as stretching/adjusting data that come from different places to have a common scale, so that we can compare or combine them more accurately. For example, how do neurons respond to Drug A and in what ways is that response similar or different to how fibroblasts respond to Drug A? We’ll call this our two-cell-type experiment. Plates include similar perturbations (different concentrations of Drugs A, B, C, and D) and the main question we’d like to answer is about the differences and similarities between cell type responses to these perturbations. ![]() We’ll consider a theoretical example experiment including two plates, each with a different cell type (neurons and fibroblasts). Let’s talk about normalization: why we normalize data and how we choose a method for normalizing data in Cell Painting experiments.
0 Comments
Leave a Reply. |