% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Dropout.R
\name{NormalizeForDropout}
\alias{NormalizeForDropout}
\title{Normalize for experimental/bioinformatic dropout of labeled RNA.}
\usage{
NormalizeForDropout(
  obj,
  normalize_across_tls = FALSE,
  grouping_factors = NULL,
  features = NULL,
  populations = NULL,
  fraction_design = NULL,
  repeatID = NULL,
  exactMatch = TRUE,
  read_cutoff = 25
)
}
\arguments{
\item{obj}{An EZbakRFractions object, which is an EZbakRData object on which
you have run \code{EstimateFractions()}.}

\item{normalize_across_tls}{If TRUE, samples from different label times will
be normalized by finding a max inferred degradation rate constant (kdeg)
sample and using that as a reference. Degradation kinetics with this max
will be assumed to infer reference fraction news at different label times}

\item{grouping_factors}{Which sample-detail columns in the metadf should be used
to group -s4U samples by for calculating the average -s4U RPM? The default value of
\code{NULL} will cause no sample-detail columns to be used.}

\item{features}{Character vector of the set of features you want to stratify
reads by and estimate proportions of each RNA population. The default of \code{NULL}
will expect there to be only one fractions table in the EZbakRFractions object.}

\item{populations}{Mutational populations that were analyzed to generate the
fractions table to use. For example, this would be "TC" for a standard
s4U-based nucleotide recoding experiment.}

\item{fraction_design}{"Design matrix" specifying which RNA populations exist
in your samples. By default, this will be created automatically and will assume
that all combinations of the \code{mutrate_populations} you have requested to analyze are
present in your data. If this is not the case for your data, then you will have
to create one manually. See docs for \code{EstimateFractions} (run ?EstimateFractions()) for more details.}

\item{repeatID}{If multiple \code{fractions} tables exist with the same metadata,
then this is the numerical index by which they are distinguished.}

\item{exactMatch}{If TRUE, then \code{features} must exactly match the \code{features}
metadata for a given fractions table for it to be used. Means that you cannot
specify a subset of features by default. Set this to FALSE if you would like
to specify a feature subset.}

\item{read_cutoff}{Minimum number of reads for a feature to be used to fit
dropout model.}
}
\value{
An \code{EZbakRData} object with the specified "fractions" table replaced
with a dropout corrected table.
}
\description{
Uses the strategy described \href{https://simonlabcode.github.io/bakR/articles/Dropout.html}{here}, and similar to that originally presented
in \href{https://pubmed.ncbi.nlm.nih.gov/38381903/}{Berg et al. 2024},
to normalize for dropout. Normalizing for dropout means identifying a reference
sample with low dropout, and estimating dropout in each sample relative to
that sample.
}
\details{
\code{NormalizeForDropout()} has a number of unique advantages relative to
\code{CorrectDropout()}:
\itemize{
\item \code{NormalizeForDropout()} doesn't require -label control data.
\item \code{NormalizeForDropout()} compares an internally normalized quantity
(fraction new) across samples, which has some advantages over the
absolute dropout estimates derived from comparisons of normalized read
counts in \code{CorrectDropout()}.
\item \code{NormalizeForDropout()} may be used to normalize half-life estimates
across very different biological contexts (e.g., different cell types).
}

There are also some caveats to be aware of when using \code{NormalizeForDropout()}:
\itemize{
\item Be careful using \code{NormalizeForDropout()} when you have multiple different
label times. Dropout normalization requires each sample be compared to a reference
sample with the same label time. Thus, normalization will be performed
separately for groups of samples with different label times. If the extent
of dropout in the references with different label times is different, there
will still be unaccounted for dropout biases between some of the samples.
\item \code{NormalizeForDropout()} effectively assumes that there are no true global
differences in turnover kinetics of RNA. If such differences actually exist
(e.g., half-lives in one context are on average truly lower than those in
another), \code{NormalizeForDropout()} risks normalizing away these real
differences. This is similar to how statistical normalization strategies
implemented in differential expression analysis software like DESeq2 assumes
that there are no global differences in RNA levels.
}

By default, all samples with same label time are normalized with respect
to a reference sample chosen from among them. If you want to further separate
the groups of samples that are normalized together, specify the columns of
your metadf by which you want to additionally group factors in the \code{grouping_factors}
parameter. This behavior can be changed by setting \code{normalize_across_tls} to
\code{TRUE}, which will
}
\examples{

# Simulate data to analyze
simdata <- EZSimulate(30)

# Create EZbakR input
ezbdo <- EZbakRData(simdata$cB, simdata$metadf)

# Estimate Fractions
ezbdo <- EstimateFractions(ezbdo)

# Normalize for dropout
ezbdo <- NormalizeForDropout(ezbdo)

}
