% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Isoforms.R
\name{DeconvolveFractions}
\alias{DeconvolveFractions}
\title{Deconvolve multi-feature fractions.}
\usage{
DeconvolveFractions(
  obj,
  feature_type = c("gene", "isoform"),
  features = NULL,
  populations = NULL,
  fraction_design = NULL,
  repeatID = NULL,
  exactMatch = TRUE,
  fraction_name = NULL,
  quant_name = NULL,
  gene_to_transcript = NULL,
  overwrite = TRUE,
  TPM_min = 1,
  count_min = 10
)
}
\arguments{
\item{obj}{An \code{EZbakRData} object}

\item{feature_type}{Either \code{"gene"} (if deconvolving gene-level fraction news,
i.e., for resolving fusion gene and component gene fraction news) or
\code{"isoform"} (if deconvolving transcript isoform fraction news).}

\item{features}{Character vector of the set of features you want to stratify
reads by and estimate proportions of each RNA population. The default of "all"
will use all feature columns in the \code{obj}'s cB.}

\item{populations}{Mutational populations that were analyzed to generate the
fractions table to use. For example, this would be "TC" for a standard
s4U-based nucleotide recoding experiment.}

\item{fraction_design}{"Design matrix" specifying which RNA populations exist
in your samples. By default, this will be created automatically and will assume
that all combinations of the \code{mutrate_populations} you have requested to analyze are
present in your data. If this is not the case for your data, then you will have
to create one manually. See docs for \code{EstimateFractions} (run ?EstimateFractions()) for more details.}

\item{repeatID}{If multiple \code{fractions} tables exist with the same metadata,
then this is the numerical index by which they are distinguished.}

\item{exactMatch}{If TRUE, then \code{features} and \code{populations} have to exactly match
those for a given fractions table for that table to be used. Means that you can't
specify a subset of features or populations by default, since this is TRUE
by default.}

\item{fraction_name}{Name of fraction estimate table to use. Should be stored in the
\code{obj$fractions} list under this name. Can also rely on specifying \code{features} and/or \code{populations}
and having \code{EZget()} find it.}

\item{quant_name}{Name of transcript isoform quantification table to use. Should be stored
in the obj$readcounts list under this name. Use \code{ImportIsoformQuant()} to create
this table. If \code{quant_name} is \code{NULL}, it will search for tables containing the string
"isoform_quant" in their name, as that is the naming convention used by \code{ImportIsoformQuant()}.
If more than one such table exists, an error will be thrown and you will have to specify
the exact name in \code{quant_name}.}

\item{gene_to_transcript}{Table with columns \code{transcript_id} and all feature related
columns that appear in the relevant fractions table. This is only relevant as a hack to
to deal with the case where STAR includes in its transcriptome alignment transcripts
on the opposite strand from where the RNA actually originated. This table will be used
to filter out such transcript-feature combinations that should not exist.}

\item{overwrite}{If TRUE and a fractions estimate output already exists that
would possess the same metadata (features analyzed, populations analyzed,
and fraction_design), then it will get overwritten with the new output. Else,
it will be saved as a separate output with the same name + "_#" where "#" is a
numerical ID to distinguish the similar outputs.}

\item{TPM_min}{Minimum TPM for a transcript to be kept in analysis.}

\item{count_min}{Minimum expected_count for a transcript to be kept in analysis.}
}
\value{
An \code{EZbakRData} object with an additional table under the "fractions"
list. Has the same form as the output of \code{EstimateFractions()}, and will have the
feature column "transcript_id".
}
\description{
Combines the output of \code{EstimateFractions} with feature
quantification performed by an outside tool (e.g., RSEM, kallisto, salmon, etc.)
to infer fraction news for features that reads cannot always be assigned to
unambiguously. This is a generalization of \code{EstimateIsoformFractions}
which performs this deconvolution for transcript isoforms.
}
\details{
\code{DeconvolveFractions} expects as input a "fractions" table with estimates
for fraction news of at least one convolved feature set. A convolved feature
set is one where some reads cannot be unambiguously assigned to one instance
of that feature type. For example, it is often impossible to assign short
reads to a single transcript isoform. Thus, something like the "TEC" assignment
provided by fastq2EZbakR is an instance of a convolved feature set, as it
is an assignment of reads to transcript isoforms with which they are compatible.
Another example is assignment to the exonic regions of genes, for fusion genes
(where a read may be consistent with both the putative fusion gene as well
as one of the fusion components).

\code{DeconvolveFractions} deconvolves fraction news
by fitting a linear mixing model to the convolved fraction new estimates +
feature abundance estimates. In other words, each convolved fraction new (data) is modeled
as a weighted average of single feature fraction news (parameters to estimate),
with the weights determined by the relative abundances of the features
in the convolved set (data). The convolved fraction new is modeled as a beta distribution with mean
equal to the weighted feature fraction new average and variance related
to the number of reads in the convolved feature set.

Features with estimated TPMs less than \code{TPM_min} (1 by default) or an estimated number of read
counts less than \code{count_min} (10 by default) are removed from convolved feature sets and will
not have their fraction news estimated.
}
\examples{

# Load dependencies
library(dplyr)

# Simulates a single sample worth of data
simdata_iso <- SimulateIsoforms(nfeatures = 30)

# We have to manually create the metadf in this case
metadf <- tibble(sample = 'sampleA',
                     tl = 4,
                     condition = 'A')

ezbdo <- EZbakRData(simdata_iso$cB,
                    metadf)

ezbdo <- EstimateFractions(ezbdo)

### Hack in the true, simulated isoform levels
reads <- simdata_iso$ground_truth \%>\%
  dplyr::select(transcript_id, true_count, true_TPM) \%>\%
  dplyr::mutate(sample = 'sampleA',
                effective_length = 10000) \%>\%
  dplyr::rename(expected_count = true_count,
                TPM = true_TPM)

# Name of table needs to have "isoform_quant" in it
ezbdo[['readcounts']][['simulated_isoform_quant']] <- reads

### Perform deconvolution
ezbdo <- DeconvolveFractions(ezbdo, feature_type = "isoform")

}
