% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utils.hamming.r
\name{utils.hamming}
\alias{utils.hamming}
\title{Calculates the Hamming distance between two DArT trimmed DNA sequences}
\usage{
utils.hamming(str1, str2, r = 4)
}
\arguments{
\item{str1}{String containing the first sequence [required].}

\item{str2}{String containing the second sequence [required].}

\item{r}{Number of bases in the restriction enzyme recognition sequence
[default 4].}
}
\value{
Hamming distance between the two strings
}
\description{
WARNING: UTILITY SCRIPTS ARE FOR INTERNAL USE ONLY AND SHOULD NOT BE USED BY END USERS AS THEIR USE OUT OF CONTEXT COULD LEAD TO UNPREDICTABLE OUTCOMES.
The algorithm is that of Johann de Jong
\url{https://johanndejong.wordpress.com/2015/10/02/faster-hamming-distance-in-r-2/}
}
\details{
Hamming distance is calculated as the number of base differences between two
sequences which can be expressed as a count or a proportion. Typically, it is
calculated between two sequences of equal length. In the context of DArT
trimmed sequences, which differ in length but which are anchored to the left
by the restriction enzyme recognition sequence, it is sensible to compare the
two trimmed sequences starting from immediately after the common recognition
sequence and terminating at the last base of the shorter sequence.
The Hamming distance between the rows of a matrix can be computed quickly
by exploiting the fact that the dot product of two binary vectors x and (1-y)
counts the corresponding elements that are different between x and y.
This matrix multiplication can also be used for matrices with more than two
possible values, and different types of elements, such as DNA sequences.
The function calculates the Hamming distance between all columns of a
matrix X, or two matrices X and Y. Again matrix multiplication is used, this
time for counting, between two columns x and y, the number of cases in which
corresponding elements have the same value (e.g. A, C, G or T). This counting
is done for each of the possible values individually, while iteratively adding
the results. The end result of the iterative adding is the sum of all
corresponding elements that are the same, i.e. the inverse of the Hamming
distance. Therefore, the last step is to subtract this end result H from the
maximum possible distance, which is the number of rows of matrix X.
If the two DNA sequences are of differing length, the longer is truncated. The
initial common restriction enzyme recognition sequence is ignored.
}
\seealso{
Other utilities: 
\code{\link{gl.alf}()},
\code{\link{utils.check.datatype}()},
\code{\link{utils.collapse.matrix}()},
\code{\link{utils.dart2genlight}()},
\code{\link{utils.dist.binary}()},
\code{\link{utils.flag.start}()},
\code{\link{utils.het.pop}()},
\code{\link{utils.impute}},
\code{\link{utils.is.fixed}()},
\code{\link{utils.jackknife}()},
\code{\link{utils.n.var.invariant}()},
\code{\link{utils.plot.save}()},
\code{\link{utils.read.fasta}()},
\code{\link{utils.read.ped}()},
\code{\link{utils.recalc.avgpic}()},
\code{\link{utils.recalc.callrate}()},
\code{\link{utils.recalc.freqhets}()},
\code{\link{utils.recalc.freqhomref}()},
\code{\link{utils.recalc.freqhomsnp}()},
\code{\link{utils.recalc.maf}()},
\code{\link{utils.reset.flags}()},
\code{\link{utils.transpose}()},
\code{\link{utils.vcfr2genlight.polyploid}()}
}
\author{
Custodian: Arthur Georges (Post to
 \url{https://groups.google.com/d/forum/dartr})
}
\concept{utilities}
