Type: Package
Title: Biomonitoring and Bioassessment Calculations
Version: 1.2.4
Maintainer: Erik W. Leppo <Erik.Leppo@tetratech.com>
Description: An aid for manipulating data associated with biomonitoring and bioassessment. Calculations include metric calculation, marking of excluded taxa, subsampling, and multimetric index calculation. Targeted communities are benthic macroinvertebrates, fish, periphyton, and coral. As described in the Revised Rapid Bioassessment Protocols (Barbour et al. 1999) https://archive.epa.gov/water/archive/web/html/index-14.html.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (≥ 3.5)
Imports: dplyr, maps, rlang, stats, tidyselect, tidyr
Suggests: DataExplorer, DT, ggplot2, knitr, lazyeval, readxl, reshape2, rmarkdown, testthat, shiny, shinydashboard, shinydashboardPlus, shinyjs, shinyWidgets, utils, writexl, shinyalert
URL: https://github.com/leppott/BioMonTools
BugReports: https://github.com/leppott/BioMonTools/issues
VignetteBuilder: knitr
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2025-10-06 18:17:29 UTC; Erik.Leppo
Author: Erik W. Leppo ORCID iD [aut, cre], Jen Stamp [ctb], John van Sickles [ctb], Ben Block [ctb]
Repository: CRAN
Date/Publication: 2025-10-09 12:40:02 UTC

Taxa Observation Maps

Description

Map taxonomic observations from a data frame. Input a dataframe with SampID, TaxaID, TaxaCount, Latitude, and Longitude. Other arguments are format (jpg vs. pdf), file name prefix, and output directory. Files are saved with the prefix "map.taxa." by default.

Usage

MapTaxaObs(
  df_obs,
  SampID,
  TaxaID,
  TaxaCount,
  Lat,
  Long,
  output_dir = NULL,
  output_prefix = "maps.taxa",
  output_type = "pdf",
  database,
  regions,
  map_grp = NULL,
  leg_loc = "right",
  verbose = FALSE,
  ...
)

Arguments

df_obs

Observation data frame

SampID

df_obs column name, Unique Sample identifier

TaxaID

df_obs column name, Unique Taxa identifier

TaxaCount

df_obs column name, Number of individuals for TaxonID and SampID

Lat

df_obs column name, Latitude

Long

df_obs column name, Longitude

output_dir

Directory to save output. Default is working directory.

output_prefix

Prefix to TaxaID for each file. Default = "map.taxa."

output_type

File format for output; jpg or pdf.

database

maps::map function database; world, usa, state, county

regions

maps::map function regions. Names pertinent to map_db.

map_grp

Map grouping variable from df_obs. Will generate legend and color code the points on the map. Default = NULL

leg_loc

Legend location text. Default = "right" Other values may not work properly.

verbose

Boolean value for if status messages are output to the console. Default = FALSE

...

Optional arguments to be passed to methods.

Details

The user will pass arguments for maps::map function that is used for the map. For example, 'database' and 'regions'. Without these arguments no map will be created.

The map will have all points and colored points for each taxon. In addition the map will include the number of samples by taxon.

The example data is fish but can be used for benthic macroinvertebrates as well.

If use grouping variable colors are from grDevices::rainbow()

Jpg file names replace all non-alphanumeric characters with "_".

The R package maps is required for this function.

Value

Taxa maps to user defined directory as jpg or pdf.

Examples

df_obs     <- data_Taxa_MA
SampID     <- "estuary"
TaxaID     <- "TaxaName"
TaxaCount  <- "Count"
Lat        <- "Latitude"
Long       <- "Longitude"
output_dir <- tempdir()
output_prefix <- "maps.taxa."
output_type   <- "pdf"

myDB     <- "state"
myRegion <- "massachusetts"
myXlim   <- c(-(73+(30/60)), -(69+(56/60)))
myYlim   <- c((41+(14/60)),(42+(53/60)))

# Run function with extra arguments for map
MapTaxaObs(df_obs[1:500, ],
           SampID,
           TaxaID,
           TaxaCount,
           Lat,
           Long,
           output_dir,
           output_prefix,
           output_type,
           database = "state",
           regions = "massachusetts",
           map_grp = "estuary",
           leg_loc = "bottomleft",
           xlim = myXlim,
           ylim = myYlim,
           verbose = FALSE)

TaxaMaster_Ben_BCG_PacNW

Description

Example data

Usage

TaxaMaster_Ben_BCG_PacNW

Format

A data frame with 684 observations on the following 20 variables.

TaxaID

a character vector

Phylum

a character vector

SubPhylum

a character vector

Class

a character vector

SubClass

a character vector

Order

a character vector

SuperFamily

a character vector

Family

a character vector

Tribe

a character vector

Genus

a character vector

SubGenus

a character vector

Species

a character vector

BCG_Attr

a character vector

NonTarget

a logical vector

Thermal_Indicator

a character vector

Long_Lived

a character vector

FFG

a character vector

Habit

a character vector

Life_Cycle

a character vector

TolVal

a numeric vector

Source

example master taxa from BCG Pacific Northwest


Assign Index_Class

Description

Assign Index_Class for based on user input fields. If use the same name of an existing field the information will be overwritten.

Multiple criteria are treated as "AND" so all must be met to be assigned to a particular Index_Class.

Internally uses 'tidyr' and 'dplyr'

If Index_Class is included in data then it is renamed Index_Class_Orig and returned in the output data frame.

Usage

assign_IndexClass(
  data,
  criteria,
  name_indexclass = "INDEX_CLASS",
  name_indexname = "INDEX_NAME",
  name_siteid = "SITEID",
  data_shape = "WIDE"
)

Arguments

data

Data frame (wide format) with metric values to be evaluated.

criteria

Data frame of metric thresholds to check.

name_indexclass

Name for new Index_Class column. Default = INDEX_CLASS

name_indexname

Name for Index Name column. Default = INDEX_NAME

name_siteid

Name for Site ID column. Default = SITEID

data_shape

Shape of data; wide or long. Default is 'wide'

Details

Requires use of reference file with criteria.

Value

Returns a data frame with new column added.

Examples

# Packages
library(readxl)

# EXAMPLE 1
# Create Example Data
df_data <- data.frame(SITEID = paste0("Site_", LETTERS[1:10]),
                      INDEX_NAME = "BCG_MariNW_Bugs500ct",
                      GRADIENT = round(stats::runif(10, 0.5, 1.5), 1),
                      ELEVATION = round(stats::runif(10, 700, 800), 1))

# Import Checks
df_criteria <- read_excel(system.file("extdata/IndexClass.xlsx",
                                      package = "BioMonTools"),
                          sheet = "Index_Class")

# Run Function
df_results <- assign_IndexClass(df_data, df_criteria, "INDEX_CLASS")

# Results
df_results


Estuary taxa data

Description

A dataset with example fish taxa data and locations for mapping.

Usage

data_Taxa_MA

Format

A data frame with 2,675 observations on the following 15 variables.

estuary

a factor with levels BOSTON HARBOR BUZZARDS BAY CAPE COD BAY MASSACHUSETTS BAY WAQUOIT BAY

CommonName

a factor with levels ALEWIFE AMERICAN EEL AMERICAN LOBSTER AMERICAN PLAICE AMERICAN SAND LANCE AMERICAN SHAD ATLANTIC COD ATLANTIC CROAKER ATLANTIC HERRING ATLANTIC MACKEREL ATLANTIC MENHADEN ATLANTIC ROCK CRAB ATLANTIC SALMON ATLANTIC STINGRAY ATLANTIC STURGEON ATLANTIC TOMCOD BAY ANCHOVY BAY SCALLOP BLACK DRUM BLACK SEA BASS BLUE CRAB BLUE MUSSEL BLUEBACK HERRING BLUEFISH BROWN SHRIMP BUTTERFISH CHANNEL CATFISH COWNOSE RAY CUNNER DAGGERBLADE GRASS SHRIMP EASTERN OYSTER FOURSPINE STICKLEBACK GOBIES GREEN CRAB GREEN SEA URCHIN GRUBBY HADDOCK HOGCHOKER JONAH CRAB KILLIFISHES LONGHORN SCULPIN MULLETS MUMMICHOG NINESPINE STICKLEBACK NORTHERN KINGFISH NORTHERN PIPEFISH NORTHERN SEAROBIN NORTHERN SHRIMP OCEAN POUT OYSTER TOADFISH PINFISH POLLOCK QUAHOG RAINBOW SMELT RED DRUM RED HAKE ROCK GUNNEL SCUP SEA SCALLOP SEVENSPINE BAY SHRIMP SHEEPSHEAD MINNOW SHORTHORN SCULPIN SHORTNOSE STURGEON SILVER HAKE SILVERSIDES SKATES SMOOTH FLOUNDER SOFTSHELL CLAM SPINY DOGFISH SPOT SPOTTED SEATROUT STRIPED BASS SUMMER FLOUNDER TAUTOG THREESPINE STICKLEBACK WEAKFISH WHITE HAKE WHITE PERCH WINDOWPANE FLOUNDER WINTER FLOUNDER YELLOW PERCH YELLOWTAIL FLOUNDER

LifeStage

a factor with levels ADULTS EGGS JUVENILES LARVAE MATING PARTURITION SPAWNING

SalZone

a factor with levels >25 ppt 0.5-25 ppt

Winter

a numeric vector

Spring

a numeric vector

Summer

a numeric vector

Fall

a numeric vector

All

a numeric vector

TaxaName

Taxa Names for mapping

State

a factor with levels MA

Latitude

a numeric vector

Longitude

a numeric vector

Count

a numeric vector

PctDensity

a numeric vector

Source

example data


Benthic macroinvertebrate taxa data; MBSS

Description

A data set with example benthic macroinvertebrate data. Calculate metrics then statistics. Data from MBSS.

Usage

data_benthos_MBSS

Format

A data frame with 5,666 observations on the following 40 variables.

INDEX_NAME

a character vector

SAMPLEID

a character vector

DATE

a character vector

TAXAID

a character vector

N_TAXA

a numeric vector, count

N_GRIDS

a numeric vector, number of grids in subsample (max = 30)

EXCLUDE

a character vector, whether taxon should be excluded from taxa richness metrics

INDEX_CLASS

a character vector, index region

Phylum

a character vector

Class

a character vector

Order

a character vector

Family

a character vector

Genus

a character vector

Other_Taxa

a character vector

Tribe

a character vector

FFG

a character vector

FAM_TV

a numeric vector

Habit

a character vector

TOLVAL

a numeric vector

TOLVAL2

a numeric vector

UFC

a numeric vector

UFC_Comment

a character vector

SUBPHYLUM

a character vector

SUBCLASS

a character vector

INFRAORDER

a character vector

SUBFAMILY

a character vector

LIFE_CYCLE

a character vector

BCG_ATTR

a character vector

THERMAL_INDICATOR

a character vector

LONGLIVED

a character vector

NOTEWORTHY

a character vector

FFG2

a character vector

HABITAT

a character vector

ELEVATION_ATTR

a character vector

GRADIENT_ATTR

a character vector

WSAREA_ATTR

a character vector

HABSTRUCT

a character vector

BCG_ATTR2

a character vector

NONTARGET

a logical vector

AIRBREATHER

a logical vector

Source

example data from MBSS


Benthic macroinvertebrate taxa data; Pacific Northwest

Description

A dataset with example (demonstration only) taxa data and attributes for calculating metric values.

This dataset is an example only. DO NOT USE for any analyses.

Usage

data_benthos_PacNW

Format

A data frame with 598 observations on the following 38 variables.

INDEX_NAME

a character vector

INDEX_CLASS

a character vector

SampleID

a character vector

TaxaID

a character vector

N_TAXA

a numeric vector

Exclude

a logical vector

NonTarget

a logical vector

Phylum

a character vector

Class

a character vector

Order

a character vector

Family

a character vector

Subfamily

a character vector

Tribe

a character vector

Genus

a character vector

BCG_Attr

a numeric vector

Thermal_Indicator

a character vector

FFG

a character vector

Clinger

a character vector

LongLived

a logical vector

Noteworthy

a logical vector

Habitat

a character vector

SubPhylum

a character vector

InfraOrder

a character vector

Habit

a logical vector

Life_Cycle

a logical vector

TolVal

a logical vector

FFG2

a logical vector

TolVal2

a logical vector

UFC

a character vector

UFC_Comment

a numeric vector

SubClass

a character vector

Elevation_Attr

a character vector

Gradient_Attr

a character vector

WSArea_Attr

a character vector

HabStruct

a character vector

BCG_Attr2

a character vector

AirBreather

a logical vector

Source

example data


rarify example data

Description

A dataset with example benthic macroinvertebrate data (600 count) to be used with the rarify function. Includes 12 samples.

Usage

data_bio2rarify

Format

A data frame with 223 rows and 28 variables:

SampleID

Sample ID

TaxaID

unique taxonomic identifier

N_Taxa

number of individuals in sample

Source

example data


Coral taxa data; Florida BCG.

Description

A data set with example coral data. Calculate metrics. Data from Florida BCG providers.

Usage

data_coral_bcg_metric_dev

Format

A data frame with 2138 observations on the following 25 variables.

DataSource

a character vector

SampleID

a character vector

TotTranLngth_m

a numeric vector

SampDate

a Date

TAXAID

a character vector

CommonName

a character vector

Juvenile

a logical vector

DiamMax_cm

a numeric vector

DiamPerp_cm

a numeric vector

Height_cm

a numeric vector

TotMort_pct

a numeric vector

BCG_ATTR

a character vector

Weedy

a character vector

LRBC

a logical vector

MorphConvFact

a numeric vector

Phylum

a character vector

Class

a character vector

SubClass

a character vector

Order

a character vector

Family

a character vector

Genus

a character vector

SubGenus

a character vector

Species

a character vector

INDEX_NAME

a character vector

INDEX_CLASS

a character vector

Source

example coral data from Florida BCG


Coral metric value data; Florida BCG.

Description

A data set with coral metric value data. Used to compare to metric value calculations. Data from Florida BCG providers.

Usage

data_coral_bcg_metric_qc

Format

A data frame with 100 observations on the following 19 variables.

SAMPLEID

a character vector

INDEX_NAME

a character vector

INDEX_CLASS

a character vector

transect_area_m2

a numeric vector

ncol_total

a numeric vector

lcol_total

a numeric vector

nt_total

a numeric vector

ncol_Acropora

a numeric vector

ncol_AcroOrbi_m2

a numeric vector

pcol_Acropora

a numeric vector

nt_BCG_att123

a numeric vector

nt_BCG_att1234

a numeric vector

nt_BCG_att5

a numeric vector

pt_BCG_att5

a numeric vector

LCSA3D_samp_m2

a numeric vector

LCSA3D_BCG_att1234_m2

a numeric vector

LCSA3D_LRBC_m2

a numeric vector

ncol_SmallWeedy

a numeric vector

pcol_SmallWeedy

a numeric vector

Source

example coral metric results from Florida BCG


Diatom taxa data; Indiana DEM

Description

A data set with example diatom data. Calculate metrics. Data from IDEM.

Usage

data_diatom_mmi_dev

Format

A data frame with 24797 observations on the following 38 variables.

INDEX_NAME

a character vector

INDEX_CLASS

a character vector

STATIONID

a character vector

COLLDATE

a Date

SAMPLEID

a character vector

TAXAID

a character vector

EXCLUDE

a logical vector

NONTARGET

a logical vector

N_TAXA

a numeric vector

ORDER

a character vector

FAMILY

a character vector

GENUS

a character vector

BC_USGS

a character vector

TROPHIC_USGS

a character vector

SAP_USGS

a character vector

PT_USGS

a character vector

O_USGS

a character vector

SALINITY_USGS

a character vector

BAHLS_USGS

a character vector

P_USGS

a character vector

N_USGS

a character vector

HABITAT_USGS

a character vector

N_FIXER_USGS

a character vector

MOTILITY_USGS

a character vector

SIZE_USGS

a character vector

HABIT_USGS

a character vector

MOTILE2_USGS

a character vector

TOLVAL

a numeric vector

DIATOM_ISA

a character vector

DIAT_CL

a numeric vector

POLL_TOL

a numeric vector

BEN_SES

a numeric vector

DIATAS_TP

a numeric vector

DIATAS_TN

a numeric vector

DIAT_COND

a numeric vector

DIAT_CA

a numeric vector

MOTILITY

a numeric vector

NF

a numeric vector

#'

PHYLUM

a character vector

Source

example data from IDEM


Diatom metric value data; Indiana DEM

Description

A data set with diatom metric value data. Used to compare to metric value calculations. Data from IDEM.

Usage

data_diatom_mmi_qc

Format

A data frame with 497 observations on the following 250 variables.

SAMPLEID

a character vector

INDEX_NAME

a character vector

INDEX_CLASS

a character vector

ni_total

a numeric vector

li_total

a numeric vector

nt_total

a numeric vector

nt_Achnan_Navic

a numeric vector

nt_LOW_N

a numeric vector

nt_HIGH_N

a numeric vector

nt_LOW_P

a numeric vector

nt_HIGH_P

a numeric vector

nt_BC_1

a numeric vector

nt_BC_2

a numeric vector

nt_BC_3

a numeric vector

nt_BC_4

a numeric vector

nt_BC_5

a numeric vector

nt_BC_12

a numeric vector

nt_BC_45

a numeric vector

nt_PT_1

a numeric vector

nt_PT_2

a numeric vector

nt_PT_3

a numeric vector

nt_PT_4

a numeric vector

nt_PT_5

a numeric vector

nt_PT_12

a numeric vector

nt_SALINITY_1

a numeric vector

nt_SALINITY_2

a numeric vector

nt_SALINITY_3

a numeric vector

nt_SALINITY_4

a numeric vector

nt_SALINITY_12

a numeric vector

nt_SALINITY_34

a numeric vector

nt_O_1

a numeric vector

nt_O_2

a numeric vector

nt_O_3

a numeric vector

nt_O_4

a numeric vector

nt_O_5

a numeric vector

nt_O_345

a numeric vector

nt_SESTONIC_HABIT

a numeric vector

nt_BENTHIC_HABIT

a numeric vector

nt_BAHLS_1

a numeric vector

nt_BAHLS_2

a numeric vector

nt_BAHLS_3

a numeric vector

nt_TROPHIC_1

a numeric vector

nt_TROPHIC_2

a numeric vector

nt_TROPHIC_3

a numeric vector

nt_TROPHIC_4

a numeric vector

nt_TROPHIC_5

a numeric vector

nt_TROPHIC_6

a numeric vector

nt_TROPHIC_7

a numeric vector

nt_TROPHIC_456

a numeric vector

nt_SAP_1

a numeric vector

nt_SAP_2

a numeric vector

nt_SAP_3

a numeric vector

nt_SAP_4

a numeric vector

nt_SAP_5

a numeric vector

nt_NON_N_FIXER

a numeric vector

nt_N_FIXER

a numeric vector

nt_HIGHLY_MOTILE

a numeric vector

nt_MODERATELY_MOTILE

a numeric vector

nt_NON_MOTILE

a numeric vector

nt_SLIGHTLY_MOTILE

a numeric vector

nt_WEAKLY_MOTILE

a numeric vector

nt_BIG

a numeric vector

nt_SMALL

a numeric vector

nt_MEDIUM

a numeric vector

nt_VERY_BIG

a numeric vector

nt_VERY_SMALL

a numeric vector

nt_ADNATE

a numeric vector

nt_STALKED

a numeric vector

nt_HIGHLY_MOTILE.1

a numeric vector

nt_ARAPHID

a numeric vector

nt_DIAT_CL_1

a numeric vector

nt_DIAT_CL_2

a numeric vector

nt_BEN_SES_1

a numeric vector

nt_BEN_SES_2

a numeric vector

nt_DIAT_CA_1

a numeric vector

nt_DIAT_CA_2

a numeric vector

nt_DIAT_COND_1

a numeric vector

nt_DIAT_COND_2

a numeric vector

nt_DIATAS_TN_1

a numeric vector

nt_DIATAS_TN_2

a numeric vector

nt_DIATAS_TP_1

a numeric vector

nt_DIATAS_TP_2

a numeric vector

nt_MOTILITY_1

a numeric vector

nt_MOTILITY_2

a numeric vector

nt_NF_1

a numeric vector

nt_NF_2

a numeric vector

pi_Achnan_Navic

a numeric vector

pi_HIGH_N

a numeric vector

pi_LOW_N

a numeric vector

pi_HIGH_P

a numeric vector

pi_LOW_P

a numeric vector

pi_BC_1

a numeric vector

pi_BC_2

a numeric vector

pi_BC_3

a numeric vector

pi_BC_4

a numeric vector

pi_BC_5

a numeric vector

pi_PT_1

a numeric vector

pi_PT_2

a numeric vector

pi_PT_3

a numeric vector

pi_PT_4

a numeric vector

pi_PT_5

a numeric vector

pi_PT_45

a numeric vector

pi_SALINITY_1

a numeric vector

pi_SALINITY_2

a numeric vector

pi_SALINITY_3

a numeric vector

pi_SALINITY_4

a numeric vector

pi_O_1

a numeric vector

pi_O_2

a numeric vector

pi_O_3

a numeric vector

pi_O_4

a numeric vector

pi_O_5

a numeric vector

pi_SESTONIC_HABIT

a numeric vector

pi_BENTHIC_HABIT

a numeric vector

pi_BAHLS_1

a numeric vector

pi_BAHLS_2

a numeric vector

pi_BAHLS_3

a numeric vector

pi_TROPHIC_1

a numeric vector

pi_TROPHIC_2

a numeric vector

pi_TROPHIC_3

a numeric vector

pi_TROPHIC_4

a numeric vector

pi_TROPHIC_5

a numeric vector

pi_TROPHIC_6

a numeric vector

pi_TROPHIC_7

a numeric vector

pi_SAP_1

a numeric vector

pi_SAP_2

a numeric vector

pi_SAP_3

a numeric vector

pi_SAP_4

a numeric vector

pi_SAP_5

a numeric vector

pi_NON_N_FIXER

a numeric vector

pi_N_FIXER

a numeric vector

pi_HIGHLY_MOTILE

a numeric vector

pi_MODERATELY_MOTILE

a numeric vector

pi_NON_MOTILE

a numeric vector

pi_SLIGHTLY_MOTILE

a numeric vector

pi_WEAKLY_MOTILE

a numeric vector

pi_BIG

a numeric vector

pi_SMALL

a numeric vector

pi_MEDIUM

a numeric vector

pi_VERY_BIG

a numeric vector

pi_VERY_SMALL

a numeric vector

pi_ADNATE

a numeric vector

pi_STALKED

a numeric vector

pi_HIGHLY_MOTILE.1

a numeric vector

pi_ARAPHID

a numeric vector

pi_DIAT_CL_1

a numeric vector

pi_DIAT_CL_1_ASSR

a numeric vector

pi_DIAT_CL_2

a numeric vector

pi_BEN_SES_1

a numeric vector

pi_BEN_SES_2

a numeric vector

pi_DIAT_CA_1

a numeric vector

pi_DIAT_CA_2

a numeric vector

pi_DIAT_COND_1

a numeric vector

pi_DIAT_COND_2

a numeric vector

pi_DIATAS_TN_1

a numeric vector

pi_DIATAS_TN_2

a numeric vector

pi_DIATAS_TP_1

a numeric vector

pi_DIATAS_TP_2

a numeric vector

pi_MOTILITY_1

a numeric vector

pi_MOTILITY_2

a numeric vector

pi_NF_1

a numeric vector

pi_NF_2

a numeric vector

pt_Achnan_Navic

a numeric vector

pt_HIGH_N

a numeric vector

pt_LOW_N

a numeric vector

pt_HIGH_P

a numeric vector

pt_LOW_P

a numeric vector

pt_BC_1

a numeric vector

pt_BC_2

a numeric vector

pt_BC_3

a numeric vector

pt_BC_4

a numeric vector

pt_BC_5

a numeric vector

pt_BC_12

a numeric vector

pt_BC_45

a numeric vector

pt_PT_1

a numeric vector

pt_PT_2

a numeric vector

pt_PT_3

a numeric vector

pt_PT_4

a numeric vector

pt_PT_5

a numeric vector

pt_PT_12

a numeric vector

pt_SALINITY_1

a numeric vector

pt_SALINITY_2

a numeric vector

pt_SALINITY_3

a numeric vector

pt_SALINITY_4

a numeric vector

pt_SALINITY_34

a numeric vector

pt_O_1

a numeric vector

pt_O_2

a numeric vector

pt_O_3

a numeric vector

pt_O_4

a numeric vector

pt_O_5

a numeric vector

pt_O_345

a numeric vector

pt_SESTONIC_HABIT

a numeric vector

pt_BENTHIC_HABIT

a numeric vector

pt_BAHLS_1

a numeric vector

pt_BAHLS_2

a numeric vector

pt_BAHLS_3

a numeric vector

pt_TROPHIC_1

a numeric vector

pt_TROPHIC_2

a numeric vector

pt_TROPHIC_3

a numeric vector

pt_TROPHIC_4

a numeric vector

pt_TROPHIC_5

a numeric vector

pt_TROPHIC_6

a numeric vector

pt_TROPHIC_7

a numeric vector

pt_TROPHIC_456

a numeric vector

pt_SAP_1

a numeric vector

pt_SAP_2

a numeric vector

pt_SAP_3

a numeric vector

pt_SAP_4

a numeric vector

pt_SAP_5

a numeric vector

pt_NON_N_FIXER

a numeric vector

pt_N_FIXER

a numeric vector

pt_HIGHLY_MOTILE

a numeric vector

pt_MODERATELY_MOTILE

a numeric vector

pt_NON_MOTILE

a numeric vector

pt_SLIGHTLY_MOTILE

a numeric vector

pt_WEAKLY_MOTILE

a numeric vector

pt_BIG

a numeric vector

pt_SMALL

a numeric vector

pt_MEDIUM

a numeric vector

pt_VERY_BIG

a numeric vector

pt_VERY_SMALL

a numeric vector

pt_ADNATE

a numeric vector

pt_STALKED

a numeric vector

pt_HIGHLY_MOTILE.1

a numeric vector

pt_ARAPHID

a numeric vector

pt_DIAT_CL_1

a numeric vector

pt_DIAT_CL_2

a numeric vector

pt_BEN_SES_1

a numeric vector

pt_BEN_SES_2

a numeric vector

pt_DIAT_CA_1

a numeric vector

pt_DIAT_CA_2

a numeric vector

pt_DIAT_COND_1

a numeric vector

pt_DIAT_COND_2

a numeric vector

pt_DIATAS_TN_1

a numeric vector

pt_DIATAS_TN_2

a numeric vector

pt_DIATAS_TP_1

a numeric vector

pt_DIATAS_TP_2

a numeric vector

pt_MOTILITY_1

a numeric vector

pt_MOTILITY_2

a numeric vector

pt_NF_1

a numeric vector

pt_NF_2

a numeric vector

nt_Sens_810

a numeric vector

nt_RefIndicators

a numeric vector

nt_Tol_13

a numeric vector

pi_Sens_810

a numeric vector

pi_RefIndicators

a numeric vector

pi_Tol_13

a numeric vector

pt_Sens_810

a numeric vector

pt_RefIndicators

a numeric vector

pt_Tol_13

a numeric vector

wa_POLL_TOL

a numeric vector

Source

example metric value data from IDEM


Fish data, MBSS

Description

A dataset with example fish taxa data for metric calculation.

Usage

data_fish_MBSS

Format

A data frame with 1694 observations on the following 30 variables.

SAMPLEID

a character vector

TAXAID

a character vector

N_TAXA

a numeric vector

TYPE

a character vector

TOLER

a character vector

NATIVE

a character vector

TROPHIC

a character vector

SILT

a character vector

INDEX_CLASS

a character vector

SAMP_LENGTH_M

a numeric vector

SAMP_WIDTH_M

a numeric vector

SAMP_BIOMASS

a numeric vector

INDEX_NAME

a character vector

EXCLUDE

a logical vector

BCG_ATTR

a character vector

#'

DA_MI2

a numeric vector

N_ANOMALIES

a numeric vector

FAMILY

a character vector

GENUS

a character vector

THERMAL_INDICATOR

a character vector

ELEVATION_ATTR

a character vector

GRADIENT_ATTR

a character vector

WSAREA_ATTR

a character vector

REPRODUCTION

a character vector

HABITAT

a character vector

CONNECTIVITY

a logical vector

SCC

a logical vector

HYBRID

a logical vector

BCGATTR2

a character vector

TOLVAL2

a numeric vector

Source

example data


data_metval_scmb_ibi

Description

Example data metrics

@format A data frame with 20 observations on the following 13 variables.

INDEX_NAME

a character vector

INDEX_REGION

a character vector

SampID

a character vector

nt_total

a numeric vector

nt_Mol

a numeric vector

ni_Noto

a numeric vector

pi_intol

a numeric vector

qc_nt_total

a numeric vector

qc_nt_Mol

a numeric vector

qc_ni_Noto

a numeric vector

qc_pi_intol

a numeric vector

qc_sum

a numeric vector

qc_nar

a character vector

Usage

data_metval_scmb_ibi

Format

An object of class data.frame with 20 rows and 13 columns.

Source

example data


Metric data for metric stats for mmi development

Description

A data set with example benthic macroinvertebrate data. Calculate metrics then statistics.

Usage

data_mmi_dev

Format

A data frame with 10,574 observations on the following 34 variables.

Class

a character vector

Ref_v1

a character vector

CalVal_Class4

a character vector

Unique_ID

a character vector

BenSampID

a character vector

CollDate

a character vector

CollMeth

a character vector

TaxaID

a character vector

Individuals

a numeric vector

Exclude

a logical vector

NonTarget

a character vector

Phylum

a character vector

Benthic_MasterTaxa.Class

a character vector

Order

a character vector

Family

a character vector

Subfamily

a character vector

Tribe

a character vector

Genus

a character vector

TolVal

a character vector

FFG

a character vector

Habit

a character vector

INDEX_NAME

a character vector

SUBPHYLUM

a character vector

CLASS

a character vector

SUBCLASS

a character vector

INFRAORDER

a character vector

LIFE_CYCLE

a character vector

BCG_ATTR

a character vector

THERMAL_INDICATOR

a character vector

LONGLIVED

a character vector

NOTEWORTHY

a character vector

FFG2

a character vector

TOLVAL2

a character vector

HABITAT

a numeric vector

Source

example data


Metric data for metric stats for mmi development

Description

A data set with example benthic macroinvertebrate data. Calculate metrics then statistics.

Usage

data_mmi_dev_small

Format

A data frame with 1,374 observations on the following 34 variables.

Class

a character vector

Ref_v1

a character vector

CalVal_Class4

a character vector

Unique_ID

a character vector

BenSampID

a character vector

CollDate

a character vector

CollMeth

a character vector

TaxaID

a character vector

Individuals

a numeric vector

Exclude

a logical vector

NonTarget

a character vector

Phylum

a character vector

Benthic_MasterTaxa.Class

a character vector

Order

a character vector

Family

a character vector

Subfamily

a character vector

Tribe

a character vector

Genus

a character vector

TolVal

a character vector

FFG

a character vector

Habit

a character vector

INDEX_NAME

a character vector

SUBPHYLUM

a character vector

CLASS

a character vector

SUBCLASS

a character vector

INFRAORDER

a character vector

LIFE_CYCLE

a character vector

BCG_ATTR

a character vector

THERMAL_INDICATOR

a character vector

LONGLIVED

a character vector

NOTEWORTHY

a character vector

FFG2

a character vector

TOLVAL2

a character vector

HABITAT

a numeric vector

Source

example data


Mark "exclude" (non-distinct / non-unique / ambiguous) taxa

Description

Takes as an input data frame with Sample ID, Taxa ID, and phlogenetic name fields and returns a similar dataframe with a column for "exclude" taxa (TRUE or FALSE).

Exclude taxa are refered to by multiple names; ambiguous, non-distinct, and non-unique. The "exclude" name was chosen so as to be consistent with "non-target" taxa. That is, taxa marked as "TRUE" are treated as undesireables. Exclude taxa are those that are present in a sample when taxa of the same group are present in the same sample are identified finer level. That is, the parent is marked as exclude when child taxa are present in the same sample.

Usage

markExcluded(
  df_samptax,
  SampID = "SAMPLEID",
  TaxaID = "TAXAID",
  TaxaCount = "N_TAXA",
  Exclude = "EXCLUDE",
  TaxaLevels,
  Exceptions = NA,
  verbose = FALSE
)

Arguments

df_samptax

Input data frame.

SampID

Column name in df_samptax for sample identifier. Default = "SAMPLEID".

TaxaID

Column name in df_samptax for organism identifier. Default = "TAXAID".

TaxaCount

Column name in df_samptax for organism count. Default = "N_TAXA".

Exclude

Column name for Exclude Taxa results in returned data frame. Default = "Exclude".

TaxaLevels

Column names in df_samptax that for phylogenetic names to be evaluated. Need to be in order from coarse to fine (i.e., Phylum to Species).

Exceptions

NA or two column data frame of synonyms or other exceptions. Default = NA Column 1 is the name used in the TaxaID column of df_samptax. Column 2 is the name used in the TaxaLevels columns of df_samptax.

verbose

Boolean value for if status messages are output to the console. Default = FALSE

Details

The exclude taxa are referenced in the metric values function. These taxa are removed from the taxa richness metrics. This is because these are coarser level taxa when fine level taxa are present in the same sample.

Exceptions is a 2 column data frame of synonyms or other exceptions. Column 1 is the name used in the TaxaID column the input data frame (df_samptax). Column 2 is the name used in the TaxaLevels columns of the input data frame (df_samptax). The phylogenetic columns (TaxaLevels) will be modified from Column 2 of the Exceptions data frame to match Column 1 of the Exceptions data frame. This ensures that the algorithm for markExcluded works properly. The changes will not be stored and the original names provided in the input data frame (df_samptax) will be returned in the final result. The function example below includes a practical case.

Taxa Levels are phylogenetic names that are to be checked. They should be listed in order from course (kingdom) to fine (species). Names not appearing in the data will be skipped.

The spelling of names must be consistent (including case) for this function to produce the intended output.

Value

Returns a data frame of df_samptax with an additional column, Exclude.

Examples


# Packages
library(readxl)
library(dplyr)
library(lazyeval)
library(knitr)

# Data
df_samps_bugs <- read_excel(system.file("./extdata/Data_Benthos.xlsx",
                                        package="BioMonTools"),
                            guess_max=10^6)

# Variables
SampID     <- "SampleID"
TaxaID     <- "TaxaID"
TaxaCount  <- "N_Taxa"
Exclude    <- "Exclude_New"
TaxaLevels <- c("Kingdom",
                "Phylum",
                "SubPhylum",
                "Class",
                "SubClass",
                "Order",
                "SubOrder",
                "SuperFamily",
                "Family",
                "SubFamily",
                "Tribe",
                "Genus",
                "SubGenus",
                "Species",
                "Variety")
# Taxa that should be treated as equivalent
Exceptions <- data.frame("TaxaID" = "Sphaeriidae",
                         "PhyloID" = "Pisidiidae")

# EXAMPLE 1
df_tst <- markExcluded(df_samps_bugs,
                       SampID = "SampleID",
                       TaxaID = "TaxaID",
                       TaxaCount = "N_Taxa",
                       Exclude = "Exclude_New",
                       TaxaLevels = TaxaLevels,
                       Exceptions = Exceptions)

# Compare
df_compare <- dplyr::summarise(dplyr::group_by(df_tst, SampleID),
                               Exclude_Import = sum(Exclude),
                               Exclude_R = sum(Exclude_New))
df_compare$Diff <- df_compare$Exclude_Import - df_compare$Exclude_R
#
tbl_diff <- table(df_compare$Diff)
kable(tbl_diff)
# sort
df_compare <- df_compare %>% arrange(desc(Diff))

# Number with issues
sum(abs(df_compare$Diff))
# total samples
nrow(df_compare)

# confusion matrix
tbl_results <- table(df_tst$Exclude, df_tst$Exclude_New, useNA = "ifany")
#
# Show differences
kable(tbl_results)
knitr::kable(df_compare[1:10, ])
knitr::kable(df_compare[672:678, ])
# samples with differences
samp_diff <- as.data.frame(df_compare[df_compare[,"Diff"] != 0, "SampleID"])
# results for only those with differences
df_tst_diff <- df_tst[df_tst[,"SampleID"] %in% samp_diff$SampleID, ]
# add diff field
df_tst_diff$Exclude_Diff <- df_tst_diff$Exclude - df_tst_diff$Exclude_New

# Classification Performance Metrics
class_TP <- tbl_results[2,2] # True Positive
class_FN <- tbl_results[2,1] # False Negative
class_FP <- tbl_results[1,2] # False Positive
class_TN <- tbl_results[1,1] # True Negative
class_n <- sum(tbl_results)  # total
#
# sensitivity (recall); TP / (TP+FN); measure model to ID true positives
class_sens <- class_TP / (class_TP + class_FN)
# precision; TP / (TP+FP); accuracy of model positives
class_prec <- class_TP / (class_TP + class_FP)
# specifity; TN / (TN + FP); measure model to ID true negatives
class_spec <- class_TN  / (class_TN + class_FP)
# overall accuracy; (TP + TN) / all cases; accuracy of all classifications
class_acc <- (class_TP + class_TN) / class_n
# F1; 2 * (class_prec*class_sens) / (class_prec+class_sens)
## balance of precision and recall
class_F1 <- 2 * (class_prec * class_sens) / (class_prec + class_sens)
#
results_names <- c("Sensitivity (Recall)",
                   "Precision",
                   "Specificity",
                   "Overall Accuracy",
                   "F1")
results_values <- c(class_sens,
                    class_prec,
                    class_spec,
                    class_acc,
                    class_F1)
#
tbl_class <- data.frame(results_names, results_values)
names(tbl_class) <- c("Performance Metrics", "Percent")
tbl_class$Percent <- round(tbl_class$Percent * 100, 2)
kable(tbl_class)

#~~~~~~~~~~~~~~~~~~~~~~~~~~

# EXAMPLE 2
## No Exceptions

df_tst2 <- markExcluded(df_samps_bugs,
                        SampID = "SampleID",
                        TaxaID = "TaxaID",
                        TaxaCount = "N_Taxa",
                        Exclude = "Exclude_New",
                        TaxaLevels = TaxaLevels,
                        Exceptions = NA)

# Compare
df_compare2 <- dplyr::summarise(dplyr::group_by(df_tst2, SampleID),
                                Exclude_Import = sum(Exclude),
                                Exclude_R = sum(Exclude_New))
df_compare2$Diff <- df_compare2$Exclude_Import - df_compare2$Exclude_R
#
tbl_diff2 <- table(df_compare2$Diff)
kable(tbl_diff2)
# sort
df_compare2 <- df_compare2 %>% arrange(desc(Diff))

# Number with issues
sum(abs(df_compare2$Diff))
# total samples
nrow(df_compare2)

# confusion matrix
tbl_results2 <- table(df_tst2$Exclude, df_tst2$Exclude_New, useNA = "ifany")
#
# Show differences
kable(tbl_results2)
knitr::kable(df_compare2[1:10, ])
knitr::kable(tail(df_compare2))
# samples with differences
(samp_diff2 <- as.data.frame(df_compare2[df_compare2[, "Diff"] != 0,
                                         "SampleID"]))
# results for only those with differences
df_tst_diff2 <- filter(df_tst2, SampleID %in% samp_diff2$SampleID)
# add diff field
df_tst_diff2$Exclude_Diff <- df_tst_diff2$Exclude - df_tst_diff2$Exclude_New

# Classification Performance Metrics
class_TP2 <- tbl_results2[2,2] # True Positive
class_FN2 <- tbl_results2[2,1] # False Negative
class_FP2 <- tbl_results2[1,2] # False Positive
class_TN2 <- tbl_results2[1,1] # True Negative
class_n2 <- sum(tbl_results2)  # total
#
# sensitivity (recall); TP / (TP+FN); measure model to ID true positives
class_sens2 <- class_TP2 / (class_TP2 + class_FN2)
# precision; TP / (TP+FP); accuracy of model positives
class_prec2 <- class_TP2 / (class_TP2 + class_FP2)
# specifity; TN / (TN + FP); measure model to ID true negatives
class_spec2 <- class_TN2 / (class_TN2 + class_FP2)
# overall accuracy; (TP + TN) / all cases; accuracy of all classifications
class_acc2 <- (class_TP2 + class_TN2) / class_n2
# F1; 2 * (class_prec*class_sens) / (class_prec+class_sens)
## balance of precision and recall
class_F12 <- 2 * (class_prec2 * class_sens2) / (class_prec2 + class_sens2)
#
results_names2 <- c("Sensitivity (Recall)",
                    "Precision",
                    "Specificity",
                    "Overall Accuracy",
                    "F1")
results_values2 <- c(class_sens2,
                     class_prec2,
                     class_spec2,
                     class_acc2,
                     class_F12)
#
tbl_class2 <- data.frame(results_names2, results_values2)
names(tbl_class2) <- c("Performance Metrics", "Percent")
tbl_class2$Percent <- round(tbl_class2$Percent * 100, 2)
kable(tbl_class2)

Score metrics

Description

This function calculates metric scores based on a Thresholds data frame. Can generate scores for categories n=3 (e.g., 1/3/5, ScoreRegime="Cat_135") or n=4 (e.g., 0/2/4/6, ScoreRegime="Cat_0246") or continuous (e.g., 0-100, ScoreRegime="Cont_0100").

Usage

metric.scores(
  DF_Metrics,
  col_MetricNames,
  col_IndexName,
  col_IndexClass,
  DF_Thresh_Metric,
  DF_Thresh_Index,
  col_ni_total = "ni_total",
  col_IndexRegion = NULL
)

Arguments

DF_Metrics

Data frame of metric values (as columns), Index Name, and Index Region (strata).

col_MetricNames

Names of columns of metric values.

col_IndexName

Name of column with index (e.g., MBSS.2005.Bugs)

col_IndexClass

Name of column with relevant bioregion or site class (e.g., COASTAL).

DF_Thresh_Metric

Data frame of Scoring Thresholds for metrics (INDEX_NAME, INDEX_CLASS, METRIC_NAME, Direction, Thresh_Lo, Thresh_Mid, Thresh_Hi, ScoreRegime , SingleValue_Add, NormDist_Tail_Lo, NormDist_Tail_Hi, CatGrad_xvar , CatGrad_InfPt, CatGrad_Lo_m, CatGrad_Lo_b, CatGrad_Mid_m, CatGrad_Mid_b , CatGrad_Hi_m, CatGrad_Hi_b).

DF_Thresh_Index

Data frame of Scoring Thresholds for indices (INDEX_NAME, INDEX_CLASS,METRIC_NAME, ScoreRegime, Thresh01, Thresh02 , Thresh03, Thresh04, Thresh05, Thresh06, Thresh07 , Nar01, Nar02, Nar03, Nar04, Nar05, Nar06).

col_ni_total

Name of column with total number of individuals. Used for cases where sample was collected but no organisms collected. Default = ni_total.#'

col_IndexRegion

Name of column with relevant bioregion or site class (e.g., COASTAL). Default = NULL. DEPRECATED

Details

The R library dplyr is needed for this function.

For all ScoreRegime cases at the index level a "sum_Index" field is computed that is the sum of all metric scores. Valid "ScoreRegime" values are:

* SUM = all metric scores added together.

* AVERAGE = all metric scores added and divided by the number of metrics. The index is on the same scale as the individual metric scores.

* AVERAGE_100 = AVERAGE is scaled 0 to 100.

FIX, 2024-01-29, v1.0.0.9060 Rename col_IndexRegion to col_IndexClass Add col_IndexRegion as variable at end to avoid breaking existing code Later remove it as an input variable but add code in the function to accept

Value

vector of scores

Examples

# Example data

library(readxl)
library(reshape2)

# Thresholds
fn_thresh <- file.path(system.file(package = "BioMonTools"),
                       "extdata",
                       "MetricScoring.xlsx")
df_thresh_metric <- read_excel(fn_thresh, sheet = "metric.scoring")
df_thresh_index <- read_excel(fn_thresh, sheet = "index.scoring")

#~~~~~~~~~~~~~~~~~~~~~~~~
# Pacific Northwest, BCG Level 1 Indicator Taxa Index
df_samps_bugs <- read_excel(system.file("extdata/Data_Benthos.xlsx"
                                        , package = "BioMonTools")
                            , guess_max = 10^6)

myIndex <- "BCG_PacNW_L1"
df_samps_bugs$Index_Name   <- myIndex
df_samps_bugs$Index_Class <- "ALL"
(myMetrics.Bugs <- unique(
  as.data.frame(df_thresh_metric)[df_thresh_metric[,
                                  "INDEX_NAME"] == myIndex, "METRIC_NAME"]))
# Run Function
df_metric_values_bugs <- metric.values(df_samps_bugs,
                                       "bugs",
                                       fun.MetricNames = myMetrics.Bugs)

# index to BCG.PacNW.L1
df_metric_values_bugs$INDEX_NAME <- myIndex
df_metric_values_bugs$INDEX_CLASS <- "ALL"

# SCORE Metrics
df_metric_scores_bugs <- metric.scores(df_metric_values_bugs,
                                       myMetrics.Bugs,
                                       "INDEX_NAME",
                                       "INDEX_CLASS",
                                       df_thresh_metric,
                                       df_thresh_index)

# QC, table
table(df_metric_scores_bugs$Index, df_metric_scores_bugs$Index_Nar)
# QC, plot
hist(df_metric_scores_bugs$Index,
     main = "PacNW BCG Example Data",
     xlab = "Level 1 Indicator Taxa Index Score")
abline(v = c(21,30), col = "blue")
text(21 + c(-2, +2), 200, c("Low", "Medium"), col = "blue")

Calculate metric statistics

Description

This function calculates metric statistics for use with developing a multi-metric index.

Inputs are a data frame with

Usage

metric.stats(
  fun.DF,
  col_metrics,
  col_SampID = "SAMPLEID",
  col_RefStatus = "Ref_Status",
  RefStatus_Ref = "Ref",
  RefStatus_Str = "Str",
  RefStatus_Oth = "Oth",
  col_DataType = "Data_Type",
  DataType_Cal = "Cal",
  DataType_Ver = "Ver",
  col_Subset = NULL,
  Subset_Value = NULL
)

Arguments

fun.DF

Data frame.

col_metrics

Column names for metrics.

col_SampID

Column name for unique sample identifier. Default = "SAMPLEID".

col_RefStatus

Column name for Reference Status. Default = "Ref_Status"

RefStatus_Ref

Reference Status name for Reference used in col_ RefStatus. Default = “Ref”. Use NULL if you don't use this value.

RefStatus_Str

Reference Status name for Stressed used in col_ RefStatus. Default = “Str”. Use NULL if you don't use this value.

RefStatus_Oth

Reference Status name for Other used in col_ RefStatus. Default = “Oth”. Use NULL if you don't use this value.

col_DataType

Column name for Data Type – Validation vs. Calibration. Default = "Data_Type"

DataType_Cal

Datatype name for Calibration used in col_DataType. Default = “Cal”. Use NULL if you don't use this value.

DataType_Ver

Datatype name for Verification used in col_DataType. Default = “Ver”. Use NULL if you don't use this value.

col_Subset

Column name to subset the data and run on each subset. Default = NULL. If NULL then no subset will be generated.

Subset_Value

Subset name to be used for creating subset. Default = NULL.

Details

Summary statistics for the data are calculated.

The data is filtered by the column Subset for only a single value given by the user. If need further subsets re-run the function. If no subset is given the entire data set is used.

Statistics will be generated for up to 6 combinations for RefStatus (Ref, Oth, Str) and DataType (Cal, Ver).

The resulting dataframe will have the statistics in columns with the first 4 columns as: INDEX_CLASS (if col_Subset not provided), col_RefStatus, col_DataType, and Metric_Name.

The following statistics are generated with na.rm = TRUE.

* n = number

* min = minimum

* max = maximum

* mean = mean

* median = median

* range = range (max - min)

* sd = standard deviation

* cv = coefficient of variation (sd/mean)

* q05 = quantile, 5

* q10 = quantile, 10

* q25 = quantile, 25

* q50 = quantile, 50

* q75 = quantile, 75

* q90 = quantile, 90

* q95 = quantile, 95

Value

data frame of metrics (rows) and statistics (columns). This is in long format with columns for INDEX_CLASS, RefStatus, and DataType.

Examples

# data, benthos
df_bugs <- data_mmi_dev_small

# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"]   <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"]      <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"]     <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"]       <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"]   <- "SITEID"

# Add Missing Columns
df_bugs$ELEVATION_ATTR <- NA_character_
df_bugs$GRADIENT_ATTR  <- NA_character_
df_bugs$WSAREA_ATTR    <- NA_character_
df_bugs$HABSTRUCT      <- NA_character_
df_bugs$BCG_ATTR2      <- NA_character_
df_bugs$AIRBREATHER    <- NA
df_bugs$UFC            <- NA_real_

# Calc Metrics
cols_keep <- c("Ref_v1",
               "CalVal_Class4",
               "SITEID",
               "CollDate",
               "CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)

# Calc Stats
col_metrics   <- names(df_metval)[9:ncol(df_metval)]
col_SampID    <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType  <- "CALVAL_CLASS4"
DataType_Cal  <- "cal"
DataType_Ver  <- "verif"
col_Subset    <- "INDEX_CLASS"
Subset_Value  <- "CentralHills"

df_stats <- metric.stats(df_metval,
                         col_metrics,
                         col_SampID,
                         col_RefStatus,
                         RefStatus_Ref,
                         RefStatus_Str,
                         RefStatus_Oth,
                         col_DataType,
                         DataType_Cal,
                         DataType_Ver,
                         col_Subset,
                         Subset_Value)


# Save Results
write.table(df_stats,
            file.path(tempdir(), "metric.stats.tsv"),
            col.names = TRUE,
            row.names = FALSE,
            sep = "\t")


Secondary metric statistics

Description

This function calculates secondary statistics (DE and z-score) on metric statistics for use with developing a multi-metric index.

Usage

metric.stats2(
  data_metval,
  data_metstat,
  col_metval_RefStatus = "RefStatus",
  col_metval_DataType = "DataType",
  col_metval_Subset = "INDEX_CLASS",
  col_metstat_RefStatus = "RefStatus",
  col_metstat_DataType = "DataType",
  col_metstat_Subset = "INDEX_CLASS",
  RefStatus_Ref = "Ref",
  RefStatus_Str = "Str",
  RefStatus_Oth = "Oth",
  DataType_Cal = "Cal",
  DataType_Ver = "Ver",
  Subset_Value = NULL
)

Arguments

data_metval

Data frame of metric values.

data_metstat

Data frame of metric statistics

col_metval_RefStatus

Column name for Reference Status. Default = "Ref_Status"

col_metval_DataType

Column name for Data Type – Validation vs. Calibration. Default = "Data_Type"

col_metval_Subset

Column name for INDEX_CLASS in data_metstats. Default = INDEX_CLASS

col_metstat_RefStatus

Column name for Reference Status. Default = "Ref_Status"

col_metstat_DataType

Column name for Data Type – Validation vs. Calibration. Default = "Data_Type"

col_metstat_Subset

Column name for INDEX_CLASS in data_metstats. Default = xx.

RefStatus_Ref

RefStatus value for Reference. Default = "Ref"

RefStatus_Str

RefStatus value for Stressed. Default = "Str"

RefStatus_Oth

RefStatus value for Other. Default = "Oth"

DataType_Cal

DataType value for Calibration. Default = "Cal"

DataType_Ver

DataType value for Verification. Default = "Ver"

Subset_Value

Subset value of INDEX_CLASS (site class). Default = NULL

Details

Secondary metrics statistics for the data are calculated.

Inputs are metric values and metric stats outputs.

Metric values is a wide format with columns for each metric. Assumes only a single Subset.

Metrics stats is a wide format with columns for each statistic with metrics in a single column. Assumes only a single Subset.

Required fields are RefStatus, DataType, and INDEX_CLASS. The user is allowed to enter their own values for these fields for each input file.

The two statistics calculated are z-score and discrimination efficiency (DE) for each metric within each DataType (cal / val).

Z-scores are calculated using the calibration (or development) data set for a given INDEX_CLASS (or Site Class).

* (mean Ref - mean Str) / sd Ref

DE is calculated without knowing the expected direction of response for each metric for a given INDEX_CLASS (or Site Class). DE is the percentage (0-100) of **stressed** samples that fall **below** the **25th** quantile (for decreaser metrics, e.g., total taxa) or **above** the **75th** quantile (for increaser metrics, e.g., HBI) of the **reference** samples.

A data frame of the metric.stats input is returned with new columns (z_score, DE25 and DE75). The z-score is added for each Ref_Status. DE25 and DE75 are only added where Ref_Status is labeled as Stressed.

Value

A data frame of the metric.stats input is returned with new columns (z_score, DE25 and DE75).

Examples

# data, benthos
df_bugs <- data_mmi_dev_small

# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"]   <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"]      <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"]     <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"]       <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"]   <- "SITEID"

# Add Missing Columns
df_bugs$ELEVATION_ATTR <- NA_character_
df_bugs$GRADIENT_ATTR  <- NA_character_
df_bugs$WSAREA_ATTR    <- NA_character_
df_bugs$HABSTRUCT      <- NA_character_
df_bugs$BCG_ATTR2      <- NA_character_
df_bugs$AIRBREATHER    <- NA
df_bugs$UFC            <- NA_real_

# Calc Metrics
cols_keep <- c("Ref_v1",
               "CalVal_Class4",
               "SITEID",
               "CollDate",
               "CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)

# Calc Stats
col_metrics   <- names(df_metval)[9:ncol(df_metval)]
col_SampID    <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType  <- "CALVAL_CLASS4"
DataType_Cal  <- "cal"
DataType_Ver  <- "verif"
col_Subset    <- "INDEX_CLASS"
Subset_Value  <- "CentralHills"
df_stats <- metric.stats(df_metval,
                         col_metrics,
                         col_SampID,
                         col_RefStatus,
                         RefStatus_Ref,
                         RefStatus_Str,
                         RefStatus_Oth,
                         col_DataType,
                         DataType_Cal,
                         DataType_Ver,
                         col_Subset,
                         Subset_Value)

# Calc Stats2 (z-scores and DE)
data_metval           <- df_metval
data_metstat          <- df_stats
col_metval_RefStatus  <- "REF_V1"
col_metval_DataType   <- "CALVAL_CLASS4"
col_metval_Subset     <- "INDEX_CLASS"
col_metstat_RefStatus <- "REF_V1"
col_metstat_DataType  <- "CALVAL_CLASS4"
col_metstat_Subset    <- "INDEX_CLASS"
RefStatus_Ref         <- "Ref"
RefStatus_Str         <- "Strs"
RefStatus_Oth         <- "Other"
DataType_Cal          <- "cal"
DataType_Ver          <- "verif"
Subset_Value          <- "CentralHills"
df_stats2 <- metric.stats2(data_metval,
                           data_metstat,
                           col_metval_RefStatus,
                           col_metval_DataType,
                           col_metval_Subset,
                           col_metstat_RefStatus,
                           col_metstat_DataType,
                           col_metstat_Subset,
                           RefStatus_Ref,
                           RefStatus_Str,
                           RefStatus_Oth,
                           DataType_Cal,
                           DataType_Ver,
                           Subset_Value)


# Save Results
write.table(df_stats2,
            file.path(tempdir(), "metric.stats2.tsv"),
            col.names = TRUE,
            row.names = FALSE,
            sep = "\t")


Calculate metric values

Description

This function calculates metric values for bugs, fish, algae , and coral. Inputs are a data frame with SampleID and taxa with phylogenetic and autecological information (see below for required fields by community). The dplyr package is used to generate the metric values.

Usage

metric.values(
  fun.DF,
  fun.Community,
  fun.MetricNames = NULL,
  boo.Adjust = FALSE,
  fun.cols2keep = NULL,
  boo.marine = FALSE,
  boo.Shiny = FALSE,
  verbose = FALSE,
  metric_subset = NULL,
  taxaid_dni = NULL
)

Arguments

fun.DF

Data frame of taxa (list required fields)

fun.Community

Community name for which to calculate metric values (bugs, fish, algae, or coral)

fun.MetricNames

Optional vector of metric names to be returned. If none are supplied then all will be returned. Default=NULL

boo.Adjust

Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics.

fun.cols2keep

Column names of fun.DF to retain in the output. Uses column names.

boo.marine

Should estuary/marine metrics be included. Ignored if fun.MetricNames is not null. Default = FALSE.

boo.Shiny

Boolean value for if the function is accessed via Shiny. Default = FALSE.

verbose

Include messages to track progress. Default = FALSE

metric_subset

Subset of metrics to be generated. Internal function. Default = NULL

taxaid_dni

Taxa names to be included in DNI (Do Not Include) metrics (n = 3) but dropped for all other metrics. Only for benthic metrics. Default = NULL

Details

All percent metric results are 0-100.

No manipulations of the taxa are performed by this routine. All benthic macroinvertebrate taxa should be identified to the appropriate operational taxonomic unit (OTU).

Any non-count taxa should be identified in the "Exclude" field as "TRUE". These taxa will be excluded from taxa richness metrics (but will count for all others).

Any non-target taxa should be identified in the "NonTarget" field as "TRUE". Non-target taxa are those that are not part of your intended #' capture list; e.g., fish, herps, water column taxa, or water surface taxa in a benthic sample. The target list will vary by program. The non-target taxa will be removed prior to any calculations.

Excluded taxa are ambiguous taxa (on a sample basis), i.e., the parent taxa when child taxa are present. For example, the parent taxa Chironomidae would be excluded when the child taxa Tanytarsini is present. Both would be excluded when Tanytarsus is present. The markExcluded function can be used to populated this field.

There are a number of required fields (see below) for metric to calculation. If any fields are missing the user will be prompted as to which are missing and if the user wants to continue or quit. If the user continues the missing fields will be added but will be filled with zero or NA (as appropriate). Any metrics based on the missing fields will not be valid.

A future update may turn these fields into function parameters. This would allow the user to tweak the function inputs to match their data rather than having to update their data to match the function.

Required fields, all communities:

* SAMPLEID (character or number, must be unique)

* TAXAID (character or number, must be unique)

* N_TAXA

* INDEX_NAME

* INDEX_CLASS (BCG or MMI site category; e.g., for BCG PacNW valid values are "hi" or "lo")

Additional Required fields, bugs:

* EXCLUDE (valid values are TRUE and FALSE)

* NONTARGET (valid values are TRUE and FALSE)

* PHYLUM, SUBPHYLUM, CLASS, SUBCLASS, INFRAORDER, ORDER, FAMILY, SUBFAMILY, TRIBE, GENUS

* FFG, HABIT, LIFE_CYCLE, TOLVAL, BCG_ATTR, THERMAL_INDICATOR, FFG2, TOLVAL2, LONGLIVED, NOTEWORTHY, HABITAT, UFC, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, HABSTRUCT

Additional Required fields, fish:

* N_ANOMALIES

* SAMP_BIOMASS (biomass total for sample, funciton uses max in case entered for all taxa in sample)

* NATIVE: NATIVE or other text values

* DA_MI2, SAMP_WIDTH_M, SAMP_LENGTH_M, , TYPE, TOLER, TROPHIC, SILT, FAMILY, GENUS, HYBRID, BCG_ATTR, THERMAL_INDICATOR, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, REPRODUCTION, HABITAT, CONNECTIVITY, SCC

Additional Required fields, algae:

* EXCLUDE, NONTARGET, PHYLUM, ORDER, FAMILY, GENUS, BC_USGS, TROPHIC_USGS, SAP_USGS, PT_USGS, O_USGS, SALINITY_USGS, BAHLS_USGS, P_USGS, N_USGS, HABITAT_USGS, N_FIXER_USGS, MOTILITY_USGS, SIZE_USGS, HABIT_USGS, MOTILE2_USGS, TOLVAL, DIATOM_ISA, DIAT_CL, POLL_TOL, BEN_SES, DIATAS_TP, DIATAS_TN, DIAT_COND, DIAT_CA, MOTILITY, NF

Valid values for fields:

* FFG: CG, CF, PR, SC, SH

* HABIT: BU, CB, CN, SP, SW

* LIFE_CYCLE: UNI, SEMI, MULTI

* THERMAL_INDICATOR: STENOC, COLD, COOL, WARM, STENOW, EURYTHERMAL , COWA, NA

* LONGLIVED: TRUE, FALSE

* NOTEWORTHY: TRUE, FALSE

* HABITAT: BRAC, DEPO, GENE, HEAD, RHEO, RIVE, SPEC, UNKN

* UFC: integers 1:6 (taxonomic uncertainty frequency class)

* ELEVATION_ATTR: LOW, HIGH

* GRADIENT_ATTR: LOW, MOD, HIGH

* WSAREA_ATTR: SMALL, MEDIUM, LARGE, XLARGE

* REPRODUCTION: BROADCASTER, SIMPLE NEST, COMPLEX NEST, BEARER, MIGRATORY

* CONNECTIVITY: TRUE, FALSE

* SCC (Species of Conservation Concern): TRUE, FALSE

'Columns to keep' are additional fields in the input file that the user wants retained in the output. Fields need to be those that are unique per sample and not associated with the taxa. For example, the fields used in qc.check(); Area_mi2, SurfaceArea, Density_m2, and Density_ft2.

If fun.MetricNames is provided only those metrics will be returned in the provided order. This variable can be used to sort the metrics per the user's preferences. By default the metric names will be returned in the groupings that were used for calculation.

The fields TOLVAL2 and FFG2 are provided to allow the user to calculate metrics based on alternative scenarios. For example, including both HBI and NCBI where the NCBI uses a different set of tolerance values (TOLVAL2).

If TAXAID is 'NONE' and N_TAXA is '0' then metrics **will** be calculated with that record. Other values for TAXAID with N_TAXA = 0 will be removed before calculations.

For 'Oligochete' metrics either Class or Subclass is required for calculation.

The parameter boo.Shiny can be set to TRUE when accessing this function in Shiny. Normally the QC check for required fields is interactive. Setting boo.Shiny to TRUE will always continue. The default is FALSE.

The parameter 'taxaid_dni' denotes taxa to be included in Do Not Include (DNI) metrics but dropped from all other metrics. Only for benthic metrics.

Breaking change from 0.5 to 0.6 with change from Index_Name to Index_Class.

Value

data frame of SampleID and metric values

Examples

# Example 1, data already in R

df_metval <- metric.values(BioMonTools::data_benthos_PacNW,
                           "bugs")

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example 2, specific metrics or metrics in a specific order
## reuse df_samps_bugs from above

# metric names to keep (in this order)
myMetrics <- c("ni_total",
               "nt_EPT",
               "nt_Ephem",
               "pi_tv_intol",
               "pi_Ephem",
               "nt_ffg_scrap",
               "pi_habit_climb")

# Run Function
df_metval_myMetrics <- metric.values(BioMonTools::data_benthos_PacNW,
                                     "bugs",
                                     fun.MetricNames = myMetrics)


Calculate metric values, Algae

Description

Subfunction of metric.values for use with Algae.

Usage

metric.values.algae(
  myDF,
  MetricNames = NULL,
  boo.Adjust = FALSE,
  cols2keep = NULL,
  MetricSort = NA,
  boo.Shiny = FALSE,
  verbose
)

Arguments

myDF

Data frame of taxa.

MetricNames

Optional vector of metric names to be returned.

boo.Adjust

Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics.

cols2keep

Column names of fun.DF to retain in the output. Uses column names.

MetricSort

How metric names should be sort; NA = as is , AZ = alphabetical. Default = NULL.

boo.Shiny

Boolean value for if the function is accessed via Shiny. Default = FALSE.

verbose

Include messages to track progress. Default = FALSE

Details

For internal use only. Called from metric.values().

Value

Data frame


Calculate metric values, Bugs

Description

Subfunction of metric.values for use with Benthic Macroinvertebrates

Usage

metric.values.bugs(
  myDF,
  MetricNames = NULL,
  boo.Adjust = FALSE,
  cols2keep = NULL,
  MetricSort = NA,
  boo.marine = FALSE,
  boo.Shiny,
  verbose,
  metric_subset,
  taxaid_dni = NULL
)

Arguments

myDF

Data frame of taxa.

MetricNames

Optional vector of metric names to be returned.

boo.Adjust

Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics.

cols2keep

Column names of fun.DF to retain in the output. Uses column names.

MetricSort

How metric names should be sort; NA = as is , AZ = alphabetical. Default = NULL.

boo.marine

Should estuary/marine metrics be included. Ignored if fun.MetricNames is not null. Default = FALSE.

boo.Shiny

Boolean value for if the function is accessed via Shiny. Default = FALSE.

verbose

Include messages to track progress. Default = FALSE

metric_subset

Subset of metrics to be generated. Internal function. Default = NULL

taxaid_dni

Taxa names to be included in DNI (Do Not Include) metrics (n = 3) but dropped for all other metrics. Only for benthic metrics. Default = NULL

Details

For internal use only. Called from metric.values().

Value

Data frame


Calculate metric values, coral

Description

Subfunction of metric.values for use with coral

Usage

metric.values.coral(
  myDF,
  MetricNames = NULL,
  boo.Adjust = FALSE,
  cols2keep = NULL,
  MetricSort = NA,
  boo.Shiny = FALSE,
  verbose
)

Arguments

myDF

Data frame of taxa.

MetricNames

Optional vector of metric names to be returned.

boo.Adjust

Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics.

cols2keep

Column names of fun.DF to retain in the output. Uses column names.

MetricSort

How metric names should be sort; NA = as is , AZ = alphabetical. Default = NULL.

boo.Shiny

Boolean value for if the function is accessed via Shiny. Default = FALSE.

verbose

Include messages to track progress. Default = FALSE

Details

For internal use only. Called from metric.values().

Value

Data frame


Calculate metric values, Fish

Description

Subfunction of metric.values for use with Fish.

Usage

metric.values.fish(
  myDF,
  MetricNames = NULL,
  boo.Adjust = FALSE,
  cols2keep = NULL,
  boo.Shiny,
  verbose
)

Arguments

myDF

Data frame of taxa.

MetricNames

Optional vector of metric names to be returned.

boo.Adjust

Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics.

cols2keep

Column names of fun.DF to retain in the output. Uses column names.

boo.Shiny

Boolean value for if the function is accessed via Shiny. Default = FALSE.

verbose

Include messages to track progress. Default = FALSE

Details

For internal use only. Called from metric.values().

Value

Data frame


Metric values Groups to Excel

Description

The output of metric.values() is saved to Excel with different groups of metrics on different worksheets.

Usage

metvalgrpxl(
  fun.DF.MetVal,
  fun.DF.xlMetNames = NULL,
  fun.Community,
  fun.MetVal.Col2Keep = c("SAMPLEID", "INDEX_NAME", "INDEX_CLASS"),
  fun.xlGrpCol = "Sort_Group",
  file.out = NULL
)

Arguments

fun.DF.MetVal

Data frame of metric values.

fun.DF.xlMetNames

Data frame of metric names and groups. Default (NULL) will use the verion of MetricNames.xlsx that is in the BioMonTools package.

fun.Community

Community name of calculated metric values (bugs, fish, or algae)

fun.MetVal.Col2Keep

Column names in metric values to keep. Default = c("SAMPLEID", "INDEX_NAME", "INDEX_CLASS")

fun.xlGrpCol

Column name from Excel metric names to use for Groupings. Default = Sort_Group

file.out

Output file name. Default (NULL) will generate a file name based on the data and time (e.g., MetricValuesGroups_bugs_20220201.xlsx)

Details

This function will save the output of metric.values() into groups by worksheet as defined by the user.

The Excel file MetricNames.xlsx provided in the extdata folder has a column named 'Groups' that can be used as default groupings. If no groupings are provided (the default) all metrics are saved to a single worksheet. Within each group the 'sort_order' is used to sort the metrics. If this column is blank then the metrics are sorted in the order they appear in the output from metric.values() (i.e., in fun.DF).

The MetricNames data frame must include the following fields:

* Metric_Name

* Community

* Sort_Group (user defined)

Value

Saves Excel file with metrics grouped by worksheet

Examples

# Example 1, bugs
## Community
comm <- "bugs"
## Calculate Metrics
df_metval <- metric.values(BioMonTools::data_benthos_PacNW, comm)
## Metric Names and Groups
df_metnames <- readxl::read_excel(system.file("extdata/MetricNames.xlsx",
                                              package="BioMonTools"),
                                  guess_max = 10^6,
                                  sheet = "MetricMetadata",
                                  skip = 4)
## Columns to Keep
col2keep <- c("SAMPLEID", "INDEX_NAME", "INDEX_CLASS")
## Grouping Column
col_Grp <- "Sort_Group"
## File Name
file_out <- file.path(tempdir(), paste0("MetValGrps_", comm, ".xlsx"))
## Run Function
metvalgrpxl(df_metval, df_metnames, comm, col2keep, col_Grp, file_out)


QC checks on metric values

Description

Apply "QC checks" on calculated metrics and station/sample attributes to "flag" samples for the user. Examples include watershed size or total number of individuals. Can have checks for both high and low values. Checks are stored in separate file. For structure see df.checks in example.

Usage

qc.checks(df.metrics, df.checks, input.shape = "wide")

Arguments

df.metrics

Wide data frame with metric values to be evaluated.

df.checks

Data frame of metric thresholds to check.

input.shape

Shape of df.metrics; wide or long. Default is wide.

Details

used reshape2 package

Value

Returns a data frame of SampleID checks and results; Pass and Fail.

Examples

library(readxl)

# Calculate Metrics
df.samps.bugs <- read_excel(system.file("./extdata/Data_Benthos.xlsx",
                                        package="BioMonTools"),
                            guess_max = 10^6)

# Columns to keep
myCols <- c("Area_mi2", "SurfaceArea", "Density_m2", "Density_ft2")

# Run Function
myDF <- df.samps.bugs
df.metric.values.bugs <- metric.values(myDF, "bugs", fun.cols2keep = myCols)

# Import Checks
df.checks <- read_excel(system.file("./extdata/MetricFlags.xlsx",
                                    package="BioMonTools"),
                        sheet="Flags")

# Run Function
df.flags <- qc.checks(df.metric.values.bugs, df.checks)

# Summarize Results
table(df.flags[, "CHECKNAME"], df.flags[, "FLAG"], useNA = "ifany")

Quality Control Check on User Data Against Master Taxa List

Description

This function compares the user's data frame to a data frame with the official (or user supplied) master taxa list (benthic macroinvertebrates).

Usage

qc_taxa(
  DF_User,
  DF_Official = NULL,
  fun.Community = NULL,
  useOfficialTaxaInfo = "only_Official"
)

Arguments

DF_User

User taxa data.

DF_Official

Official master taxa list. Can be a local file or from a URL. Default is NULL. A NULL value will use the official online files.

fun.Community

Community name for which to compare the master taxa list (bugs or fish).

useOfficialTaxaInfo

Select how to handle new/different taxa. See 'Details' for more information. Valid values are "only_Official", "only_user", "add_new". Default = "only_Official".

Details

Output is a data frame with matches.

Messages are output to the console with the number of matches and which user taxa did not match the official list.

The official list is stored online but the user can input their own saved copy.

Any columns in the user input file that match the official master taxa list will be renamed with the "_NonOfficial" suffix.

New/different taxa in the user data are handled by the 'useOfficialTaxaInfo' parameter. For taxa that did not match the master taxa list the user has options on how to handle the differences for the phylogeny (e.g., columns for phylum, class, family, etc.) and autecology (e.g., columns for FFG, habit, tolerance value, etc.). The options are below.

* only_official = use only official master taxa information. Any non-matching taxa will not have any master taxa information.

* only_user = only use the information provided by the user. Information from the 'Official' will not be used. This should only be used for non-official calculations.

* add_new = hybrid approach that uses official master taxa information, when present, but includes user information for non-matching taxa if the column names match.

Default master taxa lists are saved as CSV files online at:

https://github.com/leppott/MBSStools_SupportFiles

The files can be downloaded with the following code.

**Benthic Macroinvertebrate**

url_mt_bugs <- "https://github.com/leppott/MBSStools_SupportFiles/raw/master/Data/CHAR_Bugs.csv" df_mt_bugs <- read.csv(url_mt_bugs)

The master taxa files are periodically updated. Update dates will be logged on the GitHub repository.

Expected fields include:

**Benthic Macroinvertebrates**

+ TAXON, Phylum, Class, Order, Family, Genus, Other_Taxa, Tribe, FFG, FAM_TV, Habit, FinalTolVal07, Comment

Value

input data frame with master taxa information added to it.

Examples

# Example 1, Master Taxa List, Bugs
url_mt_bugs <- "https://github.com/leppott/MBSStools_SupportFiles/raw/master/Data/CHAR_Bugs.csv"
df_mt_bugs  <- read.csv(url_mt_bugs)

# User data
DF_User <- data_benthos_MBSS
DF_Official <- NULL   # NULL df_mt_bugs
fun.Community <- "bugs"
useOfficialTaxaInfo <- "only_Official"
# modify taxa id column
DF_User[, "TAXON"] <- DF_User[, "TAXAID"]

df_qc_taxa_bugs <- qc_taxa(DF_User,
                           DF_Official,
                           fun.Community,
                           useOfficialTaxaInfo)

# QC input/output
dim(DF_User)
dim(df_qc_taxa_bugs)
names(DF_User)
names(df_qc_taxa_bugs)

Rarify (subsample) biological sample to fixed count

Description

Takes as an input a 3 column data frame (SampleID, TaxonID , Count) and returns a similar dataframe with revised Counts.

The other inputs are subsample size (target number of organisms in each sample) and seed. The seed is given so the results can be reproduced from the same input file. If no seed is given a random seed is used.

Usage

rarify(inbug, sample.ID, abund, subsiz, mySeed = NA, verbose = FALSE)

Arguments

inbug

Input data frame. Needs 3 columns (SampleID, taxonomicID , Count).

sample.ID

Column name in inbug for sample identifier.

abund

Column name in inbug for organism count.

subsiz

Target subsample size for each sample.

mySeed

Seed for random number generator. If provided the results with the same inbug file will produce the same results. Default = NA (random seed will be used.)

verbose

Boolean value for if status messages are output to the console. Default = FALSE

Details

rarify function: R function to rarify (subsample) a macroinvertebrate sample down to a fixed count; by John Van Sickle, USEPA. email: VanSickle.John@epa.gov ; Version 1.0, 06/10/05;

Value

Returns a data frame with the same three columns but the abund field has been modified so the total count for each sample is no longer above the target (subsiz).

Examples

# Subsample to 500 organisms (from over 500 organisms) for 12 samples.

# load bio data
df_biodata <- data_bio2rarify
dim(df_biodata)

# subsample
mySize  <- 500
Seed_OR <- 18590214
Seed_WA <- 18891111
Seed_US <- 17760704
bugs_mysize <- rarify(inbug = df_biodata,
                      sample.ID = "SampleID",
                      abund = "N_Taxa",
                      subsiz = mySize,
                      mySeed = Seed_US,
                      verbose = FALSE)

# view results
dim(bugs_mysize)

# Compare pre- and post- subsample counts
df_compare <- merge(df_biodata,
                    bugs_mysize,
                    by = c("SampleID", "TaxaID"),
                    suffixes = c("_Orig","_500"))
df_compare <- df_compare[, c("SampleID",
                             "TaxaID",
                             "N_Taxa_Orig",
                             "N_Taxa_500")]

# compare totals
tbl_totals <- aggregate(cbind(N_Taxa_Orig, N_Taxa_500) ~ SampleID,
                        df_compare,
                        sum)


# save the data
write.table(bugs_mysize,
            file.path(tempdir(), paste("bugs", mySize, "txt", sep = ".")),
            sep = "\t")


Taxa Translate

Description

Convert user taxa names to those in an official project based name list.

Usage

taxa_translate(
  df_user = NULL,
  df_official = NULL,
  df_official_metadata = NULL,
  taxaid_user = "TAXAID",
  taxaid_official_match = NULL,
  taxaid_official_project = NULL,
  taxaid_drop = NULL,
  col_drop = NULL,
  sum_n_taxa_boo = FALSE,
  sum_n_taxa_col = NULL,
  sum_n_taxa_group_by = NULL,
  trim_ws = FALSE,
  match_caps = FALSE
)

Arguments

df_user

User taxa data

df_official

Official project taxa data (master taxa list).

df_official_metadata

Metadata for official project taxa data. Default is NULL

taxaid_user

Taxonomic identifier in user data. Default is "TAXAID".

taxaid_official_match

Taxonomic identifier in official data user to match with user data. This is not the project taxanomic identifier.

taxaid_official_project

Taxonomic identifier in official data that is specific to a project, e.g., after operational taxonomic unit (OTU) applied.

taxaid_drop

Official taxonomic identifier that signals a record should be dropped; e.g., DNI (Do Not Include) or -999. Default = NULL

col_drop

Columns to remove in output. Default = NULL

sum_n_taxa_boo

Boolean value for if the results should be summarized Default = FALSE DEPRECATED, values will be ignored

sum_n_taxa_col

Column name for number of individuals for user data when summarizing. This column will be summed. Default = NULL (suggestion = N_TAXA) DEPRECATED, values will be ignored

sum_n_taxa_group_by

Column names for user data to use for grouping the data when summarizing the user data. Suggestions are SAMPID and TAXA_ID. Default = NULL DEPRECATED, values will be ignored

trim_ws

Boolean value for taxaid to have leading and trailing white space removed. Non-braking spaces (e.g., from ITIS) also removed (including inside text). Default = FALSE

match_caps

Boolean value to match user and official TaxaIDs after converting to ALL CAPS. Default = FALSE

Details

Merges user file with official file. The official file has phylogeny, autecology, and other project specific fields.

The inputs for the function uses existing data frames (or tibbles).

Any fields that match between the user file and the official file the official data column name have the 'official' version retained.

The 'col_drop' parameter can be used to remove unwanted columns; e.g., the other taxa id fields in the 'official' data file.

By default, taxa are not collapsed to the official taxaid. That is, if multiple taxa in a sample have the same name the rows will not be combined. If collapsing is desired set the parameter 'sum_n_taxa_boo' to TRUE. Will also need to provide 'sum_n_taxa_col' and 'sum_n_taxa_group_by'. This feature was DEPRECATED in v1.0.2.9040 (2024-06-12). The parameters will remain and could be reinstituted in a future version.

Slightly different than 'qc_taxa' since no options in 'taxa_translate' for using one field over another and is more generic.

The parameter 'taxaid_drop' is used to drop records that matched to a new name that should not be included in the results. Examples include "999" or "DNI" (Do Not Include). Default is NULL so no action is taken. "NA"s are always removed.

Optional parameter 'trim_ws' is used to invoke the function 'trimws' to remove from the taxa matching field any leading and trailing white space. Default is FALSE (no action). All horizontal and vertical white space characters are removed. See ?trimws for additional information. Additionally, non-breaking spaces (nbsp) inside the text string will be replaced with a normal space. This cuts down on the number of permutations need to be added to the translation table.

Optional parameter 'match_caps' is used to convert user and official taxaid values to ALL CAPS before matching. Any non-ascii characters will cause this to fail. A message is output to the console for any taxaid values that contain non-ascii characters. In the event that 'match_caps' is set to TRUE and non-ascii characters are present the matching will be done without converting to upper case as this would cause the function to fail.

The taxa list and metadata file names will be added to the results as two new columns.

Another output is the unique taxa with old and new names.

Value

A list with four elements. The first (merge) is the user data frame with additional columns from the official data appended to it. Names from the user data that overlap with the official data have the suffix '_User'. The second element (nonmatch) of the list is a vector of the non-matching taxa from the user data. The third element (metadata) includes the metadata for the official data (if provided). The fourth element (unique) is a data frame of the unique taxa names old and new.

Examples

# Example 1, PacNW
## Input Parameters
df_user <- BioMonTools::data_benthos_PacNW
fn_official <- file.path(system.file("extdata", package = "BioMonTools"),
                         "taxa_official",
                         "ORWA_TAXATRANSLATOR_20221219b.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata",
                                              package = "BioMonTools"),
                                  "taxa_official",
                                  "ORWA_ATTRIBUTES_METADATA_20221117.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TaxaID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_MTTI"
taxaid_drop <- "DNI"
col_drop <- c("Taxon_v2", "OTU_BCG_MariNW") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SampleID", "TaxaID")
## Run Function

taxatrans <- taxa_translate(df_user,
                            df_official,
                            df_official_metadata,
                            taxaid_user,
                            taxaid_official_match,
                            taxaid_official_project,
                            taxaid_drop,
                            col_drop,
                            sum_n_taxa_boo,
                            sum_n_taxa_col,
                            sum_n_taxa_group_by)
## View Results
taxatrans$nonmatch


#~~~~~
# Example 2, Multiple Stages
# Create data
TAXAID <- c(rep("Agapetus", 3), rep("Zavrelimyia", 2))

N_TAXA <- c(rep(33, 3), rep(50, 2))
STAGE <- c("A", "L", "P", "X", "")
df_user <- data.frame(TAXAID, N_TAXA, STAGE)
df_user[, "INDEX_NAME"]  <- "BCG_MariNW_Bugs500ct"
df_user[, "INDEX_CLASS"] <- "HiGrad-HiElev"
df_user[, "SAMPLEID"]    <- "Test2023"
df_user[, "STATIONID"]   <- "Test"
df_user[, "DATE"]        <- "2023-01-16"
## Input Parameters
fn_official <- file.path(system.file("extdata", package = "BioMonTools"),
                         "taxa_official",
                         "ORWA_TAXATRANSLATOR_20221219b.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata",
                                              package = "BioMonTools"),
                                  "taxa_official",
                                  "ORWA_ATTRIBUTES_20221212.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TAXAID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_BCG_MariNW"
taxaid_drop <- NULL
col_drop <- c("Taxon_v2", "OTU_MTTI") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SAMPLEID", "TAXAID")
## Run Function
taxatrans <- taxa_translate(df_user,
                            df_official,
                            df_official_metadata,
                            taxaid_user,
                            taxaid_official_match,
                            taxaid_official_project,
                            taxaid_drop,
                            col_drop,
                            sum_n_taxa_boo,
                            sum_n_taxa_col,
                            sum_n_taxa_group_by)
## View Results (before and after)
df_user
taxatrans$merge