Help for package perumammals

Title:

Taxonomic Backbone and Name Validation Tools for Mammals of Peru

Version:

0.0.0.2

Maintainer:

Paul E. Santos Andrade <paulefrens@gmail.com>

Description:

Provides a curated taxonomic backbone of mammal species recorded in Peru, based on the checklist published by Pacheco and collaborators (2021) <doi:10.15381/rpb.v28i4.21019>. The package includes standardized species data, occurrence records by ecological regions, endemic status, and tools for validating and matching scientific names through exact and approximate string procedures. It is designed as a lightweight and reliable reference for ecological, environmental, biogeographical, and conservation workflows that require verified species information for Peruvian mammals.

License:

MIT + file LICENSE

URL:

https://github.com/PaulESantos/perumammals, https://paulesantos.github.io/perumammals/

BugReports:

https://github.com/PaulESantos/perumammals/issues

Depends:

R (≥ 4.1)

Encoding:

UTF-8

LazyData:

true

Config/testthat/edition:

RoxygenNote:

7.3.3

Imports:

assertthat, cli, dplyr, fuzzyjoin, progress, purrr, readr, stringr, tibble, memoise

Suggests:

ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0), tidyr, ggtext

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-02-24 02:55:38 UTC; PC

Author:

Paul E. Santos Andrade

[aut, cre], Fiorella N. Gonzales Guillen

[ctb]

Repository:

CRAN

Date/Publication:

2026-02-24 09:30:14 UTC

Attach Metadata to Results

Description

Attach Metadata to Results

Usage

.attach_metadata_peru(tbl, n_input, n_matched, n_fuzzy_genus, n_fuzzy_species)

Check for binomial names in species list

Description

Internal function to verify that species names are at the binomial level (genus + species) and identify any names at genus level or NA values. Peru mammals database only contains binomial names (including "sp." cases).

Usage

.check_binomial(splist_class, splist)

Arguments

splist_class

Classified species matrix from .splist_classify

splist

Original species list (character vector)

Value

Integer vector with positions of problematic names

Classification algorithm for a single name

Description

Internal algorithm to parse a single species name into its components. Handles regular binomials and special cases like "Genus sp. identifier" (e.g., "Akodon sp. Ancash").

Usage

.classify_algo(x_split_i)

Arguments

x_split_i

Character vector with split name parts

Value

Character vector with classified components (genus, species, author)

Classify Input Species Names

Description

Classify Input Species Names

Usage

.classify_inputs_peru(splist)

Combine Matched Nodes

Description

Combine Matched Nodes

Usage

.combine_matched_nodes_peru(pipe)

Combine Unmatched Nodes

Description

Combine Unmatched Nodes

Usage

.combine_unmatched_nodes_peru(pipe, invalid_df)

Compute Matched Rank for Peru Mammals

Description

Compute Matched Rank for Peru Mammals

Usage

.compute_matched_rank_peru(df)

Consolidate Ambiguous Match Attributes

Description

Consolidate Ambiguous Match Attributes

Usage

.consolidate_ambiguous_attrs_peru(output, pipe)

Detect trinomial names (3+ taxonomic elements)

Description

Detect trinomial names (3+ taxonomic elements)

Usage

.detect_trinomial(orig_names)

Create Empty Output Template

Description

Create Empty Output Template

Usage

.empty_output_peru(splist_class)

Final Validation of Results

Description

Final Validation of Results

Usage

.final_assertions_peru(splist_class, output)

Finalize Output Format

Description

Finalize Output Format

Usage

.finalize_output_peru(df)

Format Matched Names for Display

Description

Format Matched Names for Display

Usage

.format_matched_names_peru(df)

Get mammals species by genus from peru_mammals

Description

Internal function to filter species by genus from peru_mammals data frame. This function is memoised for performance.

Usage

.get_mammals_genus(genus_sub, target_df = NULL)

Arguments

genus_sub

Character vector of genus names (case-insensitive)

target_df

Data frame (peru_mammals) with genus and species columns

Value

Data frame filtered by genus

Initialize Matching Columns

Description

Initialize Matching Columns

Usage

.init_matching_columns_peru(df)

Invalidate trinomial matches in validation results

Description

Invalidate trinomial matches in validation results

Usage

.invalidate_trinomials(results)

Join Additional Database Information

Description

Join Additional Database Information

Usage

.join_database_info_peru(df, target_df)

Load Peru Mammals Database

Description

Load Peru Mammals Database

Usage

.load_target_peru(quiet)

Map with optional progress bar

Description

Internal wrapper for purrr::map_dfr with optional progress tracking. Progress bars are only shown in interactive sessions.

Usage

.map_dfr_progress(.x, .f, ..., .id = NULL, .progress = interactive())

Arguments

.x

A list or vector to iterate over

.f

A function to apply

...

Additional arguments passed to .f

.id

Column name for row identification

.progress

Logical. Show progress bar? Default is interactive()

Value

Data frame with combined results

Standardize species names for matching with Peru mammals database

Description

Internal function to standardize species names before matching against the peru_mammals database. Handles common formatting issues and removes hybrid indicators. Note: peru_mammals does not include infraspecific taxa.

Usage

.names_standardize(splist)

Arguments

splist

Character vector of species names to standardize

Value

Character vector of standardized names

Matching Pipeline - Hierarchical Strategy

Description

Implements hierarchical matching for peru_mammals: Node 1: Direct exact match (genus + species) Node 2: Genus exact match Node 3: Genus fuzzy match Node 4: Species fuzzy match within matched genus

Usage

.pipeline_nodes_peru(df, target_df, quiet)

Classify species names into taxonomic components

Description

Internal wrapper function to classify multiple species names into their taxonomic components (genus, species, author). Peru mammals database does not include infraspecific taxa, but this function handles "sp." notations for undescribed species (e.g., "Akodon sp. Ancash").

Automatic normalization: Empty strings ("", " ", etc.) are automatically converted to NA before processing, as they represent missing values and cannot match any names in the database.

Usage

.splist_classify(x)

Arguments

x

Character vector of species names

Value

Matrix with classified name components

Split Valid and Invalid Names

Description

Split Valid and Invalid Names

Usage

.split_valid_invalid_peru(splist_class)

Convert to sentence case (first letter uppercase, rest lowercase)

Description

Internal utility to convert text to sentence case for matching with peru_mammals database format.

Usage

.str_to_simple_cap(text)

Arguments

text

Character vector

Value

Character vector in sentence case

Transform and structure classified names

Description

Internal function to transform the classification matrix into a structured data frame. Simplified for peru_mammals which only has binomial names (and some "sp." cases) without infraspecific categories.

Important: This function distinguishes between:

Original NAs from the input (expected missing values)
Malformed names that failed rank assignment (problematic inputs)

Only the latter trigger warnings to avoid false positives.

Usage

.transform_split_classify(df)

Arguments

df

Data frame or matrix from .splist_classify

Value

Data frame with transformed names and rank

Validate Input Parameters

Description

Validate Input Parameters

Usage

.validate_inputs_peru(splist, quiet)

Validate Target Database Schema

Description

Validate Target Database Schema

Usage

.validate_target_schema_peru(target_df)

Check if taxonomic backbone needs updating

Description

Checks whether a newer version of the Pacheco et al. mammal checklist might be available based on the publication year.

Usage

check_backbone_update(backbone_year)

Arguments

backbone_year

Numeric or character year of the current backbone.

Value

A list with components:

update_available – logical indicating if update may be available.
message – character string with information message.

Direct Match Species Names Against Peru Mammals Database

Description

Performs direct matching of species names against the peru_mammals database. Matches binomial names (genus + species) and handles special "sp." cases (e.g., "Akodon sp. Ancash"). Peru mammals database does not include infraspecific taxa.

Usage

direct_match(df, target_df = NULL)

Arguments

df

A data frame or tibble containing the species data to be matched. Must include columns: Orig.Genus, Orig.Species, Rank

target_df

A data frame representing the peru_mammals database. Must include columns: genus, species

Details

This function only matches Rank 2 (binomial) names since peru_mammals does not include infraspecific taxa. It handles:

Regular binomials: "Panthera onca"
Special "sp." cases: "Akodon sp. Ancash", "Oligoryzomys sp. B"

Names at Rank 1 (genus only) are not matched by this function; use genus_match() instead.

Value

A tibble with an additional logical column direct_match indicating whether the name was successfully matched (TRUE) or not (FALSE), plus columns Matched.Genus and Matched.Species for matched records.

Quick check: Is species found in Peru?

Description

Simplified boolean check for species presence in Peru mammals database. Useful for filtering and logical operations.

Usage

found_in_peru(splist, exact_only = FALSE)

Arguments

splist

Character vector of species names

exact_only

Logical. If TRUE, only exact matches return TRUE (default: FALSE)

Value

Logical vector (TRUE = found, FALSE = not found)

Examples


species <- c("Panthera onca", "Tremarctos orrnatus",
             "Tremarctos orrnatos", "Felis catus")

# Check presence (includes fuzzy matches)
found_in_peru(species)

tibble::tibble(splist = species) |>
 dplyr::mutate(endemic = found_in_peru(splist))

Fuzzy Match Genus Name Against Peru Mammals Database

Description

Performs fuzzy matching of genus names against the peru_mammals database using string distance (Levenshtein) to account for slight spelling variations. Maximum distance is set to 1 character difference.

This implementation uses a two-step approach to avoid warnings when no matches are found:

Perform stringdist_left_join to get all candidates
Split into valid (finite distance) and invalid (NA distance)
Process only valid matches to find best candidates

Usage

fuzzy_match_genus(df, target_df = NULL)

Arguments

df

A data frame containing the genus names to be matched. Must include column: Orig.Genus

target_df

A data frame representing peru_mammals database. Must include column: genus

Details

If multiple genera match with the same string distance (ambiguous matches), a warning is issued and the first match is automatically selected. To examine ambiguous matches, use get_ambiguous_matches(result, type = "genus").

Ambiguous match information is stored as an attribute and includes:

Original genus
All matched genera with tied distances
Family information from peru_mammals
Number of species per genus

Value

A tibble with two additional columns:

fuzzy_match_genus: Logical indicating if genus was matched
fuzzy_genus_dist: Numeric distance for each match (lower = better)
Matched.Genus: The matched genus name

Fuzzy Match Species within Genus in Peru Mammals Database

Description

Performs fuzzy matching of species names within a matched genus using string distance to account for spelling variations. Peru mammals database does not include infraspecific taxa.

Usage

fuzzy_match_species_within_genus(df, target_df = NULL)

Arguments

df

A data frame containing species data to be matched. Must include columns: Orig.Species, Matched.Genus

target_df

A data frame representing peru_mammals database. Must include columns: genus, species

Details

This function processes each matched genus separately for efficiency. If multiple species match with the same distance, a warning is issued and the first match is selected. Use get_ambiguous_matches(result, type = "species") to examine ambiguous cases.

Special handling for "sp." cases:

"Akodon sp. Ancash" is treated as a complete specific epithet
Fuzzy matching will work on the entire "SP. ANCASH" string

Value

A tibble with additional columns:

fuzzy_match_species_within_genus: Logical indicating match success
fuzzy_species_dist: Numeric distance for each match
Matched.Species: The matched species name

Helper: Fuzzy Match Species within Genus

Description

Helper function that performs fuzzy matching for a single genus.

This implementation uses a two-step approach to avoid issues with empty groups when filtering NAs:

Perform stringdist_left_join to get all candidates
Split into matched (finite distance) and unmatched (NA distance)
Process matched candidates to find best matches
Recombine for final output

Usage

fuzzy_match_species_within_genus_helper(df, target_df)

Arguments

df

Data frame for a single matched genus

target_df

Peru mammals database

Value

Data frame with fuzzy match results

Match Genus Names Against Peru Mammals Database

Description

Performs direct matching of genus names against the unique genera listed in the peru_mammals database. Useful for Rank 1 (genus-only) names.

Usage

genus_match(df, target_df = NULL)

Arguments

df

A data frame or tibble containing the genus names to be matched. Must include column: Orig.Genus

target_df

A data frame representing the peru_mammals database. Must include column: genus

Details

This function is typically used for names submitted at the genus level (Rank 1). When a genus is matched, all species of that genus in peru_mammals can be retrieved for further processing (e.g., suggesting possible species to the user).

Value

A tibble with an additional logical column genus_match indicating whether the genus was successfully matched (TRUE) or not (FALSE), plus column Matched.Genus for matched records.

Retrieve Ambiguous Match Information for Peru Mammals

Description

Extracts information about ambiguous matches (multiple candidates with tied distances) from matching results. Useful for quality control and manual curation. Adapted for peru_mammals (genus and species only).

Usage

get_ambiguous_matches(
  match_result,
  type = c("genus", "species", "all"),
  save_to_file = FALSE,
  output_dir = tempdir()
)

Arguments

match_result

A tibble returned by matching functions.

type

Character. Type of ambiguous matches to retrieve:

"genus" (default): Ambiguous genus-level matches
"species": Ambiguous species-level matches
"all": Both types

save_to_file

Logical. If TRUE, saves results to CSV. Default is FALSE (CRAN compliant).

output_dir

Character. Directory to save file if save_to_file = TRUE. Defaults to tempdir().

Details

During fuzzy matching, multiple candidates may have identical string distances. The matching algorithm automatically selects the first candidate, but this function allows you to review all alternatives for quality control.

Value

A tibble with ambiguous match details, or NULL if none exist. Includes original names, matched names, distances, and database metadata.

Get taxonomic and common name information for Peru mammals

Description

Returns taxonomic classification and common names for species validated against the Peru mammals database.

Usage

get_common_names_peru(splist, return_details = FALSE)

Arguments

splist

Character vector of species names

return_details

Logical. If TRUE, includes full taxonomic information (default: FALSE)

Value

If return_details = FALSE: Character vector with common names If return_details = TRUE: Tibble with taxonomic and common name information

Examples


species <- c("Panthera onca", "Tremarctos ornatus",
             "Puma concolor", "Myotis bakeri")

# Get common names
# Vector
get_common_names_peru(species)
# tibble
tibble::tibble(splist = species) |>
 dplyr::mutate(endemic = get_common_names_peru(splist))

# Get full taxonomic information
taxonomy <- get_common_names_peru(species, return_details = TRUE)
taxonomy

Get All Species for Matched Genera from Peru Mammals

Description

Helper function to retrieve all species belonging to matched genera from the peru_mammals database. Useful for suggesting possible species when only genus is provided.

Usage

get_species_for_genera(matched_genera, target_df = NULL)

Arguments

matched_genera

Character vector of matched genus names (uppercase)

target_df

A data frame representing the peru_mammals database

Value

A data frame with genus and species columns for all species in the matched genera.

Check if species are endemic to Peru

Description

Simplified wrapper specifically for checking endemism status of mammals in Peru. Only evaluates species that are confirmed to occur in Peru.

Usage

is_endemic_peru(splist, return_logical = FALSE, filter_exact = FALSE)

Arguments

splist

Character vector of species names

return_logical

Logical. If TRUE, returns logical vector (TRUE/FALSE/NA). If FALSE, returns descriptive character vector (default: FALSE)

filter_exact

Logical. If TRUE, only considers exact matches (default: FALSE)

Value

If return_logical = FALSE: Character vector with endemism status If return_logical = TRUE: Logical vector (TRUE = endemic, FALSE = not endemic, NA = not found or endemism unknown)

Examples


species <- c("Panthera onca",
             "Atelocynus microtis",
             "Felis catus",
             "Myotis bakeri")

is_endemic_peru(species)
# Descriptive output
tibble::tibble(splist = species) |>
  dplyr::mutate(endemic = is_endemic_peru(splist))

Check if species are Peru mammals

Description

Main wrapper function that validates species names against the Peru mammals database with various output options for match quality, endemism status, and detailed information.

Usage

is_peru_mammal(
  splist,
  return_details = FALSE,
  match_type = "status",
  filter_exact = FALSE
)

Arguments

splist

Character vector of species names to check

return_details

Logical. If TRUE, returns full validation tibble. If FALSE, returns simplified status vector (default: FALSE)

match_type

Character. Type of information to return when return_details = FALSE:

"status": Returns "Found" or "Not found" (default)
"match_quality": Returns match quality ("Exact", "Fuzzy", or "Not found")
"endemic": Returns endemism status ("Endemic", "Not endemic", or "Not found")

filter_exact

Logical. If TRUE, only returns exact matches (genus_dist = 0 AND species_dist = 0). Fuzzy matches are treated as "Not found" (default: FALSE)

Details

This function wraps validate_peru_mammals() to provide flexible output formats for different use cases:

Basic presence/absence checking
Match quality assessment (exact vs fuzzy)
Endemism status queries

The function handles taxonomic matching with fuzzy string matching to accommodate minor spelling variations while maintaining data quality.

When filter_exact = TRUE, only matches with zero edit distance in both genus and species names are considered valid matches. All fields related to fuzzy matches are set to NA or "—" to maintain consistency.

Value

If return_details = FALSE: Character vector with requested information. If return_details = TRUE: Tibble with complete validation information.

Examples


species <- c(
  "Panthera onca",       # Exact match
  "Pantera onca",        # Fuzzy match (genus misspelled)
  "Tremarctos orrnatus", # Fuzzy match (species misspelled)
  "Felis domesticus",     # Not in Peru
  "Myotis bakeri"
)

# Check if species are found (includes fuzzy matches)
is_peru_mammal(species)

# Check with exact matches only
is_peru_mammal(species, filter_exact = TRUE)

# Check match quality
is_peru_mammal(species, match_type = "match_quality")

# Check endemism
is_peru_mammal(species, match_type = "endemic")

# Get detailed information
is_peru_mammal(species, return_details = TRUE)

# Get detailed information with exact matches only
is_peru_mammal(species, return_details = TRUE, filter_exact = TRUE)

Get match quality for Peru mammal names

Description

Returns the quality of taxonomic name matching (exact vs fuzzy) for species validated against the Peru mammals database.

Usage

match_quality_peru(splist, return_details = FALSE)

Arguments

splist

Character vector of species names

return_details

Logical. If TRUE, includes distance metrics and matching information (default: FALSE)

Details

Match quality categories:

"Exact": Perfect match with no spelling differences (genus_dist = 0, species_dist = 0)
"Fuzzy": Match found with minor spelling variations (genus_dist > 0 or species_dist > 0)
"Not found": No match in database

The function uses string distance metrics to quantify matching quality:

genus_dist: Edit distance for genus name
species_dist: Edit distance for species epithet

Value

If return_details = FALSE: Character vector with match quality If return_details = TRUE: Tibble with detailed matching information

Examples


species <- c(
  "Panthera onca",      # Exact
  "Tremarctos orrnatus", # Fuzzy (spelling error)
  "Felis domesticus",   # Not found
  "Myotis bakeri"
)

# Simple quality check
match_quality_peru(species)

# Detailed information with edit distances
details <- match_quality_peru(species, return_details = TRUE)
details

Mammal species of Peru based on Pacheco et al. (2021)

Description

A backbone of the terrestrial and marine mammal species known for Peru, compiled from Pacheco et al. (2021) "Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización".

Usage

data("peru_mammals")

Format

A tibble with 573 rows and 12 variables:

pm_id: Character. Internal stable identifier for the species, combining the original numeric id and an abbreviation of the genus. Intended for internal linking between tables.
order: Character. Taxonomic order (e.g. Didelphimorphia, Rodentia, Chiroptera).
family: Character. Taxonomic family.
genus: Character. Genus name.
species: Character. Specific epithet.
scientific_name: Character. Binomial scientific name (Genus species), without authorship. This is the main field used for name validation.
scientific_name_full: Character. Full scientific name including authorship and year, as provided in the original annex.
author: Character. Authorship and year of the species name.
common_name: Character. Common name in Spanish, when available.
endemic: Logical. TRUE if the species is considered endemic to Peru in Pacheco et al. (2021), FALSE otherwise.
ecoregions: Character. Comma-separated codes of Peruvian ecoregions where the species occurs, using the abbreviations defined by Pacheco et al. (2021) (e.g. "YUN, SB, SP"). See peru_mammals_ecoregions_meta for code definitions.
reference: Character. Bibliographic notes or specific references supporting the presence or taxonomy of the species.

Details

Each row corresponds to a single species as listed in the original annex of the paper. This dataset is the main taxonomic backbone used by the perumammals package.

Source

Pacheco, V., Cadenillas, R., Zeballos, H., Hurtado, C. M., Ruelas, D., & Pari, A. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización.

Summary information on the perumammals taxonomic backbone

Description

A one-row tibble with metadata about the taxonomic backbone used in perumammals, including its bibliographic source, year, number of species and the date when the internal data objects were created.

Usage

data("peru_mammals_backbone")

Format

A tibble with 1 row and 4 variables:

source: Character. Short bibliographic reference to the backbone source (Pacheco et al. 2021).
source_year: Integer. Publication year of the backbone source (2021).
n_species: Integer. Number of species included in the backbone (as rows in peru_mammals).
created_at: Date. Date when the backbone data objects were generated (in the package build process).

Details

This object is intended for internal bookkeeping and for functions that report the origin and version of the backbone.

Mammal species by Peruvian ecoregion

Description

A long-format table linking each mammal species to the Peruvian ecoregions where it occurs, based on Pacheco et al. (2021).

Usage

data("peru_mammals_ecoregions")

Format

A tibble with one row per species–ecoregion combination and 3 variables:

pm_id: Character. Internal species identifier, matching peru_mammals.
scientific_name: Character. Binomial scientific name (Genus species).
ecoregion_code: Character. Abbreviation of the ecoregion where the species occurs (e.g. "YUN", "SB", "COS"). See peru_mammals_ecoregions_meta for code definitions.

Details

Each row corresponds to a single combination of species and ecoregion. This dataset is derived from the ecoregions field of peru_mammals.

Source

Pacheco et al. (2021).

Metadata for Peruvian mammal ecoregions

Description

Definitions of the ecoregion codes used in peru_mammals and peru_mammals_ecoregions. The codes follow the abbreviations used by Pacheco et al. (2021), based on Peruvian ecoregion schemes.

Usage

data("peru_mammals_ecoregions_meta")

Format

A tibble with one row per ecoregion code and 2 variables:

ecoregion_code

Character. Ecoregion abbreviation. The codes used in the dataset are:

"OCE" – Oceánica
"BPP" – Bosque Pluvial del Pacífico
"BSE" – Bosque Seco Ecuatorial
"COS" – Costa
"VOC" – Vertiente Occidental
"PAR" – Páramo
"PUN" – Puna
"YUN" – Yungas
"SB" – Selva Baja
"SP" – Sabana de Palmera

ecoregion_label

Character. Human-readable label/description of the ecoregion in Spanish.

Source

Pacheco et al. (2021).

Display taxonomic backbone metadata for Peruvian mammals

Description

Displays summary information about the taxonomic backbone used in perumammals. The backbone is based on the taxonomic checklist published by Pacheco et al. (2021), which was digitised from the original PDF publication into a structured tibble format.

Usage

pm_backbone_info()

Value

Invisibly returns a tibble with one row containing the backbone metadata. The same structure as peru_mammals_backbone. Called primarily for its side effect of printing the summary information.

References

Pacheco Torres, V. R., Diaz, S., Graham Angeles, L. A., Flores-Quispe, M., Calizaya-Mamani, G., Ruelas, D., & Sánchez-Vendizú, P. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización. Revista Peruana De Biología, 28(4), e21019. doi:10.15381/rpb.v28i4.21019

Examples

# Display backbone information
pm_backbone_info()

# Access the data invisibly returned
backbone_data <- pm_backbone_info()
backbone_data$n_species

List species by ecoregion

Description

Convenience wrapper to list species occurring in one or more Peruvian ecoregions. This function uses pm_species() internally and therefore supports the same taxonomic and endemism filters.

Usage

pm_by_ecoregion(
  ecoregion,
  order = NULL,
  family = NULL,
  genus = NULL,
  endemic = NULL
)

Arguments

ecoregion

Character vector with one or more ecoregion codes (e.g. "YUN", "SB", "COS"). At least one code must be provided. Invalid codes will generate a warning.

order

Optional character vector with one or more taxonomic orders to keep. If NULL (default), no filter is applied by order.

family

Optional character vector with one or more families to keep. If NULL (default), no filter is applied by family.

genus

Optional character vector with one or more genera to keep. If NULL (default), no filter is applied by genus.

endemic

Optional logical. If TRUE, only endemic species are returned; if FALSE, only non-endemic species are returned; if NULL (default), no filter is applied by endemism.

Value

A tibble with a subset of rows from peru_mammals corresponding to species present in at least one of the requested ecoregions. Returns an empty tibble if no species match the criteria.

Examples

# All species in Yungas
pm_by_ecoregion("YUN")

# Endemic species in Selva Baja (SB)
pm_by_ecoregion("SB", endemic = TRUE)

# Rodents in Costa and Vertiente Occidental
pm_by_ecoregion(c("COS", "VOC"), order = "Rodentia")

# Bats in multiple ecoregions
pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera")
pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera",
endemic =  TRUE)

Summary of species richness by ecoregion

Description

Computes a summary of species richness and endemism for each ecoregion in the Peruvian mammal backbone.

Usage

pm_ecoregion_summary(sort_by = c("code", "species", "endemic", "label"))

Arguments

sort_by

Character string indicating how to sort the results. Options are:

"code" (default) – sort alphabetically by ecoregion code.
"species" – sort by number of species (descending).
"endemic" – sort by number of endemic species (descending).
"label" – sort alphabetically by ecoregion label.

Details

The summary is based on the long-format table peru_mammals_ecoregions and joins metadata from peru_mammals_ecoregions_meta and endemism information from peru_mammals.

Value

A tibble with one row per ecoregion and the following columns:

ecoregion_code – ecoregion abbreviation.
ecoregion_label – ecoregion description in Spanish.
n_species – total number of species recorded in the ecoregion.
n_endemic – number of endemic species recorded in the ecoregion.
pct_endemic – percentage of endemic species in the ecoregion.

Examples

# Get summary for all ecoregions (sorted by code)
pm_ecoregion_summary()

# Sort by species richness
pm_ecoregion_summary(sort_by = "species")

# Sort by number of endemic species
pm_ecoregion_summary(sort_by = "endemic")

# Find ecoregion with highest species richness
eco_summary <- pm_ecoregion_summary(sort_by = "species")
eco_summary[1, ]

# Ecoregions with more than 100 species
eco_summary <- pm_ecoregion_summary()
subset(eco_summary, n_species > 100)

# Compare richness between lowland and highland ecoregions
eco_summary <- pm_ecoregion_summary(sort_by = "species")
lowland <- eco_summary[eco_summary$ecoregion_code %in% c("SB", "SP"), ]
highland <- eco_summary[eco_summary$ecoregion_code %in% c("PUN", "PAR"), ]

List endemic mammal species of Peru

Description

Returns endemic species from the Peruvian mammal backbone, with optional filters by order, family and/or ecoregion.

Usage

pm_endemics(order = NULL, family = NULL, genus = NULL, ecoregion = NULL)

Arguments

order

Optional character vector with one or more taxonomic orders to keep. If NULL (default), no filter is applied by order.

family

Optional character vector with one or more families to keep. If NULL (default), no filter is applied by family.

genus

Optional character vector with one or more genera to keep. If NULL (default), no filter is applied by genus.

ecoregion

Optional character vector with one or more ecoregion codes (e.g. "YUN", "SB", "COS"). If supplied, only species occurring in at least one of the given ecoregions are returned.

Details

This is a convenience wrapper around pm_species() with endemic = TRUE.

Value

A tibble with endemic species (subset of peru_mammals).

Examples


# All endemic species
pm_endemics()

# Endemic rodents
pm_endemics(order = "Rodentia")

# Endemic species in Yungas (YUN)
pm_endemics(ecoregion = "YUN")

Display ecoregion metadata for Peruvian mammals

Description

Displays summary information about the ecoregions used in the Peruvian mammal backbone. Ecoregions follow the Brack-Egg (1986) classification system used in Peruvian biogeography to describe the distribution of mammal species across different ecological regions.

Usage

pm_list_ecoregions(include_endemic = FALSE)

Arguments

include_endemic

Logical. If TRUE, includes columns showing the number and percentage of endemic species per ecoregion. Default is FALSE.

Details

The ecoregion classification follows Brack-Egg (1986), a widely-used biogeographic framework for Peru that recognizes 10 distinct ecological regions based on climate, vegetation, and elevation. This classification is used in Pacheco et al. (2021) to document the distribution patterns of Peruvian mammals.

The function prints a formatted summary to the console and invisibly returns the complete data for further analysis.

Value

A tibble with one row per ecoregion, arranged in descending order by species richness, with the following columns:

ecoregion_code: Abbreviated ecoregion code (e.g., "SB", "YUN")
ecoregion_label: Full ecoregion name in Spanish
n_species: Total number of mammal species recorded in the ecoregion
pct_species: Percentage of Peru's total mammal diversity (0-100)
n_endemic: (Only if include_endemic = TRUE) Number of endemic species in the ecoregion
pct_endemic: (Only if include_endemic = TRUE) Percentage of endemic species relative to total species in the ecoregion (0-100)

References

Brack-Egg, A. (1986). Ecología de un país complejo. In J. Mejía Baca (Ed.), Gran Geografía del Perú: Naturaleza y Hombre (Vol. 2, pp. 175-319). Barcelona: Manfer-Mejía Baca.

Examples

# Display ecoregion information
pm_list_ecoregions()

# Include endemic species information
 pm_list_ecoregions(include_endemic = TRUE)

# Access the data for further analysis
ecoregion_data <- pm_list_ecoregions()

# Ecoregions with highest species richness
ecoregion_data

List endemic mammal species by taxonomic order

Description

Summarises the diversity of endemic mammal species in Peru, grouped by taxonomic order. Provides counts of families, genera, and species that are endemic to Peru within each order. Optionally includes endemism rates relative to total species richness.

Usage

pm_list_endemic(include_rate = FALSE)

Arguments

include_rate

Logical. If TRUE, includes additional columns showing total species richness and endemism rate for each order. Default is FALSE.

Details

This function focuses exclusively on species that are endemic to Peru (i.e., species found nowhere else in the world). Orders without any endemic species are not included in the output.

When include_rate = FALSE (default), results are sorted by the number of endemic species in descending order, highlighting which orders have the highest endemic diversity.

When include_rate = TRUE, results are sorted by total species richness in descending order, and include endemism rates to show what proportion of each order's diversity is endemic to Peru. A summary row labeled "Total" is appended to show overall statistics.

Value

A tibble with one row per order containing endemic species, arranged in descending order by number of endemic species, with the following columns:

order: Taxonomic order
n_families: Number of families with endemic species in the order
n_genera: Number of genera with endemic species in the order
n_endemic: Number of endemic species in the order
n_species: (Only if include_rate = TRUE) Total number of species in the order
endemic_rate: (Only if include_rate = TRUE) Proportion of endemic species (0-1)
endemic_pct: (Only if include_rate = TRUE) Percentage of endemic species (0-100)

Examples

# Summary of endemic species by order
pm_list_endemic()

# Include endemism rates
pm_list_endemic(include_rate = TRUE)

List taxonomic families in the Peruvian mammal backbone

Description

Summarises the number of genera, species and endemic species per family. Optionally filters the output to one or more taxonomic orders.

Usage

pm_list_families(order = NULL)

Arguments

order

Optional character vector specifying one or more taxonomic orders to include. If NULL (default), all orders are included. Order names are case-sensitive (e.g., "Rodentia", "Chiroptera").

Value

A tibble with one row per family, arranged by order and family name, with the following columns:

order: Taxonomic order
family: Family name
n_genera: Number of genera in the family
n_species: Number of species in the family
n_endemic: Number of endemic species to Peru in the family

Examples

# All families
pm_list_families()

# Only families within Rodentia
pm_list_families(order = "Rodentia")

# Multiple orders
pm_list_families(order = c("Rodentia", "Chiroptera"))

List genera in the Peruvian mammal backbone

Description

Summarises the number of species and endemic species per genus. Optionally restricts the output to one or more orders and/or families. Genera with missing values are excluded from the results.

Usage

pm_list_genera(order = NULL, family = NULL)

Arguments

order

Optional character vector with one or more taxonomic orders to keep. If NULL (default), no filter is applied by order. Invalid order names will generate a warning.

family

Optional character vector with one or more families to keep. If NULL (default), no filter is applied by family. Invalid family names will generate a warning.

Details

The function validates input parameters and warns if invalid order or family names are provided. It also warns if the filters result in an empty dataset.

Value

A tibble with one row per genus and the following columns:

order – taxonomic order.
family – family name.
genus – genus name.
n_species – number of species in the genus.
n_endemic – number of endemic species in the genus.

Returns an empty tibble with the same structure if no records match the specified filters.

Examples

# All genera
pm_list_genera()

# Genera within Chiroptera (bats)
pm_list_genera(order = "Chiroptera")

# Multiple orders
pm_list_genera(order = c("Didelphimorphia", "Chiroptera"))

# Genera within a specific family
bat_genera <- pm_list_genera(family = "Phyllostomidae")

# Count total endemic species in a family
sum(bat_genera$n_endemic)

# Combination of filters
pm_list_genera(order = "Chiroptera", family = "Phyllostomidae")

List taxonomic orders in the Peruvian mammal backbone

Description

Summarises the number of families, genera, species and endemic species per order in peru_mammals.

Usage

pm_list_orders()

Value

A tibble with one row per order and the following columns:

order – taxonomic order.
n_families – number of families in the order.
n_genera – number of genera in the order.
n_species – number of species in the order.
n_endemic – number of endemic species in the order.

Examples


pm_list_orders()

Filter mammal species from the Peruvian backbone

Description

Convenience wrapper around peru_mammals to subset species by taxonomic group, endemism and/or ecoregion.

Usage

pm_species(
  order = NULL,
  family = NULL,
  genus = NULL,
  endemic = NULL,
  ecoregion = NULL
)

Arguments

order

Optional character vector with one or more taxonomic orders to keep. If NULL (default), no filter is applied by order.

family

Optional character vector with one or more families to keep. If NULL (default), no filter is applied by family.

genus

Optional character vector with one or more genera to keep. If NULL (default), no filter is applied by genus.

endemic

Optional logical. If TRUE, only endemic species are returned; if FALSE, only non-endemic species are returned; if NULL (default), no filter is applied by endemism.

ecoregion

Optional character vector with one or more ecoregion codes (e.g. "YUN", "SB", "COS"). If supplied, only species occurring in at least one of the given ecoregions are returned.

Value

A tibble with a subset of rows from peru_mammals.

Examples

# All species
pm_species()

# Only Rodentia
pm_species(order = "Rodentia")

# Endemic bats (Chiroptera)
pm_species(order = "Chiroptera", endemic = TRUE)

# Species present in Yungas (YUN) and Selva Baja (SB)
pm_species(ecoregion = c("YUN", "SB"))

Match Species Names Against Peru Mammals Database

Description

Matches given species names against the official list of mammal species of Peru (Pacheco et al. 2021). Uses a hierarchical matching strategy that includes direct matching, genus-level matching, and fuzzy matching to maximize successful matches while maintaining accuracy.

Peru Mammals Database:

575 mammal species
Binomial nomenclature only (no infraspecific taxa)
Includes 6 undescribed species ("sp." cases)
Fields: genus, species, scientific_name, common_name, family, order, endemic

Usage

validate_peru_mammals(splist, quiet = TRUE)

Arguments

splist

A character vector containing the species names to be matched. Names can be in any format (uppercase, lowercase, with underscores, etc.). Duplicate names are preserved in the output.

quiet

Logical, default TRUE. If FALSE, prints informative messages about the matching progress.

Details

Matching Strategy: The function implements a hierarchical matching pipeline:

Node 1 - Direct Match: Exact matching of binomial names (genus + species)
Node 2 - Genus Match: Exact matching at genus level
Node 3 - Fuzzy Genus: Fuzzy matching for genus with typos (max distance = 1)
Node 4 - Fuzzy Species: Fuzzy matching for species within matched genus

Special Cases:

Handles "sp." cases: "Akodon sp. Ancash", "Oligoryzomys sp. B", etc.
Case-insensitive matching
Removes common qualifiers (CF., AFF.)
Standardizes spacing and formatting

Rank System:

Rank 1: Genus level only (e.g., "Panthera")
Rank 2: Binomial (genus + species, e.g., "Panthera onca")

Ambiguous Matches: When multiple candidates have identical fuzzy match scores, a warning is issued and the first match is selected. Use get_ambiguous_matches() to examine these cases.

Input Requirements:

Species names must be provided as binomials (Genus species) WITHOUT:

Author information: Panthera onca Linnaeus"
Infraspecific taxa: "Panthera onca onca"
Parenthetical authors: "Panthera onca (Linnaeus, 1758)"

Valid formats:

Standard binomial: "Panthera onca"
Undescribed species: "Akodon sp. Ancash"
Case-insensitive: "PANTHERA ONCA" or "panthera onca"

Names with 3+ elements will be automatically rejected with a warning.

Value

A tibble with the following columns:

sorter: Integer. Original position in input vector
Orig.Name: Character. Original input name (standardized)
Matched.Name: Character. Matched name from database or "—"
Match.Level: Character. Quality of match ("Exact rank", "No match", etc.)
matched: Logical. Whether a match was found
Rank: Integer. Input taxonomic rank (1 or 2)
Matched.Rank: Integer. Matched taxonomic rank (1 or 2)
Comp.Rank: Logical. Whether ranks match exactly
valid_rank: Logical. Whether match is valid at correct rank
Orig.Genus: Character. Input genus (uppercase)
Orig.Species: Character. Input species (uppercase)
Author: Character. Taxonomic authority if provided
Matched.Genus: Character. Matched genus (uppercase)
Matched.Species: Character. Matched species (uppercase)
genus_dist: Integer. Edit distance for genus (0=exact, >0=fuzzy, NA=no match)
species_dist: Integer. Edit distance for species (0=exact, >0=fuzzy, NA=no match or genus-only)
scientific_name: Character. Scientific name from peru_mammals
common_name: Character. Common name in Spanish
family: Character. Family
order: Character. Order
endemic: Logical. Endemic to Peru?

Attributes: The output includes metadata accessible via attr():

target_database: "peru_mammals"
matching_date: Date of matching
n_input: Number of input names
n_matched: Number of successful matches
match_rate: Percentage of successful matches
n_fuzzy_genus: Number of fuzzy genus matches
n_fuzzy_species: Number of fuzzy species matches
ambiguous_genera: Ambiguous genus matches (if any)
ambiguous_species: Ambiguous species matches (if any)

Examples


# Basic usage
species_list <- c("Panthera onca", "Tremarctos ornatus", "Puma concolor")
results <- validate_peru_mammals(species_list)

# Check results
table(results$matched)
table(results$Match.Level)

# View matched species
results |>
  dplyr::filter(matched) |>
  dplyr::select(Orig.Name, Matched.Name, common_name, endemic)

# With typos (fuzzy matching)
typos <- c("Pumma concolor", "Tremarctos ornatu")  # Spelling errors
results_fuzzy <- validate_peru_mammals(typos, quiet = FALSE)

# Check for ambiguous matches
get_ambiguous_matches(results_fuzzy, type = "genus")

# Access metadata
attr(results, "match_rate")
attr(results, "n_fuzzy_genus")

# With special "sp." cases
sp_cases <- c("Akodon sp. Ancash", "Oligoryzomys sp. B")
results_sp <- validate_peru_mammals(sp_cases)
# Should match exactly