| Title: | Taxonomic Backbone and Name Validation Tools for Mammals of Peru |
| Version: | 0.0.0.1 |
| Maintainer: | Paul E. Santos Andrade <paulefrens@gmail.com> |
| Description: | Provides a curated taxonomic backbone of mammal species from Peru based on Pacheco et al. (2021) "Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización" <doi:10.15381/rpb.v28i4.21019>. The package includes standardized species data, occurrence by ecoregions, endemism status, and tools for validating and matching scientific names through exact and fuzzy procedures. It is designed as a lightweight and dependable reference for ecological, environmental, biogeographic, and conservation workflows that require reliable species information for Peruvian mammals. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/PaulESantos/perumammals, https://paulesantos.github.io/perumammals/ |
| BugReports: | https://github.com/PaulESantos/perumammals/issues |
| Depends: | R (≥ 4.1) |
| Encoding: | UTF-8 |
| LazyData: | true |
| Config/testthat/edition: | 3 |
| RoxygenNote: | 7.3.3 |
| Imports: | assertthat, cli, dplyr, fuzzyjoin, progress, purrr, readr, stringr, tibble, memoise |
| Suggests: | ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0), tidyr, ggtext |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-12-20 03:28:39 UTC; PC |
| Author: | Paul E. Santos Andrade
|
| Repository: | CRAN |
| Date/Publication: | 2026-01-06 11:30:02 UTC |
Attach Metadata to Results
Description
Attach Metadata to Results
Usage
.attach_metadata_peru(tbl, n_input, n_matched, n_fuzzy_genus, n_fuzzy_species)
Check for binomial names in species list
Description
Internal function to verify that species names are at the binomial level (genus + species) and identify any names at genus level or NA values. Peru mammals database only contains binomial names (including "sp." cases).
Usage
.check_binomial(splist_class, splist)
Arguments
splist_class |
Classified species matrix from .splist_classify |
splist |
Original species list (character vector) |
Value
Integer vector with positions of problematic names
Classification algorithm for a single name
Description
Internal algorithm to parse a single species name into its components. Handles regular binomials and special cases like "Genus sp. identifier" (e.g., "Akodon sp. Ancash").
Usage
.classify_algo(x_split_i)
Arguments
x_split_i |
Character vector with split name parts |
Value
Character vector with classified components (genus, species, author)
Classify Input Species Names
Description
Classify Input Species Names
Usage
.classify_inputs_peru(splist)
Combine Matched Nodes
Description
Combine Matched Nodes
Usage
.combine_matched_nodes_peru(pipe)
Combine Unmatched Nodes
Description
Combine Unmatched Nodes
Usage
.combine_unmatched_nodes_peru(pipe, invalid_df)
Compute Matched Rank for Peru Mammals
Description
Compute Matched Rank for Peru Mammals
Usage
.compute_matched_rank_peru(df)
Consolidate Ambiguous Match Attributes
Description
Consolidate Ambiguous Match Attributes
Usage
.consolidate_ambiguous_attrs_peru(output, pipe)
Detect trinomial names (3+ taxonomic elements)
Description
Detect trinomial names (3+ taxonomic elements)
Usage
.detect_trinomial(orig_names)
Create Empty Output Template
Description
Create Empty Output Template
Usage
.empty_output_peru(splist_class)
Final Validation of Results
Description
Final Validation of Results
Usage
.final_assertions_peru(splist_class, output)
Finalize Output Format
Description
Finalize Output Format
Usage
.finalize_output_peru(df)
Format Matched Names for Display
Description
Format Matched Names for Display
Usage
.format_matched_names_peru(df)
Get mammals species by genus from peru_mammals
Description
Internal function to filter species by genus from peru_mammals data frame. This function is memoised for performance.
Usage
.get_mammals_genus(genus_sub, target_df = NULL)
Arguments
genus_sub |
Character vector of genus names (case-insensitive) |
target_df |
Data frame (peru_mammals) with genus and species columns |
Value
Data frame filtered by genus
Initialize Matching Columns
Description
Initialize Matching Columns
Usage
.init_matching_columns_peru(df)
Invalidate trinomial matches in validation results
Description
Invalidate trinomial matches in validation results
Usage
.invalidate_trinomials(results)
Join Additional Database Information
Description
Join Additional Database Information
Usage
.join_database_info_peru(df, target_df)
Load Peru Mammals Database
Description
Load Peru Mammals Database
Usage
.load_target_peru(quiet)
Map with optional progress bar
Description
Internal wrapper for purrr::map_dfr with optional progress tracking. Progress bars are only shown in interactive sessions.
Usage
.map_dfr_progress(.x, .f, ..., .id = NULL, .progress = interactive())
Arguments
.x |
A list or vector to iterate over |
.f |
A function to apply |
... |
Additional arguments passed to .f |
.id |
Column name for row identification |
.progress |
Logical. Show progress bar? Default is interactive() |
Value
Data frame with combined results
Standardize species names for matching with Peru mammals database
Description
Internal function to standardize species names before matching against the peru_mammals database. Handles common formatting issues and removes hybrid indicators. Note: peru_mammals does not include infraspecific taxa.
Usage
.names_standardize(splist)
Arguments
splist |
Character vector of species names to standardize |
Value
Character vector of standardized names
.onAttach hook
Description
Hook function that runs when the package is attached via library().
It displays the package version and information about the taxonomic backbone.
Usage
.onAttach(libname, pkgname)
Arguments
libname |
A character string indicating the path to the library. |
pkgname |
A character string with the name of the package. |
.onLoad hook
Description
Hook function that runs when the package is loaded. It sets default options for the package.
Usage
.onLoad(libname, pkgname)
Arguments
libname |
A character string with the name of the library directory. |
pkgname |
A character string with the name of the package. |
Matching Pipeline - Hierarchical Strategy
Description
Implements hierarchical matching for peru_mammals: Node 1: Direct exact match (genus + species) Node 2: Genus exact match Node 3: Genus fuzzy match Node 4: Species fuzzy match within matched genus
Usage
.pipeline_nodes_peru(df, target_df, quiet)
Classify species names into taxonomic components
Description
Internal wrapper function to classify multiple species names into their taxonomic components (genus, species, author). Peru mammals database does not include infraspecific taxa, but this function handles "sp." notations for undescribed species (e.g., "Akodon sp. Ancash").
Automatic normalization: Empty strings ("", " ", etc.) are automatically converted to NA before processing, as they represent missing values and cannot match any names in the database.
Usage
.splist_classify(x)
Arguments
x |
Character vector of species names |
Value
Matrix with classified name components
Split Valid and Invalid Names
Description
Split Valid and Invalid Names
Usage
.split_valid_invalid_peru(splist_class)
Convert to sentence case (first letter uppercase, rest lowercase)
Description
Internal utility to convert text to sentence case for matching with peru_mammals database format.
Usage
.str_to_simple_cap(text)
Arguments
text |
Character vector |
Value
Character vector in sentence case
Transform and structure classified names
Description
Internal function to transform the classification matrix into a structured data frame. Simplified for peru_mammals which only has binomial names (and some "sp." cases) without infraspecific categories.
Important: This function distinguishes between:
Original NAs from the input (expected missing values)
Malformed names that failed rank assignment (problematic inputs)
Only the latter trigger warnings to avoid false positives.
Usage
.transform_split_classify(df)
Arguments
df |
Data frame or matrix from .splist_classify |
Value
Data frame with transformed names and rank
Validate Input Parameters
Description
Validate Input Parameters
Usage
.validate_inputs_peru(splist, quiet)
Validate Target Database Schema
Description
Validate Target Database Schema
Usage
.validate_target_schema_peru(target_df)
Check if taxonomic backbone needs updating
Description
Checks whether a newer version of the Pacheco et al. mammal checklist might be available based on the publication year.
Usage
check_backbone_update(backbone_year)
Arguments
backbone_year |
Numeric or character year of the current backbone. |
Value
A list with components:
-
update_available– logical indicating if update may be available. -
message– character string with information message.
Direct Match Species Names Against Peru Mammals Database
Description
Performs direct matching of species names against the peru_mammals database. Matches binomial names (genus + species) and handles special "sp." cases (e.g., "Akodon sp. Ancash"). Peru mammals database does not include infraspecific taxa.
Usage
direct_match(df, target_df = NULL)
Arguments
df |
A data frame or tibble containing the species data to be matched. Must include columns: Orig.Genus, Orig.Species, Rank |
target_df |
A data frame representing the peru_mammals database. Must include columns: genus, species |
Details
This function only matches Rank 2 (binomial) names since peru_mammals does not include infraspecific taxa. It handles:
Regular binomials: "Panthera onca"
Special "sp." cases: "Akodon sp. Ancash", "Oligoryzomys sp. B"
Names at Rank 1 (genus only) are not matched by this function; use
genus_match() instead.
Value
A tibble with an additional logical column direct_match indicating whether
the name was successfully matched (TRUE) or not (FALSE), plus columns
Matched.Genus and Matched.Species for matched records.
Quick check: Is species found in Peru?
Description
Simplified boolean check for species presence in Peru mammals database. Useful for filtering and logical operations.
Usage
found_in_peru(splist, exact_only = FALSE)
Arguments
splist |
Character vector of species names |
exact_only |
Logical. If TRUE, only exact matches return TRUE (default: FALSE) |
Value
Logical vector (TRUE = found, FALSE = not found)
Examples
species <- c("Panthera onca", "Tremarctos orrnatus",
"Tremarctos orrnatos", "Felis catus")
# Check presence (includes fuzzy matches)
found_in_peru(species)
tibble::tibble(splist = species) |>
dplyr::mutate(endemic = found_in_peru(splist))
Fuzzy Match Genus Name Against Peru Mammals Database
Description
Performs fuzzy matching of genus names against the peru_mammals database using string distance (Levenshtein) to account for slight spelling variations. Maximum distance is set to 1 character difference.
This implementation uses a two-step approach to avoid warnings when no matches are found:
Perform stringdist_left_join to get all candidates
Split into valid (finite distance) and invalid (NA distance)
Process only valid matches to find best candidates
Usage
fuzzy_match_genus(df, target_df = NULL)
Arguments
df |
A data frame containing the genus names to be matched. Must include column: Orig.Genus |
target_df |
A data frame representing peru_mammals database. Must include column: genus |
Details
If multiple genera match with the same string distance (ambiguous matches),
a warning is issued and the first match is automatically selected. To
examine ambiguous matches, use get_ambiguous_matches(result, type = "genus").
Ambiguous match information is stored as an attribute and includes:
Original genus
All matched genera with tied distances
Family information from peru_mammals
Number of species per genus
Value
A tibble with two additional columns:
-
fuzzy_match_genus: Logical indicating if genus was matched -
fuzzy_genus_dist: Numeric distance for each match (lower = better) -
Matched.Genus: The matched genus name
Fuzzy Match Species within Genus in Peru Mammals Database
Description
Performs fuzzy matching of species names within a matched genus using string distance to account for spelling variations. Peru mammals database does not include infraspecific taxa.
Usage
fuzzy_match_species_within_genus(df, target_df = NULL)
Arguments
df |
A data frame containing species data to be matched. Must include columns: Orig.Species, Matched.Genus |
target_df |
A data frame representing peru_mammals database. Must include columns: genus, species |
Details
This function processes each matched genus separately for efficiency.
If multiple species match with the same distance, a warning is issued
and the first match is selected. Use get_ambiguous_matches(result, type = "species")
to examine ambiguous cases.
Special handling for "sp." cases:
"Akodon sp. Ancash" is treated as a complete specific epithet
Fuzzy matching will work on the entire "SP. ANCASH" string
Value
A tibble with additional columns:
-
fuzzy_match_species_within_genus: Logical indicating match success -
fuzzy_species_dist: Numeric distance for each match -
Matched.Species: The matched species name
Helper: Fuzzy Match Species within Genus
Description
Helper function that performs fuzzy matching for a single genus.
This implementation uses a two-step approach to avoid issues with empty groups when filtering NAs:
Perform stringdist_left_join to get all candidates
Split into matched (finite distance) and unmatched (NA distance)
Process matched candidates to find best matches
Recombine for final output
Usage
fuzzy_match_species_within_genus_helper(df, target_df)
Arguments
df |
Data frame for a single matched genus |
target_df |
Peru mammals database |
Value
Data frame with fuzzy match results
Match Genus Names Against Peru Mammals Database
Description
Performs direct matching of genus names against the unique genera listed in the peru_mammals database. Useful for Rank 1 (genus-only) names.
Usage
genus_match(df, target_df = NULL)
Arguments
df |
A data frame or tibble containing the genus names to be matched. Must include column: Orig.Genus |
target_df |
A data frame representing the peru_mammals database. Must include column: genus |
Details
This function is typically used for names submitted at the genus level (Rank 1). When a genus is matched, all species of that genus in peru_mammals can be retrieved for further processing (e.g., suggesting possible species to the user).
Value
A tibble with an additional logical column genus_match indicating whether
the genus was successfully matched (TRUE) or not (FALSE), plus column
Matched.Genus for matched records.
Retrieve Ambiguous Match Information for Peru Mammals
Description
Extracts information about ambiguous matches (multiple candidates with tied distances) from matching results. Useful for quality control and manual curation. Adapted for peru_mammals (genus and species only).
Usage
get_ambiguous_matches(
match_result,
type = c("genus", "species", "all"),
save_to_file = FALSE,
output_dir = tempdir()
)
Arguments
match_result |
A tibble returned by matching functions. |
type |
Character. Type of ambiguous matches to retrieve:
|
save_to_file |
Logical. If TRUE, saves results to CSV. Default is FALSE (CRAN compliant). |
output_dir |
Character. Directory to save file if save_to_file = TRUE.
Defaults to |
Details
During fuzzy matching, multiple candidates may have identical string distances. The matching algorithm automatically selects the first candidate, but this function allows you to review all alternatives for quality control.
Value
A tibble with ambiguous match details, or NULL if none exist. Includes original names, matched names, distances, and database metadata.
Get taxonomic and common name information for Peru mammals
Description
Returns taxonomic classification and common names for species validated against the Peru mammals database.
Usage
get_common_names_peru(splist, return_details = FALSE)
Arguments
splist |
Character vector of species names |
return_details |
Logical. If TRUE, includes full taxonomic information (default: FALSE) |
Value
If return_details = FALSE: Character vector with common names If return_details = TRUE: Tibble with taxonomic and common name information
Examples
species <- c("Panthera onca", "Tremarctos ornatus",
"Puma concolor", "Myotis bakeri")
# Get common names
# Vector
get_common_names_peru(species)
# tibble
tibble::tibble(splist = species) |>
dplyr::mutate(endemic = get_common_names_peru(splist))
# Get full taxonomic information
taxonomy <- get_common_names_peru(species, return_details = TRUE)
taxonomy
Get All Species for Matched Genera from Peru Mammals
Description
Helper function to retrieve all species belonging to matched genera from the peru_mammals database. Useful for suggesting possible species when only genus is provided.
Usage
get_species_for_genera(matched_genera, target_df = NULL)
Arguments
matched_genera |
Character vector of matched genus names (uppercase) |
target_df |
A data frame representing the peru_mammals database |
Value
A data frame with genus and species columns for all species in the matched genera.
Check if species are endemic to Peru
Description
Simplified wrapper specifically for checking endemism status of mammals in Peru. Only evaluates species that are confirmed to occur in Peru.
Usage
is_endemic_peru(splist, return_logical = FALSE, filter_exact = FALSE)
Arguments
splist |
Character vector of species names |
return_logical |
Logical. If TRUE, returns logical vector (TRUE/FALSE/NA). If FALSE, returns descriptive character vector (default: FALSE) |
filter_exact |
Logical. If TRUE, only considers exact matches (default: FALSE) |
Value
If return_logical = FALSE: Character vector with endemism status If return_logical = TRUE: Logical vector (TRUE = endemic, FALSE = not endemic, NA = not found or endemism unknown)
Examples
species <- c("Panthera onca",
"Atelocynus microtis",
"Felis catus",
"Myotis bakeri")
is_endemic_peru(species)
# Descriptive output
tibble::tibble(splist = species) |>
dplyr::mutate(endemic = is_endemic_peru(splist))
Check if species are Peru mammals
Description
Main wrapper function that validates species names against the Peru mammals database with various output options for match quality, endemism status, and detailed information.
Usage
is_peru_mammal(
splist,
return_details = FALSE,
match_type = "status",
filter_exact = FALSE
)
Arguments
splist |
Character vector of species names to check |
return_details |
Logical. If TRUE, returns full validation tibble. If FALSE, returns simplified status vector (default: FALSE) |
match_type |
Character. Type of information to return when return_details = FALSE:
|
filter_exact |
Logical. If TRUE, only returns exact matches (genus_dist = 0 AND species_dist = 0). Fuzzy matches are treated as "Not found" (default: FALSE) |
Details
This function wraps validate_peru_mammals() to provide flexible output
formats for different use cases:
Basic presence/absence checking
Match quality assessment (exact vs fuzzy)
Endemism status queries
The function handles taxonomic matching with fuzzy string matching to accommodate minor spelling variations while maintaining data quality.
When filter_exact = TRUE, only matches with zero edit distance in both genus and species names are considered valid matches. All fields related to fuzzy matches are set to NA or "—" to maintain consistency.
Value
If return_details = FALSE: Character vector with requested information. If return_details = TRUE: Tibble with complete validation information.
Examples
species <- c(
"Panthera onca", # Exact match
"Pantera onca", # Fuzzy match (genus misspelled)
"Tremarctos orrnatus", # Fuzzy match (species misspelled)
"Felis domesticus", # Not in Peru
"Myotis bakeri"
)
# Check if species are found (includes fuzzy matches)
is_peru_mammal(species)
# Check with exact matches only
is_peru_mammal(species, filter_exact = TRUE)
# Check match quality
is_peru_mammal(species, match_type = "match_quality")
# Check endemism
is_peru_mammal(species, match_type = "endemic")
# Get detailed information
is_peru_mammal(species, return_details = TRUE)
# Get detailed information with exact matches only
is_peru_mammal(species, return_details = TRUE, filter_exact = TRUE)
Get match quality for Peru mammal names
Description
Returns the quality of taxonomic name matching (exact vs fuzzy) for species validated against the Peru mammals database.
Usage
match_quality_peru(splist, return_details = FALSE)
Arguments
splist |
Character vector of species names |
return_details |
Logical. If TRUE, includes distance metrics and matching information (default: FALSE) |
Details
Match quality categories:
"Exact": Perfect match with no spelling differences (genus_dist = 0, species_dist = 0)
"Fuzzy": Match found with minor spelling variations (genus_dist > 0 or species_dist > 0)
"Not found": No match in database
The function uses string distance metrics to quantify matching quality:
genus_dist: Edit distance for genus name
species_dist: Edit distance for species epithet
Value
If return_details = FALSE: Character vector with match quality If return_details = TRUE: Tibble with detailed matching information
Examples
species <- c(
"Panthera onca", # Exact
"Tremarctos orrnatus", # Fuzzy (spelling error)
"Felis domesticus", # Not found
"Myotis bakeri"
)
# Simple quality check
match_quality_peru(species)
# Detailed information with edit distances
details <- match_quality_peru(species, return_details = TRUE)
details
Mammal species of Peru based on Pacheco et al. (2021)
Description
A backbone of the terrestrial and marine mammal species known for Peru, compiled from Pacheco et al. (2021) "Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización".
Usage
data("peru_mammals")
Format
A tibble with 573 rows and 12 variables:
- pm_id
Character. Internal stable identifier for the species, combining the original numeric id and an abbreviation of the genus. Intended for internal linking between tables.
- order
Character. Taxonomic order (e.g. Didelphimorphia, Rodentia, Chiroptera).
- family
Character. Taxonomic family.
- genus
Character. Genus name.
- species
Character. Specific epithet.
- scientific_name
Character. Binomial scientific name (Genus species), without authorship. This is the main field used for name validation.
- scientific_name_full
Character. Full scientific name including authorship and year, as provided in the original annex.
- author
Character. Authorship and year of the species name.
- common_name
Character. Common name in Spanish, when available.
- endemic
Logical.
TRUEif the species is considered endemic to Peru in Pacheco et al. (2021),FALSEotherwise.- ecoregions
Character. Comma-separated codes of Peruvian ecoregions where the species occurs, using the abbreviations defined by Pacheco et al. (2021) (e.g.
"YUN, SB, SP"). Seeperu_mammals_ecoregions_metafor code definitions.- reference
Character. Bibliographic notes or specific references supporting the presence or taxonomy of the species.
Details
Each row corresponds to a single species as listed in the original annex of the paper. This dataset is the main taxonomic backbone used by the perumammals package.
Source
Pacheco, V., Cadenillas, R., Zeballos, H., Hurtado, C. M., Ruelas, D., & Pari, A. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización.
Summary information on the perumammals taxonomic backbone
Description
A one-row tibble with metadata about the taxonomic backbone used in perumammals, including its bibliographic source, year, number of species and the date when the internal data objects were created.
Usage
data("peru_mammals_backbone")
Format
A tibble with 1 row and 4 variables:
- source
Character. Short bibliographic reference to the backbone source (Pacheco et al. 2021).
- source_year
Integer. Publication year of the backbone source (2021).
- n_species
Integer. Number of species included in the backbone (as rows in
peru_mammals).- created_at
Date. Date when the backbone data objects were generated (in the package build process).
Details
This object is intended for internal bookkeeping and for functions that report the origin and version of the backbone.
See Also
Mammal species by Peruvian ecoregion
Description
A long-format table linking each mammal species to the Peruvian ecoregions where it occurs, based on Pacheco et al. (2021).
Usage
data("peru_mammals_ecoregions")
Format
A tibble with one row per species–ecoregion combination and 3 variables:
- pm_id
Character. Internal species identifier, matching
peru_mammals.- scientific_name
Character. Binomial scientific name (Genus species).
- ecoregion_code
Character. Abbreviation of the ecoregion where the species occurs (e.g.
"YUN","SB","COS"). Seeperu_mammals_ecoregions_metafor code definitions.
Details
Each row corresponds to a single combination of species and ecoregion.
This dataset is derived from the ecoregions field of
peru_mammals.
Source
Pacheco et al. (2021).
See Also
peru_mammals,
peru_mammals_ecoregions_meta
Metadata for Peruvian mammal ecoregions
Description
Definitions of the ecoregion codes used in peru_mammals
and peru_mammals_ecoregions. The codes follow the
abbreviations used by Pacheco et al. (2021), based on Peruvian
ecoregion schemes.
Usage
data("peru_mammals_ecoregions_meta")
Format
A tibble with one row per ecoregion code and 2 variables:
- ecoregion_code
Character. Ecoregion abbreviation. The codes used in the dataset are:
-
"OCE"– Oceánica -
"BPP"– Bosque Pluvial del Pacífico -
"BSE"– Bosque Seco Ecuatorial -
"COS"– Costa -
"VOC"– Vertiente Occidental -
"PAR"– Páramo -
"PUN"– Puna -
"YUN"– Yungas -
"SB"– Selva Baja -
"SP"– Sabana de Palmera
-
- ecoregion_label
Character. Human-readable label/description of the ecoregion in Spanish.
Source
Pacheco et al. (2021).
See Also
peru_mammals,
peru_mammals_ecoregions
Display taxonomic backbone metadata for Peruvian mammals
Description
Displays summary information about the taxonomic backbone used in perumammals. The backbone is based on the taxonomic checklist published by Pacheco et al. (2021), which was digitised from the original PDF publication into a structured tibble format.
Usage
pm_backbone_info()
Value
Invisibly returns a tibble with one row containing the backbone
metadata. The same structure as peru_mammals_backbone.
Called primarily for its side effect of printing the summary information.
References
Pacheco Torres, V. R., Diaz, S., Graham Angeles, L. A., Flores-Quispe, M., Calizaya-Mamani, G., Ruelas, D., & Sánchez-Vendizú, P. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización. Revista Peruana De Biología, 28(4), e21019. doi:10.15381/rpb.v28i4.21019
See Also
peru_mammals_backbone for the complete backbone data.
Examples
# Display backbone information
pm_backbone_info()
# Access the data invisibly returned
backbone_data <- pm_backbone_info()
backbone_data$n_species
List species by ecoregion
Description
Convenience wrapper to list species occurring in one or more Peruvian
ecoregions. This function uses pm_species() internally and
therefore supports the same taxonomic and endemism filters.
Usage
pm_by_ecoregion(
ecoregion,
order = NULL,
family = NULL,
genus = NULL,
endemic = NULL
)
Arguments
ecoregion |
Character vector with one or more ecoregion codes
(e.g. |
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
genus |
Optional character vector with one or more genera to keep.
If |
endemic |
Optional logical. If |
Value
A tibble with a subset of rows from peru_mammals
corresponding to species present in at least one of the requested
ecoregions. Returns an empty tibble if no species match the criteria.
See Also
pm_list_ecoregions() to see available ecoregion codes,
pm_species() for the underlying function.
Examples
# All species in Yungas
pm_by_ecoregion("YUN")
# Endemic species in Selva Baja (SB)
pm_by_ecoregion("SB", endemic = TRUE)
# Rodents in Costa and Vertiente Occidental
pm_by_ecoregion(c("COS", "VOC"), order = "Rodentia")
# Bats in multiple ecoregions
pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera")
pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera",
endemic = TRUE)
Summary of species richness by ecoregion
Description
Computes a summary of species richness and endemism for each ecoregion in the Peruvian mammal backbone.
Usage
pm_ecoregion_summary(sort_by = c("code", "species", "endemic", "label"))
Arguments
sort_by |
Character string indicating how to sort the results. Options are:
|
Details
The summary is based on the long-format table
peru_mammals_ecoregions and joins metadata from
peru_mammals_ecoregions_meta and endemism information
from peru_mammals.
Value
A tibble with one row per ecoregion and the following columns:
-
ecoregion_code– ecoregion abbreviation. -
ecoregion_label– ecoregion description in Spanish. -
n_species– total number of species recorded in the ecoregion. -
n_endemic– number of endemic species recorded in the ecoregion. -
pct_endemic– percentage of endemic species in the ecoregion.
See Also
pm_list_ecoregions() for ecoregion metadata,
pm_by_ecoregion() to list species by ecoregion.
Examples
# Get summary for all ecoregions (sorted by code)
pm_ecoregion_summary()
# Sort by species richness
pm_ecoregion_summary(sort_by = "species")
# Sort by number of endemic species
pm_ecoregion_summary(sort_by = "endemic")
# Find ecoregion with highest species richness
eco_summary <- pm_ecoregion_summary(sort_by = "species")
eco_summary[1, ]
# Ecoregions with more than 100 species
eco_summary <- pm_ecoregion_summary()
subset(eco_summary, n_species > 100)
# Compare richness between lowland and highland ecoregions
eco_summary <- pm_ecoregion_summary(sort_by = "species")
lowland <- eco_summary[eco_summary$ecoregion_code %in% c("SB", "SP"), ]
highland <- eco_summary[eco_summary$ecoregion_code %in% c("PUN", "PAR"), ]
List endemic mammal species of Peru
Description
Returns endemic species from the Peruvian mammal backbone, with optional filters by order, family and/or ecoregion.
Usage
pm_endemics(order = NULL, family = NULL, genus = NULL, ecoregion = NULL)
Arguments
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
genus |
Optional character vector with one or more genera to keep.
If |
ecoregion |
Optional character vector with one or more ecoregion
codes (e.g. |
Details
This is a convenience wrapper around pm_species() with
endemic = TRUE.
Value
A tibble with endemic species (subset of peru_mammals).
Examples
# All endemic species
pm_endemics()
# Endemic rodents
pm_endemics(order = "Rodentia")
# Endemic species in Yungas (YUN)
pm_endemics(ecoregion = "YUN")
Display ecoregion metadata for Peruvian mammals
Description
Displays summary information about the ecoregions used in the Peruvian mammal backbone. Ecoregions follow the Brack-Egg (1986) classification system used in Peruvian biogeography to describe the distribution of mammal species across different ecological regions.
Usage
pm_list_ecoregions(include_endemic = FALSE)
Arguments
include_endemic |
Logical. If |
Details
The ecoregion classification follows Brack-Egg (1986), a widely-used biogeographic framework for Peru that recognizes 10 distinct ecological regions based on climate, vegetation, and elevation. This classification is used in Pacheco et al. (2021) to document the distribution patterns of Peruvian mammals.
The function prints a formatted summary to the console and invisibly returns the complete data for further analysis.
Value
A tibble with one row per ecoregion, arranged in descending order by species richness, with the following columns:
- ecoregion_code
Abbreviated ecoregion code (e.g., "SB", "YUN")
- ecoregion_label
Full ecoregion name in Spanish
- n_species
Total number of mammal species recorded in the ecoregion
- pct_species
Percentage of Peru's total mammal diversity (0-100)
- n_endemic
(Only if
include_endemic = TRUE) Number of endemic species in the ecoregion- pct_endemic
(Only if
include_endemic = TRUE) Percentage of endemic species relative to total species in the ecoregion (0-100)
References
Brack-Egg, A. (1986). Ecología de un país complejo. In J. Mejía Baca (Ed.), Gran Geografía del Perú: Naturaleza y Hombre (Vol. 2, pp. 175-319). Barcelona: Manfer-Mejía Baca.
See Also
peru_mammals_ecoregions_meta for the complete ecoregion metadata,
peru_mammals_ecoregions for species-ecoregion associations,
pm_by_ecoregion() to filter species by ecoregion,
pm_ecoregion_summary() for species richness summaries by ecoregion.
Examples
# Display ecoregion information
pm_list_ecoregions()
# Include endemic species information
pm_list_ecoregions(include_endemic = TRUE)
# Access the data for further analysis
ecoregion_data <- pm_list_ecoregions()
# Ecoregions with highest species richness
ecoregion_data
List endemic mammal species by taxonomic order
Description
Summarises the diversity of endemic mammal species in Peru, grouped by taxonomic order. Provides counts of families, genera, and species that are endemic to Peru within each order. Optionally includes endemism rates relative to total species richness.
Usage
pm_list_endemic(include_rate = FALSE)
Arguments
include_rate |
Logical. If |
Details
This function focuses exclusively on species that are endemic to Peru (i.e., species found nowhere else in the world). Orders without any endemic species are not included in the output.
When include_rate = FALSE (default), results are sorted by the
number of endemic species in descending order, highlighting which orders
have the highest endemic diversity.
When include_rate = TRUE, results are sorted by total species
richness in descending order, and include endemism rates to show what
proportion of each order's diversity is endemic to Peru. A summary row
labeled "Total" is appended to show overall statistics.
Value
A tibble with one row per order containing endemic species, arranged in descending order by number of endemic species, with the following columns:
- order
Taxonomic order
- n_families
Number of families with endemic species in the order
- n_genera
Number of genera with endemic species in the order
- n_endemic
Number of endemic species in the order
- n_species
(Only if
include_rate = TRUE) Total number of species in the order- endemic_rate
(Only if
include_rate = TRUE) Proportion of endemic species (0-1)- endemic_pct
(Only if
include_rate = TRUE) Percentage of endemic species (0-100)
Examples
# Summary of endemic species by order
pm_list_endemic()
# Include endemism rates
pm_list_endemic(include_rate = TRUE)
List taxonomic families in the Peruvian mammal backbone
Description
Summarises the number of genera, species and endemic species per family. Optionally filters the output to one or more taxonomic orders.
Usage
pm_list_families(order = NULL)
Arguments
order |
Optional character vector specifying one or more taxonomic
orders to include. If |
Value
A tibble with one row per family, arranged by order and family name, with the following columns:
- order
Taxonomic order
- family
Family name
- n_genera
Number of genera in the family
- n_species
Number of species in the family
- n_endemic
Number of endemic species to Peru in the family
Examples
# All families
pm_list_families()
# Only families within Rodentia
pm_list_families(order = "Rodentia")
# Multiple orders
pm_list_families(order = c("Rodentia", "Chiroptera"))
List genera in the Peruvian mammal backbone
Description
Summarises the number of species and endemic species per genus. Optionally restricts the output to one or more orders and/or families. Genera with missing values are excluded from the results.
Usage
pm_list_genera(order = NULL, family = NULL)
Arguments
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
Details
The function validates input parameters and warns if invalid order or family names are provided. It also warns if the filters result in an empty dataset.
Value
A tibble with one row per genus and the following columns:
-
order– taxonomic order. -
family– family name. -
genus– genus name. -
n_species– number of species in the genus. -
n_endemic– number of endemic species in the genus.
Returns an empty tibble with the same structure if no records match the specified filters.
Examples
# All genera
pm_list_genera()
# Genera within Chiroptera (bats)
pm_list_genera(order = "Chiroptera")
# Multiple orders
pm_list_genera(order = c("Didelphimorphia", "Chiroptera"))
# Genera within a specific family
bat_genera <- pm_list_genera(family = "Phyllostomidae")
# Count total endemic species in a family
sum(bat_genera$n_endemic)
# Combination of filters
pm_list_genera(order = "Chiroptera", family = "Phyllostomidae")
List taxonomic orders in the Peruvian mammal backbone
Description
Summarises the number of families, genera, species and endemic species
per order in peru_mammals.
Usage
pm_list_orders()
Value
A tibble with one row per order and the following columns:
-
order– taxonomic order. -
n_families– number of families in the order. -
n_genera– number of genera in the order. -
n_species– number of species in the order. -
n_endemic– number of endemic species in the order.
Examples
pm_list_orders()
Filter mammal species from the Peruvian backbone
Description
Convenience wrapper around peru_mammals to subset species by
taxonomic group, endemism and/or ecoregion.
Usage
pm_species(
order = NULL,
family = NULL,
genus = NULL,
endemic = NULL,
ecoregion = NULL
)
Arguments
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
genus |
Optional character vector with one or more genera to keep.
If |
endemic |
Optional logical. If |
ecoregion |
Optional character vector with one or more ecoregion
codes (e.g. |
Value
A tibble with a subset of rows from peru_mammals.
Examples
# All species
pm_species()
# Only Rodentia
pm_species(order = "Rodentia")
# Endemic bats (Chiroptera)
pm_species(order = "Chiroptera", endemic = TRUE)
# Species present in Yungas (YUN) and Selva Baja (SB)
pm_species(ecoregion = c("YUN", "SB"))
Determine whether to show progress bar
Description
Returns logical TRUE/FALSE depending on package options and whether the session is interactive.
Usage
show_progress()
Value
Logical indicating whether progress bars should be shown.
Match Species Names Against Peru Mammals Database
Description
Matches given species names against the official list of mammal species of Peru (Pacheco et al. 2021). Uses a hierarchical matching strategy that includes direct matching, genus-level matching, and fuzzy matching to maximize successful matches while maintaining accuracy.
Peru Mammals Database:
575 mammal species
Binomial nomenclature only (no infraspecific taxa)
Includes 6 undescribed species ("sp." cases)
Fields: genus, species, scientific_name, common_name, family, order, endemic
Usage
validate_peru_mammals(splist, quiet = TRUE)
Arguments
splist |
A character vector containing the species names to be matched. Names can be in any format (uppercase, lowercase, with underscores, etc.). Duplicate names are preserved in the output. |
quiet |
Logical, default TRUE. If FALSE, prints informative messages about the matching progress. |
Details
Matching Strategy: The function implements a hierarchical matching pipeline:
-
Node 1 - Direct Match: Exact matching of binomial names (genus + species)
-
Node 2 - Genus Match: Exact matching at genus level
-
Node 3 - Fuzzy Genus: Fuzzy matching for genus with typos (max distance = 1)
-
Node 4 - Fuzzy Species: Fuzzy matching for species within matched genus
Special Cases:
Handles "sp." cases: "Akodon sp. Ancash", "Oligoryzomys sp. B", etc.
Case-insensitive matching
Removes common qualifiers (CF., AFF.)
Standardizes spacing and formatting
Rank System:
-
Rank 1: Genus level only (e.g., "Panthera")
-
Rank 2: Binomial (genus + species, e.g., "Panthera onca")
Ambiguous Matches:
When multiple candidates have identical fuzzy match scores, a warning is
issued and the first match is selected. Use get_ambiguous_matches()
to examine these cases.
Input Requirements:
Species names must be provided as binomials (Genus species) WITHOUT:
Author information: Panthera onca Linnaeus"
Infraspecific taxa: "Panthera onca onca"
Parenthetical authors: "Panthera onca (Linnaeus, 1758)"
Valid formats:
Standard binomial: "Panthera onca"
Undescribed species: "Akodon sp. Ancash"
Case-insensitive: "PANTHERA ONCA" or "panthera onca"
Names with 3+ elements will be automatically rejected with a warning.
Value
A tibble with the following columns:
- sorter
Integer. Original position in input vector
- Orig.Name
Character. Original input name (standardized)
- Matched.Name
Character. Matched name from database or "—"
- Match.Level
Character. Quality of match ("Exact rank", "No match", etc.)
- matched
Logical. Whether a match was found
- Rank
Integer. Input taxonomic rank (1 or 2)
- Matched.Rank
Integer. Matched taxonomic rank (1 or 2)
- Comp.Rank
Logical. Whether ranks match exactly
- valid_rank
Logical. Whether match is valid at correct rank
- Orig.Genus
Character. Input genus (uppercase)
- Orig.Species
Character. Input species (uppercase)
- Author
Character. Taxonomic authority if provided
- Matched.Genus
Character. Matched genus (uppercase)
- Matched.Species
Character. Matched species (uppercase)
- genus_dist
Integer. Edit distance for genus (0=exact, >0=fuzzy, NA=no match)
- species_dist
Integer. Edit distance for species (0=exact, >0=fuzzy, NA=no match or genus-only)
- scientific_name
Character. Scientific name from peru_mammals
- common_name
Character. Common name in Spanish
- family
Character. Family
- order
Character. Order
- endemic
Logical. Endemic to Peru?
Attributes:
The output includes metadata accessible via attr():
-
target_database: "peru_mammals" -
matching_date: Date of matching -
n_input: Number of input names -
n_matched: Number of successful matches -
match_rate: Percentage of successful matches -
n_fuzzy_genus: Number of fuzzy genus matches -
n_fuzzy_species: Number of fuzzy species matches -
ambiguous_genera: Ambiguous genus matches (if any) -
ambiguous_species: Ambiguous species matches (if any)
See Also
get_ambiguous_matches to retrieve ambiguous match details
Examples
# Basic usage
species_list <- c("Panthera onca", "Tremarctos ornatus", "Puma concolor")
results <- validate_peru_mammals(species_list)
# Check results
table(results$matched)
table(results$Match.Level)
# View matched species
results |>
dplyr::filter(matched) |>
dplyr::select(Orig.Name, Matched.Name, common_name, endemic)
# With typos (fuzzy matching)
typos <- c("Pumma concolor", "Tremarctos ornatu") # Spelling errors
results_fuzzy <- validate_peru_mammals(typos, quiet = FALSE)
# Check for ambiguous matches
get_ambiguous_matches(results_fuzzy, type = "genus")
# Access metadata
attr(results, "match_rate")
attr(results, "n_fuzzy_genus")
# With special "sp." cases
sp_cases <- c("Akodon sp. Ancash", "Oligoryzomys sp. B")
results_sp <- validate_peru_mammals(sp_cases)
# Should match exactly