The PhenotypeR package helps us to assess the research-readiness of a set of cohorts we have defined. This assessment includes:
You can install PhenotypeR from CRAN:
install.packages("PhenotypeR")
Or you can install the development version from GitHub:
# install.packages("remotes")
::install_github("OHDSI/PhenotypeR") remotes
To illustrate the functionality of PhenotypeR, let’s create a cohort using the Eunomia Synpuf dataset. We’ll first load the required packages and create the cdm reference for the data.
library(dplyr)
library(CohortConstructor)
library(PhenotypeR)
library(CodelistGenerator)
library(duckdb)
library(CDMConnector)
library(DBI)
# Connect to the database and create the cdm object
<- dbConnect(duckdb(), dbdir = eunomiaDir("synpuf-1k", "5.3"))
con <- CDMConnector::cdmFromCon(con = con,
cdm cdmName = "Eunomia Synpuf",
cdmSchema = "main",
writeSchema = "main",
achillesSchema = "main")
Note that we’ve included achilles results in our cdm reference. Where we can we’ll use these precomputed counts to speed up our analysis.
cdm#>
#> ── # OMOP CDM reference (duckdb) of Eunomia Synpuf ─────────────────────────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength, cohort_definition,
#> attribute_definition
#> • cohort tables: -
#> • achilles tables: achilles_analysis, achilles_results, achilles_results_dist
#> • other tables: -
# Create a code lists
<- list("warfarin" = c(1310149L, 40163554L),
codes "acetaminophen" = c(1125315L, 1127078L, 1127433L, 40229134L, 40231925L, 40162522L, 19133768L),
"morphine" = c(1110410L, 35605858L, 40169988L),
"measurements_cohort" = c(40660437L, 2617206L, 4034850L, 2617239L, 4098179L))
# Instantiate cohorts with CohortConstructor
$my_cohort <- conceptCohort(cdm = cdm,
cdmconceptSet = codes,
exit = "event_end_date",
overlap = "merge",
name = "my_cohort")
We can easily run all the analyses explained above (database
diagnostics, codelist diagnostics,
cohort diagnostics, and population
diagnostics) using phenotypeDiagnostics()
:
<- phenotypeDiagnostics(cdm$my_cohort, survival = TRUE) result
You can also create a table with the expected results, so you can compare later with the actual results.
<- tibble(
expectations "cohort_name" = c("warfarin", "acetaminophen", "morphine", "measurements_cohort"),
"estimate" = c("Male percentage", "Survival probability after 5y", "Median age", "Median age"),
"value" = c("56%", "96%", "57-58", "42-45"),
"source" = c("A clinician", "A clinician", "A clinician", "A clinician"),
"diagnostic" = c("cohort_characteristics", "cohort_survival", "cohort_characteristics", "cohort_characteristics")
)
Or alternatively, you can use AI to generate expectations
library(ellmer)
# Notice that you may need to generate an google gemini API with https://aistudio.google.com/app/apikey and add it to your R environment:
# usethis::edit_r_environ()
# GEMINI_API_KEY = "your API"
<- chat("google_gemini")
chat
<- getCohortExpectations(chat = chat,
expectations phenotypes = result)
Once we have our results we can quickly view them in an interactive application. Here we’ll apply a minimum cell count of 10 to our results and save our shiny app to a temporary directory.
shinyDiagnostics(result = result, minCellCount = 2, directory = tempdir(), expectations = expectations)
See the shiny app generated from the example cohort in here.
To see more details regarding each one of the analyses, please refer to the package vignettes.