This vignette covers all topics concerned with flattening FHIR resources in some depth. If you are interested in a quick overview, please have a look at the fhircrackr:intro vignette.
Before running any of the following code, you need to load the
fhircrackr package:
In the vignette fhircrackr: Download FHIR resources you
saw how to download FHIR resources into R. Now we’ll have a look at how
to flatten them into data.frames/data.tables. For the rest of the
vignette, we’ll work with the two example data sets from
fhircrackr, which can be made accessible like this:
pat_bundles <- fhir_unserialize(bundles = patient_bundles)
med_bundles <- fhir_unserialize(bundles = medication_bundles)See ?patient_bundles and
?medication_bundles for the FHIR search requests that
generated them.
There are two extraction scenarios when you want to flatten FHIR
bundles: Either you want to extract just one resource type, or you want
to extract several resource types. Because the structure of different
resource types is quite dissimilar, it makes sense to create one table
per resource type. Therefore the result of the flattening process in
fhircrackr can be either a single table (when extracting
just one resource type) or a list of tables (when extracting more than
one resource type). Both scenarios are realized with a call to
fhir_crack(). We will now explain the two scenarios
individually.
We’ll start with pat_bundles, which only contains
Patient resources. To transform them into a table, we will use
fhir_crack(). The most important argument
fhir_crack() takes is bundles, an object of
class fhir_bundle_list that is returned by
fhir_search(). The second important argument is
design, which tells the function which data to extract from
the bundle. When we want to extract just one resource type, we can use a
fhir_table_description in the argument design.
fhir_crack() then returns a single data.frame or data.table
(if argument data.table = TRUE).
We’ll show you an example of how it works first and then go on to
explain the fhir_table_description in more detail.
pat_table_description <- fhir_table_description(
    resource = "Patient",
    cols     = list(
        id     = "id",
        gender = "gender",
        name   = "name/family",
        city   = "address/city"
    )
)
table <- fhir_crack(
    bundles = pat_bundles,
    design  = pat_table_description,
    verbose = 0
)
head(table)
#        id gender               name          city
# 1 2072744 female           Nordmann          <NA>
# 2 2431578   male              Smith          <NA>
# 3 2431568   male Chalmers:::Windsor PleasantVille
# 4 2431577   male            Malekar          <NA>
# 5 2431757   male             murali          <NA>
# 6 2431759   male                XYZ       PhoenixA fhir_table_description holds all the information
fhir_crack() needs to create a table from resources of a
certain type. It is created with fhir_table_description()
by providing the following arguments:
resource argumentThis is basically a string that defines the resource type (e.g. Patient or Observation) to extract. It is the only argument that you must provide and you set it like this:
fhir_table_description(resource = "Patient")
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# An empty fhir_columns object
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEInternally, fhir_table_description() calls
fhir_resource_type() which checks the type you provided
against list of all currently available resource types which can be
found at
https://hl7.org/FHIR/resourcelist.html.
Case errors are corrected automatically and the function throws a
warning if the resource type doesn’t match the list under hl7.org.
As you can see in the above output, there are more elements in a
fhir_table_description which are filled automatically by
fhir_table_description().
The cols argument takes the column names and XPath (1.0)
expressions defining the columns to create from the FHIR resources. The
XPath expression has to be built relatively to the root of the resource
tree. If the cols element is empty,
fhir_crack() will extract all available elements of the
resource and name the columns automatically. To explicitly define
columns, you can provide a (named) character or a (named) list with
XPath expressions like this:
fhir_table_description(
    resource = "Patient",
    cols     = list(
        gender = "gender",
        name   = "name/family",
        city   = "address/city"
    )
)
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# ------------ -----------------
# column name | xpath expression
# ------------ -----------------
# gender      | gender
# name        | name/family
# city        | address/city
# ------------ -----------------
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEIn this case a table with three columns called gender, name and city
will be created. They will be filled with the element that can be found
under the respective xpath expression in the resource. The element will
be extracted regardless of the attribute that is used (in FHIR this is
mostly @value but can also be @id or
@url in rare cases). If you are interested in keeping the
attribute information, you can set keep_attr = TRUE, in
which case the attribute will be attached to the column name.
Internally, fhir_table_description() calls
fhir_columns() to check the validity of the XPath
expressions and assign column names. You can provide the XPath
expressions in a named or unnamed character vector or a named or unnamed
list. If you choose the unnamed version, the names will be set
automatically and reflect the respective XPath expression:
#custom column names
fhir_columns(
    xpaths = c(
        gender = "gender",
        name   = "name/family",
        city   = "address/city"
    )
)
# ------------ -----------------
# column name | xpath expression
# ------------ -----------------
# gender      | gender
# name        | name/family
# city        | address/city
# ------------ -----------------
#automatic column names
fhir_columns(xpaths = c("gender", "name/family", "address/city"))
# ------------- -----------------
# column name  | xpath expression
# ------------- -----------------
# gender       | gender
# name.family  | name/family
# address.city | address/city
# ------------- -----------------A fhir_columns object that is created explicitly like
this can of course also be used in the columns argument of
fhir_table_description. We strongly advise to only use
fully specified relative XPath expressions here,
e.g. "ingredient/strength/numerator/code" and not search
paths like "//code", as those can generate unexpected
results especially if the searched element appears on different levels
of the resource.
In the XPath expression it is also possible to use so-called
predicates to find elements that contain a specific value. When your
resources for example contain several code/coding elements
and you are interested in loinc codes only, the expression
code/coding[system[@value='http://loinc.org']]/code will
extract the code only from code elements with a loinc system. A more
detailed example of this can be found in the paragraph on multiple
entries further down.
While the resource and cols control
what is extracted from the bundles, the remaining elements of a
fhir_table_description control how the resulting
table looks. These elements for example control how
fhir_crack() deals with multiple entries for the same
element and with columns that are completely empty, i.e. have only
NA values. Furthermore you can select the shape of the
output tables and how column names are generated:
sep element is a string defining the separator used
when multiple entries to the same attribute are pasted together. This
could for example happen if there is more than one address entry in a
Patient resource. Examples of this are shown further down under the
heading Multiple entries.brackets element is either an empty character
vector (of length 0) or a character vector of length 2. If it is empty,
multiple entries will be pasted together without indices. If it is of
length 2, the two strings provided here are used as brackets for
automatically generated indices to sort out multiple entries (see
paragraph Multiple Entries). brackets = c("[", "]")
e.g. will lead to indices like [1.1].rm_empty_cols flag can be TRUE or
FALSE. If TRUE, columns containing only
NA values will be removed, if FALSE, these
columns will be kept.format element takes values compact or
wide that specify the shape of the output table. In a
compact table multiple entries are written into the same
cell/column separated by sep. In a wide table
multiple entries are spread over several indexed columns. See the
paragraph on multiple entries for more information.keep_attr flag controls whether the xml tag
attributes of the FHIR element should be attached to the end of the
column name or not. For the column extracted by name/given,
the name would result in name.given if
keep_attr = FALSE and name.given@value if
keep_attr = TRUE.All five style elements can also be controlled directly by the
fhir_crack() arguments sep,
brackets, remove_empty_columns,
format and keep_attr. If the function
arguments are NULL (their default), the values provided in
fhir_table_description are used, if they are not NULL, they
will overwrite any values in fhir_table_description.
A fully defined set of a Patient table description would
be like this:
table_description <- fhir_table_description(
    resource = "Patient",
    cols     = list(
        gender = "gender",
        name   = "name/family",
        city   = "address/city"
    ),
    sep           = "||",
    brackets      = c("[", "]"),
    rm_empty_cols = FALSE,
    format        = "compact",
    keep_attr     = FALSE
)We will now work through examples using
fhir_table_descriptions of different complexity.
Lets start with an example where we only provide the (mandatory)
resource component of the table_description. In this case,
fhir_crack() will extract all available attributes and use
default values for the other elements:
#define a table_description
table_description1 <- fhir_table_description(resource = "Patient")
#convert resources
#pass table_description1 to the design argument
table <- fhir_crack(bundles = pat_bundles, design = table_description1, verbose = 0)
#have look at part of the results
table[1:5,1:5]#38:42
#   active  address.city address.district   address.line address.period.start
# 1   <NA>          <NA>             <NA>           <NA>                 <NA>
# 2   <NA>          <NA>             <NA>           <NA>                 <NA>
# 3   true PleasantVille          Rainbow 534 Erewhon St           1974-12-25
# 4   true          <NA>             <NA>           <NA>                 <NA>
# 5   true          <NA>             <NA>           <NA>                 <NA>
#see the fill result with:
#View(table)As you can see, this can easily become a rather wide and sparse data
frame. This is due to the fact that every every element appearing in at
least one of the resources will be turned into a variable (i.e. column),
even if none of the other resources contain this element. For those
resources, the value on that element will be set to NA.
Depending on the variability of the resources, the resulting table can
contain a lot of NA values. If a resource has multiple
entries for an element (e.g. several addresses in a Patient resource),
these entries will pasted together using the string provided in
sep as a separator. The column names in this option are
automatically generated by pasting together the path to the respective
element, e.g. name.given.
If we know which elements we want to extract, we can specify them in
a named list and provide it in the cols component of the
table description:
#define a table_description
table_description2 <- fhir_table_description(
    resource = "Patient",
    cols     = list(
        PID         = "id",
        use_name    = "name/use",
        given_name  = "name/given",
        family_name = "name/family",
        gender      = "gender",
        birthday    = "birthDate"
    )
)
#convert resources
table <- fhir_crack(bundles = pat_bundles, design = table_description2, verbose = 0)
#have look at the results
head(table)
#       PID                  use_name                          given_name
# 1 2072744                  official                            K:::Kari
# 2 2431578                  official                               Roman
# 3 2431568 official:::usual:::maiden Peter:::James:::Jim:::Peter:::James
# 4 2431577                  official                    Ganpat:::Malekar
# 5 2431757                       old                                <NA>
# 6 2431759                  official                                 ABC
#          family_name gender   birthday
# 1           Nordmann female 2018-09-12
# 2              Smith   male 2021-07-19
# 3 Chalmers:::Windsor   male 1974-12-25
# 4            Malekar   male 1996-02-07
# 5             murali   male       <NA>
# 6                XYZ   male 1998-01-03This option will return more tidy and clear data frames, because you have full control over the extracted columns including their name in the resulting table. You should always extract the resource id, because this is used to link to other resources you might also extract.
If you are not sure which elements are available or where they are
located in the resource, it can be helpful to start by extracting all
available elements. If you are more comfortable with xml, you can also
use xml2::xml_structure on one of the bundles from your
bundle list, this will print the complete xml structure into your
console. Then you can get an overview over the available element and
their location and continue by doing a second, more targeted extraction
to get your final table.
If you want to have a look at how the design looked that was actually
used in the last call to fhir_crack() you can retrieve it
with fhir_canonical_design().
fhir_canonical_design()
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# ------------ -----------------
# column name | xpath expression
# ------------ -----------------
# PID         | id
# use_name    | name/use
# given_name  | name/given
# family_name | name/family
# gender      | gender
# birthday    | birthDate
# ------------ -----------------
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEOf course the previous example is using just one resource type. If
you are interested in several types of resources, you need one
fhir_table_description per resource type. You can bundle a
bunch of fhir_table_descriptions in a
fhir_design. This is basically a named list of
fhir_table_descriptions, and when you pass it to
fhir_crack(), the result will be a named list of tables
with the same names as the design. Consider an example where we have
downloaded MedicationStatements referring to a certain medication as
well as the Patient resources these MedicationStatements are linked
to.
The design to extract both resource types could look like this:
#all attributes defined explicitly
meds <- fhir_table_description(
    resource = "MedicationStatement",
    cols     = list(
        ms_id       = "id",
        status_text = "text/status",
        status      = "status",
        med_system  = "medicationCodeableConcept/coding/system",
        med_code    = "medicationCodeableConcept/coding/code"
    ),
    sep           = "|",
    brackets      = NULL,
    rm_empty_cols = FALSE,
    format        = 'compact',
    keep_attr     = FALSE 
)
#automatic extraction/default values
pat <- fhir_table_description(resource = "Patient")
#combine both table_descriptions
design <- fhir_design(meds, pat)In this example, we have spelled out the table_description MedicationStatement completely, while we have used a short form for Patients. It looks like this:
design
# A fhir_design with 2 table descriptions:
# A fhir_table_description with the following elements: 
# 
# resource: MedicationStatement
# 
# cols: 
# ------------ ----------------------------------------
# column name | xpath expression
# ------------ ----------------------------------------
# ms_id       | id
# status_text | text/status
# status      | status
# med_system  | medicationCodeableConcept/coding/system
# med_code    | medicationCodeableConcept/coding/code
# ------------ ----------------------------------------
# 
# sep:           '|'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSE
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# An empty fhir_columns object
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEAs you can see, each table_description is identified by a name, which
will also be the name of the corresponding table in the result of
fhir_crack().
You can assign the names explicitly, if you prefer:
design <- fhir_design(Medications = meds, Patients = pat)
design
# A fhir_design with 2 table descriptions:
# A fhir_table_description with the following elements: 
# 
# resource: MedicationStatement
# 
# cols: 
# ------------ ----------------------------------------
# column name | xpath expression
# ------------ ----------------------------------------
# ms_id       | id
# status_text | text/status
# status      | status
# med_system  | medicationCodeableConcept/coding/system
# med_code    | medicationCodeableConcept/coding/code
# ------------ ----------------------------------------
# 
# sep:           '|'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSE
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# An empty fhir_columns object
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEAnd you can also extract single table_descriptions by their name:
design$Patients
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# An empty fhir_columns object
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEWe can use the design for fhir_crack():
list_of_tables <- fhir_crack(bundles = med_bundles, design = design, verbose = 0)
head(list_of_tables$Medications)
#    ms_id status_text status            med_system  med_code
# 1  30233   generated active http://snomed.info/ct 429374003
# 2  42091   generated active http://snomed.info/ct 429374003
# 3  45724   generated active http://snomed.info/ct 429374003
# 4  59597   generated active http://snomed.info/ct 429374003
# 5  69117   generated active http://snomed.info/ct 429374003
# 6 591941   generated active http://snomed.info/ct 429374003
head(list_of_tables$Patients)
#   address.city address.country address.district
# 1         <NA>            <NA>             <NA>
# 2         <NA>            <NA>             <NA>
# 3         <NA>            <NA>             <NA>
# 4         <NA>            <NA>             <NA>
# 5         <NA>            <NA>             <NA>
# 6     Westford              US             <NA>
#                                     address.extension
# 1                                                <NA>
# 2                                                <NA>
# 3                                                <NA>
# 4                                                <NA>
# 5                                                <NA>
# 6 http://hl7.org/fhir/StructureDefinition/geolocation
#   address.extension.extension address.extension.extension.valueDecimal
# 1                        <NA>                                     <NA>
# 2                        <NA>                                     <NA>
# 3                        <NA>                                     <NA>
# 4                        <NA>                                     <NA>
# 5                        <NA>                                     <NA>
# 6        latitude:::longitude    42.58942256332994:::-71.3827654850569
#        address.line address.postalCode address.state address.text address.type
# 1              <NA>               <NA>          <NA>         <NA>         <NA>
# 2              <NA>               <NA>          <NA>         <NA>         <NA>
# 3              <NA>               <NA>          <NA>         <NA>         <NA>
# 4              <NA>               <NA>          <NA>         <NA>         <NA>
# 5              <NA>               <NA>          <NA>         <NA>         <NA>
# 6 378 Krajcik Lodge               <NA> Massachusetts         <NA>         <NA>
#   address.use  birthDate communication.language.coding.code
# 1        <NA> 2020-03-23                               <NA>
# 2        <NA> 2020-03-24                               <NA>
# 3        <NA> 1979-10-08                               <NA>
# 4        <NA> 2019-11-10                               <NA>
# 5        <NA> 1970-01-10                               <NA>
# 6        <NA> 1946-03-29                              en-US
#   communication.language.coding.display communication.language.coding.system
# 1                                  <NA>                                 <NA>
# 2                                  <NA>                                 <NA>
# 3                                  <NA>                                 <NA>
# 4                                  <NA>                                 <NA>
# 5                                  <NA>                                 <NA>
# 6                               English                      urn:ietf:bcp:47
#   communication.language.text
# 1                        <NA>
# 2                        <NA>
# 3                        <NA>
# 4                        <NA>
# 5                        <NA>
# 6                     English
#                                                                                                                                                                                                                                                                                                                                                                                                                                                                               extension
# 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <NA>
# 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <NA>
# 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <NA>
# 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <NA>
# 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <NA>
# 6 http://hl7.org/fhir/us/core/StructureDefinition/us-core-race:::http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity:::http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName:::http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex:::http://hl7.org/fhir/StructureDefinition/patient-birthPlace:::http://synthetichealth.github.io/synthea/disability-adjusted-life-years:::http://synthetichealth.github.io/synthea/quality-adjusted-life-years
#                       extension.extension extension.extension.valueCoding.code
# 1                                    <NA>                                 <NA>
# 2                                    <NA>                                 <NA>
# 3                                    <NA>                                 <NA>
# 4                                    <NA>                                 <NA>
# 5                                    <NA>                                 <NA>
# 6 ombCategory:::text:::ombCategory:::text                      2106-3:::2186-5
#   extension.extension.valueCoding.display
# 1                                    <NA>
# 2                                    <NA>
# 3                                    <NA>
# 4                                    <NA>
# 5                                    <NA>
# 6          White:::Not Hispanic or Latino
#                              extension.extension.valueCoding.system
# 1                                                              <NA>
# 2                                                              <NA>
# 3                                                              <NA>
# 4                                                              <NA>
# 5                                                              <NA>
# 6 urn:oid:2.16.840.1.113883.6.238:::urn:oid:2.16.840.1.113883.6.238
#   extension.extension.valueString extension.valueAddress.city
# 1                            <NA>                        <NA>
# 2                            <NA>                        <NA>
# 3                            <NA>                        <NA>
# 4                            <NA>                        <NA>
# 5                            <NA>                        <NA>
# 6  White:::Not Hispanic or Latino                      Boston
#   extension.valueAddress.country extension.valueAddress.state
# 1                           <NA>                         <NA>
# 2                           <NA>                         <NA>
# 3                           <NA>                         <NA>
# 4                           <NA>                         <NA>
# 5                           <NA>                         <NA>
# 6                             US                Massachusetts
#   extension.valueCode                extension.valueDecimal
# 1                <NA>                                  <NA>
# 2                <NA>                                  <NA>
# 3                <NA>                                  <NA>
# 4                <NA>                                  <NA>
# 5                <NA>                                  <NA>
# 6                   M 4.160702818392717:::67.83929718160728
#   extension.valueString gender generalPractitioner.reference      id
# 1                  <NA>   male                          <NA>  697738
# 2                  <NA>   male                          <NA>  697934
# 3                  <NA> female                          <NA>   42024
# 4                  <NA> female                          <NA>   59530
# 5                  <NA>   male                          <NA> 1162779
# 6   Kristyn560 Lesch175   male           Practitioner/636228  636226
#                                                                                                                                                                                           identifier.system
# 1                                                                                                                                                                                                      <NA>
# 2                                                                                                                                                                                                      <NA>
# 3                                                                                                                                                                                                      <NA>
# 4                                                                                                                                                                                                      <NA>
# 5                                                                                                                                                                                                      <NA>
# 6 http://www.interopx.com:::http://hospital.smarthealthit.org:::http://hl7.org/fhir/sid/us-ssn:::urn:oid:2.16.840.1.113883.4.3.25:::http://standardhealthrecord.org/fhir/StructureDefinition/passportNumber
#   identifier.type.coding.code
# 1                        <NA>
# 2                        <NA>
# 3                        <NA>
# 4                        <NA>
# 5                        <NA>
# 6          MR:::SS:::DL:::PPN
#                                                        identifier.type.coding.display
# 1                                                                                <NA>
# 2                                                                                <NA>
# 3                                                                                <NA>
# 4                                                                                <NA>
# 5                                                                                <NA>
# 6 Medical Record Number:::Social Security Number:::Driver's License:::Passport Number
#                                                                                                                                                                   identifier.type.coding.system
# 1                                                                                                                                                                                          <NA>
# 2                                                                                                                                                                                          <NA>
# 3                                                                                                                                                                                          <NA>
# 4                                                                                                                                                                                          <NA>
# 5                                                                                                                                                                                          <NA>
# 6 http://terminology.hl7.org/CodeSystem/v2-0203:::http://terminology.hl7.org/CodeSystem/v2-0203:::http://terminology.hl7.org/CodeSystem/v2-0203:::http://terminology.hl7.org/CodeSystem/v2-0203
#                                                                  identifier.type.text
# 1                                                                                <NA>
# 2                                                                                <NA>
# 3                                                                                <NA>
# 4                                                                                <NA>
# 5                                                                                <NA>
# 6 Medical Record Number:::Social Security Number:::Driver's License:::Passport Number
#                                                                       identifier.value
# 1                                                                                 <NA>
# 2                                                                                 <NA>
# 3                                                                                 <NA>
# 4                                                                                 <NA>
# 5                                                                                 <NA>
# 6 411669:::41166989-975d-4d17-b9de-17f94cb3eec1:::999-17-8717:::S99933732:::X75257608X
#   managingOrganization.reference maritalStatus.coding.code
# 1                           <NA>                      <NA>
# 2                           <NA>                      <NA>
# 3                           <NA>                      <NA>
# 4                           <NA>                      <NA>
# 5                           <NA>                      <NA>
# 6            Organization/636227                         M
#   maritalStatus.coding.display
# 1                         <NA>
# 2                         <NA>
# 3                         <NA>
# 4                         <NA>
# 5                         <NA>
# 6                            M
#                              maritalStatus.coding.system maritalStatus.text
# 1                                                   <NA>               <NA>
# 2                                                   <NA>               <NA>
# 3                                                   <NA>               <NA>
# 4                                                   <NA>               <NA>
# 5                                                   <NA>               <NA>
# 6 http://terminology.hl7.org/CodeSystem/v3-MaritalStatus                  M
#                meta.lastUpdated       meta.source meta.versionId
# 1 2020-03-23T16:12:33.294+00:00 #LUrUNxAhdZrFHftu              1
# 2 2020-03-24T06:19:22.991+00:00 #F6KTacg6zpZSnNLM              1
# 3 2020-08-07T14:25:55.860+00:00 #S9uv5jD2iAAi5WFP              3
# 4 2021-07-13T08:44:06.580+00:00 #qWzI8wftDtgcB3LC              6
# 5 2021-10-15T09:10:37.409+00:00 #JnxgDwhILtrVk2uc              5
# 6 2020-03-04T07:48:04.171+00:00 #zQK1HhDgSQFBnN3C              1
#   multipleBirthBoolean   name.family name.given name.prefix     name.text
# 1                 <NA>        Cooper     Xavier        <NA> Xavier Cooper
# 2                 <NA>           Hay      Harry        <NA>     Harry Hay
# 3                 <NA>        Walker      Pippa        <NA>  Pippa Walker
# 4                 <NA>        Singh2      Anna         <NA>          <NA>
# 5                 <NA>           Doe       John        <NA>      John Doe
# 6                false Stiedemann542   Aaron697         Mr.          <NA>
#   name.use telecom.system telecom.use telecom.value text.status
# 1 official           <NA>        <NA>          <NA>   generated
# 2 official           <NA>        <NA>          <NA>   generated
# 3 official           <NA>        <NA>          <NA>   generated
# 4     <NA>           <NA>        <NA>          <NA>   generated
# 5 official           <NA>        <NA>          <NA>   generated
# 6 official          phone        home  555-213-2064   generatedAs you can see, the result is a list of tables, one for Patient
resources and one for MedicationStatement resources. When you use
fhir_crack() with a fhir_desgn() instead of a
fhir_table_description, the result is an object of class
fhir_df_list or fhir_dt_list that also has the
design attached. You can extract the design from a list like this using
fhir_design():
fhir_design(list_of_tables)
# A fhir_design with 2 table descriptions:
# A fhir_table_description with the following elements: 
# 
# resource: MedicationStatement
# 
# cols: 
# ------------ ----------------------------------------
# column name | xpath expression
# ------------ ----------------------------------------
# ms_id       | id
# status_text | text/status
# status      | status
# med_system  | medicationCodeableConcept/coding/system
# med_code    | medicationCodeableConcept/coding/code
# ------------ ----------------------------------------
# 
# sep:           '|'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSE
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# An empty fhir_columns object
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSENote that this doesn’t work on single tables created with a
fhir_table_description.
If you want to save a design for later or to share with others, you
can do so using the fhir_save_design(). This function takes
a design and saves it as an xml file:
To read the design back into R, you can use
fhir_load_design():
fhir_load_design(paste0(temp_dir, "/design.xml"))
# A fhir_design with 2 table descriptions:
# A fhir_table_description with the following elements: 
# 
# resource: MedicationStatement
# 
# cols: 
# ------------ ----------------------------------------
# column name | xpath expression
# ------------ ----------------------------------------
# ms_id       | id
# status_text | text/status
# status      | status
# med_system  | medicationCodeableConcept/coding/system
# med_code    | medicationCodeableConcept/coding/code
# ------------ ----------------------------------------
# 
# sep:           '|'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSE
# A fhir_table_description with the following elements: 
# 
# resource: Patient
# 
# cols: 
# An empty fhir_columns object
# 
# sep:           ':::'
# brackets:      no brackets
# rm_empty_cols: FALSE
# format:        'compact'
# keep_attr:     FALSEA particularly complicated problem in flattening FHIR resources is caused by the fact that there can be multiple entries to an element. The profile according to which your FHIR resources have been built defines how often a particular element can appear in a resource. This is called the cardinality of the element. For example the Patient resource defined here can have zero or one birth dates but arbitrarily many addresses.
In the default setting, fhir_crack() will paste multiple
entries for the same element together in the table, using the separator
provided by the sep argument. In most cases this will work
just fine, but there are some special cases that require a little more
attention.
Let’s have a look at an example bundle containing just three Patient resources. You can make it available in your workspace like this:
This is how the xml looks:
<Bundle>
  <type value='searchset'/>
  <entry>
    <resource>
        <Patient>
            <id value='id1'/>
            <address>
                <use value='home'/>
                <city value='Amsterdam'/>
                <type value='physical'/>
                <country value='Netherlands'/>
            </address>
            <name>
                <given value='Marie'/>
            </name>
        </Patient>
    </resource>
  </entry>
  
  <entry>
    <resource>
        <Patient>
            <id value='id2'/>
            <address>
                <use value='home'/>
                <city value='Rome'/>
                <type value='physical'/>
                <country value='Italy'/>
            </address>
            <address>
                <use value='work'/>
                <city value='Stockholm'/>
                <type value='postal'/>
                <country value='Sweden'/>
            </address>
            <name>
                <given value='Susie'/>
            </name>
        </Patient>
    </resource>
  </entry>
  
  <entry>
    <resource>
        <Patient>
            <id value='id3'/>
            <address>
                <use value='home'/>
                <city value='Berlin'/>
            </address>
            <address>
                <type value='postal'/>
                <country value='France'/>
            </address>
            <address>
                <use value='work'/>
                <city value='London'/>
                <type value='postal'/>
                <country value='England'/>
            </address>
            <name>
                <given value='Frank'/>
            </name>
            <name>
                <given value='Max'/>
            </name>
        </Patient>
    </resource>
  </entry>
</Bundle>This bundle contains three Patient resources. The first resource has just one entry for the address attribute. The second Patient resource has two entries containing the same elements for the address attribute. The third Patient resource has a rather messy address attribute, with three entries containing different elements and also two entries for the name attribute.
Let’s see what happens if we extract all attributes:
desc1 <- fhir_table_description(resource = "Patient", sep = " | ")
df1 <- fhir_crack(bundles = bundle, design = desc1, verbose = 0)
df1
#       address.city  address.country      address.type address.use  id
# 1        Amsterdam      Netherlands          physical        home id1
# 2 Rome | Stockholm   Italy | Sweden physical | postal home | work id2
# 3  Berlin | London France | England   postal | postal home | work id3
#    name.given
# 1       Marie
# 2       Susie
# 3 Frank | MaxAs you can see, multiple entries for the same attribute (address and
name) are pasted together. This works fine for Patient 2, but for
Patient 3 you can see a problem with the number of entries that are
displayed. The original Patient resource had three (incomplete)
address entries, but because the first two of them use
complementary elements (use and city
vs. type and country), the resulting pasted
entries look like there had just been two entries for the
address attribute.
You can counter this problem by setting brackets:
desc2 <- fhir_table_description(
    resource = "Patient",
    sep      = " | ",
    brackets = c("[", "]")
)
df2 <- fhir_crack(bundles = bundle, design = desc2, verbose = 0)
df2
#                 address.city            address.country
# 1             [1.1]Amsterdam           [1.1]Netherlands
# 2 [1.1]Rome | [2.1]Stockholm   [1.1]Italy | [2.1]Sweden
# 3  [1.1]Berlin | [3.1]London [2.1]France | [3.1]England
#                  address.type           address.use     id
# 1               [1.1]physical             [1.1]home [1]id1
# 2 [1.1]physical | [2.1]postal [1.1]home | [2.1]work [1]id2
# 3   [2.1]postal | [3.1]postal [1.1]home | [3.1]work [1]id3
#              name.given
# 1            [1.1]Marie
# 2            [1.1]Susie
# 3 [1.1]Frank | [2.1]MaxNow the indices display the entry the value belongs to. That way you
can see that Patient resource 3 had three entries for the attribute
address and you can also see which attributes belong to
which entry.
If you set the format argument to wide, the
entries are spread over multiple columns and the indices are attached to
column name:
df3 <- fhir_crack(bundles = bundle, design = desc2, format = "wide", verbose = 0)
df3
#   [1.1]address.city [2.1]address.city [3.1]address.city [1.1]address.country
# 1         Amsterdam              <NA>              <NA>          Netherlands
# 2              Rome         Stockholm              <NA>                Italy
# 3            Berlin              <NA>            London                 <NA>
#   [2.1]address.country [3.1]address.country [1.1]address.type [2.1]address.type
# 1                 <NA>                 <NA>          physical              <NA>
# 2               Sweden                 <NA>          physical            postal
# 3               France              England              <NA>            postal
#   [3.1]address.type [1.1]address.use [2.1]address.use [3.1]address.use [1]id
# 1              <NA>             home             <NA>             <NA>   id1
# 2              <NA>             home             work             <NA>   id2
# 3            postal             home             <NA>             work   id3
#   [1.1]name.given [2.1]name.given
# 1           Marie            <NA>
# 2           Susie            <NA>
# 3           Frank             MaxOf course the above example is a very specific case that only occurs if your resources have multiple entries with complementary elements. In the majority of cases multiple entries in one resource will have the same structure, thus making numbering of those entries superfluous. But the indices also help to disentangle those entries and put them in separate rows, as you’ll see in the next paragraph.
To avoid multiple entries in your table altogether, you can select which of the multiple elements you want to keep during the cracking process. You can achieve this using predicates in your Xpath expressions.
In the following table description, all address elements are only
taken from addresses that have the value “physical” for in
address/type and the value “home” in
address/use.
desc3 <- fhir_table_description(
    resource = "Patient",
    cols = c(id = "id",
             name = "name/given",
             address.city = "address[type[@value='physical'] and use[@value='home']]/city",
             address.country = "address[type[@value='physical'] and use[@value='home']]/country"
             )
)
df_selected <- fhir_crack(bundles = bundle, design = desc3, verbose = 0)
df_selected
#    id        name address.city address.country
# 1 id1       Marie    Amsterdam     Netherlands
# 2 id2       Susie         Rome           Italy
# 3 id3 Frank:::Max         <NA>            <NA>The general formulation is
element[filterChildElement[@value="filterValue"]]/childElement,
where
element is the fhir element that occurs multiple times
(here address)filterChildElement is a child of element
that is used for filtering (here address/type and
address/use)filterValue is the value to filter by (here
'physical' and 'home')childElement is the element that is actually extracted
(here address/city and address/country)Another example is the following Observation resources bundle that has loinc and snomed codes, that can be cracked into a table that only contains loinc codes:
<Bundle>
  <type value="searchset"/>
  <entry>
    <resource>
      <Observation>
        <id value="obs1"/>
        <code>
          <coding>
            <system value="http://loinc.org"/>
            <code value="29463-7"/>
            <display value="Body Weight"/>
          </coding>
          <coding>
            <system value="http://snomed.info/sct"/>
            <code value="27113001"/>
            <display value="Body weight"/>
          </coding>
        </code>
        <subject>
          <reference value="Patient/id2"/>
        </subject>
      </Observation>
    </resource>
  </entry>
  <entry>
    <resource>
      <Observation>
        <id value="obs2"/>
        <code>
          <coding>
            <system value="http://loinc.org"/>
            <code value="8302-2"/>
            <display value="Body Height"/>
          </coding>
          <coding>
            <system value="http://snomed.info/sct"/>
            <code value="50373000"/>
            <display value="Body height measure"/>
          </coding>
        </code>
        <subject>
          <reference value="Patient/id2"/>
        </subject>
      </Observation>
    </resource>
  </entry>
</Bundle>bundle2 <- fhir_unserialize(bundles = example_bundles5)
desc4 <- fhir_table_description(resource = "Observation",
                                cols = c(
                                    id = "id",
                                    code = "code/coding[system[@value='http://loinc.org']]/code",
                                    display = "code/coding[system[@value='http://loinc.org']]/display")
                                     )
df_selected2 <- fhir_crack(bundles = bundle2,
                    design = desc4,
                    verbose = F)
df_selected2
#     id    code     display
# 1 obs1 29463-7 Body Weight
# 2 obs2  8302-2 Body HeightIn some cases, you won’t be able to filter elements during the
cracking process, e.g. because you don’t know what to filter for
beforehand. In that case, the table produced by
fhir_crack() will contain multiple entries, which you’ll
probably want to divide into distinct cells at some point. Apart from
directly spreading those values over multiple columns by using a
wide cracking format, the fhircrackr gives you two options
to get from a compact table with multiple entries to either a long or a
wide format: fhir_melt() and fhir_cast(). The
former spreads the entries across rows, creating a long format, the
latter spreads them across columns, creating a wide format.
fhir_melt() takes an indexed data frame with multiple
entries in one or several columns and spreads (aka melts)
the entries in columns over several rows:
 fhir_melt(
    indexed_data_frame = df2,
    columns            = "address.city",
    brackets           = c("[", "]"),
    sep                = " | ",
    all_columns        = TRUE
 )
#   address.city            address.country                address.type
# 1 [1]Amsterdam           [1.1]Netherlands               [1.1]physical
# 2      [1]Rome   [1.1]Italy | [2.1]Sweden [1.1]physical | [2.1]postal
# 3 [1]Stockholm   [1.1]Italy | [2.1]Sweden [1.1]physical | [2.1]postal
# 4    [1]Berlin [2.1]France | [3.1]England   [2.1]postal | [3.1]postal
# 5         <NA> [2.1]France | [3.1]England   [2.1]postal | [3.1]postal
# 6    [1]London [2.1]France | [3.1]England   [2.1]postal | [3.1]postal
#             address.use     id            name.given resource_identifier
# 1             [1.1]home [1]id1            [1.1]Marie                   1
# 2 [1.1]home | [2.1]work [1]id2            [1.1]Susie                   2
# 3 [1.1]home | [2.1]work [1]id2            [1.1]Susie                   2
# 4 [1.1]home | [3.1]work [1]id3 [1.1]Frank | [2.1]Max                   3
# 5 [1.1]home | [3.1]work [1]id3 [1.1]Frank | [2.1]Max                   3
# 6 [1.1]home | [3.1]work [1]id3 [1.1]Frank | [2.1]Max                   3The new variable resource_identifier records which rows
in the created data frame belong to which row (usually equivalent to one
resource) in the original data frame. brackets and
sep should be given the same character vectors that have
been used to build the indices in fhir_crack().
columns is a character vector with the names of the
variables you want to melt. You can provide more than one column here
but it makes sense to only have variables from the same repeating
attribute together in one call to fhir_melt():
cols <- c("address.city", "address.use", "address.type", "address.country")
 
fhir_melt(
    indexed_data_frame = df2,
    columns            = cols,
    brackets           = c("[", "]"), 
    sep                = " | ",
    all_columns        = TRUE
)
#   address.city address.country address.type address.use     id
# 1 [1]Amsterdam  [1]Netherlands  [1]physical     [1]home [1]id1
# 2      [1]Rome        [1]Italy  [1]physical     [1]home [1]id2
# 3 [1]Stockholm       [1]Sweden    [1]postal     [1]work [1]id2
# 4    [1]Berlin            <NA>         <NA>     [1]home [1]id3
# 5         <NA>       [1]France    [1]postal        <NA> [1]id3
# 6    [1]London      [1]England    [1]postal     [1]work [1]id3
#              name.given resource_identifier
# 1            [1.1]Marie                   1
# 2            [1.1]Susie                   2
# 3            [1.1]Susie                   2
# 4 [1.1]Frank | [2.1]Max                   3
# 5 [1.1]Frank | [2.1]Max                   3
# 6 [1.1]Frank | [2.1]Max                   3If the names of the variables in your data frame have been generated
automatically with fhir_crack() you can find all variable
names belonging to the same attribute with
fhir_common_columns():
cols <- fhir_common_columns(data_frame = df2, column_names_prefix = "address")
cols
# [1] "address.city"    "address.country" "address.type"    "address.use"With the argument all_columns you can control whether
the resulting data frame contains only the molten columns or all columns
of the original data frame:
fhir_melt(
    indexed_data_frame = df2,
    columns            = cols,
    brackets           = c("[", "]"), 
    sep                = " | ",
    all_columns        = FALSE
)
#   resource_identifier address.city address.country address.type address.use
# 1                   1 [1]Amsterdam  [1]Netherlands  [1]physical     [1]home
# 2                   2      [1]Rome        [1]Italy  [1]physical     [1]home
# 3                   2 [1]Stockholm       [1]Sweden    [1]postal     [1]work
# 4                   3    [1]Berlin            <NA>         <NA>     [1]home
# 5                   3         <NA>       [1]France    [1]postal        <NA>
# 6                   3    [1]London      [1]England    [1]postal     [1]workValues on the other variables will just repeat in the newly created rows.
If you try to melt several variables that don’t belong to the same
element in one call to fhir_melt(), this will cause
problems, because the different elements won’t be combined
correctly:
cols <- c(cols, "name.given")
fhir_melt(
    indexed_data_frame = df2,
    columns            = cols,
    brackets           = c("[", "]"), 
    sep                = " | ",
    all_columns        = TRUE
)
#   address.city address.country address.type address.use     id name.given
# 1 [1]Amsterdam  [1]Netherlands  [1]physical     [1]home [1]id1   [1]Marie
# 2      [1]Rome        [1]Italy  [1]physical     [1]home [1]id2   [1]Susie
# 3 [1]Stockholm       [1]Sweden    [1]postal     [1]work [1]id2       <NA>
# 4    [1]Berlin            <NA>         <NA>     [1]home [1]id3   [1]Frank
# 5         <NA>       [1]France    [1]postal        <NA> [1]id3     [1]Max
# 6    [1]London      [1]England    [1]postal     [1]work [1]id3       <NA>
#   resource_identifier
# 1                   1
# 2                   2
# 3                   2
# 4                   3
# 5                   3
# 6                   3Instead, melt the attributes one after another:
cols <- fhir_common_columns(data_frame = df2, column_names_prefix = "address")
molten_1 <- fhir_melt(
    indexed_data_frame = df2,
    columns            = cols,
    brackets           = c("[", "]"),
    sep                = " | ",
    all_columns        = TRUE
)
molten_1
#   address.city address.country address.type address.use     id
# 1 [1]Amsterdam  [1]Netherlands  [1]physical     [1]home [1]id1
# 2      [1]Rome        [1]Italy  [1]physical     [1]home [1]id2
# 3 [1]Stockholm       [1]Sweden    [1]postal     [1]work [1]id2
# 4    [1]Berlin            <NA>         <NA>     [1]home [1]id3
# 5         <NA>       [1]France    [1]postal        <NA> [1]id3
# 6    [1]London      [1]England    [1]postal     [1]work [1]id3
#              name.given resource_identifier
# 1            [1.1]Marie                   1
# 2            [1.1]Susie                   2
# 3            [1.1]Susie                   2
# 4 [1.1]Frank | [2.1]Max                   3
# 5 [1.1]Frank | [2.1]Max                   3
# 6 [1.1]Frank | [2.1]Max                   3
molten_2 <- fhir_melt(
    indexed_data_frame = molten_1,
    columns            = "name.given",
    brackets           = c("[", "]"),
    sep                = " | ",
    all_columns        = TRUE
)
molten_2
#   address.city address.country address.type address.use     id name.given
# 1 [1]Amsterdam  [1]Netherlands  [1]physical     [1]home [1]id1   [1]Marie
# 2      [1]Rome        [1]Italy  [1]physical     [1]home [1]id2   [1]Susie
# 3 [1]Stockholm       [1]Sweden    [1]postal     [1]work [1]id2   [1]Susie
# 4    [1]Berlin            <NA>         <NA>     [1]home [1]id3   [1]Frank
# 5    [1]Berlin            <NA>         <NA>     [1]home [1]id3     [1]Max
# 6         <NA>       [1]France    [1]postal        <NA> [1]id3   [1]Frank
# 7         <NA>       [1]France    [1]postal        <NA> [1]id3     [1]Max
# 8    [1]London      [1]England    [1]postal     [1]work [1]id3   [1]Frank
# 9    [1]London      [1]England    [1]postal     [1]work [1]id3     [1]Max
#   resource_identifier
# 1                   1
# 2                   2
# 3                   3
# 4                   4
# 5                   4
# 6                   5
# 7                   5
# 8                   6
# 9                   6This will give you the appropriate product of all multiple entries.
Once you have sorted out the multiple entries, you might want to get
rid of the indices in your table. This can be achieved using
fhir_rm_indices():
fhir_rm_indices(indexed_data_frame = molten_2, brackets = c("[", "]"))
#   address.city address.country address.type address.use  id name.given
# 1    Amsterdam     Netherlands     physical        home id1      Marie
# 2         Rome           Italy     physical        home id2      Susie
# 3    Stockholm          Sweden       postal        work id2      Susie
# 4       Berlin            <NA>         <NA>        home id3      Frank
# 5       Berlin            <NA>         <NA>        home id3        Max
# 6         <NA>          France       postal        <NA> id3      Frank
# 7         <NA>          France       postal        <NA> id3        Max
# 8       London         England       postal        work id3      Frank
# 9       London         England       postal        work id3        Max
#   resource_identifier
# 1                   1
# 2                   2
# 3                   3
# 4                   4
# 5                   4
# 6                   5
# 7                   5
# 8                   6
# 9                   6Again, brackets and sep should be given the
same character vector that was used for fhir_crack() and
fhir_melt() respectively.
If you want to melt all multiple entries in a table regardless of
their origin, you can use the function fhir_melt_all():
fhir_melt_all(indexed_data_frame = df2, brackets = c("[", "]"), sep = " | ")
#   address.city address.country address.type address.use  id name.given
# 1    Amsterdam     Netherlands     physical        home id1      Marie
# 2         Rome           Italy     physical        home id2      Susie
# 3    Stockholm          Sweden       postal        work id2      Susie
# 4       Berlin            <NA>         <NA>        home id3      Frank
# 5       Berlin            <NA>         <NA>        home id3        Max
# 6         <NA>          France       postal        <NA> id3      Frank
# 7         <NA>          France       postal        <NA> id3        Max
# 8       London         England       postal        work id3      Frank
# 9       London         England       postal        work id3        MaxThis function performs the above steps automatically and repeatedly
calls fhir_melt() on groups of columns that belong to the
same FHIR element (e.g. address.city,
address.country and address.type) until every
cell contains a single value. If there is more than one FHIR element
with multiple values (e.g. multiple address elements and multiple name
elements), every possible combination of the two elements will appear in
the resulting table.
Caution! This creates something like a cross product of all values and can multiply the number of rows from the original table considerably.
Instead of spreading the entries across rows, you can also spread
them across columns using fhir_cast(). As you’ve seen above
this can be achieved by setting format = "wide" in
fhir_crack(). There is, however, a function that turns a
compact table into a wide table and this
function is fhir_cast(). It takes a compact table with
multiple entries and the brackets and separator that have been used in
fhir_crack() as input:
fhir_cast(df2, brackets = c("[", "]"), sep = " | ", verbose = 0)
#   [1.1]address.city [2.1]address.city [3.1]address.city [1.1]address.country
# 1         Amsterdam              <NA>              <NA>          Netherlands
# 2              Rome         Stockholm              <NA>                Italy
# 3            Berlin              <NA>            London                 <NA>
#   [2.1]address.country [3.1]address.country [1.1]address.type [2.1]address.type
# 1                 <NA>                 <NA>          physical              <NA>
# 2               Sweden                 <NA>          physical            postal
# 3               France              England              <NA>            postal
#   [3.1]address.type [1.1]address.use [2.1]address.use [3.1]address.use [1]id
# 1              <NA>             home             <NA>             <NA>   id1
# 2              <NA>             home             work             <NA>   id2
# 3            postal             home             <NA>             work   id3
#   [1.1]name.given [2.1]name.given
# 1           Marie            <NA>
# 2           Susie            <NA>
# 3           Frank             MaxContrary to fhir_melt() this function requires all
column names to reflect the XPath expression of the respective
attribute. The column containing information on
address/city for example has to be named
adress.city because the information of the indices is
incorporated in those names to avoid duplicate column names. This column
naming scheme is automatically used when you don’t give explicit column
names in the table_description/design for fhir_crack() so
it makes sense to only cast tables that have automatically generated
column names.
The tables produced by fhir_crack(..., format = "wide")
and fhir_cast() can also be used to recreate the resources
that were cracked in the first place, as you’ll the in the vignette
about recreation of resources.
In some cases, you don’t want to split up multiple entries but collapse them into one value in a suitable way. Consider the following example bundle:
<Bundle>
    <type value='searchset'/>
    <entry>
    <resource>
        <Patient>
            <id value='id1'/>
            <name>
                <given value='Marie'/>
                <given value='Luise'/>
                <family value = 'Smith'/>
                <use value = 'official'/>
            </name>
            <name>
                <given value = 'Lea'/>
                <given value = 'Sophie'/>
                <given value = 'Anna'/>
                <family value = 'Baker'/>
                <use value = 'nickname'/>
            </name>
        </Patient>
     </resource>
  </entry>
  <entry>
    <resource>
        <Patient>
            <id value='id2'/>
            <name>
                <given value='Max'/>
                <family value = 'Brown'/>
                <use value = 'official'/>
            </name>
            <name>
                <given value = 'Anton'/>
                <given value = 'John'/>
                <family value = 'Jones'/>
                <use value = 'nickname'/>
            </name>
        </Patient>
    </resource>
  </entry>
</Bundle>In this example, you would want to collapse all given names into one
value instead of dividing them across multiple rows. The official name
and the nickname, however, should stay separated. This can be achieved
with the function fhir_collapse(). First we crack the
example resources:
#unserialize example
bundles <- fhir_unserialize(bundles = example_bundles7)
#Define sep and brackets
sep <- "|"
brackets <- c("[", "]")
#crack fhir resources
table_desc <- fhir_table_description(
    resource = "Patient",
    brackets = brackets,
    sep = sep
)
df <- fhir_crack(bundles = bundles, design = table_desc, verbose = 0)
df
#       id           name.family
# 1 [1]id1 [1.1]Smith|[2.1]Baker
# 2 [1]id2 [1.1]Brown|[2.1]Jones
#                                             name.given
# 1 [1.1]Marie|[1.2]Luise|[2.1]Lea|[2.2]Sophie|[2.3]Anna
# 2                        [1.1]Max|[2.1]Anton|[2.2]John
#                      name.use
# 1 [1.1]official|[2.1]nickname
# 2 [1.1]official|[2.1]nicknameThen we collapse the given names. The function uses the information in the indices to make sure it only collapses given names within the same name element (official vs. nickname):
#name.given elements from the same name (i.e. the official vs. the nickname) 
#should be collapsed
df2 <- fhir_collapse(df, columns = "name.given", sep = sep, brackets = brackets)
df2
#       id           name.family                            name.given
# 1 [1]id1 [1.1]Smith|[2.1]Baker [1.1]Marie Luise [2.1]Lea Sophie Anna
# 2 [1]id2 [1.1]Brown|[2.1]Jones              [1.1]Max [2.1]Anton John
#                      name.use
# 1 [1.1]official|[2.1]nickname
# 2 [1.1]official|[2.1]nicknameAfter collapsing the given names, we can melt the table to split apart the official and the nickname:
df2_molten <- fhir_melt(indexed_data_frame =  df2, 
                        brackets = brackets, 
                        sep = sep, 
                        columns = fhir_common_columns(df2,"name"),
                        all_columns = TRUE
                        )
df2_molten
#       id name.family         name.given    name.use resource_identifier
# 1 [1]id1    [1]Smith    [1]Marie Luise  [1]official                   1
# 2 [1]id1    [1]Baker [1]Lea Sophie Anna [1]nickname                   1
# 3 [1]id2    [1]Brown            [1]Max  [1]official                   2
# 4 [1]id2    [1]Jones      [1]Anton John [1]nickname                   2And finish off by removing the indices: