Introduction to PHEindicatormethods

Georgina Anderson

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following core functions available within PHEindicatormethods:

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
calculate_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Other functions are introduced in separate vignettes.

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  14 148
#> 2 Area2 2015  76 174
#> 3 Area3 2015  98 125
#> 4 Area4 2015  97 109
#> 5 Area1 2016   1 150
#> 6 Area2 2016  45 150
#> 7 Area3 2016  65 105
#> 8 Area4 2016  23 141

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop       value     lowercl    uppercl confidence
#> 1 Area1 2015  14 148 0.094594595 0.057185755 0.15251625        95%
#> 2 Area2 2015  76 174 0.436781609 0.365238623 0.51105569        95%
#> 3 Area3 2015  98 125 0.784000000 0.703972958 0.84709190        95%
#> 4 Area4 2015  97 109 0.889908257 0.817377280 0.93589196        95%
#> 5 Area1 2016   1 150 0.006666667 0.001177801 0.03679284        95%
#> 6 Area2 2016  45 150 0.300000000 0.232408294 0.37757980        95%
#> 7 Area3 2016  65 105 0.619047619 0.523517011 0.70617488        95%
#> 8 Area4 2016  23 141 0.163120567 0.111224465 0.23288598        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence = 99.8)
#>    area year obs pop       value      lowercl   uppercl confidence
#> 1 Area1 2015  14 148 0.094594595 0.0430418185 0.1952930      99.8%
#> 2 Area2 2015  76 174 0.436781609 0.3268909758 0.5532504      99.8%
#> 3 Area3 2015  98 125 0.784000000 0.6523743704 0.8753123      99.8%
#> 4 Area4 2015  97 109 0.889908257 0.7642746772 0.9527252      99.8%
#> 5 Area1 2016   1 150 0.006666667 0.0005819015 0.0718065      99.8%
#> 6 Area2 2016  45 150 0.300000000 0.1992211268 0.4247201      99.8%
#> 7 Area3 2016  65 105 0.619047619 0.4685581525 0.7496880      99.8%
#> 8 Area4 2016  23 141 0.163120567 0.0890127738 0.2799657      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify multiplier to output proportions as percentages
phe_proportion(df, obs, pop, multiplier = 100)
#>    area year obs pop      value    lowercl   uppercl confidence  statistic
#> 1 Area1 2015  14 148  9.4594595  5.7185755 15.251625        95% percentage
#> 2 Area2 2015  76 174 43.6781609 36.5238623 51.105569        95% percentage
#> 3 Area3 2015  98 125 78.4000000 70.3972958 84.709190        95% percentage
#> 4 Area4 2015  97 109 88.9908257 81.7377280 93.589196        95% percentage
#> 5 Area1 2016   1 150  0.6666667  0.1177801  3.679284        95% percentage
#> 6 Area2 2016  45 150 30.0000000 23.2408294 37.757980        95% percentage
#> 7 Area3 2016  65 105 61.9047619 52.3517011 70.617488        95% percentage
#> 8 Area4 2016  23 141 16.3120567 11.1224465 23.288598        95% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify multiplier for proportion, confidence level and remove metadata columns
phe_proportion(df, obs, pop, confidence = 99.8, multiplier = 100, type = "standard")
#>    area year obs pop      value     lowercl  uppercl
#> 1 Area1 2015  14 148  9.4594595  4.30418185 19.52930
#> 2 Area2 2015  76 174 43.6781609 32.68909758 55.32504
#> 3 Area3 2015  98 125 78.4000000 65.23743704 87.53123
#> 4 Area4 2015  97 109 88.9908257 76.42746772 95.27252
#> 5 Area1 2016   1 150  0.6666667  0.05819015  7.18065
#> 6 Area2 2016  45 150 30.0000000 19.92211268 42.47201
#> 7 Area3 2016  65 105 61.9047619 46.85581525 74.96880
#> 8 Area4 2016  23 141 16.3120567  8.90127738 27.99657

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop      value     lowercl    uppercl confidence
#> 1 Area1 2015  14 148  9459.4595  5167.25047  15872.296        95%
#> 2 Area2 2015  76 174 43678.1609 34412.14672  54670.448        95%
#> 3 Area3 2015  98 125 78400.0000 63647.38292  95545.409        95%
#> 4 Area4 2015  97 109 88990.8257 72163.80324 108562.320        95%
#> 5 Area1 2016   1 150   666.6667    16.87854   3714.429        95%
#> 6 Area2 2016  45 150 30000.0000 21880.16900  40143.308        95%
#> 7 Area3 2016  65 105 61904.7619 47774.46449  78903.941        95%
#> 8 Area4 2016  23 141 16312.0567 10337.14233  24477.170        95%
#>         statistic method
#> 1 rate per 100000  Byars
#> 2 rate per 100000  Byars
#> 3 rate per 100000  Byars
#> 4 rate per 100000  Byars
#> 5 rate per 100000  Exact
#> 6 rate per 100000  Byars
#> 7 rate per 100000  Byars
#> 8 rate per 100000  Byars

# specify multiplier for rate and confidence level
phe_rate(df, obs, pop, confidence = 99.8, multiplier = 100)
#>    area year obs pop      value      lowercl    uppercl confidence    statistic
#> 1 Area1 2015  14 148  9.4594595 3.483328e+00  20.204526      99.8% rate per 100
#> 2 Area2 2015  76 174 43.6781609 2.980402e+01  61.499290      99.8% rate per 100
#> 3 Area3 2015  98 125 78.4000000 5.617104e+01 106.107951      99.8% rate per 100
#> 4 Area4 2015  97 109 88.9908257 6.364223e+01 120.623331      99.8% rate per 100
#> 5 Area1 2016   1 150  0.6666667 6.670002e-04   6.155609      99.8% rate per 100
#> 6 Area2 2016  45 150 30.0000000 1.803481e+01  46.574638      99.8% rate per 100
#> 7 Area3 2016  65 105 61.9047619 4.083834e+01  89.524087      99.8% rate per 100
#> 8 Area4 2016  23 141 16.3120567 7.752330e+00  29.828996      99.8% rate per 100
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Exact
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify multiplier for rate, confidence level and remove metadata columns
phe_rate(df, obs, pop, type = "standard", confidence = 99.8, multiplier = 100)
#>    area year obs pop      value      lowercl    uppercl
#> 1 Area1 2015  14 148  9.4594595 3.483328e+00  20.204526
#> 2 Area2 2015  76 174 43.6781609 2.980402e+01  61.499290
#> 3 Area3 2015  98 125 78.4000000 5.617104e+01 106.107951
#> 4 Area4 2015  97 109 88.9908257 6.364223e+01 120.623331
#> 5 Area1 2016   1 150  0.6666667 6.670002e-04   6.155609
#> 6 Area2 2016  45 150 30.0000000 1.803481e+01  46.574638
#> 7 Area3 2016  65 105 61.9047619 4.083834e+01  89.524087
#> 8 Area4 2016  23 141 16.3120567 7.752330e+00  29.828996

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   285   556 0.513   0.471   0.554 95%        proportion of 1 Wilson
#> 2  2016   134   546 0.245   0.211   0.283 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   285   556 51259.  45480.  57569. 95%        rate per 100000 Byars 
#> 2  2016   134   546 24542.  20563.  29067. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count   stdev  value  lowercl  uppercl confidence statistic
#> 1       419           8 37.4545 52.375 21.06226 83.68774        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence = 0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#> 1  2015       285           4  39.5  71.2   -130.    273. 99.8%      mean     
#> 2  2016       134           4  27.6  33.5   -108.    175. 99.8%      mean     
#> # ℹ 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence = 0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       285           4  39.5  71.2   -130.    273.
#> 2  2016       134           4  27.6  33.5   -108.    175.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  98 19501
#> 2 Area1 2006 Male       5  86 10608
#> 3 Area1 2006 Male      10 105 19243
#> 4 Area1 2006 Male      15   9 19771
#> 5 Area1 2006 Male      20 116 10870
#> 6 Area1 2006 Male      25 107 14685

Execute calculate_dsr

INPUT: The minimum input requirement for the calculate_dsr function is a single data frame with columns representing the numerators and denominators and standard populations for each standardisation category. The standard populations must be appended to the input data frame by the user prior to execution of the function. The 2013 European Standard Population is provided within the package in vector form (esp2013), which you can join to your dataset. Alternative standard populations can be used but must be provided by the user.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output. It is also possible to calculate CIs when we can’t assume events are independent - further details can be found in the DSR vignette.

Here are some example code chunks to demonstrate the calculate_dsr function and the arguments that can optionally be specified


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        2052    285877  792.    756.    829. 95%       
#>  2 Area1  2006 Male          1614    289807  663.    629.    699. 95%       
#>  3 Area1  2007 Female        2118    291239  746.    713.    780. 95%       
#>  4 Area1  2007 Male          1963    281201  719.    686.    754. 95%       
#>  5 Area1  2008 Female        1880    273984  716.    682.    751. 95%       
#>  6 Area1  2008 Male          2127    300111  706.    674.    738. 95%       
#>  7 Area1  2009 Female        1747    283821  580.    550.    610. 95%       
#>  8 Area1  2009 Male          2159    281163  811.    774.    849. 95%       
#>  9 Area1  2010 Female        1877    278715  732.    697.    768. 95%       
#> 10 Area1  2010 Male          1810    286097  668.    635.    702. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop, type = "standard")
#> # A tibble: 40 × 8
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        2052    285877  792.    756.    829.
#>  2 Area1  2006 Male          1614    289807  663.    629.    699.
#>  3 Area1  2007 Female        2118    291239  746.    713.    780.
#>  4 Area1  2007 Male          1963    281201  719.    686.    754.
#>  5 Area1  2008 Female        1880    273984  716.    682.    751.
#>  6 Area1  2008 Male          2127    300111  706.    674.    738.
#>  7 Area1  2009 Female        1747    283821  580.    550.    610.
#>  8 Area1  2009 Male          2159    281163  811.    774.    849.
#>  9 Area1  2010 Female        1877    278715  732.    697.    768.
#> 10 Area1  2010 Male          1810    286097  668.    635.    702.
#> # ℹ 30 more rows

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
  filter(ageband <= 70) %>%
  mutate(refpop = rep(esp2013[1:15], 40)) %>%
  group_by(area, year, sex) %>%
  calculate_dsr(obs, pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1858    234685  843.    804.    884. 95%       
#>  2 Area1  2006 Male          1399    227444  701.    663.    740. 95%       
#>  3 Area1  2007 Female        1727    227760  758.    722.    795. 95%       
#>  4 Area1  2007 Male          1508    219439  712.    676.    749. 95%       
#>  5 Area1  2008 Female        1626    220197  749.    712.    787. 95%       
#>  6 Area1  2008 Male          1696    235777  717.    682.    752. 95%       
#>  7 Area1  2009 Female        1204    226379  543.    512.    576. 95%       
#>  8 Area1  2009 Male          1684    221656  819.    779.    860. 95%       
#>  9 Area1  2010 Female        1606    218730  752.    714.    791. 95%       
#> 10 Area1  2010 Male          1430    217660  687.    651.    724. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

Execute calculate_ISRatio and calculate_ISRate

INPUT: These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   780 122561
#> 2       5   877 100412
#> 3      10   498 124111
#> 4      15   680 118401
#> 5      20   938 131301
#> 6      25   578 111035

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female     2052    1890. 1.09    1.04    1.13  95%       
#>  2 Area1  2006 Male       1614    1904. 0.848   0.807   0.890 95%       
#>  3 Area1  2007 Female     2118    1981. 1.07    1.02    1.12  95%       
#>  4 Area1  2007 Male       1963    1856. 1.06    1.01    1.11  95%       
#>  5 Area1  2008 Female     1880    1822. 1.03    0.986   1.08  95%       
#>  6 Area1  2008 Male       2127    2041. 1.04    0.998   1.09  95%       
#>  7 Area1  2009 Female     1747    1906. 0.917   0.874   0.961 95%       
#>  8 Area1  2009 Male       2159    1869. 1.15    1.11    1.20  95%       
#>  9 Area1  2010 Female     1877    1859. 1.01    0.964   1.06  95%       
#> 10 Area1  2010 Male       1810    1932. 0.937   0.894   0.981 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype = "field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     2052    1890. 1.09    1.04    1.13 
#>  2 Area1  2006 Male       1614    1904. 0.848   0.807   0.890
#>  3 Area1  2007 Female     2118    1981. 1.07    1.02    1.12 
#>  4 Area1  2007 Male       1963    1856. 1.06    1.01    1.11 
#>  5 Area1  2008 Female     1880    1822. 1.03    0.986   1.08 
#>  6 Area1  2008 Male       2127    2041. 1.04    0.998   1.09 
#>  7 Area1  2009 Female     1747    1906. 0.917   0.874   0.961
#>  8 Area1  2009 Male       2159    1869. 1.15    1.11    1.20 
#>  9 Area1  2010 Female     1877    1859. 1.01    0.964   1.06 
#> 10 Area1  2010 Male       1810    1932. 0.937   0.894   0.981
#> # ℹ 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema…     2052    1890.     666.  723.    692.    755. 95%       
#>  2 Area1  2006 Male      1614    1904.     666.  565.    537.    593. 95%       
#>  3 Area1  2007 Fema…     2118    1981.     666.  712.    682.    743. 95%       
#>  4 Area1  2007 Male      1963    1856.     666.  704.    674.    736. 95%       
#>  5 Area1  2008 Fema…     1880    1822.     666.  687.    656.    719. 95%       
#>  6 Area1  2008 Male      2127    2041.     666.  694.    665.    724. 95%       
#>  7 Area1  2009 Fema…     1747    1906.     666.  611.    582.    640. 95%       
#>  8 Area1  2009 Male      2159    1869.     666.  769.    737.    802. 95%       
#>  9 Area1  2010 Fema…     1877    1859.     666.  672.    642.    703. 95%       
#> 10 Area1  2010 Male      1810    1932.     666.  624.    596.    653. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype = "field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     2052    1890.     666.  723.    692.    755.
#>  2 Area1  2006 Male       1614    1904.     666.  565.    537.    593.
#>  3 Area1  2007 Female     2118    1981.     666.  712.    682.    743.
#>  4 Area1  2007 Male       1963    1856.     666.  704.    674.    736.
#>  5 Area1  2008 Female     1880    1822.     666.  687.    656.    719.
#>  6 Area1  2008 Male       2127    2041.     666.  694.    665.    724.
#>  7 Area1  2009 Female     1747    1906.     666.  611.    582.    640.
#>  8 Area1  2009 Male       2159    1869.     666.  769.    737.    802.
#>  9 Area1  2010 Female     1877    1859.     666.  672.    642.    703.
#> 10 Area1  2010 Male       1810    1932.     666.  624.    596.    653.
#> # ℹ 30 more rows