In some situations, you may want to use encodefrom() to
collapse values, that is, group unique raw values into a smaller set of
clean values / labels. For example, say you have the following data set,
which gives each state’s census division number and name:
| id | state | cendiv | cendiv_name | 
|---|---|---|---|
| 1 | AL | 6 | East South Central | 
| 2 | AK | 9 | Pacific | 
| 3 | AZ | 8 | Mountain | 
| 4 | AR | 7 | West South Central | 
| 5 | CA | 9 | Pacific | 
| 6 | CO | 8 | Mountain | 
| 7 | CT | 1 | New England | 
| 8 | DE | 5 | South Atlantic | 
| 10 | FL | 5 | South Atlantic | 
| 12 | HI | 9 | Pacific | 
| 14 | IL | 3 | East North Central | 
| 15 | IN | 3 | East North Central | 
| 16 | IA | 4 | West North Central | 
| 31 | NJ | 2 | Middle Atlantic | 
| 33 | NY | 2 | Middle Atlantic | 
Rather than using the nine census divisions, you would rather group states by their regions. You have the following crosswalk:
| cendiv | cenreg | cenregnm | 
|---|---|---|
| 1 | 1 | Northeast | 
| 2 | 1 | Northeast | 
| 3 | 2 | Midwest | 
| 4 | 2 | Midwest | 
| 5 | 3 | South | 
| 6 | 3 | South | 
| 7 | 3 | South | 
| 8 | 4 | West | 
| 9 | 4 | West | 
As long as
raw values are unique in the crosswalkclean and label columns have a 1:1
matchThen you can use encodefrom() to collapse categories as
you move from raw to clean values.
## data
df <- tibble(id = c(1:8,10,12,14:16,31,33),
             state = c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','HI',
                       'IL','IN','IA','NJ','NY'),
             cendiv = c(6,9,8,7,9,8,1,5,5,9,3,3,4,2,2),
             cendiv_name = c('East South Central','Pacific','Mountain',
                             'West South Central','Pacific','Mountain','New England',
                             'South Atlantic','South Atlantic','Pacific',
                             'East North Central','East North Central',
                             'West North Central','Middle Atlantic','Middle Atlantic'))
             
## crosswalk
cw <- tibble(cendiv = 1:9,
             cenreg = c(1,1,2,2,3,3,3,4,4),
             cenregnm = c('Northeast','Northeast','Midwest','Midwest',
                          'South','South','South','West','West'))## encode new column
df <- df %>%
    mutate(cenreg = encodefrom(., var = cendiv, cw_file = cw, raw = cendiv,
                               clean = cenreg, label = cenregnm))## # A tibble: 15 × 5
##       id state cendiv cendiv_name        cenreg       
##    <dbl> <chr>  <dbl> <chr>              <dbl+lbl>    
##  1     1 AL         6 East South Central 3 [South]    
##  2     2 AK         9 Pacific            4 [West]     
##  3     3 AZ         8 Mountain           4 [West]     
##  4     4 AR         7 West South Central 3 [South]    
##  5     5 CA         9 Pacific            4 [West]     
##  6     6 CO         8 Mountain           4 [West]     
##  7     7 CT         1 New England        1 [Northeast]
##  8     8 DE         5 South Atlantic     3 [South]    
##  9    10 FL         5 South Atlantic     3 [South]    
## 10    12 HI         9 Pacific            4 [West]     
## 11    14 IL         3 East North Central 2 [Midwest]  
## 12    15 IN         3 East North Central 2 [Midwest]  
## 13    16 IA         4 West North Central 2 [Midwest]  
## 14    31 NJ         2 Middle Atlantic    1 [Northeast]
## 15    33 NY         2 Middle Atlantic    1 [Northeast]