In the example we will use the same dataset as in the Blocking records for record linkage vignette.
reclin2 packageThe package contains function pair_ann which aims at
integration with reclin2 package. This function works as
follows.
pair_ann(x = census[1:1000],
y = cis[1:1000],
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
deduplication = FALSE) |>
head()| .x | .y | block |
|---|---|---|
| 204 | 1 | 1 |
| 204 | 176 | 1 |
| 204 | 375 | 1 |
| 204 | 391 | 1 |
| 204 | 405 | 1 |
| 204 | 424 | 1 |
Which provides you information on the total number of pairs. This can
be further included in the pipeline of the reclin2 package
(note that we use a different ANN this time).
pair_ann(x = census[1:1000],
y = cis[1:1000],
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
deduplication = FALSE,
ann = "hnsw") |>
compare_pairs(on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
comparators = list(cmp_jarowinkler())) |>
score_simple("score",
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc")) |>
select_threshold("threshold", score = "score", threshold = 6) |>
link(selection = "threshold") |>
head()| .y | .x | person_id.x | pername1.x | pername2.x | sex.x | dob_day.x | dob_mon.x | dob_year.x | hse_num | enumcap.x | enumpc.x | str_nam | cap_add | census_id | x | txt.x | person_id.y | pername1.y | pername2.y | sex.y | dob_day.y | dob_mon.y | dob_year.y | enumcap.y | enumpc.y | cis_id | y | txt.y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 945 | DE256NG039003 | HARRIET | THOMSON | F | 12 | 1 | 1995 | 39 | 39 SPRINGFIELD ROAD | DE256NG | Springfield Road | 39, Springfield Road | CENSDE256NG039003 | 945 | HARRIETTHOMSONF121199539 SPRINGFIELD ROADDE256NG | DE256NG039003 | HARRIET | THOMSON | F | 12 | 1 | 39 SPRINGFIELD ROAD | DE256NG | CISDE256NG039003 | 11 | HARRIETTHOMSONF12139 SPRINGFIELD ROADDE256NG | |
| 71 | 427 | DE159QA062001 | LEWIS | GREEN | M | 23 | 3 | 1973 | 62 | 62 CHURCH ROAD | DE159QA | Church Road | 62, Church Road | CENSDE159QA062001 | 427 | LEWISGREENM233197362 CHURCH ROADDE159QA | DE159QA062001 | LEWIS | GREEN | M | 23 | 3 | 62 CHURCH ROAD | DE159QA | CISDE159QA062001 | 71 | LEWISGREENM23362 CHURCH ROADDE159QA | |
| 83 | 720 | DE237GG025002 | IMOGEN | DARIS | F | 6 | 4 | 1968 | 25 | 25 WOODLANDS ROAD | DE237GG | Woodlands Road | 25, Woodlands Road | CENSDE237GG025002 | 720 | IMOGENDARISF64196825 WOODLANDS ROADDE237GG | DE237GG025002 | IMOGEW | DAVIS | F | 6 | 4 | 25 WOODLANDS ROAD | DE237GG | CISDE237GG025002 | 83 | IMOGEWDAVISF6425 WOODLANDS ROADDE237GG | |
| 99 | 136 | DE125LU022001 | DANIEC | MICCER | M | 21 | 4 | 1947 | 22 | 22 PARK LANE | DE125LU | Park Lane | 22, Park Lane | CENSDE125LU022001 | 136 | DANIECMICCERM214194722 PARK LANEDE125LU | DE125LU022001 | DAMIEL | HILLER | M | 21 | 4 | 22 PARK LANE | DE125LU | CISDE125LU022001 | 99 | DAMIELHILLERM21422 PARK LANEDE125LU | |
| 154 | 949 | DE256NG040002 | CHLOE | WILSON | F | 5 | 7 | 1978 | 40 | 40 SPRINGFIELD ROAD | DE256NG | Springfield Road | 40, Springfield Road | CENSDE256NG040002 | 949 | CHLOEWILSONF57197840 SPRINGFIELD ROADDE256NG | DE256NG040002 | CHLOE | WILSOM | F | 5 | 7 | 40 SPRINGFIELD ROAD | DE256NG | CISDE256NG040002 | 154 | CHLOEWILSOMF5740 SPRINGFIELD ROADDE256NG | |
| 156 | 549 | DE159QY035002 | AVA | KING | F | 7 | 7 | 1969 | 35 | 35 CHURCH ROAD | DE159QY | Church Road | 35, Church Road | CENSDE159QY035002 | 549 | AVAKINGF77196935 CHURCH ROADDE159QY | DE159QY035002 | AVA | KING | F | 7 | 7 | 35 CHURCH ROAD | DE159QY | CISDE159QY035002 | 156 | AVAKINGF7735 CHURCH ROADDE159QY |
fastLink packageJust use the block column in the function
fastLink::blockData(). As a result you will obtain a list
of records blocked for further processing.
RecordLinkage packageJust use the block column in the argument
blockfld in the compare.dedup() or
compare.linkage() function. Please note that
block column for the RecordLinkage package
should be stored as a character not a
numeric/integer vector.