Anonymise data by replacing holding identifiers with prefix-integer combinations
anonymise.Rd
anonymise()
anonymises a holding or movement data frame by replacing
holding identifiers with prefix-integer combinations. Both the anonymised
data frame and the anonymisation key are returned. By default, a new
anonymisation key is generated; alternatively, an existing key can be
provided.
Arguments
- data
A holding or movement data frame.
- prefix
Character string, to form the basis of anonymised holding identifiers. An integer will be appended to form this new identifier.
- key
A named character vector to be used as anonymisation key, or
NULL
(default) to generate a new key. A providedkey
should have original holding identifiers as names, and new (anonymised) identifiers as values.
Value
A named list with two elements:
data
containing the anonymised data framekey
containing the applied anonymisation key. This has the form of a named character vector, with original holding identifiers as names, and new (anonymised) identifiers as values.
Details
Requires that the appropriate config file is loaded, to identify the
column(s) in data
that contain(s) holding identifiers: origin (from
) and
destination (to
) columns for movement data, or the id
column for holding
data.
If key == NULL
(default), a new anonymisation key is generated, with
holdings being given new identifiers consisting of prefix
followed by an
integer ranging between 1 and the total number of holdings. Integers are
assigned to holdings in a random order.
If an existing key
is provided, its coverage of holding identifiers in
data
is checked. If all holding identifiers in data
are present among
element names in key
, the key
is used for anonymisation as-is: holding
identifiers in data
are replaced with the values of elements of the same
name in key
. Otherwise, if data
contains holding identifiers that are not
present in key
, the key
is expanded by adding additional prefix
-integer
combinations.
See also
Other Privacy-enhancing functions:
create_anonymisation_effect_analysis_report()
,
jitter_dates()
,
jitter_weights()
,
round_dates()
,
round_weights()
Examples
# Set-up: Save movenet environment with current configurations
movenetenv <- movenet:::movenetenv
old_config <- movenetenv$options
# Load a combined config file
load_config(system.file("configurations", "fakeScotEID_combined_config.yml",
package = "movenet"))
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/fakeScotEID_combined_config.yml
# Pseudonymise holding data by replacing identifiers with random consecutive
# integers from 1-N
pseudonymised_holdings <- anonymise(example_holding_data)
head(pseudonymised_holdings$data)
#> # A tibble: 6 x 4
#> cph holding_type herd_size coordinates
#> <chr> <chr> <dbl> <POINT [°]>
#> 1 270 GXFSR 2111 (3.718568 52.69096)
#> 2 195 SCHZQ 2134 (-4.959035 51.88195)
#> 3 26 HEJDE 2140 (5.709143 51.97547)
#> 4 251 IQALL 2141 (-2.983365 57.55851)
#> 5 209 YUFUC 2148 (5.586477 50.50902)
#> 6 387 IATKP 2151 (-0.3323066 58.84295)
# Pseudonymise holding data by replacing identifiers with FARM_1 to FARM_N
pseudonymised_holdings <- anonymise(example_holding_data, prefix = "FARM_")
head(pseudonymised_holdings$data)
#> # A tibble: 6 x 4
#> cph holding_type herd_size coordinates
#> <chr> <chr> <dbl> <POINT [°]>
#> 1 FARM_12 GXFSR 2111 (3.718568 52.69096)
#> 2 FARM_46 SCHZQ 2134 (-4.959035 51.88195)
#> 3 FARM_282 HEJDE 2140 (5.709143 51.97547)
#> 4 FARM_11 IQALL 2141 (-2.983365 57.55851)
#> 5 FARM_393 YUFUC 2148 (5.586477 50.50902)
#> 6 FARM_75 IATKP 2151 (-0.3323066 58.84295)
head(pseudonymised_holdings$key)
#> 64/745/9830 99/478/5120 27/243/2267 36/885/8878 81/736/4987 73/588/4268
#> "FARM_1" "FARM_2" "FARM_3" "FARM_4" "FARM_5" "FARM_6"
# Save the pseudonymisation key for later use
pseudonymisation_key <- pseudonymised_holdings$key
# Pseudonymise movement data using the previously generated key
pseudonymised_movements <-
anonymise(example_movement_data, key = pseudonymisation_key)
head(pseudonymised_movements$data)
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 FARM_59 FARM_284 2019-02-08 97 304781
#> 2 FARM_244 FARM_297 2019-08-15 167 229759
#> 3 FARM_347 FARM_11 2019-09-15 115 36413
#> 4 FARM_377 FARM_333 2019-10-26 125 488616
#> 5 FARM_397 FARM_371 2019-10-17 109 581785
#> 6 FARM_74 FARM_438 2019-10-06 72 564911
# Clean-up: Reinstate previous configurations
movenetenv$options <- old_config
rm("old_config", "movenetenv", "pseudonymised_holdings",
"pseudonymisation_key", "pseudonymised_movements")