Anonymise data by replacing holding identifiers with prefix-integer combinations

anonymise() anonymises a holding or movement data frame by replacing holding identifiers with prefix-integer combinations. Both the anonymised data frame and the anonymisation key are returned. By default, a new anonymisation key is generated; alternatively, an existing key can be provided.

Usage

anonymise(data, prefix = NULL, key = NULL)

Arguments

data: A holding or movement data frame.
prefix: Character string, to form the basis of anonymised holding identifiers. An integer will be appended to form this new identifier.
key: A named character vector to be used as anonymisation key, or NULL (default) to generate a new key. A provided key should have original holding identifiers as names, and new (anonymised) identifiers as values.

Value

A named list with two elements:

data containing the anonymised data frame
key containing the applied anonymisation key. This has the form of a named character vector, with original holding identifiers as names, and new (anonymised) identifiers as values.

Details

Requires that the appropriate config file is loaded, to identify the column(s) in data that contain(s) holding identifiers: origin (from) and destination (to) columns for movement data, or the id column for holding data.

If key == NULL (default), a new anonymisation key is generated, with holdings being given new identifiers consisting of prefix followed by an integer ranging between 1 and the total number of holdings. Integers are assigned to holdings in a random order.

If an existing key is provided, its coverage of holding identifiers in data is checked. If all holding identifiers in data are present among element names in key, the key is used for anonymisation as-is: holding identifiers in data are replaced with the values of elements of the same name in key. Otherwise, if data contains holding identifiers that are not present in key, the key is expanded by adding additional prefix-integer combinations.

Examples

# Set-up: Save movenet environment with current configurations
movenetenv <- movenet:::movenetenv
old_config <- movenetenv$options

# Load a combined config file
load_config(system.file("configurations", "fakeScotEID_combined_config.yml",
                        package = "movenet"))
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/fakeScotEID_combined_config.yml

# Pseudonymise holding data by replacing identifiers with random consecutive
# integers from 1-N
pseudonymised_holdings <- anonymise(example_holding_data)
head(pseudonymised_holdings$data)
#> # A tibble: 6 x 4
#>   cph   holding_type herd_size           coordinates
#>   <chr> <chr>            <dbl>           <POINT [°]>
#> 1 270   GXFSR             2111   (3.718568 52.69096)
#> 2 195   SCHZQ             2134  (-4.959035 51.88195)
#> 3 26    HEJDE             2140   (5.709143 51.97547)
#> 4 251   IQALL             2141  (-2.983365 57.55851)
#> 5 209   YUFUC             2148   (5.586477 50.50902)
#> 6 387   IATKP             2151 (-0.3323066 58.84295)

# Pseudonymise holding data by replacing identifiers with FARM_1 to FARM_N
pseudonymised_holdings <- anonymise(example_holding_data, prefix = "FARM_")
head(pseudonymised_holdings$data)
#> # A tibble: 6 x 4
#>   cph      holding_type herd_size           coordinates
#>   <chr>    <chr>            <dbl>           <POINT [°]>
#> 1 FARM_12  GXFSR             2111   (3.718568 52.69096)
#> 2 FARM_46  SCHZQ             2134  (-4.959035 51.88195)
#> 3 FARM_282 HEJDE             2140   (5.709143 51.97547)
#> 4 FARM_11  IQALL             2141  (-2.983365 57.55851)
#> 5 FARM_393 YUFUC             2148   (5.586477 50.50902)
#> 6 FARM_75  IATKP             2151 (-0.3323066 58.84295)
head(pseudonymised_holdings$key)
#> 64/745/9830 99/478/5120 27/243/2267 36/885/8878 81/736/4987 73/588/4268 
#>    "FARM_1"    "FARM_2"    "FARM_3"    "FARM_4"    "FARM_5"    "FARM_6" 
# Save the pseudonymisation key for later use
pseudonymisation_key <- pseudonymised_holdings$key

# Pseudonymise movement data using the previously generated key
pseudonymised_movements <-
  anonymise(example_movement_data, key = pseudonymisation_key)
head(pseudonymised_movements$data)
#> # A tibble: 6 x 5
#>   departure_cph dest_cph departure_date qty_pigs movement_reference
#>   <chr>         <chr>    <date>            <dbl>              <dbl>
#> 1 FARM_59       FARM_284 2019-02-08           97             304781
#> 2 FARM_244      FARM_297 2019-08-15          167             229759
#> 3 FARM_347      FARM_11  2019-09-15          115              36413
#> 4 FARM_377      FARM_333 2019-10-26          125             488616
#> 5 FARM_397      FARM_371 2019-10-17          109             581785
#> 6 FARM_74       FARM_438 2019-10-06           72             564911

# Clean-up: Reinstate previous configurations
movenetenv$options <- old_config
rm("old_config", "movenetenv", "pseudonymised_holdings",
   "pseudonymisation_key", "pseudonymised_movements")