Pseudonymising livestock movement and holding data
pseudonymise.Rmd
To help address concerns around commercial sensitivity, movenet includes a range of functions to make livestock movement data and/or holding data non-identifiable. This vignette shows you how to use these.
library(movenet)
# Load a combined movement and holding config file:
load_config(system.file("configurations", "fakeScotEID_combined_config.yml", package="movenet"))
#> Successfully loaded config file: C:/Users/cboga/OneDrive - University of Glasgow/Documents/R/win-library/4.1/movenet/configurations/fakeScotEID_combined_config.yml
# Load example movenet-format movement and holding tibbles into the global environment:
data(example_movement_data, package = "movenet")
data(example_holding_data, package = "movenet")
Pseudonymising holding identifiers
The function anonymise()
pseudonymises holding
identifiers in movenet-format movement or holding data tibbles, by
replacing these identifiers with a number and an optional prefix
(e.g. “FARM”).
It returns a pseudonymised data tibble and the applied pseudonymisation key. This key can optionally be saved to recover the original identifiers at a later date, or for application to an overlapping dataset.
# Pseudonymise movement_data by changing identifiers to FARM1-N:
pseudonymised <- anonymise(example_movement_data, prefix = "FARM")
pseudonymised_movement_data <- pseudonymised$data
pseudonymisation_key <- pseudonymised$key
head(pseudonymised_movement_data) # Inspect pseudonymised movement data
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 FARM152 FARM216 2019-02-08 97 304781
#> 2 FARM186 FARM466 2019-08-15 167 229759
#> 3 FARM435 FARM70 2019-09-15 115 36413
#> 4 FARM438 FARM337 2019-10-26 125 488616
#> 5 FARM292 FARM164 2019-10-17 109 581785
#> 6 FARM46 FARM373 2019-10-06 72 564911
head(pseudonymisation_key) # Inspect pseudonymisation key
#> 96/999/4677 79/642/5562 86/867/7476 75/345/2020 76/613/8076 67/158/5432
#> "FARM1" "FARM2" "FARM3" "FARM4" "FARM5" "FARM6"
anonymise()
also takes an optional key
argument, with which you can apply an existing pseudonymisation key to
the data tibble:
# Use the same key from above to substitute holding identifiers in holding_data:
pseudonymised_holding <- anonymise(example_holding_data, key = pseudonymisation_key)
pseudonymised_holding_data <- pseudonymised_holding$data
# Update saved key, in case additional identifiers were added from the holding datafile:
pseudonymisation_key <- pseudonymised_holding$key
head(pseudonymised_holding_data) # Inspect pseudonymised holding data
#> # A tibble: 6 x 4
#> cph holding_type herd_size coordinates
#> <chr> <chr> <dbl> <POINT [°]>m
#> 1 FARM202 GXFSR 2111 (3.718568 52.69096)
#> 2 FARM213 SCHZQ 2134 (-4.959035 51.88195)
#> 3 FARM460 HEJDE 2140 (5.709143 51.97547)
#> 4 FARM70 IQALL 2141 (-2.983365 57.55851)
#> 5 FARM238 YUFUC 2148 (5.586477 50.50902)
#> 6 FARM330 IATKP 2151 (-0.3323066 58.84295)
This allows multiple datasets to be pseudonymised in a consistent way, so that it is possible to subsequently merge the datasets by pseudonymised identifier.
Modifying dates, weights, and optional numeric data columns
movenet also has functions to modify movement dates or weights by applying a small amount of noise (jittering) or by rounding:
jitter_dates(data, range)
adds random noise of up torange
days to movement dates.jitter_weights(data, range, column)
adds random noise of up torange
to a numericcolumn
in the movement data, by default the “weight” column.round_dates(data, unit, week_start, sum_weight, ...)
rounds movement dates down to the first day of the specified timeunit
. For rounding down to weeks, set the starting day of the week withweek_start
. By default, weights are aggregated for all movements between the same holdings over the indicated time unit (sum_weight = TRUE
); to keep movements separate, setsum_weight = FALSE
. Alternative or additional summary functions can be applied through...
, using tidy evaluation rules.round_weights(data, unit, column)
rounds data in a numericcolumn
, by default the “weight” column, to multiples ofunit
.
# Add jitter of up to ±5 days to movement dates::
movedata_datesj5 <- jitter_dates(example_movement_data, range = 5)
head(example_movement_data) # Inspect original
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-15 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-15 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-26 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-17 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-06 72 564911
head(movedata_datesj5) # Inspect jittered dates
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-06 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-10 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-14 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-31 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-19 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-05 72 564911
# Add jitter of up to ±10 to movement weights::
movedata_weightsj10 <- jitter_weights(example_movement_data, range = 10)
head(movedata_weightsj10) # Inspect jittered weights
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 106. 304781
#> 2 69/196/5890 71/939/3228 2019-08-15 174. 229759
#> 3 52/577/5349 82/501/8178 2019-09-15 122. 36413
#> 4 39/103/5541 13/282/1763 2019-10-26 116. 488616
#> 5 41/788/6464 57/418/6011 2019-10-17 103. 581785
#> 6 69/393/9398 39/947/2201 2019-10-06 74.8 564911
# Round movement dates down to the first day of the month, but do not aggregate:
movedata_months <- round_dates(example_movement_data, unit = "month", sum_weight = FALSE)
head(movedata_months) # Inspect rounded dates
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-01 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-01 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-01 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-01 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-01 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-01 72 564911
# Round movement dates down to the first day of the month, aggregate weights, and list reference numbers:
movedata_months_aggr <- round_dates(example_movement_data, unit = "month", sum_weight = TRUE,
movement_reference = list(movement_reference))
# Inspect aggregated record for holdings which have 2 movements in the same month:
movedata_months_aggr[which(sapply(movedata_months_aggr$movement_reference, length) == 2),]
#> # A tibble: 1 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <list>
#> 1 57/427/5455 21/771/7140 2019-09-01 156 <dbl [2]>
# Round movement reference numbers to the nearest multiple of 10:
movedata_ref10 <- round_weights(example_movement_data, unit = 10, column = "movement_reference")
head(movedata_ref10) # Inspect rounded movement reference numbers
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97 304780
#> 2 69/196/5890 71/939/3228 2019-08-15 167 229760
#> 3 52/577/5349 82/501/8178 2019-09-15 115 36410
#> 4 39/103/5541 13/282/1763 2019-10-26 125 488620
#> 5 41/788/6464 57/418/6011 2019-10-17 109 581780
#> 6 69/393/9398 39/947/2201 2019-10-06 72 564910
Modifying holding coordinates
A function to resample holding coordinates in a density-dependent manner is under development within the hexscape package.