Enhancing the privacy of livestock movement and holding data
pseudonymise.Rmd
To help address concerns around commercial sensitivity, movenet includes a range of functions to make livestock movement data and/or holding data non-identifiable. This vignette shows you how to use these.
Setting up
To get started, first load the movenet package.
The example datasets that we will use in this vignette are
example_movement_data
and
example_holding_data
, provided with the package. These are
tibbles in movenet format, containing movement and holding data,
respectively. Inspect these datasets to get a feel for them.
# Inspect the first lines of the example movenet-format movement and holding tibbles:
head(example_movement_data)
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-15 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-15 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-26 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-17 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-06 72 564911
head(example_holding_data)
#> # A tibble: 6 x 4
#> cph holding_type herd_size coordinates
#> <chr> <chr> <dbl> <POINT [°]>m
#> 1 68/575/1991 GXFSR 2111 (3.718568 52.69096)
#> 2 51/469/9863 SCHZQ 2134 (-4.959035 51.88195)
#> 3 32/532/8560 HEJDE 2140 (5.709143 51.97547)
#> 4 82/501/8178 IQALL 2141 (-2.983365 57.55851)
#> 5 29/675/4499 YUFUC 2148 (5.586477 50.50902)
#> 6 59/516/9442 IATKP 2151 (-0.3323066 58.84295)
Then, load a configuration file that can be used with these example
datasets. This configuration file tells movenet which columns contain
which data types, so that the correct columns get modified when using
the privacy-enhancing functions. For more details on configuration files
see vignette("movenet")
and
vignette("configurations")
.
# Load a combined movement and holding config file:
load_config(system.file("configurations", "fakeScotEID_combined_config.yml", package="movenet"))
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/fakeScotEID_combined_config.yml
Pseudonymising holding identifiers
The function anonymise()
pseudonymises holding
identifiers in movenet-format movement or holding data tibbles, by
replacing these identifiers with a number and an optional prefix
(e.g. “FARM”).
It returns a pseudonymised data tibble and the applied pseudonymisation key. This key can optionally be saved to recover the original identifiers at a later date, or for application to an overlapping dataset.
# Pseudonymise movement_data by changing identifiers to FARM1-N:
pseudonymised <- anonymise(example_movement_data, prefix = "FARM")
pseudonymised_movement_data <- pseudonymised$data
pseudonymisation_key <- pseudonymised$key
head(pseudonymised_movement_data) # Inspect pseudonymised movement data
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 FARM244 FARM77 2019-02-08 97 304781
#> 2 FARM61 FARM255 2019-08-15 167 229759
#> 3 FARM261 FARM395 2019-09-15 115 36413
#> 4 FARM5 FARM220 2019-10-26 125 488616
#> 5 FARM113 FARM119 2019-10-17 109 581785
#> 6 FARM167 FARM480 2019-10-06 72 564911
head(pseudonymisation_key) # Inspect pseudonymisation key
#> 47/396/4417 54/504/3274 31/473/4857 36/885/8878 39/103/5541 36/308/8021
#> "FARM1" "FARM2" "FARM3" "FARM4" "FARM5" "FARM6"
anonymise()
also takes an optional key
argument, with which you can apply an existing pseudonymisation key to
the data tibble:
# Use the same key from above to substitute holding identifiers in holding_data:
pseudonymised_holding <- anonymise(example_holding_data, key = pseudonymisation_key)
pseudonymised_holding_data <- pseudonymised_holding$data
# Update saved key, in case additional identifiers were added from the holding datafile:
pseudonymisation_key <- pseudonymised_holding$key
head(pseudonymised_holding_data) # Inspect pseudonymised holding data
#> # A tibble: 6 x 4
#> cph holding_type herd_size coordinates
#> <chr> <chr> <dbl> <POINT [°]>m
#> 1 FARM499 GXFSR 2111 (3.718568 52.69096)
#> 2 FARM163 SCHZQ 2134 (-4.959035 51.88195)
#> 3 FARM191 HEJDE 2140 (5.709143 51.97547)
#> 4 FARM395 IQALL 2141 (-2.983365 57.55851)
#> 5 FARM64 YUFUC 2148 (5.586477 50.50902)
#> 6 FARM176 IATKP 2151 (-0.3323066 58.84295)
This allows multiple datasets to be pseudonymised in a consistent way, so that it is possible to subsequently merge the datasets by pseudonymised identifier.
Modifying dates, weights, and optional numeric data columns
movenet also has functions to modify movement dates or weights by applying a small amount of noise (jittering) or by rounding:
jitter_dates(data, range)
adds random noise of up torange
days to movement dates.jitter_weights(data, range, column)
adds random noise of up torange
to a numericcolumn
in the movement data, by default the “weight” column.round_dates(data, unit, week_start, sum_weight, ...)
rounds movement dates down to the first day of the specified timeunit
. For rounding down to weeks, set the starting day of the week withweek_start
. By default, weights are aggregated for all movements between the same holdings over the indicated time unit (sum_weight = TRUE
); to keep movements separate, setsum_weight = FALSE
. Alternative or additional summary functions can be applied through...
, using tidy evaluation rules.round_weights(data, unit, column)
rounds data in a numericcolumn
, by default the “weight” column, to multiples ofunit
.
# Add jitter of up to ±5 days to movement dates::
movedata_datesj5 <- jitter_dates(example_movement_data, range = 5)
head(example_movement_data) # Inspect original
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-15 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-15 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-26 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-17 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-06 72 564911
head(movedata_datesj5) # Inspect jittered dates
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-12 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-20 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-19 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-30 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-15 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-04 72 564911
# Add jitter of up to ±10 to movement weights::
movedata_weightsj10 <- jitter_weights(example_movement_data, range = 10)
head(movedata_weightsj10) # Inspect jittered weights
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97.3 304781
#> 2 69/196/5890 71/939/3228 2019-08-15 162. 229759
#> 3 52/577/5349 82/501/8178 2019-09-15 105. 36413
#> 4 39/103/5541 13/282/1763 2019-10-26 126. 488616
#> 5 41/788/6464 57/418/6011 2019-10-17 106. 581785
#> 6 69/393/9398 39/947/2201 2019-10-06 67.5 564911
# Round movement dates down to the first day of the month, but do not aggregate:
movedata_months <- round_dates(example_movement_data, unit = "month", sum_weight = FALSE)
head(movedata_months) # Inspect rounded dates
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-01 97 304781
#> 2 69/196/5890 71/939/3228 2019-08-01 167 229759
#> 3 52/577/5349 82/501/8178 2019-09-01 115 36413
#> 4 39/103/5541 13/282/1763 2019-10-01 125 488616
#> 5 41/788/6464 57/418/6011 2019-10-01 109 581785
#> 6 69/393/9398 39/947/2201 2019-10-01 72 564911
# Round movement dates down to the first day of the month, aggregate weights, and list reference numbers:
movedata_months_aggr <- round_dates(example_movement_data, unit = "month", sum_weight = TRUE,
movement_reference = list(movement_reference))
# Inspect aggregated record for holdings which have 2 movements in the same month:
movedata_months_aggr[which(sapply(movedata_months_aggr$movement_reference, length) == 2),]
#> # A tibble: 1 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <list>
#> 1 57/427/5455 21/771/7140 2019-09-01 156 <dbl [2]>
# Round movement reference numbers to the nearest multiple of 10:
movedata_ref10 <- round_weights(example_movement_data, unit = 10, column = "movement_reference")
head(movedata_ref10) # Inspect rounded movement reference numbers
#> # A tibble: 6 x 5
#> departure_cph dest_cph departure_date qty_pigs movement_reference
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97 304780
#> 2 69/196/5890 71/939/3228 2019-08-15 167 229760
#> 3 52/577/5349 82/501/8178 2019-09-15 115 36410
#> 4 39/103/5541 13/282/1763 2019-10-26 125 488620
#> 5 41/788/6464 57/418/6011 2019-10-17 109 581780
#> 6 69/393/9398 39/947/2201 2019-10-06 72 564910
Modifying holding coordinates
A function to resample holding coordinates in a density-dependent manner is under development within the hexscape package.