Skip to contents

To help address concerns around commercial sensitivity, movenet includes a range of functions to make livestock movement data and/or holding data non-identifiable. This vignette shows you how to use these.

Setting up

To get started, first load the movenet package.

The example datasets that we will use in this vignette are example_movement_data and example_holding_data, provided with the package. These are tibbles in movenet format, containing movement and holding data, respectively. Inspect these datasets to get a feel for them.

# Inspect the first lines of the example movenet-format movement and holding tibbles:
head(example_movement_data)
#> # A tibble: 6 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl>              <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-08           97             304781
#> 2 69/196/5890   71/939/3228 2019-08-15          167             229759
#> 3 52/577/5349   82/501/8178 2019-09-15          115              36413
#> 4 39/103/5541   13/282/1763 2019-10-26          125             488616
#> 5 41/788/6464   57/418/6011 2019-10-17          109             581785
#> 6 69/393/9398   39/947/2201 2019-10-06           72             564911
head(example_holding_data)
#> # A tibble: 6 x 4
#>   cph         holding_type herd_size           coordinates
#>   <chr>       <chr>            <dbl>           <POINT [°]>m
#> 1 68/575/1991 GXFSR             2111   (3.718568 52.69096)
#> 2 51/469/9863 SCHZQ             2134  (-4.959035 51.88195)
#> 3 32/532/8560 HEJDE             2140   (5.709143 51.97547)
#> 4 82/501/8178 IQALL             2141  (-2.983365 57.55851)
#> 5 29/675/4499 YUFUC             2148   (5.586477 50.50902)
#> 6 59/516/9442 IATKP             2151 (-0.3323066 58.84295)

Then, load a configuration file that can be used with these example datasets. This configuration file tells movenet which columns contain which data types, so that the correct columns get modified when using the privacy-enhancing functions. For more details on configuration files see vignette("movenet") and vignette("configurations").

# Load a combined movement and holding config file:
load_config(system.file("configurations", "fakeScotEID_combined_config.yml", package="movenet"))
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/fakeScotEID_combined_config.yml

Pseudonymising holding identifiers

The function anonymise() pseudonymises holding identifiers in movenet-format movement or holding data tibbles, by replacing these identifiers with a number and an optional prefix (e.g. “FARM”).

It returns a pseudonymised data tibble and the applied pseudonymisation key. This key can optionally be saved to recover the original identifiers at a later date, or for application to an overlapping dataset.

# Pseudonymise movement_data by changing identifiers to FARM1-N:
pseudonymised <- anonymise(example_movement_data, prefix = "FARM")
pseudonymised_movement_data <- pseudonymised$data
pseudonymisation_key <- pseudonymised$key

head(pseudonymised_movement_data) # Inspect pseudonymised movement data
#> # A tibble: 6 x 5
#>   departure_cph dest_cph departure_date qty_pigs movement_reference
#>   <chr>         <chr>    <date>            <dbl>              <dbl>
#> 1 FARM244       FARM77   2019-02-08           97             304781
#> 2 FARM61        FARM255  2019-08-15          167             229759
#> 3 FARM261       FARM395  2019-09-15          115              36413
#> 4 FARM5         FARM220  2019-10-26          125             488616
#> 5 FARM113       FARM119  2019-10-17          109             581785
#> 6 FARM167       FARM480  2019-10-06           72             564911
head(pseudonymisation_key) # Inspect pseudonymisation key
#> 47/396/4417 54/504/3274 31/473/4857 36/885/8878 39/103/5541 36/308/8021 
#>     "FARM1"     "FARM2"     "FARM3"     "FARM4"     "FARM5"     "FARM6"

anonymise() also takes an optional key argument, with which you can apply an existing pseudonymisation key to the data tibble:

# Use the same key from above to substitute holding identifiers in holding_data:
pseudonymised_holding <- anonymise(example_holding_data, key = pseudonymisation_key) 
pseudonymised_holding_data <- pseudonymised_holding$data
# Update saved key, in case additional identifiers were added from the holding datafile:
pseudonymisation_key <- pseudonymised_holding$key 

head(pseudonymised_holding_data) # Inspect pseudonymised holding data
#> # A tibble: 6 x 4
#>   cph     holding_type herd_size           coordinates
#>   <chr>   <chr>            <dbl>           <POINT [°]>m
#> 1 FARM499 GXFSR             2111   (3.718568 52.69096)
#> 2 FARM163 SCHZQ             2134  (-4.959035 51.88195)
#> 3 FARM191 HEJDE             2140   (5.709143 51.97547)
#> 4 FARM395 IQALL             2141  (-2.983365 57.55851)
#> 5 FARM64  YUFUC             2148   (5.586477 50.50902)
#> 6 FARM176 IATKP             2151 (-0.3323066 58.84295)

This allows multiple datasets to be pseudonymised in a consistent way, so that it is possible to subsequently merge the datasets by pseudonymised identifier.

Modifying dates, weights, and optional numeric data columns

movenet also has functions to modify movement dates or weights by applying a small amount of noise (jittering) or by rounding:

  • jitter_dates(data, range) adds random noise of up to range days to movement dates.

  • jitter_weights(data, range, column) adds random noise of up to range to a numeric column in the movement data, by default the “weight” column.

  • round_dates(data, unit, week_start, sum_weight, ...) rounds movement dates down to the first day of the specified time unit. For rounding down to weeks, set the starting day of the week with week_start. By default, weights are aggregated for all movements between the same holdings over the indicated time unit (sum_weight = TRUE); to keep movements separate, set sum_weight = FALSE. Alternative or additional summary functions can be applied through ..., using tidy evaluation rules.

  • round_weights(data, unit, column) rounds data in a numeric column, by default the “weight” column, to multiples of unit.

# Add jitter of up to ±5 days to movement dates::
movedata_datesj5 <- jitter_dates(example_movement_data, range = 5) 
head(example_movement_data) # Inspect original
#> # A tibble: 6 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl>              <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-08           97             304781
#> 2 69/196/5890   71/939/3228 2019-08-15          167             229759
#> 3 52/577/5349   82/501/8178 2019-09-15          115              36413
#> 4 39/103/5541   13/282/1763 2019-10-26          125             488616
#> 5 41/788/6464   57/418/6011 2019-10-17          109             581785
#> 6 69/393/9398   39/947/2201 2019-10-06           72             564911
head(movedata_datesj5) # Inspect jittered dates
#> # A tibble: 6 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl>              <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-12           97             304781
#> 2 69/196/5890   71/939/3228 2019-08-20          167             229759
#> 3 52/577/5349   82/501/8178 2019-09-19          115              36413
#> 4 39/103/5541   13/282/1763 2019-10-30          125             488616
#> 5 41/788/6464   57/418/6011 2019-10-15          109             581785
#> 6 69/393/9398   39/947/2201 2019-10-04           72             564911

# Add jitter of up to ±10 to movement weights::
movedata_weightsj10 <- jitter_weights(example_movement_data, range = 10)
head(movedata_weightsj10) # Inspect jittered weights
#> # A tibble: 6 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl>              <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-08         97.3             304781
#> 2 69/196/5890   71/939/3228 2019-08-15        162.              229759
#> 3 52/577/5349   82/501/8178 2019-09-15        105.               36413
#> 4 39/103/5541   13/282/1763 2019-10-26        126.              488616
#> 5 41/788/6464   57/418/6011 2019-10-17        106.              581785
#> 6 69/393/9398   39/947/2201 2019-10-06         67.5             564911

# Round movement dates down to the first day of the month, but do not aggregate:
movedata_months <- round_dates(example_movement_data, unit = "month", sum_weight = FALSE) 
head(movedata_months) # Inspect rounded dates
#> # A tibble: 6 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl>              <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-01           97             304781
#> 2 69/196/5890   71/939/3228 2019-08-01          167             229759
#> 3 52/577/5349   82/501/8178 2019-09-01          115              36413
#> 4 39/103/5541   13/282/1763 2019-10-01          125             488616
#> 5 41/788/6464   57/418/6011 2019-10-01          109             581785
#> 6 69/393/9398   39/947/2201 2019-10-01           72             564911

# Round movement dates down to the first day of the month, aggregate weights, and list reference numbers:
movedata_months_aggr <- round_dates(example_movement_data, unit = "month", sum_weight = TRUE,
                                    movement_reference = list(movement_reference))
# Inspect aggregated record for holdings which have 2 movements in the same month:
movedata_months_aggr[which(sapply(movedata_months_aggr$movement_reference, length) == 2),] 
#> # A tibble: 1 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl> <list>            
#> 1 57/427/5455   21/771/7140 2019-09-01          156 <dbl [2]>

# Round movement reference numbers to the nearest multiple of 10:
movedata_ref10 <- round_weights(example_movement_data, unit = 10, column = "movement_reference")
head(movedata_ref10) # Inspect rounded movement reference numbers
#> # A tibble: 6 x 5
#>   departure_cph dest_cph    departure_date qty_pigs movement_reference
#>   <chr>         <chr>       <date>            <dbl>              <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-08           97             304780
#> 2 69/196/5890   71/939/3228 2019-08-15          167             229760
#> 3 52/577/5349   82/501/8178 2019-09-15          115              36410
#> 4 39/103/5541   13/282/1763 2019-10-26          125             488620
#> 5 41/788/6464   57/418/6011 2019-10-17          109             581780
#> 6 69/393/9398   39/947/2201 2019-10-06           72             564910

Modifying holding coordinates

A function to resample holding coordinates in a density-dependent manner is under development within the hexscape package.