Getting started with movenet

The goal of movenet is to simplify the use of livestock movement data in veterinary public health. It does this by providing workflows for pseudonymisation and network analysis. A first step in any of these workflows is reading in the livestock movement data that you want to process or analyse.

This vignette describes how to get started with movenet, by reading in data files and reshaping them into a common format required by all three movenet workflows.

library(movenet)

Reshaping livestock movement data to movenet format

To read in a movement data file, first load a configurations (config) file, telling movenet how to read the data file, and then load the data file itself.

Loading a movement config file

Livestock movement data can come in a diversity of shapes and formats: not only do different countries collect different types of trade-related data, they may also follow different data formatting customs for, for example, decimal marks and dates.

To ensure correct reading of the data regardless of the source, movenet requires users to supply a config file. The config file contains values for a range of properties that commonly vary between datasets, as well as column headers for data fields that are either required (from, to, date, weight) or that you want to extract along for analyses. For details on config file requirements and how to manage movenet configurations, see vignette("configurations"). Example config files and an empty template can be found in the package’s configurations folder.

Here is the content of an example config file, ScotEID.yml:

xfun::file_string(system.file("configurations", "ScotEID.yml", package="movenet"))
#> movedata_fileopts:
#>   separator: ","   #Separator (delimiter) character used in movement datafile
#>   encoding: "UTF-8"    #Encoding used in movement datafile
#>   decimal: "."  #Decimal mark used in movement datafile
#>   date_format: "%Y%m%d" #Date format specification used in movement datafile (empty string "" for "%AD" flexible YMD parser, or see ?readr::parse_date for guidance)
#> 
#> movedata_cols:
#>   from: "departure_cph" #Column name or number for Identifier of origin holding
#>   to: "dest_cph" #Column name or number for Identifier of destination holding
#>   date: "departure_date" #Column name or number for Date of transport
#>   weight: "qty_pigs" #Column name or number for Movement weight (e.g. nr of pigs moved)

The ScotEID.yml configurations can be loaded into the movenet environment with the function load_config():

load_config(system.file("configurations", "ScotEID.yml", package="movenet")) # Load ScotEID.yml
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/ScotEID.yml

get_config() # Inspect the configurations in the movenet environment
#> $movedata_fileopts.separator
#> [1] ","
#> 
#> $movedata_fileopts.encoding
#> [1] "UTF-8"
#> 
#> $movedata_fileopts.decimal
#> [1] "."
#> 
#> $movedata_fileopts.date_format
#> [1] "%Y%m%d"
#> 
#> $movedata_cols.from
#> [1] "departure_cph"
#> 
#> $movedata_cols.to
#> [1] "dest_cph"
#> 
#> $movedata_cols.date
#> [1] "departure_date"
#> 
#> $movedata_cols.weight
#> [1] "qty_pigs"

Loading a movement data file

After configurations have been loaded into the movenet environment, a delimited movement data file can be read in and processed.

Here are the first 6 rows of an example movement data file, example_movement_data.csv:

head(read.csv(system.file("extdata", "example_movement_data.csv", package="movenet"), 
              encoding = "UTF-8"))
#>   movement_reference foreign_reference        lot_no   lot_date departure_date
#> 1             304781        PQLIKVUQVS 828PRSXKRGZYB 18/12/2018       20190208
#> 2             229759        UJULUGCKOU 614OMRCNFSSXX 06/06/2018       20190815
#> 3              36413        IQRGKIIMJY 118NPCNNTKWCP 19/01/2018       20190915
#> 4             488616        EFDZUOPVUJ 581DFDZCBFVBC 03/04/2018       20191026
#> 5             581785        IQFOGHNDRO 826HOVNIMIQWS 14/12/2018       20191017
#> 6             564911        STJBWKKKIG 928VSNVNYJQNR 11/10/2018       20191006
#>   arrival_date qty_pigs qty_doa fci_declaration dep_assurance_no   dep_name
#> 1     20190209       97       4            3685           812319 TNYXECRXTQ
#> 2     20190816      167      19            3608           621990 BYPDKAUULX
#> 3     20190916      115      14            1299           996219 UYRYPXNTGU
#> 4     20191027      125       9            9603           636449 KOWQHTETUG
#> 5     20191018      109      14            2420           272511 XEHRXZSJEA
#> 6     20191007       72      11            2452           959254 OYFCXVTOYL
#>     dep_address dep_postcode departure_cph dest_assurance_no  dest_name
#> 1 50 TOFGLBEULW       ID YTA   95/216/1100             22510 MFBGMIUGWV
#> 2 31 CDVIAMOZYC       NC RJE   69/196/5890            439583 XMQBZRFOKL
#> 3 10 VBLBFDSAFH       TV XCF   52/577/5349            245690 KUBFWNVPLS
#> 4 94 XGWIBIQRKK       CX QXM   39/103/5541            458474 ZGWVOAAEFM
#> 5 42 MMYOSGVYIK       QV RNN   41/788/6464            378463 KULTVHDMXP
#> 6 94 IYOBHBTJWI       YB HWC   69/393/9398            695389 GSKYRADWKQ
#>    dest_address dest_postcode    dest_cph
#> 1 22 LFYLLNWFPI        QY GRM 19/818/9098
#> 2 54 TRQSPYVRPQ        BW NWI 71/939/3228
#> 3 32 GXDFSBEDPM        UY ZFH 82/501/8178
#> 4 61 IZGFLEGGLY        AL MXZ 13/282/1763
#> 5 54 XQZKSYLASC        MC JJT 57/418/6011
#> 6 35 UWDSSQIGMS        QV GSK 39/947/2201

The function reformat_data() uses the loaded configurations to read in the movement data, and standardise them to movenet format, by:

extracting columns with minimally required data, corresponding to from (originating holding identifier), to (destination holding identifier), date, and weight (movement quantity). The resulting data tibble contains these columns in this specific order.
extracting along any optional columns, as indicated in the loaded configurations. In the resulting data tibble, these columns are located after the required data columns.
checking that the date column can be parsed as dates (using the date format indicated in the configurations), and that the weight column is numeric.
converting dates to R Date format, to improve interoperability.

To read in a movement data file, use reformat_data(data, type = "movement"):

movement_data <- 
  reformat_data(system.file("extdata", "example_movement_data.csv", package="movenet"),
                type = "movement")

head(movement_data) # Inspect the resulting movement_data tibble
#> # A tibble: 6 x 4
#>   departure_cph dest_cph    departure_date qty_pigs
#>   <chr>         <chr>       <date>            <dbl>
#> 1 95/216/1100   19/818/9098 2019-02-08           97
#> 2 69/196/5890   71/939/3228 2019-08-15          167
#> 3 52/577/5349   82/501/8178 2019-09-15          115
#> 4 39/103/5541   13/282/1763 2019-10-26          125
#> 5 41/788/6464   57/418/6011 2019-10-17          109
#> 6 69/393/9398   39/947/2201 2019-10-06           72

The movement data are now in a suitable format (movenet-format movement data tibble) to be plugged into pseudonymisation and network analysis workflows.

Reshaping holding data to movenet format

If you want to use additional holding data (e.g. coordinates or holding type) in your analyses, the same process as described above can be followed for holding config and data files:

# Load a holding config file:
load_config(system.file("configurations", "fakeScotEID_holding.yml", package="movenet")) 
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/fakeScotEID_holding.yml
# Read in and reformat a holding data file:
holding_data <- 
  reformat_data(system.file("extdata", "example_holding_data.csv", package="movenet"),
                type = "holding")

For holding data, the only absolutely required data column is id (holding identifier, matching to from and to in the movement data). If you wish to include geographic coordinates, coord_EPSG_code and country_code are required file options, and coord_x and coord_y are required data columns.

In addition to extraction of required and requested optional columns, reformat_data() performs the following data checks and standardisations on holding data:

If the loaded configurations include headers/indices for columns with geographical coordinates (coord_x and coord_y), these columns are checked to contain numeric data, and are then converted to a single simple feature (sf) geometry list-column using the ETRS89 coordinate reference system.
If the loaded configurations include a header/index for a column herd_size, this column is checked to contain numeric data.

The holding data are now in a suitable format (movenet-format holding data tibble) to be plugged into pseudonymisation and network analysis workflows.

Reshaping livestock movement data to movenet format

Loading a movement config file

Loading a movement data file

Reshaping holding data to movenet format

Further reading