Getting started with movenet
movenet.Rmd
The goal of movenet is to simplify the use of livestock movement data in veterinary public health. It does this by providing workflows for pseudonymisation and network analysis. A first step in any of these workflows is reading in the livestock movement data that you want to process or analyse.
This vignette describes how to get started with movenet, by reading in data files and reshaping them into a common format required by all three movenet workflows.
Reshaping livestock movement data to movenet format
To read in a movement data file, first load a configurations (config) file, telling movenet how to read the data file, and then load the data file itself.
Loading a movement config file
Livestock movement data can come in a diversity of shapes and formats: not only do different countries collect different types of trade-related data, they may also follow different data formatting customs for, for example, decimal marks and dates.
To ensure correct reading of the data regardless of the source,
movenet requires users to supply a config file. The config file contains
values for a range of properties that commonly vary between datasets, as
well as column headers for data fields that are either required
(from
, to
, date
,
weight
) or that you want to extract along for analyses. For
details on config file requirements and how to manage movenet
configurations, see vignette("configurations")
. Example
config files and an empty template can be found in the package’s
configurations
folder.
Here is the content of an example config file,
ScotEID.yml
:
xfun::file_string(system.file("configurations", "ScotEID.yml", package="movenet"))
#> movedata_fileopts:
#> separator: "," #Separator (delimiter) character used in movement datafile
#> encoding: "UTF-8" #Encoding used in movement datafile
#> decimal: "." #Decimal mark used in movement datafile
#> date_format: "%Y%m%d" #Date format specification used in movement datafile (empty string "" for "%AD" flexible YMD parser, or see ?readr::parse_date for guidance)
#>
#> movedata_cols:
#> from: "departure_cph" #Column name or number for Identifier of origin holding
#> to: "dest_cph" #Column name or number for Identifier of destination holding
#> date: "departure_date" #Column name or number for Date of transport
#> weight: "qty_pigs" #Column name or number for Movement weight (e.g. nr of pigs moved)
The ScotEID.yml
configurations can be loaded into the
movenet environment with the function load_config()
:
load_config(system.file("configurations", "ScotEID.yml", package="movenet")) # Load ScotEID.yml
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/ScotEID.yml
get_config() # Inspect the configurations in the movenet environment
#> $movedata_fileopts.separator
#> [1] ","
#>
#> $movedata_fileopts.encoding
#> [1] "UTF-8"
#>
#> $movedata_fileopts.decimal
#> [1] "."
#>
#> $movedata_fileopts.date_format
#> [1] "%Y%m%d"
#>
#> $movedata_cols.from
#> [1] "departure_cph"
#>
#> $movedata_cols.to
#> [1] "dest_cph"
#>
#> $movedata_cols.date
#> [1] "departure_date"
#>
#> $movedata_cols.weight
#> [1] "qty_pigs"
Loading a movement data file
After configurations have been loaded into the movenet environment, a delimited movement data file can be read in and processed.
Here are the first 6 rows of an example movement data file,
example_movement_data.csv
:
head(read.csv(system.file("extdata", "example_movement_data.csv", package="movenet"),
encoding = "UTF-8"))
#> movement_reference foreign_reference lot_no lot_date departure_date
#> 1 304781 PQLIKVUQVS 828PRSXKRGZYB 18/12/2018 20190208
#> 2 229759 UJULUGCKOU 614OMRCNFSSXX 06/06/2018 20190815
#> 3 36413 IQRGKIIMJY 118NPCNNTKWCP 19/01/2018 20190915
#> 4 488616 EFDZUOPVUJ 581DFDZCBFVBC 03/04/2018 20191026
#> 5 581785 IQFOGHNDRO 826HOVNIMIQWS 14/12/2018 20191017
#> 6 564911 STJBWKKKIG 928VSNVNYJQNR 11/10/2018 20191006
#> arrival_date qty_pigs qty_doa fci_declaration dep_assurance_no dep_name
#> 1 20190209 97 4 3685 812319 TNYXECRXTQ
#> 2 20190816 167 19 3608 621990 BYPDKAUULX
#> 3 20190916 115 14 1299 996219 UYRYPXNTGU
#> 4 20191027 125 9 9603 636449 KOWQHTETUG
#> 5 20191018 109 14 2420 272511 XEHRXZSJEA
#> 6 20191007 72 11 2452 959254 OYFCXVTOYL
#> dep_address dep_postcode departure_cph dest_assurance_no dest_name
#> 1 50 TOFGLBEULW ID YTA 95/216/1100 22510 MFBGMIUGWV
#> 2 31 CDVIAMOZYC NC RJE 69/196/5890 439583 XMQBZRFOKL
#> 3 10 VBLBFDSAFH TV XCF 52/577/5349 245690 KUBFWNVPLS
#> 4 94 XGWIBIQRKK CX QXM 39/103/5541 458474 ZGWVOAAEFM
#> 5 42 MMYOSGVYIK QV RNN 41/788/6464 378463 KULTVHDMXP
#> 6 94 IYOBHBTJWI YB HWC 69/393/9398 695389 GSKYRADWKQ
#> dest_address dest_postcode dest_cph
#> 1 22 LFYLLNWFPI QY GRM 19/818/9098
#> 2 54 TRQSPYVRPQ BW NWI 71/939/3228
#> 3 32 GXDFSBEDPM UY ZFH 82/501/8178
#> 4 61 IZGFLEGGLY AL MXZ 13/282/1763
#> 5 54 XQZKSYLASC MC JJT 57/418/6011
#> 6 35 UWDSSQIGMS QV GSK 39/947/2201
The function reformat_data()
uses the loaded
configurations to read in the movement data, and standardise them to
movenet format, by:
extracting columns with minimally required data, corresponding to
from
(originating holding identifier),to
(destination holding identifier),date
, andweight
(movement quantity). The resulting data tibble contains these columns in this specific order.extracting along any optional columns, as indicated in the loaded configurations. In the resulting data tibble, these columns are located after the required data columns.
checking that the
date
column can be parsed as dates (using the date format indicated in the configurations), and that theweight
column is numeric.converting dates to R Date format, to improve interoperability.
To read in a movement data file, use
reformat_data(data, type = "movement")
:
movement_data <-
reformat_data(system.file("extdata", "example_movement_data.csv", package="movenet"),
type = "movement")
head(movement_data) # Inspect the resulting movement_data tibble
#> # A tibble: 6 x 4
#> departure_cph dest_cph departure_date qty_pigs
#> <chr> <chr> <date> <dbl>
#> 1 95/216/1100 19/818/9098 2019-02-08 97
#> 2 69/196/5890 71/939/3228 2019-08-15 167
#> 3 52/577/5349 82/501/8178 2019-09-15 115
#> 4 39/103/5541 13/282/1763 2019-10-26 125
#> 5 41/788/6464 57/418/6011 2019-10-17 109
#> 6 69/393/9398 39/947/2201 2019-10-06 72
The movement data are now in a suitable format (movenet-format movement data tibble) to be plugged into pseudonymisation and network analysis workflows.
Reshaping holding data to movenet format
If you want to use additional holding data (e.g. coordinates or holding type) in your analyses, the same process as described above can be followed for holding config and data files:
# Load a holding config file:
load_config(system.file("configurations", "fakeScotEID_holding.yml", package="movenet"))
#> Successfully loaded config file: C:/Users/cboga/AppData/Local/Temp/Rtmp25xhub/temp_libpath11484970381d/movenet/configurations/fakeScotEID_holding.yml
# Read in and reformat a holding data file:
holding_data <-
reformat_data(system.file("extdata", "example_holding_data.csv", package="movenet"),
type = "holding")
For holding data, the only absolutely required data column is
id
(holding identifier, matching to from
and
to
in the movement data). If you wish to include geographic
coordinates, coord_EPSG_code
and country_code
are required file options, and coord_x
and
coord_y
are required data columns.
In addition to extraction of required and requested optional columns,
reformat_data()
performs the following data checks and
standardisations on holding data:
If the loaded configurations include headers/indices for columns with geographical coordinates (
coord_x
andcoord_y
), these columns are checked to contain numeric data, and are then converted to a single simple feature (sf) geometry list-column using the ETRS89 coordinate reference system.If the loaded configurations include a header/index for a column
herd_size
, this column is checked to contain numeric data.
The holding data are now in a suitable format (movenet-format holding data tibble) to be plugged into pseudonymisation and network analysis workflows.