This report provides custodians of livestock movement data with information about how enhancing the privacy of these data affects epidemiologically relevant network properties. The aim is to facilitate finding an appropriate balance between privacy for the livestock industry, and utility for veterinary public health practitioners, epidemiologists and mathematical modellers.
To generate this report, a movenet-format movement dataframe was
provided by the user, movement weights (defined as the number of animals moved between two holdings on a movement date) were modified
using various functions and parameters, and the effects of these
modifications were analysed on a selection of
weighted network measures. These analyses were
carried out for each 28-day period within the data (as set via the
time_unit
argument). The report presents the results of these analyses
in the form of figures with some interpretive guidance.
Figure 1 shows how the mean movement weights of all 28-day periodic sub-networks in the data are affected by jittering or rounding movement weights to different extents.
Where movement weight modifications result in mean movement weights that are higher than in the true data, this is the result of the boundary conditions of the privacy-enhancing functions. When jittering, movements weights are required to be positive, and a jitter resampling procedure is in place to avoid values of 0 or below; as a result, replacement values (jittered movement weights) can on average be somewhat higher than the original values, particularly where the jitter range is of a similar or larger magnitude than the observed movement weights. Similarly, when rounding, the rounding unit is set as the minimum value for the modified datasets, resulting in relatively more rounding up (rather than down) where the rounding unit is of a similar or larger magnitude than the observed movement weights.
This section describes the effects of movement weight modifications on two global network properties: (i) the mean shortest path length and (ii) the strength assortativity. These are both calculated for each 28-day period within the data.
The shortest path length or geodesic distance between a pair of nodes (holdings) in the network is the smallest number of edges (movements) needed to reach one holding from the other. The most common weighted version of this algorithm interprets edge weight as the distance (or cost) associated with an edge, and compares path lengths based on their total weighted distance (Dijkstra 1959). Considering that movement weight is a measure of the strength of a connection, rather than a distance or cost, here edges were weighted by the reciprocal of movement weight, normalised by the mean weight in the network (\({mean\_movement\_weight}/{movement\_weight}\)). Use of this weighting metric is suggested by Opsahl et al., as the resulting unit of distance can be easily interpreted (as “one step with the average weight in the network”) and makes distances comparable across networks with different ranges of movement weights (Opsahl, Agneessens, and Skvoretz 2010; Opsahl 2011). The smaller the mean weighted shortest path length is, the faster you can expect a disease to spread between holdings, due to shorter paths and/or greater movement weights.
Figure 2 shows how the mean weighted shortest path lengths of all 28-day periodic sub-networks in the data are affected by jittering or rounding movement weights to different extents.
Where movement weight modifications result in mean shortest path lengths that are higher than in the true data, this indicates that using these particular modifications would lead to overestimating the real mean shortest path length and underestimating the transmission potential of the network. Conversely, where mean shortest path lengths are lower than in the true data, this indicates that using these particular modifications would lead to underestimating the real mean shortest path length and overestimating the transmission potential of the network.
The strength of a node (holding) in a network is the sum of the weights of its edges. Here, this is taken to mean the sum of all movement weights. Strength assortativity is a measure of the tendency of holdings with similar strength to be connected to each other. Here, assortativity is considered in a directed manner: the tendency of holdings with a particular total number of outgoing animals (out-strength) to be connected to holdings with a similar number of incoming animals (in-strength). From an epidemiological point of view, the strength assortativity can provide information on which type of holdings may be infected early in a potential outbreak: if the assortativity is high, the tendency will be for the infection to first reach holdings that exchange a large number of animals.
Figure 3 shows how the strength assortativities of all 28-day periodic sub-networks in the data are affected by jittering or rounding movement weights to different extents.
Where movement weight modifications result in strength assortativities that are higher than in the true data, this indicates that using these particular modifications would lead to overestimating the real strength assortativities in the network. Conversely, where strength assortativities are lower than in the true data, this indicates that using these particular modifications would lead to underestimating the real strength assortativities in the network.
This section describes the effects of movement weight modifications on the ranking of holdings according to three local centrality measures, providing information about the importance of holdings in the network in terms of their connections with other holdings. Rankings are calculated and compared for each 28-day period within the data.
The strength of a node (holding) in a network is the sum of the weights of its edges. Here, this is taken to mean the sum of all movement weights. Holdings were ranked according to the geometric mean of their total number of incoming animals (in-strength) and outgoing animals (out-strength), in decreasing order. The ranking in each privacy-enhanced dataset was then compared to the ranking in the true dataset.
Figure 4 shows how the ranking of holdings according to strength is affected by jittering or rounding movement weights to different extents.
Where movement weight modifications result in correlation coefficients that are much lower than 1, this indicates that the ranking of holdings according to strength is very different than in the true data. Using these particular modifications could lead to inaccurate identification of the most (and least) important holdings in the network based on their strength.
The betweenness of a node (holding) in a network is the number of shortest paths between all pairs of holdings that pass through that holding. Here, the shortest paths were determined by weighting edges according to the reciprocal of the movement weight, normalised by the mean weight in the network (\({mean\_movement\_weight}/{movement\_weight}\)). Holdings were ranked according to their betweenness, in decreasing order. The ranking in each privacy-enhanced dataset was then compared to the ranking in the true dataset.
Holdings with a high betweenness centrality may be of particular epidemiological relevance as they increase the connectivity of the network, and they could make it easier for an infection to reach sections of the network that would otherwise be distant.
Figure 5 shows how the ranking of holdings according to betweenness is affected by jittering or rounding movement weights to different extents.
Where movement weight modifications result in correlation coefficients that are much lower than 1, this indicates that the ranking of holdings according to betweenness is very different than in the true data. Using these particular modifications could lead to inaccurate identification of the most (and least) important holdings in the network based on their betweenness.
The PageRank of a node (holding) in a network is a measure of its importance based on the number of nodes it links to, and the number of links these connections make in turn (Brin and Page 1998). Here, edges were weighted according to their movement weights. Holdings were ranked according to their PageRank, in decreasing order. The ranking in each privacy-enhanced dataset was then compared to the ranking in the true dataset using Kendall’s tau rank correlation coefficient.
PageRank has been suggested as a useful measure for the importance and influence of a holding in a network and a proxy for the probability of a random spreading process occurring when the holding becomes infected. PageRank has been used as a proxy for transmission risk in various studies investigating the spread of infectious diseases of livestock such as bovine viral diarrhoea (Hirose et al. 2021) and foot-and-mouth disease (González-Gordon et al. 2023).
Figure 6 shows how the ranking of holdings according to PageRank is affected by jittering or rounding movement weights to different extents.
Where movement weight modifications result in correlation coefficients that are much lower than 1, this indicates that the ranking of holdings according to PageRank is very different than in the true data. Using these particular modifications could lead to inaccurate identification of the most (and least) important holdings in the network based on their PageRank.
Entries with movement weight 0 or representing moves from a holding to itself (“loops”) were removed, as they were considered irrelevant for disease transmission and as potentially complicating the interpretability of analyses. Additionally, all repeated moves from and to the same holdings on the same day were aggregated into a single entry per day, with movement weights summed up.
Movement weights were modified with the following functions:
movenet::jitter_weights()
: This adds random noise between -range
and range
to movement weights, while ensuring that resulting weights remain greater than 0. jitter_weights()
was applied with various range
arguments: 5, 10, 50, 100 (until the order of magnitude of the mean weight in the data). To take into account the effects of random sampling, 3 simulations were run with each range
argument.
movenet::round_weights()
: This rounds movement weights to multiples of unit
, and also sets unit
as the minimum possible value for the resulting weights. round_weights()
was applied with various unit
arguments: 5, 10, 50, 100 (until the order of magnitude of the largest weight in the data).
Static network representations (snapshots) were created from true and privacy-enhanced datasets, for each subsequent 28-day period in the data, using the igraph
package (Csardi and Nepusz 2006). During snapshot generation, all repeated moves from and to the same holdings were aggregated into a single network edge per snapshot, with movement weights summed up.
Static network properties for each 28-day period were determined using the igraph
package (Csardi and Nepusz 2006). Strength assortativity, strength centrality and PageRank centrality were calculated by weighting edges according to their movement weights. Mean shortest path length and betweenness centrality were calculated by weighting edges according to the reciprocal of their movement weights, normalised by the mean movement weight in the network (\({mean\_movement\_weight}/{movement\_weight}\)). For global measures, the significance of differences between the true data and each privacy-enhanced dataset was assessed using two-tailed paired Wilcoxon signed rank sum tests.
For rankings according to local network properties, holdings were ranked according to the respective centrality measure, in decreasing order, with ties being replaced by their mean value. The ranking in each privacy-enhanced periodic subnetwork was compared to the ranking in the respective true subnetwork using Kendall’s tau rank correlation coefficient.