```
library(tidyverse)
library(gt)
library(DHARMa)
library(patchwork)
library(broom)
source("XX_functions.R")
# Metrics
<- read_csv("Data/Datasets/vultures_final.csv") |>
v # Round non-integer values of population counts
mutate(across(c(contains("pop"), contains("raw")), round))
# Raw counts
<- read_csv("Data/Datasets/vultures_clean_2023.csv")
raw
# Predicted GAM models
<- read_csv("Data/Datasets/vultures_gams_pred.csv") pred
```

# Analysis

## Setup

Set plotting defaults

`<- seq(200, 300, by = 5) y_breaks `

## Using This Report

This report contains all the stats and analyses for the final data set.

Each section (e.g., “Timing of kettle formation”) contains the analysis for a particular type of data.

These analyses have several parts

**Descriptive statistics**- min, max, median, etc.**Model results**- results of linear or general linear models, depending on the measures**Figures**- visualizing the models on top of the data**Model Checks**- DHARMa plots assessing model fit and assumptions**Sensitivity**- Checking influential observations**Full Model Results**- the raw summary output of the models, for completeness

### Models

Most models are standard linear regressions.

Some, involving the count data (number of birds) are either Negative binomial or Poisson generalized linear models.

The Patterns of migration (skew and kurtosis) are linear models without predictors (intercept only), which effectively makes them t-tests (they produce the exact same estimates and statistics), but allow us to use DHARMA for model checking, etc.

### Model Results

Model results are presented in a tabulated form, which is also saved to CSV as `Data/Tables/table_MEASURE.csv`

.

They are also presented as the “Full Model Results”. This is the output of using `summary(model)`

in R and is just a complete version of the output in case anything needs to be inspected at a later date. In short, I don’t think you’ll need it, but I’ve included it just in case.

The tabulated results contain both estimates and P values, as well as overall model R2 values for linear models (but not for general linear models, as they don’t really apply). **I could also add confidence intervals as well if that’s useful**.

R2 values are presented as regular and adjusted. The adjusted values take into account samples sizes and the number of predictors.

We can interpret the Estimates directly, especially in linear models.

For example, if there is a significant effect of `year`

on `mig_start_doy`

, and the estimate is 0.176 we can say:

“There was a significant effect of year on the start date of kettle formations, such that the start date increased by 0.176 days per year (t = XXX; P-value = XXX). Thus kettle formations started ~4.4 days later at the end of the study compared to the start.”

Considering that there are 25 years in this study.

For Negative Binomial or Poisson models, we will transform the estimates to “Incident Rate Ratios” by exponentiating the estimate.

For example, if there is a significant effect of `year`

on `mig_pop_max`

in a generalized linear model with the negative binomial family, and the estimate is 0.03625499, we can first calculate the exponential:

`exp(0.03625499)`

`[1] 1.03692`

Then we can say:

“There was a significant effect of year on the maximum number of vultures predicted in a kettle (Est = 0.0363; t = XXX; P-value = XXX), such that the number of birds increased by 3% per year.”

If our exponentiated result was 3, we would say a “300% increase”, or a “3-fold” increase.

These exponentiated estimates are included in the outputs as well as “Estimate Exp”

### Variables

All variables are generally referred to by their name in the data set, e.g., `mig_start_doy`

.

See the glossary of variables names in the Calculate Metrics > Data Details section.

### Figures

You can click on any figure expand it out.

These figures are here to demostrate the patterns we’re seeing in the data and the stats. I imagine you would want different figures for publication, so just let me know what they should look like!

### DHARMa plots

DHARMa is a package for simulating residuals to allow model checking for all types of models (details).

Generally these plots can be interpreted similarly to interpreting QQ Normal and Residual plots for linear models (although technically they’re using simulated residuals, not model residuals).

**For context…**

For example, we would interpret the following plot as showing decent model fit and assumptions.

The QQ plot on the left is nearly straight, there are no tests that flag problems in the distribution (KS test), Dispersion or Outliers. We also see no pattern in the residuals and no problems have been highlighted.

Similarly, I wouldn’t be unduly worried about the scary red line in this plot, as I don’t really find the residuals to show much pattern and there are no other problems highlighted, except that one outlier (the red asterisk).

This one starts to look a bit problematic…

This one is awful!

### Sensitivity

The DHARMa model checks will usually catch outlier problems, but out of an abundance of caution, we’ll do a quick review of any high Cook’s D values and will rerun the models without certain data points to ensure they’re still acceptable.

These sections first calculate and show the Cook’s D for all observations in each model. In these tables, values highlighted in red are more than 4/25 (a standard Cook’s D cutoff 4/sample size). These indicate potential influential data points for a particular model.

Then I run a comparison of the model before and the after removing that point.

In every case here the pattern was either strengthened or weakened by the removal, but the overall pattern did not change (no Estimate signs changed, and significance changed from sig to non-sig or vice versa).

- Strengthened patterns showed greater Estimates and smaller P values
- Weakened patterns showed smaller Estimates and greater P values

Tables with coefficients for both original and new model are shown, with colour intensity showing strength in the pattern.

## Timing of kettle formation

**When do kettles form? Has this timing changed over the years?**

- Look for changes in the DOY of the 5%, 25%, 50%, 75%, 95%, of passage as well as the DOY with the highest predicted count.
`mig_start_doy`

,`peak_start_doy`

,`p50_doy`

,`peak_end_doy`

,`mig_end_doy`

,`max_doy`

#### Descriptive stats

## Code

```
|>
v select(contains("start_doy"), contains("50_doy"), contains("end_doy"), "max_doy") |>
desc_stats()
```

measure | mean | sd | min | median | max | n |
---|---|---|---|---|---|---|

mig_start_doy | 259.48 | 2.90 | 254 | 259 | 265 | 25 |

peak_start_doy | 266.96 | 2.21 | 263 | 267 | 272 | 25 |

p50_doy | 271.64 | 2.25 | 268 | 271 | 277 | 25 |

mig_end_doy | 284.00 | 3.11 | 277 | 284 | 289 | 25 |

peak_end_doy | 276.48 | 2.14 | 273 | 276 | 281 | 25 |

max_doy | 271.60 | 2.80 | 266 | 271 | 278 | 25 |

Here we look at a **linear regression** of year by kettle timing dates.

We look at the effect of year and type of measurement (migration start/end, peak start/end and time of 50% passage) on date (day of year).

## Code

```
<- select(v, ends_with("doy"), year) |>
v1 pivot_longer(-year, names_to = "measure", values_to = "doy")
```

First we look at the interaction term to see if the slopes of change among measurements are significantly different from one another.

The type III ANOVA shows no significant interaction.

## Code

```
<- lm(doy ~ measure * year, data = v1)
m0 ::Anova(m0, type = "III") car
```

```
Anova Table (Type III tests)
Response: doy
Sum Sq Df F value Pr(>F)
(Intercept) 3.06 1 0.5445 0.46183
measure 9.29 5 0.3309 0.89360
year 47.49 1 8.4613 0.00423 **
measure:year 7.53 5 0.2683 0.92974
Residuals 774.60 138
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We’ll look at a post-hoc comparison of the slopes to interpret them individually.

As with the individual models, we can see that each measure shows a significant positive slope with the exception of the End of Migration (`mig_end_day`

).

Note that here we’ve used a False Discovery Rate to adjust the p-values to multiple tests. Adjusting these tests is likely more conservative than required.

## Code

```
<- emmeans::emtrends(m0, ~ measure, var = "year") |>
e ::test(adjust = "FDR") emmeans
```

All show a significant increase in doy except `mig_end_doy`

. Data saved to Data/Datasets/table_timing_combined.csv

## Code

```
<- e |>
t mutate(
model = "doy ~ measure * year",
n = nrow(v)) |>
rename(
estimate = year.trend,
std_error = SE,
statistic = t.ratio,
p_value = p.value)
write_csv(t, d)
fmt_table(t) |>
tab_header(title = "Post-hoc results") |>
tab_footnote("Number of years in the data.", locations = cells_column_labels(n)) |>
tab_footnote("T statistic.", locations = cells_column_labels("statistic"))
```

Post-hoc results | ||||||
---|---|---|---|---|---|---|

measure | estimate | std error | df | statistic^{1} |
p value | n^{2} |

doy ~ measure * year |
||||||

max_doy | 0.181 | 0.062 | 138 | 2.909 | 0.016 | 25 |

mig_end_doy | 0.093 | 0.062 | 138 | 1.497 | 0.137 | 25 |

mig_start_doy | 0.176 | 0.062 | 138 | 2.830 | 0.016 | 25 |

p50_doy | 0.147 | 0.062 | 138 | 2.370 | 0.029 | 25 |

peak_end_doy | 0.132 | 0.062 | 138 | 2.121 | 0.043 | 25 |

peak_start_doy | 0.155 | 0.062 | 138 | 2.490 | 0.028 | 25 |

^{1} T statistic. |
||||||

^{2} Number of years in the data. |

Here we look at **linear regressions** of year by kettle timing dates.

```
<- lm(mig_start_doy ~ year, data = v)
m1 <- lm(peak_start_doy ~ year, data = v)
m2 <- lm(p50_doy ~ year, data = v)
m3 <- lm(peak_end_doy ~ year, data = v)
m4 <- lm(mig_end_doy ~ year, data = v)
m5 <- lm(max_doy ~ year, data = v)
m6
<- list(m1, m2, m3, m4, m5, m6) models
```

**Tabulated Model Results**

All show a significant increase in doy except `mig_end_doy`

. Data saved to Data/Datasets/table_timing.csv

## Code

```
<- get_table(models)
t write_csv(t, d)
fmt_table(t)
```

term | estimate | std error | statistic | p value | n |
---|---|---|---|---|---|

mig_start_doy ~ year (R2 = 0.22; R2-adj = 0.19) |
|||||

(Intercept) | −94.618 | 138.088 | −0.685 | 0.500 | 25 |

year | 0.176 | 0.069 | 2.564 | 0.017 | 25 |

peak_start_doy ~ year (R2 = 0.3; R2-adj = 0.27) |
|||||

(Intercept) | −44.588 | 99.801 | −0.447 | 0.659 | 25 |

year | 0.155 | 0.050 | 3.122 | 0.005 | 25 |

p50_doy ~ year (R2 = 0.26; R2-adj = 0.23) |
|||||

(Intercept) | −24.819 | 104.600 | −0.237 | 0.815 | 25 |

year | 0.147 | 0.052 | 2.834 | 0.009 | 25 |

peak_end_doy ~ year (R2 = 0.23; R2-adj = 0.2) |
|||||

(Intercept) | 11.143 | 101.511 | 0.110 | 0.914 | 25 |

year | 0.132 | 0.050 | 2.614 | 0.016 | 25 |

mig_end_doy ~ year (R2 = 0.05; R2-adj = 0.01) |
|||||

(Intercept) | 96.771 | 163.107 | 0.593 | 0.559 | 25 |

year | 0.093 | 0.081 | 1.148 | 0.263 | 25 |

max_doy ~ year (R2 = 0.25; R2-adj = 0.22) |
|||||

(Intercept) | −92.317 | 130.519 | −0.707 | 0.486 | 25 |

year | 0.181 | 0.065 | 2.788 | 0.010 | 25 |

**Note:** Lines without CI ribbons are not significant

## Code

```
<- v |>
doy_figs select(year, contains("doy")) |>
pivot_longer(-year, names_to = "measure", values_to = "doy") |>
left_join(mutate(t, measure = str_trim(str_extract(model, "^[^~]*")), sig = p_value <= 0.05) |>
filter(term == "year") |>
select(measure, sig), by = "measure")
ggplot(doy_figs, aes(x = year, y = doy, group = measure, colour = measure, fill = sig)) +
theme_bw() +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
scale_y_continuous(breaks = y_breaks) +
scale_colour_viridis_d() +
scale_fill_manual(values = c("TRUE" = "grey60", "FALSE" = "#FFFFFF00"), guide = "none") +
labs(caption = "Lines without CI Error ribbons represent non-significant models")
```

``geom_smooth()` using formula = 'y ~ x'`

The DHARMa model checks don’t highlight any particular outlier problems, but out of an abundance of caution, we’ll do a quick review of any high Cook’s D values.

Values highlighted in red are less than 4/25, a standard Cook’s D cutoff (4/sample size). These indicate potential influential data points for a particular model.

## Code

```
get_cooks(models) |>
gt_cooks(width = "80%")
```

Cook's Distances | ||||||
---|---|---|---|---|---|---|

year | mig_start_doy | peak_start_doy | p50_doy | peak_end_doy | mig_end_doy | max_doy |

1998 | 0.05 | 0.03 | 0.04 | 0.04 | 0.00 | 0.05 |

1999 | 0.17 | 0.08 | 0.03 | 0.00 | 0.01 | 0.03 |

2000 | 0.14 | 0.11 | 0.08 | 0.09 | 0.03 | 0.01 |

2001 | 0.14 | 0.00 | 0.00 | 0.03 | 0.07 | 0.00 |

2002 | 0.01 | 0.04 | 0.08 | 0.00 | 0.02 | 0.15 |

2003 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |

2004 | 0.08 | 0.05 | 0.06 | 0.06 | 0.09 | 0.02 |

2005 | 0.03 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 |

2006 | 0.01 | 0.01 | 0.00 | 0.01 | 0.06 | 0.02 |

2008 | 0.03 | 0.02 | 0.01 | 0.01 | 0.02 | 0.00 |

2009 | 0.01 | 0.00 | 0.03 | 0.07 | 0.11 | 0.02 |

2010 | 0.02 | 0.01 | 0.00 | 0.00 | 0.01 | 0.01 |

2011 | 0.04 | 0.10 | 0.10 | 0.07 | 0.01 | 0.14 |

2012 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |

2013 | 0.02 | 0.02 | 0.02 | 0.07 | 0.02 | 0.03 |

2014 | 0.00 | 0.02 | 0.02 | 0.03 | 0.04 | 0.01 |

2015 | 0.01 | 0.00 | 0.01 | 0.01 | 0.07 | 0.03 |

2016 | 0.06 | 0.07 | 0.05 | 0.01 | 0.00 | 0.04 |

2017 | 0.00 | 0.01 | 0.02 | 0.06 | 0.09 | 0.00 |

2018 | 0.00 | 0.02 | 0.01 | 0.00 | 0.00 | 0.06 |

2019 | 0.01 | 0.02 | 0.05 | 0.04 | 0.04 | 0.04 |

2020 | 0.07 | 0.22 | 0.25 | 0.18 | 0.06 | 0.14 |

2021 | 0.12 | 0.13 | 0.09 | 0.16 | 0.12 | 0.07 |

2022 | 0.15 | 0.12 | 0.06 | 0.02 | 0.00 | 0.08 |

2023 | 0.10 | 0.02 | 0.01 | 0.00 | 0.01 | 0.00 |

#### Check influential points

Now let’s see what happens if we were to omit these years from the respective analyses.

Tables with coefficients for both original and new model are shown, with colour intensity showing strength in the pattern.

For the start of migration, if we omit 1999, the pattern is stronger, but the same.

## Code

`compare(m1, 1999)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - mig_start_doy | (Intercept) | -94.618 | 138.088 | -0.685 | 0.500 |

Drop 1999 - mig_start_doy | (Intercept) | -161.289 | 141.525 | -1.140 | 0.267 |

Original - mig_start_doy | year | 0.176 | 0.069 | 2.564 | 0.017 |

Drop 1999 - mig_start_doy | year | 0.209 | 0.070 | 2.973 | 0.007 |

For the start of peak migration, the date of 50% passage and the end of peak migration, if we omit 2020, the pattern is weaker, but the same.

## Code

`compare(m2, 2020)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - peak_start_doy | (Intercept) | -44.588 | 99.801 | -0.447 | 0.659 |

Drop 2020 - peak_start_doy | (Intercept) | 7.053 | 95.767 | 0.074 | 0.942 |

Original - peak_start_doy | year | 0.155 | 0.050 | 3.122 | 0.005 |

Drop 2020 - peak_start_doy | year | 0.129 | 0.048 | 2.712 | 0.013 |

## Code

`compare(m3, 2020)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - p50_doy | (Intercept) | -24.819 | 104.600 | -0.237 | 0.815 |

Drop 2020 - p50_doy | (Intercept) | 32.436 | 99.102 | 0.327 | 0.747 |

Original - p50_doy | year | 0.147 | 0.052 | 2.834 | 0.009 |

Drop 2020 - p50_doy | year | 0.119 | 0.049 | 2.411 | 0.025 |

## Code

`compare(m4, 2020)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - peak_end_doy | (Intercept) | 11.143 | 101.511 | 0.110 | 0.914 |

Drop 2020 - peak_end_doy | (Intercept) | 58.397 | 99.350 | 0.588 | 0.563 |

Original - peak_end_doy | year | 0.132 | 0.050 | 2.614 | 0.016 |

Drop 2020 - peak_end_doy | year | 0.108 | 0.049 | 2.193 | 0.039 |

In addition to the sensitivity check using cook’s distances, here we will conduct a more targeted exploration of some specific years.

#### Gaps around the peak

First, we consider removing 1998, 2011, 2013 as they have sampling gaps around the peak of migration

`<- update(m0, data = filter(v1, !year %in% c(1998, 2011, 2013))) m0a `

Comparing the ANOVA tables shows only superficial differences.

```
fmt_anova(m0)
fmt_anova(m0a)
```

Parameter | Sum Sq | Df | F | P |
---|---|---|---|---|

(Intercept) | 3.056 | 1 | 0.544 | 0.462 |

measure | 9.287 | 5 | 0.331 | 0.894 |

year | 47.493 | 1 | 8.461 | 0.004 |

measure:year | 7.531 | 5 | 0.268 | 0.930 |

Residuals | 774.597 | 138 | NA | NA |

Parameter | Sum Sq | Df | F | P |
---|---|---|---|---|

(Intercept) | 4.520 | 1 | 0.894 | 0.346 |

measure | 11.113 | 5 | 0.439 | 0.820 |

year | 48.304 | 1 | 9.550 | 0.002 |

measure:year | 9.279 | 5 | 0.367 | 0.870 |

Residuals | 606.955 | 120 | NA | NA |

As does the post-hoc comparison of slopes among the measures. If anything, it looks like removing these ‘gappy’ years only strengthens the existing patterns.

Those which are significant are slightly more so, and the slopes are slightly greater.

```
fmt_emmeans(m0)
fmt_emmeans(m0a)
```

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.181 | 0.062 | 138 | 2.909 | 0.016 |

mig_end_doy | 0.093 | 0.062 | 138 | 1.497 | 0.137 |

mig_start_doy | 0.176 | 0.062 | 138 | 2.830 | 0.016 |

p50_doy | 0.147 | 0.062 | 138 | 2.370 | 0.029 |

peak_end_doy | 0.132 | 0.062 | 138 | 2.121 | 0.043 |

peak_start_doy | 0.155 | 0.062 | 138 | 2.490 | 0.028 |

FDR P-value adjustment |

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.194 | 0.063 | 120 | 3.090 | 0.009 |

mig_end_doy | 0.091 | 0.063 | 120 | 1.446 | 0.151 |

mig_start_doy | 0.191 | 0.063 | 120 | 3.039 | 0.009 |

p50_doy | 0.157 | 0.063 | 120 | 2.502 | 0.021 |

peak_end_doy | 0.140 | 0.063 | 120 | 2.220 | 0.034 |

peak_start_doy | 0.163 | 0.063 | 120 | 2.592 | 0.021 |

FDR P-value adjustment |

#### Missing the end of migration

Second, we consider removing 2001, 2006, 2023 as they were three years where it looks like sampling ended earlier than the end of migration.

`<- update(m0, data = filter(v1, !year %in% c(2001, 2006, 2023))) m0b `

Comparing the ANOVA tables shows that while year is still significant, the P-value is higher.

```
fmt_anova(m0)
fmt_anova(m0b)
```

Parameter | Sum Sq | Df | F | P |
---|---|---|---|---|

(Intercept) | 3.056 | 1 | 0.544 | 0.462 |

measure | 9.287 | 5 | 0.331 | 0.894 |

year | 47.493 | 1 | 8.461 | 0.004 |

measure:year | 7.531 | 5 | 0.268 | 0.930 |

Residuals | 774.597 | 138 | NA | NA |

Parameter | Sum Sq | Df | F | P |
---|---|---|---|---|

(Intercept) | 1.738 | 1 | 0.295 | 0.588 |

measure | 8.978 | 5 | 0.305 | 0.909 |

year | 35.575 | 1 | 6.033 | 0.015 |

measure:year | 7.454 | 5 | 0.253 | 0.938 |

Residuals | 707.625 | 120 | NA | NA |

Removing these years from the analyses, reduced the strength of the patterns such that significant effects became trends. However there were few changes to the magnitude of the slopes.

```
fmt_emmeans(m0)
fmt_emmeans(m0b)
```

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.181 | 0.062 | 138 | 2.909 | 0.016 |

mig_end_doy | 0.093 | 0.062 | 138 | 1.497 | 0.137 |

mig_start_doy | 0.176 | 0.062 | 138 | 2.830 | 0.016 |

p50_doy | 0.147 | 0.062 | 138 | 2.370 | 0.029 |

peak_end_doy | 0.132 | 0.062 | 138 | 2.121 | 0.043 |

peak_start_doy | 0.155 | 0.062 | 138 | 2.490 | 0.028 |

FDR P-value adjustment |

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.173 | 0.071 | 120 | 2.456 | 0.052 |

mig_end_doy | 0.077 | 0.071 | 120 | 1.083 | 0.281 |

mig_start_doy | 0.166 | 0.071 | 120 | 2.349 | 0.052 |

p50_doy | 0.151 | 0.071 | 120 | 2.133 | 0.052 |

peak_end_doy | 0.128 | 0.071 | 120 | 1.808 | 0.088 |

peak_start_doy | 0.155 | 0.071 | 120 | 2.198 | 0.052 |

FDR P-value adjustment |

If we compare to a non-adjusted test, only the end of the peak becomes a trend. Adjusting these tests is likely overkill, anyway.

```
fmt_emmeans(m0, adjust = "none")
fmt_emmeans(m0b, adjust = "none")
```

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.181 | 0.062 | 138 | 2.909 | 0.004 |

mig_end_doy | 0.093 | 0.062 | 138 | 1.497 | 0.137 |

mig_start_doy | 0.176 | 0.062 | 138 | 2.830 | 0.005 |

p50_doy | 0.147 | 0.062 | 138 | 2.370 | 0.019 |

peak_end_doy | 0.132 | 0.062 | 138 | 2.121 | 0.036 |

peak_start_doy | 0.155 | 0.062 | 138 | 2.490 | 0.014 |

none P-value adjustment |

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.173 | 0.071 | 120 | 2.456 | 0.015 |

mig_end_doy | 0.077 | 0.071 | 120 | 1.083 | 0.281 |

mig_start_doy | 0.166 | 0.071 | 120 | 2.349 | 0.020 |

p50_doy | 0.151 | 0.071 | 120 | 2.133 | 0.035 |

peak_end_doy | 0.128 | 0.071 | 120 | 1.808 | 0.073 |

peak_start_doy | 0.155 | 0.071 | 120 | 2.198 | 0.030 |

none P-value adjustment |

#### Remove all problematic years

Out of curiosity, let’s try removing all 6 years. This reduces our data from 25 to 19 samples, so is a substantial change and we may see non-significant results merely because we have removed too many samples.

```
<- update(m0, data = filter(v1, !year %in% c(1998, 2011, 2013,
m0c 2001, 2006, 2023)))
```

Comparing the ANOVA tables shows that while year is still significant, the P-value is higher.

```
fmt_anova(m0)
fmt_anova(m0c)
```

Parameter | Sum Sq | Df | F | P |
---|---|---|---|---|

(Intercept) | 3.056 | 1 | 0.544 | 0.462 |

measure | 9.287 | 5 | 0.331 | 0.894 |

year | 47.493 | 1 | 8.461 | 0.004 |

measure:year | 7.531 | 5 | 0.268 | 0.930 |

Residuals | 774.597 | 138 | NA | NA |

Parameter | Sum Sq | Df | F | P |
---|---|---|---|---|

(Intercept) | 3.083 | 1 | 0.576 | 0.450 |

measure | 10.647 | 5 | 0.397 | 0.850 |

year | 36.490 | 1 | 6.811 | 0.010 |

measure:year | 9.121 | 5 | 0.341 | 0.887 |

Residuals | 546.467 | 102 | NA | NA |

Removing these years from the analyses, reduced the strength of the patterns in the end of the peak such that it becomes a trend, but all other measures show little changes. Further, there are few changes to the magnitude of the slopes.

```
fmt_emmeans(m0)
fmt_emmeans(m0c)
```

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.181 | 0.062 | 138 | 2.909 | 0.016 |

mig_end_doy | 0.093 | 0.062 | 138 | 1.497 | 0.137 |

mig_start_doy | 0.176 | 0.062 | 138 | 2.830 | 0.016 |

p50_doy | 0.147 | 0.062 | 138 | 2.370 | 0.029 |

peak_end_doy | 0.132 | 0.062 | 138 | 2.121 | 0.043 |

peak_start_doy | 0.155 | 0.062 | 138 | 2.490 | 0.028 |

FDR P-value adjustment |

Measure | Slope | SE | Df | T | P |
---|---|---|---|---|---|

max_doy | 0.190 | 0.073 | 102 | 2.610 | 0.039 |

mig_end_doy | 0.073 | 0.073 | 102 | 1.000 | 0.319 |

mig_start_doy | 0.180 | 0.073 | 102 | 2.472 | 0.039 |

p50_doy | 0.165 | 0.073 | 102 | 2.259 | 0.039 |

peak_end_doy | 0.138 | 0.073 | 102 | 1.893 | 0.073 |

peak_start_doy | 0.165 | 0.073 | 102 | 2.268 | 0.039 |

FDR P-value adjustment |

#### Conclusions

**I do not believe that these potentially problematic sampling years are driving the patterns we see in this data**.

At worst, by removing these years we increase the error around a slope. But the magnitudes of the slopes do not change and even the significance is only ever reduced to trends.

## Code

```
<- append(list(m0), models)
models map(models, summary)
```

```
[[1]]
Call:
lm(formula = doy ~ measure * year, data = v1)
Residuals:
Min 1Q Median 3Q Max
-6.8473 -1.6117 -0.1487 1.6871 6.3348
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -92.316742 125.108782 -0.738 0.46183
measuremig_end_doy 189.088235 176.930536 1.069 0.28706
measuremig_start_doy -2.300905 176.930536 -0.013 0.98964
measurep50_doy 67.497738 176.930536 0.381 0.70342
measurepeak_end_doy 103.459276 176.930536 0.585 0.55967
measurepeak_start_doy 47.728507 176.930536 0.270 0.78775
year 0.180995 0.062223 2.909 0.00423 **
measuremig_end_doy:year -0.087877 0.087996 -0.999 0.31972
measuremig_start_doy:year -0.004884 0.087996 -0.055 0.95582
measurep50_doy:year -0.033550 0.087996 -0.381 0.70359
measurepeak_end_doy:year -0.049029 0.087996 -0.557 0.57832
measurepeak_start_doy:year -0.026046 0.087996 -0.296 0.76769
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.369 on 138 degrees of freedom
Multiple R-squared: 0.9195, Adjusted R-squared: 0.9131
F-statistic: 143.3 on 11 and 138 DF, p-value: < 2.2e-16
[[2]]
Call:
lm(formula = mig_start_doy ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-3.7823 -2.4867 -0.0717 2.1044 3.6894
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -94.61765 138.08785 -0.685 0.5001
year 0.17611 0.06868 2.564 0.0173 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.615 on 23 degrees of freedom
Multiple R-squared: 0.2223, Adjusted R-squared: 0.1885
F-statistic: 6.576 on 1 and 23 DF, p-value: 0.01734
[[3]]
Call:
lm(formula = peak_start_doy ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-2.7905 -1.2410 -0.6356 1.5194 3.9842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -44.58824 99.80097 -0.447 0.65922
year 0.15495 0.04964 3.122 0.00479 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.89 on 23 degrees of freedom
Multiple R-squared: 0.2976, Adjusted R-squared: 0.2671
F-statistic: 9.745 on 1 and 23 DF, p-value: 0.004794
[[4]]
Call:
lm(formula = p50_doy ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-2.4303 -1.5778 -0.2186 1.2237 4.3069
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -24.81900 104.59963 -0.237 0.8145
year 0.14745 0.05202 2.834 0.0094 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.981 on 23 degrees of freedom
Multiple R-squared: 0.2589, Adjusted R-squared: 0.2266
F-statistic: 8.033 on 1 and 23 DF, p-value: 0.009399
[[5]]
Call:
lm(formula = peak_end_doy ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-3.2636 -1.1873 -0.3398 1.1323 3.4725
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.14253 101.51140 0.110 0.9135
year 0.13197 0.05049 2.614 0.0155 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.922 on 23 degrees of freedom
Multiple R-squared: 0.229, Adjusted R-squared: 0.1955
F-statistic: 6.832 on 1 and 23 DF, p-value: 0.01552
[[6]]
Call:
lm(formula = mig_end_doy ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-6.847 -2.009 0.177 1.966 4.618
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 96.77149 163.10693 0.593 0.559
year 0.09312 0.08112 1.148 0.263
Residual standard error: 3.089 on 23 degrees of freedom
Multiple R-squared: 0.05419, Adjusted R-squared: 0.01306
F-statistic: 1.318 on 1 and 23 DF, p-value: 0.2628
[[7]]
Call:
lm(formula = max_doy ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-4.0362 -2.1131 -0.1222 1.6018 6.3348
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -92.31674 130.51911 -0.707 0.4865
year 0.18100 0.06491 2.788 0.0104 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.472 on 23 degrees of freedom
Multiple R-squared: 0.2526, Adjusted R-squared: 0.2201
F-statistic: 7.774 on 1 and 23 DF, p-value: 0.01045
```

## Code

`model_check_figs(models)`

## Duration of migration

**How long is migration? Has it changed in length? **

- Look for changes in the number of days over which migration and peak migration occur
`mig_dur_days`

,`peak_dur_days`

#### Descriptive stats

## Code

```
|>
v select(contains("dur")) |>
desc_stats()
```

measure | mean | sd | min | median | max | n |
---|---|---|---|---|---|---|

mig_dur_days | 24.52 | 3.62 | 16 | 25 | 31 | 25 |

peak_dur_days | 9.52 | 1.19 | 7 | 10 | 12 | 25 |

```
<- lm(mig_dur_days ~ year, data = v)
m1 <- lm(peak_dur_days ~ year, data = v)
m2
<- list(m1, m2) models
```

**Tabulated Results**

No significant results

## Code

```
<- get_table(models)
t write_csv(t, "Data/Datasets/table_duration.csv")
fmt_table(t)
```

term | estimate | std error | statistic | p value | n |
---|---|---|---|---|---|

mig_dur_days ~ year (R2 = 0.03; R2-adj = -0.01) |
|||||

(Intercept) | 191.389 | 192.063 | 0.996 | 0.329 | 25 |

year | −0.083 | 0.096 | −0.869 | 0.394 | 25 |

peak_dur_days ~ year (R2 = 0.02; R2-adj = -0.02) |
|||||

(Intercept) | 55.731 | 63.706 | 0.875 | 0.391 | 25 |

year | −0.023 | 0.032 | −0.725 | 0.476 | 25 |

**Note:** Lines without Std Error ribbons are not significant

## Code

```
<- v |>
dur_figs select(year, contains("dur")) |>
pivot_longer(-year, names_to = "measure", values_to = "dur")
ggplot(dur_figs, aes(x = year, y = dur, group = measure, colour = measure)) +
theme_bw() +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
#scale_y_continuous(breaks = y_breaks) +
scale_colour_viridis_d()+
labs(caption = "Lines without Std Error ribbons represent non-significant models")
```

``geom_smooth()` using formula = 'y ~ x'`

## Code

`model_check_figs(models)`

The DHARMa model checks don’t highlight any particular outlier problems, but out of an abundance of caution, we’ll do a quick review of any high Cook’s D values.

Values highlighted in red are less than 4/25, a standard Cook’s D cutoff (4/sample size). These indicate potential influential data points for a particular model.

## Code

```
get_cooks(models) |>
gt_cooks()
```

Cook's Distances | ||
---|---|---|

year | mig_dur_days | peak_dur_days |

1998 | 0.02 | 0.00 |

1999 | 0.14 | 0.20 |

2000 | 0.01 | 0.00 |

2001 | 0.00 | 0.02 |

2002 | 0.03 | 0.06 |

2003 | 0.00 | 0.00 |

2004 | 0.00 | 0.00 |

2005 | 0.01 | 0.00 |

2006 | 0.09 | 0.12 |

2008 | 0.00 | 0.00 |

2009 | 0.13 | 0.10 |

2010 | 0.03 | 0.04 |

2011 | 0.00 | 0.00 |

2012 | 0.00 | 0.00 |

2013 | 0.00 | 0.04 |

2014 | 0.03 | 0.01 |

2015 | 0.08 | 0.00 |

2016 | 0.04 | 0.06 |

2017 | 0.07 | 0.05 |

2018 | 0.00 | 0.08 |

2019 | 0.06 | 0.00 |

2020 | 0.00 | 0.00 |

2021 | 0.00 | 0.00 |

2022 | 0.08 | 0.09 |

2023 | 0.09 | 0.04 |

#### Check influential points

Now let’s see what happens if we were to omit these years from the respective analyses.

Tables with coefficients for both original and new model are shown, with colour intensity showing strength in the pattern.

For the number of days in peak migration, if we omit 1999, the pattern is stronger, but the same (i.e. not significant).

## Code

`compare(m2, 1999)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - peak_dur_days | (Intercept) | 55.731 | 63.706 | 0.875 | 0.391 |

Drop 1999 - peak_dur_days | (Intercept) | 89.114 | 64.690 | 1.378 | 0.182 |

Original - peak_dur_days | year | -0.023 | 0.032 | -0.725 | 0.476 |

Drop 1999 - peak_dur_days | year | -0.040 | 0.032 | -1.229 | 0.232 |

In addition to the sensitivity check using cook’s distances, here we will conduct a more targeted exploration of some specific years.

#### Gaps around the peak

First, we consider removing 1998, 2011, 2013 as they have sampling gaps around the peak of migration

```
<- update(m1, data = filter(v, !year %in% c(1998, 2011, 2013)))
m1a <- update(m2, data = filter(v, !year %in% c(1998, 2011, 2013))) m2a
```

Comparing the summary tables shows only superficial differences.

```
fmt_summary(m1)
fmt_summary(m1a)
```

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 191.389 | 192.063 | 0.996 | 0.329 |

year | −0.083 | 0.096 | −0.869 | 0.394 |

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 226.014 | 217.172 | 1.041 | 0.310 |

year | −0.100 | 0.108 | −0.927 | 0.365 |

```
fmt_summary(m2)
fmt_summary(m2a)
```

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 55.731 | 63.706 | 0.875 | 0.391 |

year | −0.023 | 0.032 | −0.725 | 0.476 |

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 56.453 | 69.712 | 0.810 | 0.428 |

year | −0.023 | 0.035 | −0.674 | 0.508 |

#### Missing the end of migration

Second, we consider removing 2001, 2006, 2023 as they were three years where it looks like sampling ended earlier than the end of migration.

```
<- update(m1, data = filter(v, !year %in% c(2001, 2006, 2023)))
m1b <- update(m2, data = filter(v, !year %in% c(2001, 2006, 2023))) m2b
```

Comparing the summary tables shows only superficial differences.

```
fmt_summary(m1)
fmt_summary(m1b)
```

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 191.389 | 192.063 | 0.996 | 0.329 |

year | −0.083 | 0.096 | −0.869 | 0.394 |

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 203.845 | 206.539 | 0.987 | 0.335 |

year | −0.089 | 0.103 | −0.871 | 0.394 |

```
fmt_summary(m2)
fmt_summary(m2b)
```

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 55.731 | 63.706 | 0.875 | 0.391 |

year | −0.023 | 0.032 | −0.725 | 0.476 |

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 64.756 | 67.213 | 0.963 | 0.347 |

year | −0.028 | 0.033 | −0.823 | 0.420 |

#### Remove all problematic years

Out of curiosity, let’s try removing all 6 years. This reduces our data from 25 to 19 samples, so is a substantial change and we may see non-significant results merely because we have removed too many samples.

```
<- update(m1, data = filter(v, !year %in% c(1998, 2011, 2013,
m1c 2001, 2006, 2023)))
<- update(m2, data = filter(v, !year %in% c(1998, 2011, 2013,
m2c 2001, 2006, 2023)))
```

Comparing the summary tables shows only superficial differences.

```
fmt_summary(m1)
fmt_summary(m1c)
```

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 191.389 | 192.063 | 0.996 | 0.329 |

year | −0.083 | 0.096 | −0.869 | 0.394 |

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 239.710 | 240.457 | 0.997 | 0.333 |

year | −0.107 | 0.120 | −0.897 | 0.382 |

```
fmt_summary(m2)
fmt_summary(m2c)
```

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 55.731 | 63.706 | 0.875 | 0.391 |

year | −0.023 | 0.032 | −0.725 | 0.476 |

Parameter | Estimate | Std. Error | T | P |
---|---|---|---|---|

(Intercept) | 64.252 | 74.185 | 0.866 | 0.398 |

year | −0.027 | 0.037 | −0.741 | 0.469 |

#### Conclusions

**I do not believe that these potentially problematic sampling years are driving the patterns we see in this data**.

## Code

`map(models, summary)`

```
[[1]]
Call:
lm(formula = mig_dur_days ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-8.6561 -1.5690 0.5929 2.0119 6.0949
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 191.38914 192.06304 0.996 0.329
year -0.08299 0.09552 -0.869 0.394
Residual standard error: 3.637 on 23 degrees of freedom
Multiple R-squared: 0.03178, Adjusted R-squared: -0.01032
F-statistic: 0.7549 on 1 and 23 DF, p-value: 0.3939
[[2]]
Call:
lm(formula = peak_dur_days ~ year, data = v)
Residuals:
Min 1Q Median 3Q Max
-2.5577 -0.5117 0.1895 0.5572 2.3734
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.73077 63.70604 0.875 0.391
year -0.02298 0.03168 -0.725 0.476
Residual standard error: 1.206 on 23 degrees of freedom
Multiple R-squared: 0.02237, Adjusted R-squared: -0.02014
F-statistic: 0.5262 on 1 and 23 DF, p-value: 0.4755
```

## Number of birds in kettles

**What is the maximum kettle count? Has this changed?**

- Look for changes in the maximum predicted count and in the maximum observed count
- variables
`mig_pop_max`

,`mig_raw_max`

- In both cases, these values represent the maximum population counts for the migration period (25%-75%), but also correspond the the maximum population count overall.

#### Descriptive stats

## Code

```
|>
v select("mig_pop_max", "mig_raw_max") |>
desc_stats()
```

measure | mean | sd | min | median | max | n |
---|---|---|---|---|---|---|

mig_pop_max | 428.68 | 236.76 | 146 | 342 | 1304 | 25 |

mig_raw_max | 797.76 | 468.45 | 310 | 650 | 2375 | 25 |

- I also ran a third NB model (
`m2b`

), which omitted some of the really large raw kettle counts (see the Figure), because these really large counts resulted in some problems in the model residual figure (see Model Checks) - Removing the counts resulted in better model fit, and didn’t really change the interpretations (see the results under
`mig_raw_max`

where the sample size is only 23. - Therefore, I think I wouldn’t worry about the model checks and would keep model
`m2`

(the results saved to file only include the first two models).

```
<- MASS::glm.nb(mig_pop_max ~ year, data = v)
m1 <- MASS::glm.nb(mig_raw_max ~ year, data = v)
m2 <- MASS::glm.nb(mig_raw_max ~ year, data = filter(v, mig_raw_max < 1500))
m2b
<- list(m1, m2, m2b) models
```

**Not used, but included for completeness**

```
<- lm(log10(mig_pop_max) ~ year, data = v)
x1 <- lm(log10(mig_raw_max) ~ year, data = v)
x2
<- glm(mig_pop_max ~ year, family = "poisson", data = v)
y1 <- glm(mig_raw_max ~ year, family = "poisson", data = v)
y2
<- simulateResiduals(x1, plot = TRUE)
s title("log10 - mig_pop_max ~ year")
```

```
<- simulateResiduals(x2, plot = TRUE)
s title("log10 - mig_raw_max ~ year")
```

```
<- simulateResiduals(y1, plot = TRUE)
s title("Poisson - mig_pop_max ~ year")
```

```
<- simulateResiduals(y2, plot = TRUE)
s title("Poisson - mig_raw_max ~ year")
```

**Tabulated Results**

All the measures are significant. There was an increase in the maximum number of birds in kettles in both the raw counts as well as the predicted max.

## Code

```
<- get_table(models)
t write_csv(filter(t, n != 23), "Data/Datasets/table_pop_mig.csv")
fmt_table(t)
```

term | estimate | estimate exp | std error | statistic | p value | n |
---|---|---|---|---|---|---|

mig_pop_max ~ year |
||||||

(Intercept) | −66.877 | 0.000 | 20.084 | −3.330 | 0.001 | 25 |

year | 0.036 | 1.037 | 0.010 | 3.630 | 0.000 | 25 |

mig_raw_max ~ year |
||||||

(Intercept) | −75.414 | 0.000 | 20.048 | −3.762 | 0.000 | 25 |

year | 0.041 | 1.042 | 0.010 | 4.092 | 0.000 | 25 |

(Intercept) | −45.033 | 0.000 | 19.928 | −2.260 | 0.024 | 23 |

year | 0.026 | 1.026 | 0.010 | 2.587 | 0.010 | 23 |

## Code

```
<- v |>
pop_figs select(year, mig_pop_max, mig_raw_max) |>
pivot_longer(-year, names_to = "measure", values_to = "pop")
ggplot(pop_figs, aes(x = year, y = pop, group = measure, colour = measure)) +
theme_bw() +
geom_point() +
stat_smooth(method = MASS::glm.nb) +
scale_colour_viridis_d(end = 0.8)
```

``geom_smooth()` using formula = 'y ~ x'`

## Code

`model_check_figs(models)`

`qu = 0.25, log(sigma) = -3.825783 : outer Newton did not converge fully.`

The DHARMa model checks don’t highlight any particular outlier problems, but out of an abundance of caution, we’ll do a quick review of any high Cook’s D values.

Values highlighted in red are less than 4/25, a standard Cook’s D cutoff (4/sample size). These indicate potential influential data points for a particular model.

## Code

```
get_cooks(models[-3]) |>
gt_cooks()
```

Cook's Distances | ||
---|---|---|

year | mig_pop_max | mig_raw_max |

1998 | 0.07 | 0.01 |

1999 | 0.96 | 1.10 |

2000 | 0.12 | 0.04 |

2001 | 0.04 | 0.07 |

2002 | 0.01 | 0.00 |

2003 | 0.00 | 0.00 |

2004 | 0.14 | 0.03 |

2005 | 0.01 | 0.02 |

2006 | 0.00 | 0.02 |

2008 | 0.02 | 0.00 |

2009 | 0.00 | 0.00 |

2010 | 0.04 | 0.10 |

2011 | 0.00 | 0.00 |

2012 | 0.01 | 0.03 |

2013 | 0.01 | 0.01 |

2014 | 0.01 | 0.00 |

2015 | 0.02 | 0.00 |

2016 | 0.00 | 0.02 |

2017 | 0.00 | 0.04 |

2018 | 0.05 | 0.04 |

2019 | 0.04 | 0.02 |

2020 | 0.02 | 0.00 |

2021 | 0.00 | 0.00 |

2022 | 0.72 | 0.57 |

2023 | 0.00 | 0.09 |

#### Check influential points

Now let’s see what happens if we were to omit these years from the respective analyses.

For the maximum population recorded during migration based on both predicted counts *and* raw counts, if we omit 1999, the patterns are stronger, but the same.

## Code

`compare(m1, 1999)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - mig_pop_max | (Intercept) | -66.877 | 20.084 | -3.330 | 0.001 |

Drop 1999 - mig_pop_max | (Intercept) | -91.053 | 18.059 | -5.042 | 0.000 |

Original - mig_pop_max | year | 0.036 | 0.010 | 3.630 | 0.000 |

Drop 1999 - mig_pop_max | year | 0.048 | 0.009 | 5.373 | 0.000 |

## Code

`compare(m2, 1999)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - mig_raw_max | (Intercept) | -75.414 | 20.048 | -3.762 | 0 |

Drop 1999 - mig_raw_max | (Intercept) | -100.459 | 17.545 | -5.726 | 0 |

Original - mig_raw_max | year | 0.041 | 0.010 | 4.092 | 0 |

Drop 1999 - mig_raw_max | year | 0.053 | 0.009 | 6.101 | 0 |

If we omit 2022, the patterns are weaker, but still the same.

## Code

`compare(m1, 2022)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - mig_pop_max | (Intercept) | -66.877 | 20.084 | -3.330 | 0.001 |

Drop 2022 - mig_pop_max | (Intercept) | -46.746 | 18.695 | -2.500 | 0.012 |

Original - mig_pop_max | year | 0.036 | 0.010 | 3.630 | 0.000 |

Drop 2022 - mig_pop_max | year | 0.026 | 0.009 | 2.819 | 0.005 |

## Code

`compare(m2, 2022)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - mig_raw_max | (Intercept) | -75.414 | 20.048 | -3.762 | 0.000 |

Drop 2022 - mig_raw_max | (Intercept) | -58.542 | 19.243 | -3.042 | 0.002 |

Original - mig_raw_max | year | 0.041 | 0.010 | 4.092 | 0.000 |

Drop 2022 - mig_raw_max | year | 0.032 | 0.010 | 3.383 | 0.001 |

In addition to the sensitivity check using cook’s distances, here we will conduct a more targeted exploration of some specific years.

#### Gaps around the peak

First, we consider removing 1998, 2011, 2013 as they have sampling gaps around the peak of migration

```
<- update(m1, data = filter(v, !year %in% c(1998, 2011, 2013)))
m1a <- update(m2, data = filter(v, !year %in% c(1998, 2011, 2013))) m2a
```

Comparing the summary tables shows only superficial differences.

```
fmt_summary(m1)
fmt_summary(m1a)
```

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −66.877 | 20.084 | −3.330 | 0.001 |

year | 0.036 | 0.010 | 3.630 | 0.000 |

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −62.057 | 21.898 | −2.834 | 0.005 |

year | 0.034 | 0.011 | 3.111 | 0.002 |

```
fmt_summary(m2)
fmt_summary(m2a)
```

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −75.414 | 20.048 | −3.762 | 0.000 |

year | 0.041 | 0.010 | 4.092 | 0.000 |

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −74.112 | 22.185 | −3.341 | 0.001 |

year | 0.040 | 0.011 | 3.641 | 0.000 |

#### Missing the end of migration

Second, we consider removing 2001, 2006, 2023 as they were three years where it looks like sampling ended earlier than the end of migration.

```
<- update(m1, data = filter(v, !year %in% c(2001, 2006, 2023)))
m1b <- update(m2, data = filter(v, !year %in% c(2001, 2006, 2023))) m2b
```

Comparing the summary tables shows only superficial differences.

```
fmt_summary(m1)
fmt_summary(m1b)
```

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −66.877 | 20.084 | −3.330 | 0.001 |

year | 0.036 | 0.010 | 3.630 | 0.000 |

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −62.326 | 23.298 | −2.675 | 0.007 |

year | 0.034 | 0.012 | 2.934 | 0.003 |

```
fmt_summary(m2)
fmt_summary(m2b)
```

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −75.414 | 20.048 | −3.762 | 0.000 |

year | 0.041 | 0.010 | 4.092 | 0.000 |

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −61.411 | 22.105 | −2.778 | 0.005 |

year | 0.034 | 0.011 | 3.079 | 0.002 |

#### Remove all problematic years

Out of curiosity, let’s try removing all 6 years. This reduces our data from 25 to 19 samples, so is a substantial change and we may see non-significant results merely because we have removed too many samples.

```
<- update(m1, data = filter(v, !year %in% c(1998, 2011, 2013,
m1c 2001, 2006, 2023)))
<- update(m2, data = filter(v, !year %in% c(1998, 2011, 2013,
m2c 2001, 2006, 2023)))
```

Comparing the summary tables shows only superficial differences.

```
fmt_summary(m1)
fmt_summary(m1c)
```

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −66.877 | 20.084 | −3.330 | 0.001 |

year | 0.036 | 0.010 | 3.630 | 0.000 |

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −55.625 | 25.831 | −2.153 | 0.031 |

year | 0.031 | 0.013 | 2.389 | 0.017 |

```
fmt_summary(m2)
fmt_summary(m2c)
```

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −75.414 | 20.048 | −3.762 | 0.000 |

year | 0.041 | 0.010 | 4.092 | 0.000 |

Parameter | Estimate | Std. Error | Z | P |
---|---|---|---|---|

(Intercept) | −57.740 | 24.834 | −2.325 | 0.020 |

year | 0.032 | 0.012 | 2.595 | 0.009 |

#### Conclusions

**I do not believe that these potentially problematic sampling years are driving the patterns we see in this data**.

## Code

`map(models, summary)`

```
[[1]]
Call:
MASS::glm.nb(formula = mig_pop_max ~ year, data = v, init.theta = 7.041467497,
link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -66.876552 20.083997 -3.33 0.000869 ***
year 0.036255 0.009989 3.63 0.000284 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(7.0415) family taken to be 1)
Null deviance: 39.927 on 24 degrees of freedom
Residual deviance: 25.555 on 23 degrees of freedom
AIC: 327.07
Number of Fisher Scoring iterations: 1
Theta: 7.04
Std. Err.: 1.98
2 x log-likelihood: -321.07
[[2]]
Call:
MASS::glm.nb(formula = mig_raw_max ~ year, data = v, init.theta = 7.009118042,
link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -75.413580 20.048025 -3.762 0.000169 ***
year 0.040803 0.009971 4.092 4.27e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(7.0091) family taken to be 1)
Null deviance: 44.412 on 24 degrees of freedom
Residual deviance: 25.564 on 23 degrees of freedom
AIC: 357.4
Number of Fisher Scoring iterations: 1
Theta: 7.01
Std. Err.: 1.96
2 x log-likelihood: -351.398
[[3]]
Call:
MASS::glm.nb(formula = mig_raw_max ~ year, data = filter(v, mig_raw_max <
1500), init.theta = 9.016076348, link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -45.033115 19.928099 -2.260 0.02383 *
year 0.025653 0.009916 2.587 0.00968 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Negative Binomial(9.0161) family taken to be 1)
Null deviance: 30.328 on 22 degrees of freedom
Residual deviance: 23.409 on 21 degrees of freedom
AIC: 319.15
Number of Fisher Scoring iterations: 1
Theta: 9.02
Std. Err.: 2.65
2 x log-likelihood: -313.148
```

## Number of residents

**How many resident vultures are there? Has this changed?**

- Look for changes in the median number of residents
`res_pop_median`

,`res_raw_median`

#### Descriptive stats

## Code

```
|>
v select("res_pop_median", "res_raw_median") |>
desc_stats()
```

measure | mean | sd | min | median | max | n |
---|---|---|---|---|---|---|

res_pop_median | 5.56 | 2.71 | 2 | 5 | 12 | 25 |

res_raw_median | 4.84 | 2.17 | 1 | 4 | 8 | 25 |

```
<- glm(res_pop_median ~ year, family = "poisson", data = v)
m1 <- glm(res_raw_median ~ year, family = "poisson", data = v)
m2
<- list(m1, m2) models
```

**Not used, but included for completeness**

```
<- lm(log10(res_pop_median) ~ year, data = v)
x1 <- lm(log10(res_raw_median) ~ year, data = v)
x2
<- simulateResiduals(x1, plot = TRUE)
s title("log10 - res_pop_median ~ year")
```

```
<- simulateResiduals(x2, plot = TRUE)
s title("log10 - res_raw_median ~ year")
```

**Tabulated Results**

All measures are significant, there is an increase in the number of resident birds over the years.

## Code

```
<- get_table(models)
t write_csv(t, "Data/Datasets/table_pop_res.csv")
fmt_table(t)
```

term | estimate | estimate exp | std error | statistic | p value | n |
---|---|---|---|---|---|---|

res_pop_median ~ year |
||||||

(Intercept) | −77.157 | 0.000 | 23.209 | −3.324 | 0.001 | 25 |

year | 0.039 | 1.040 | 0.012 | 3.400 | 0.001 | 25 |

res_raw_median ~ year |
||||||

(Intercept) | −65.838 | 0.000 | 24.667 | −2.669 | 0.008 | 25 |

year | 0.034 | 1.034 | 0.012 | 2.734 | 0.006 | 25 |

## Code

```
<- v |>
res_figs select(year, res_pop_median, res_raw_median) |>
pivot_longer(-year, names_to = "measure", values_to = "pop")
ggplot(res_figs, aes(x = year, y = pop, group = measure, colour = measure)) +
theme_bw() +
geom_point() +
geom_smooth(method = "glm", method.args = list(family = "poisson")) +
scale_colour_viridis_d(end = 0.8)
```

``geom_smooth()` using formula = 'y ~ x'`

## Code

`model_check_figs(models)`

## Code

```
get_cooks(models) |>
gt_cooks()
```

Cook's Distances | ||
---|---|---|

year | res_pop_median | res_raw_median |

1998 | 0.00 | 0.03 |

1999 | 0.05 | 0.08 |

2000 | 0.04 | 0.03 |

2001 | 0.30 | 0.37 |

2002 | 0.00 | 0.00 |

2003 | 0.01 | 0.00 |

2004 | 0.00 | 0.00 |

2005 | 0.00 | 0.00 |

2006 | 0.02 | 0.03 |

2008 | 0.04 | 0.07 |

2009 | 0.02 | 0.01 |

2010 | 0.00 | 0.00 |

2011 | 0.01 | 0.00 |

2012 | 0.01 | 0.00 |

2013 | 0.00 | 0.00 |

2014 | 0.04 | 0.02 |

2015 | 0.01 | 0.01 |

2016 | 0.03 | 0.03 |

2017 | 0.02 | 0.00 |

2018 | 0.10 | 0.03 |

2019 | 0.00 | 0.03 |

2020 | 0.16 | 0.03 |

2021 | 0.01 | 0.02 |

2022 | 0.02 | 0.01 |

2023 | 0.04 | 0.02 |

#### Check influential points

Now let’s see what happens if we were to omit these years from the respective analyses.

For the daily median number of resident birds based on both predicted counts *and* raw counts, if we omit 2001, the patterns are stronger, but the same.

## Code

`compare(m1, 2001)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - res_pop_median | (Intercept) | -77.157 | 23.209 | -3.324 | 0.001 |

Drop 2001 - res_pop_median | (Intercept) | -93.177 | 24.827 | -3.753 | 0.000 |

Original - res_pop_median | year | 0.039 | 0.012 | 3.400 | 0.001 |

Drop 2001 - res_pop_median | year | 0.047 | 0.012 | 3.824 | 0.000 |

## Code

`compare(m2, 2001)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - res_raw_median | (Intercept) | -65.838 | 24.667 | -2.669 | 0.008 |

Drop 2001 - res_raw_median | (Intercept) | -84.681 | 26.515 | -3.194 | 0.001 |

Original - res_raw_median | year | 0.034 | 0.012 | 2.734 | 0.006 |

Drop 2001 - res_raw_median | year | 0.043 | 0.013 | 3.254 | 0.001 |

If we omit 2020, the patterns are weaker, but still the same.

## Code

`compare(m1, 2020)`

model | parameter | Estimate | Std. Error | z value | Pr(>|z|) |
---|---|---|---|---|---|

Original - res_pop_median | (Intercept) | -77.157 | 23.209 | -3.324 | 0.001 |

Drop 2020 - res_pop_median | (Intercept) | -67.678 | 24.143 | -2.803 | 0.005 |

Original - res_pop_median | year | 0.039 | 0.012 | 3.400 | 0.001 |

Drop 2020 - res_pop_median | year | 0.034 | 0.012 | 2.874 | 0.004 |

## Code

`map(models, summary)`

```
[[1]]
Call:
glm(formula = res_pop_median ~ year, family = "poisson", data = v)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -77.15702 23.20927 -3.324 0.000886 ***
year 0.03921 0.01153 3.400 0.000673 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 30.326 on 24 degrees of freedom
Residual deviance: 18.389 on 23 degrees of freedom
AIC: 109.34
Number of Fisher Scoring iterations: 4
[[2]]
Call:
glm(formula = res_raw_median ~ year, family = "poisson", data = v)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -65.83767 24.66741 -2.669 0.00761 **
year 0.03351 0.01226 2.734 0.00625 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 24.526 on 24 degrees of freedom
Residual deviance: 16.868 on 23 degrees of freedom
AIC: 104.42
Number of Fisher Scoring iterations: 4
```

## Patterns of migration

**Is the timing of migration skewed to earlier or later in the season?** **Is this distribution of migration flattened and wide (low kurtosis) or peaked (high kurtosis)** **Are these patterns consistent over the years?**

*Kurtosis can be used to indicates if all the birds pass through in a relatively quick ‘clump’ (thick-tailed, lower kurtosis), or whether the migration stretches out over a longer period of time (long-tailed, higher kurtosis). Possible implications for conservation or future survey designs?*

- Look for significant skew and kurtosis in migration (5-95%) and peak migration (25-75%)
`mig_skew`

,`peak_skew`

,`mig_kurt`

,`peak_kurt`

#### Descriptive stats

## Code

```
|>
v select(contains("skew"), contains("kurt")) |>
desc_stats()
```

measure | mean | sd | min | median | max | n |
---|---|---|---|---|---|---|

mig_skew | 0.01 | 0.21 | -0.4117554 | -0.0278960097 | 0.5067799 | 25 |

peak_skew | 0.01 | 0.09 | -0.1378556 | -0.0006279026 | 0.2880288 | 25 |

all_skew | −0.05 | 0.21 | -0.4237418 | -0.0403155341 | 0.3816406 | 25 |

mig_kurt | −0.67 | 0.16 | -0.9329652 | -0.7095091193 | -0.1280714 | 25 |

peak_kurt | −1.12 | 0.03 | -1.1883554 | -1.1297710654 | -1.0337500 | 25 |

all_kurt | 0.26 | 0.56 | -0.3467272 | 0.1957870557 | 2.4580910 | 25 |

Here we look at skew and *excess* kurtosis only against the intercept. This allows us to test for differences from 0.

- Skew of 0 is normal, but below or above are considered left- and right-skewed
- A normal distribution has an excess kurtosis of 0.

```
<- lm(mig_skew ~ 1, data = v)
m1 <- lm(peak_skew ~ 1, data = v)
m2 <- lm(all_skew ~ 1, data = v)
m3 <- lm(mig_kurt ~ 1, data = v)
m4 <- lm(peak_kurt ~ 1, data = v)
m5 <- lm(all_kurt ~ 1, data = v)
m6
<- list(m1, m2, m3, m4, m5, m6) models
```

**Tabulated Results**

Skew is not significantly different from zero in any period.

Kurtosis is significantly different from zero in all periods.

In the migration and peak migration periods, Kurtosis is negative, corresponding to **thick-tailed** distributions, squat distributions (closer to a uniform distribution).

However, in the ‘all’ period, kurtosis is actually significantly positive, corresponding to relatively more narrow, **thin-tailed** distributions.

## Code

```
<- get_table(models)
t write_csv(t, "Data/Datasets/table_skew.csv")
fmt_table(t)
```

term | estimate | std error | statistic | p value | n |
---|---|---|---|---|---|

mig_skew ~ 1 (R2 = 0; R2-adj = 0) |
|||||

(Intercept) | 0.010 | 0.042 | 0.236 | 0.815 | 25 |

peak_skew ~ 1 (R2 = 0; R2-adj = 0) |
|||||

(Intercept) | 0.007 | 0.018 | 0.377 | 0.709 | 25 |

all_skew ~ 1 (R2 = 0; R2-adj = 0) |
|||||

(Intercept) | −0.049 | 0.042 | −1.184 | 0.248 | 25 |

mig_kurt ~ 1 (R2 = 0; R2-adj = 0) |
|||||

(Intercept) | −0.671 | 0.032 | −20.956 | 0.000 | 25 |

peak_kurt ~ 1 (R2 = 0; R2-adj = 0) |
|||||

(Intercept) | −1.125 | 0.006 | −188.017 | 0.000 | 25 |

all_kurt ~ 1 (R2 = 0; R2-adj = 0) |
|||||

(Intercept) | 0.256 | 0.112 | 2.287 | 0.031 | 25 |

**The periods**

The “all” period is defined individually as being the dates centred on the date at the 50% percentile passage is reached and includes all dates from this date to the end of the year, and that same number of dates prior to the p50 date. See the Skewness & Kurtosis section in the last report.

By definition, the migration period (5-95%) and the peak migration period (25-75%) are truncated distributions, so I don’t think we can really interpret kurtosis for these subsets. With the “all” period, I have attempted to subset the data to get a good measure of the migration period only, but **honestly, kurtosis depends so much on where I define the cutoffs I’m not really sure that we can accurately measure it. Looking at the figures, I’d say they’re narrower than normal (so positive kurtosis, which does align with the stats for the “all” period), but that’s just a visual estimate.**

This is figure showing the statistics.

## Code

```
<- v |>
dist_figs select(year, contains("skew"), contains("kurt")) |>
pivot_longer(cols = -year, names_to = "measure", values_to = "statistic")
ggplot(filter(dist_figs, year != 2001), aes(x = statistic, y = measure, fill = measure)) +
theme_bw() +
geom_vline(xintercept = 0, colour = "grey20", linetype = "dotted") +
geom_boxplot() +
scale_fill_viridis_d()
```

This figure shows the distributions in each year over top a normal distribution, for illustration.

I *think* this shows positive kurtosis. The black lines are simulated normal distributions simulated using the mean, SD and counts from the data.

## Code

```
<- v |>
normal select(year, max_doy, mig_pop_total) |>
left_join(summarize(filter(pred, doy > 240), sd = sd(rep(doy, round(count))), .by = "year"),
by = "year") |>
mutate(doy = pmap(list(max_doy, sd, mig_pop_total),
as.integer(rnorm(mean = x, sd = y, n = z)))) |>
\(x, y, z) unnest(doy) |>
count(year, doy, name = "count")
ggplot(raw, aes(x = doy, y = count)) +
theme_bw() +
geom_smooth(data = normal, method = "gam", colour = "black", linewidth = 1.5) +
geom_point(size = 1, aes(colour = year), na.rm = TRUE) +
geom_line(data = pred, linewidth = 1, aes(group = year, colour = year), na.rm = TRUE) +
scale_colour_viridis_c() +
facet_wrap(~year, scales = "free")
```

``geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'`

Here is a scaled view (counts are divided by the yearly standard deviation) of the patterns of migration using our GAM models from the “Calculate Metrics” step.

## Code

```
<- raw |>
raw_scaled mutate(across(c(count), \(x) scale(x, center = FALSE)),
.by = "year")
<- pred |>
pred_scaled mutate(across(c(count, ci99_upper, ci99_lower), \(x) scale(x, center = FALSE)),
.by = "year")
ggplot(raw_scaled, aes(x = doy, y = count)) +
theme_bw() +
geom_point(aes(colour = year), na.rm = TRUE) +
geom_line(data = pred_scaled, aes(colour = year, group = year),
size = 1, na.rm = TRUE) +
scale_colour_viridis_c()
```

```
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
```

## Code

`model_check_figs(models)`

## Code

```
get_cooks(models) |>
gt_cooks()
```

Cook's Distances | ||||||
---|---|---|---|---|---|---|

year | mig_skew ~ 1 | peak_skew ~ 1 | all_skew ~ 1 | mig_kurt ~ 1 | peak_kurt ~ 1 | all_kurt ~ 1 |

1998 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 |

1999 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 |

2000 | 0.02 | 0.03 | 0.04 | 0.00 | 0.01 | 0.00 |

2001 | 0.18 | 0.04 | 0.14 | 0.14 | 0.00 | 0.00 |

2002 | 0.14 | 0.42 | 0.04 | 0.00 | 0.40 | 0.01 |

2003 | 0.01 | 0.04 | 0.02 | 0.03 | 0.03 | 0.01 |

2004 | 0.14 | 0.07 | 0.18 | 0.09 | 0.08 | 0.00 |

2005 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 |

2006 | 0.06 | 0.04 | 0.08 | 0.00 | 0.02 | 0.04 |

2008 | 0.06 | 0.03 | 0.09 | 0.00 | 0.00 | 0.00 |

2009 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.67 |

2010 | 0.01 | 0.04 | 0.01 | 0.01 | 0.01 | 0.01 |

2011 | 0.07 | 0.11 | 0.03 | 0.02 | 0.00 | 0.05 |

2012 | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 |

2013 | 0.00 | 0.00 | 0.00 | 0.03 | 0.00 | 0.04 |

2014 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.02 |

2015 | 0.25 | 0.08 | 0.18 | 0.50 | 0.14 | 0.00 |

2016 | 0.02 | 0.02 | 0.04 | 0.00 | 0.00 | 0.00 |

2017 | 0.03 | 0.04 | 0.05 | 0.00 | 0.00 | 0.03 |

2018 | 0.00 | 0.01 | 0.00 | 0.12 | 0.20 | 0.03 |

2019 | 0.00 | 0.00 | 0.04 | 0.03 | 0.01 | 0.01 |

2020 | 0.01 | 0.02 | 0.00 | 0.00 | 0.01 | 0.03 |

2021 | 0.02 | 0.01 | 0.02 | 0.02 | 0.07 | 0.00 |

2022 | 0.01 | 0.04 | 0.02 | 0.02 | 0.04 | 0.00 |

2023 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.02 |

#### Check influential points

Now let’s see what happens if we were to omit these years from the respective analyses.

For skew of the counts during the migration period, if we omit 2001, the patterns don’t really change (i.e., stay non-significant).

## Code

`compare(m1, 2002)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - mig_skew ~ 1 | (Intercept) | 0.010 | 0.042 | 0.236 | 0.815 |

Drop 2002 - mig_skew ~ 1 | (Intercept) | -0.006 | 0.040 | -0.141 | 0.889 |

For skew of the counts during the peak migration period, if we omit 2002, the patterns don’t really change (i.e., stay non-significant).

## Code

`compare(m2, 2002)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - peak_skew ~ 1 | (Intercept) | 0.007 | 0.018 | 0.377 | 0.709 |

Drop 2002 - peak_skew ~ 1 | (Intercept) | -0.005 | 0.014 | -0.337 | 0.739 |

For kurtosis during the peak migration period, if we omit 2002, the patterns don’t really change.

## Code

`compare(m5, 2002)`

model | parameter | Estimate | Std. Error | t value | Pr(>|t|) |
---|---|---|---|---|---|

Original - peak_kurt ~ 1 | (Intercept) | -1.125 | 0.006 | -188.017 | 0 |

Drop 2002 - peak_kurt ~ 1 | (Intercept) | -1.128 | 0.005 | -233.856 | 0 |

For skew of the counts during the “all” period, if we omit 2004, the patterns are stronger, and nearly show a trend, but are still non-significant.

## Code

`compare(m3, 2004)`