auk 0.6.0 is designed for EBD files downloaded after 2022-10-25.
EBD data directory: /home/steffi/Projects/Business/Matt/lb_curlew_distribution/Data/Raw
eBird taxonomy version: 2022
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)library(sf)
Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
There are two levels of best practices we’ll be applying here.
First we’ll Filter the data to include only those checklists recommended by the eBird data best practices(https://cornelllabofornithology.github.io/ebird-best-practices/)
Keep only standard protocols of “traveling” and “stationary counts
Keep only < 5 km
Keep only < 5 hrs (300min)
Next we’ll Summarize the data to address the Spatial and Temporal Biases:
Spatial bias: “…most participants in citizen science surveys sample near their homes (Luck et al. 2004), in easily accessible areas such as roadsides (Kadmon, Farber, and Danin 2004), or in areas and habitats of known high biodiversity (Prendergast et al. 1993). A simple method to reduce the spatial bias is to create an equal area grid over the region of interest, and sample a given number of checklists from within each grid cell.”
Temporal bias: “…participants preferentially sample when they are available, such as weekends (Courter et al. 2013), and at times of year when they expect to observe more birds, notably during spring migration (Sullivan et al. 2014). To address the weekend bias, we recommend using a temporal scale of a week or multiple weeks for most analyses.”
So we use 10x10km or 20x20km grids and summarize over years
Data files
NOTE: These are the names of MY files
if you re-download the data you will have a different data version (this one is May-2023)
I restricted the dates of the files from May 2010 to Aug 2022
Make sure to update the files names to match the names of your files
Match all files starting with txt (if you followed the instructions in Scripts/01_setup this should show you your files)
Now we’ll load in the filtered data and do some more filting to clean it up a bit more.
Convert all NA distances to 0 and filter again (distance filters didn’t apply to sampling - bug?). This can also take a little time as the sampling (checklist) data is pretty large
Zero fill the data - Ensures that ‘complete’ checklists included as zero counts
Final checks to make sure filters worked as expected (assert() functions)
Keep only the columns which are useful to us
Note There is a BCR column which we are not using as we will assign BCR membership to the grid cells based on overlap with the BCR shapefiles. This means it’s possible that the odd edge checklist may or may not be assigned to a grid cell with the same BCR as the observation itself, but we won’t worry about that.
ebird_sf <-read_rds("Data/Intermediate/lobcur_complete.rds") |>st_as_sf(coords =c("longitude", "latitude"), crs =4326) |># Lat/lon are GPS (4326)st_transform(st_crs(grid_10)) # Now transform to match the grid data
Then join with grid data and summarize by grid
Get counts of checklists for each year for each grid (only concerned with yearly counts)
number of checklists total (total_checklists, based on the sum of checklists associated with each grid cell. If there were none, this grid cell would have an NA in year)
number of checklists with a bird detected (total_obs, based on species_observed which is a logical TRUE/FALSE column)