Package 'AQEval' reference manual

Title:	Air Quality Evaluation
Description:	Developed for use by those tasked with the routine detection, characterisation and quantification of discrete changes in air quality time-series, such as identifying the impacts of air quality policy interventions. The main functions use signal isolation then break-point/segment (BP/S) methods based on 'strucchange' and 'segmented' methods to detect and quantify change events (Ropkins & Tate, 2021, <doi:10.1016/j.scitotenv.2020.142374>).
Authors:	Karl Ropkins [aut, cre] , Anthony Walker [aut] , James Tate [aut]
Maintainer:	Karl Ropkins <[email protected]>
License:	GPL (>= 3)
Version:	0.6.1
Built:	2025-02-07 16:36:31 UTC
Source:	https://github.com/karlropkins/aqeval

Air Quality Evaluation

Description

R AQEval: R code for the analysis of discrete change in Air Quality time-series.

AQEval

AQEval was developed for use by those tasked with the routine detection, characterisation and quantification of discrete changes in air quality time-series.

The main functions, quantBreakPoints and quantBreakSegments, use break-point/segment (BP/S) methods based on the consecutive use of methods in the strucchange and segmented R packages to first detection (as break-points) and then characterise and quantify (as segments), discrete changes in air-quality time-series.

AQEval functions adopt an openair-friendly approach using function and data structures that many in the air quality research community are already familiar with. Most notably, most functions expect supplied data to be time-series, to be supplied as a single data.frame (or similar R object), and for time-series to be identified by column names. The main functions are typically structured expect first the data.frame, then the name of the pollutant to be used, then other arguments:

function(data, "polluant.name", ...)

output <- function(data, "polluant.name", ...)

Author(s)

Karl Ropkins

References

Ropkins, K. and Tate, J.E., 2021. Early observations on the impact of the COVID-19 lockdown on air quality trends across the UK. Science of the Total Environment, 754, p.142374. https://doi.org/10.1016/j.scitotenv.2020.142374

Ropkins, K., Tate, J.E., Walker, A. and Clark, T., 2022. Measuring the impact of air quality related interventions. Environmental Science: Atmospheres, 2(3), pp.500-516. https://doi.org/10.1039/d1ea00073j

Ropkins, K., Walker, A., Philips, I., Rushton, C., Clark, T. and Tate, J., Change Detection of Air Quality Time-Series Using the R Package AEQval. Available at SSRN 4267722. https://ssrn.com/abstract=4267722 or http://dx.doi.org/10.2139/ssrn.4267722 Also at: https://karlropkins.github.io/AQEval/articles/AQEval_Intro_Preprint.pdf

AQEval Example data

Description

Data packaged with AQEval for use with example code.

Usage

aq.data
aq.data

Format

(26280x6) 'tbl_df' objects

date: Time-series of POSIX class date and time records.
no2: Time-series of nitrogen dioxide measurements from local site.
bg.no2: Time-series of nitrogen dioxide measurements from nearby background site.
ws: Time-series of local wind speed measurements.
wd: Time-series of local wind direction measurements.
air_temp: Time-series of local air temperature measurements.

Details

Most of functions in AQEval adopt the openair convention of assuming supplied data is a single data.frame or similar. The data frame was initially adopted for two reasons:

Firstly, air quality data collected and archived in numerous formats and keeping the import requirements simple minimises the frustrations associated with data importation.
Secondly, restricting the user to work with a single data format greatly simplifies data management for those less familiar with programming environments.

As part of this work several openair coding conventions were adopted, most importantly that data sets should include a column named date of POSIX class data-and-time-stamps (DateTimeClasses). This and other conventions, such as the use of ws and wd for numeric wind speed and direction data-series, and site and code for character or factor monitoring site name and identifier code, are now commonplace for many working with R in the air quality research community, and many air quality archives provide data in (or support import functions that convert their own data structures to) this openair-friendly structure.

Source

Air quality and meteorological data packaged for use with AQEval Examples.

Time-series sources:

date Date-and-time-stamp of POSIX class (DateTimeClasses).
no2 Nitrogen dioxide downloaded from King's College London Archive using importKCL function in openair.
bg.no2 Nitrogen dioxide downloaded from the Automatic Urban and Rural Network Archive using importAURN function in openair.
ws, wd, air_temp Wind speed, wind direction and air temperature downloaded from NOAA's Integrated Surface Database using importNOAA function in worldmet.

References

Regarding openair and openair-friendly data structuring, see:

Carslaw, D. C. and K. Ropkins (2012), openair — an R package for air quality data analysis. Environmental Modelling & Software. Volume 27-28, 52-61, DOI doi:10.1016/j.envsoft.2011.09.008

Ropkins, K. and D.C. Carslaw (2012), openair-Data Analysis Tools for the Air Quality Community. R Journal, 4(1). URL https://journal.r-project.org/archive/2012/RJ-2012-003/RJ-2012-003.pdf

Regarding worldmet, see:

David Carslaw (2021), worldmet: Import Surface Meteorological Data from NOAA Integrated Surface Database (ISD). R package version 0.9.5. URL https://CRAN.R-project.org/package=worldmet

Examples

#data set used in AQEval Examples
dim(aq.data)
head(aq.data)
with(aq.data, plot(date, no2, type="l"))

#data set used in AQEval Examples
dim(aq.data)
head(aq.data)
with(aq.data, plot(date, no2, type="l"))

Some functions to calculate statistics

Description

Calculate data set statistics for selected time intervals.

Usage

calcDateRangeStat(
  data,
  from = NULL,
  to = NULL,
  stat = NULL,
  pollutant = NULL,
  ...,
  method = 2
)

calcRollingDateRangeStat(
  data,
  range = "year",
  res = "day",
  stat = NULL,
  pollutant = NULL,
  from = NULL,
  to = NULL,
  ...,
  method = 2
)
calcDateRangeStat(
  data,
  from = NULL,
  to = NULL,
  stat = NULL,
  pollutant = NULL,
  ...,
  method = 2
)

calcRollingDateRangeStat(
  data,
  range = "year",
  res = "day",
  stat = NULL,
  pollutant = NULL,
  from = NULL,
  to = NULL,
  ...,
  method = 2
)

Arguments

`data`	(data.frame, tibble, etc) Data set containing data statistic to be calculated for, and `date` column of date/time records.
`from`	(various) Start date(s) to subsample from when calculating statistic, by default end of supplied `data` date range.
`to`	(various) End date(s) to subsample to when calculating statistic, by default end of supplied `data` date range.
`stat`	(function) Statistic to be applied to selected data, by default `mean(pollutant, na.rm=TRUE)`. NB: This should be a function that works on vectors in the form `function(x)`.
`pollutant`	(character) The name(s) of data-series to analyse in `data`, by default all columns in supplied data except `date`.
`...`	extra arguments.
`method`	(numeric) Method to use when calculating statistic. Currently 1 (using base R), 2 (using dplyr), 3 (using data.table), and 4 (using dplyr and purrr)
`range`	(character) For `calcRollingDateRange`, the range the rolling date windows, by default `'year'` for annual statistic calculations.
`res`	(character) For `calcRollingDateRange`, the resolution to calculate the rolling statistic at, by default `'day'` to calculate this once per day.

Value

These functions return data.frames of function outputs.

Note

These functions are in development and likely to change significantly in future versions, please handle with care.

find and test break-points

Description

Finding and testing break-points in conventionally formatted air quality data sets.

Usage

findBreakPoints(data, pollutant, h = 0.15, ...)

testBreakPoints(data, pollutant, breaks, ...)
findBreakPoints(data, pollutant, h = 0.15, ...)

testBreakPoints(data, pollutant, breaks, ...)

Arguments

`data`	Data source, typically a `data.frame` or similar, containing data-series to apply function to and a paired time-stamped data-series, called `date`.
`pollutant`	Name of time-series, assumed to be a column in `date`.
`h`	(`findBreakPoints` only) The data/time window size to use when looking for breaks in a supplied time-series, expressed as proportion of time-series (0-1), default 0.15.
`...`	other parameters
`breaks`	(`testBreakPoints` only) `data.frame` of The break-points and confidence intervals, typically a `findBreakPoints` output.

Details

findBreakPoints uses methods from strucchange package (see references) and modifications as suggested by the main author of strucchange to handle missing cases to find potential breaks-points in a supplied time-series.

testBreakPoints tests and identifies most likely break-points using methods proposed for use with quantBreakPoints and quantBreakSegments and conventionally formatted air quality data sets.

Value

findBreakPoints returns a data.frame of found break-points.

testBreakPoints return a likely break-point/segment report.

References

Regarding strucchange methods see breakpoints, and:

Achim Zeileis, Friedrich Leisch, Kurt Hornik and Christian Kleiber (2002). strucchange: An R Package for Testing for Structural Change in Linear Regression Models. Journal of Statistical Software, 7(2), 1-38. URL https://www.jstatsoft.org/v07/i02/.

Achim Zeileis, Christian Kleiber, Walter Kraemer and Kurt Hornik (2003). Testing and Dating of Structural Changes in Practice. Computational Statistics & Data Analysis, 44, 109-123.

Regarding missing data handling, see:

URL: https://stackoverflow.com/questions/43243548/strucchange-not-reporting-breakdates.

Regarding testBreakPoints, see:

find nearby sites

Description

Function to find nearest locations in a reference by latitude and longitude.

Usage

findNearLatLon(lat, lon = NULL, nmax = 10, ..., ref = NULL, units = "m")

findNearSites(
  lat,
  lon,
  pollutant = "no2",
  site.type = "rural background",
  nmax = 10,
  ...,
  ref = NULL,
  units = "m"
)
findNearLatLon(lat, lon = NULL, nmax = 10, ..., ref = NULL, units = "m")

findNearSites(
  lat,
  lon,
  pollutant = "no2",
  site.type = "rural background",
  nmax = 10,
  ...,
  ref = NULL,
  units = "m"
)

Arguments

`lat`, `lon`	(numeric) The supplied latitude and longitude.
`nmax`	(numeric) The maximum number of nearest sites to report, by default 10.
`...`	Other parameters, mostly ignored.
`ref`	(`data.frame` or similar) The look-up table to use when identifying nearby locations, and expected to contain latitude, longitude and any required location identifier data-series. By default, `findNearSites` uses openair `importMeta` output if this is not supplied but this is a required input for `findNearLatLon`.
`units`	(character) The units to use when reporting distances to near locations; current options m.
`pollutant`	(character) For `findNearSites` only, the pollutant of interest, by default NO2.
`site.type`	(character) For `findNearSites` only, the monitoring site type, by default Rural Background.

Details

If investigating air quality in a particular location, for example a UK Clean Air Zone (https://www.gov.uk/guidance/driving-in-a-clean-air-zone), you may wish to locate an appropriate rural background air quality monitoring station. findNearSites locates air quality monitoring sites with openly available data such as that available from the UK AURN network (https://uk-air.defra.gov.uk/networks/network-info?view=aurn)

Value

find.near returns data.frame of near site meta data.

Note

This function uses haversine formula to account to the Earth's surface curvature, and uses 6371 km as the radius of earth.

Examples

#find rural background NO2 monitoring sites
#near latitude = 50, longitude = -1

#not run: requires internet
## Not run: 
findNearSites(lat = 50, lon = -1)

## End(Not run)
#find rural background NO2 monitoring sites
#near latitude = 50, longitude = -1

#not run: requires internet
## Not run: 
findNearSites(lat = 50, lon = -1)

## End(Not run)

isolateContribution

Description

Environmental time-series signal processing: Contribution isolation based on background subtraction, deseasonalisation and/or deweathering.

Usage

isolateContribution(
  data,
  pollutant,
  background = NULL,
  deseason = TRUE,
  deweather = TRUE,
  method = 2,
  add.term = NULL,
  formula = NULL,
  use.bam = FALSE,
  output = "mean",
  ...
)
isolateContribution(
  data,
  pollutant,
  background = NULL,
  deseason = TRUE,
  deweather = TRUE,
  method = 2,
  add.term = NULL,
  formula = NULL,
  use.bam = FALSE,
  output = "mean",
  ...
)

Arguments

`data`	Data source, typically `data.frame` (or similar), containing all time-series to be used when applying signal processing.
`pollutant`	The column name of the `data` time-series to be signal processed.
`background`	(optional) if supplied, the background time-series to use as a background correction. See below.
`deseason`	logical or character vector, if `TRUE` (default), the `pollutant` is deseasonalised using `day.hour` and `year.day` frequency terms, all calculate from the `data` time stamp, assumed to be `date` in `data`. Other options: `FALSE` to turn off deseasonalisation; or a character vector of frequency terms if user-defining. See below.
`deweather`	logical or character vector, if `TRUE` (default), the data is deweathered using wind speed and direction, assumed to be `ws` and `wd` in `data`). Other options: `FALSE` to turn off deweathering; or a character vector of `data` column names if user-defining. See below.
`method`	numeric, contribution isolation method (default 2). See Note.
`add.term`	extra terms to add to the contribution isolation model; ignore for now (in development).
`formula`	(optional) Signal isolate model formula; this allows user to set the signal isolation model formula directly, but means function arguments `background`, `deseason` and `deweather` will be ignored.
`use.bam`	(logical) If TRUE, the `bam` is used instead of standard `gam` to build the model.
`output`	output options; currently, `'mean'`, `'model'`, and `'all'`; but please note these are in development and may be subject to change.
`...`	other arguments; ignore for now (in development)

Details

isolateContribution estimates and subtracts pollutant variance associated with factors that may hinder break-point/segment analysis:

Background Correction If applied, this fits the supplied background time-series as a spline term: s(background).
Seasonality If applied, this fits regular frequency terms, e.g. day.hour, year.day, as spline terms, default TRUE is equivalent to s(day.hour) and s(year.day). All terms are calculated from date column in data.
Weather If applied, this fits time-series of identified meteorological measurements, e.g. wind speed and direction (ws and wd in data). If both ws and wd are present these are fitted as a tensor term te(ws, wd). Other deweathering terms, if included, are fitted as spline term s(term). The default TRUE is equivalent to te(ws, wd).

Using the supplied arguments, it builds a signal (mgcv) GAM model, calculates, and returns the mean-centred residuals as an estimate of the isolated local contribution.

Value

isolateContribution returns a vector of predictions of the pollutant time-series after the requested signal isolation.

Note

method was included as part of method development and testing work, and retained for now. Please ignore for now.

Author(s)

Karl Ropkins

References

Regarding mgcv GAM fitting methods, see Wood (2017) for general introduction and package documentation regarding coding (mgcv):

Wood, S.N. (2017) Generalized Additive Models: an introduction with R (2nd edition), Chapman and Hall/CRC.

Regarding isolateContribution, see:

Examples

#fitting a simple deseasonalisation, deweathering
#and background correction (dswb) model to no2:

aq.data$dswb.no2 <- isolateContribution(aq.data,
                        "no2", background="bg.no2")

#compare at 14 day resolution:
temp <- openair::timeAverage(aq.data, "14 day")

#without dswb
quantBreakPoints(temp, "no2", test=FALSE, h=0.1)

#with dswb
quantBreakPoints(temp, "dswb.no2", test=FALSE, h=0.1)
#fitting a simple deseasonalisation, deweathering
#and background correction (dswb) model to no2:

aq.data$dswb.no2 <- isolateContribution(aq.data,
                        "no2", background="bg.no2")

#compare at 14 day resolution:
temp <- openair::timeAverage(aq.data, "14 day")

#without dswb
quantBreakPoints(temp, "no2", test=FALSE, h=0.1)

#with dswb
quantBreakPoints(temp, "dswb.no2", test=FALSE, h=0.1)

Other Air Quality Models

Description

Other packaged Air Quality Models.

Usage

fitNearSiteModel(data, pollutant = "no2", y, x = "rest", elements = NULL, ...)
fitNearSiteModel(data, pollutant = "no2", y, x = "rest", elements = NULL, ...)

Arguments

`data`	`data.frame` (or similar) containing data-series to be modelled; this is expected to contain 'date', 'site' and pollutant of interest data-series.
`pollutant`	The name of the `pollutant` (in `data`) to model, by default 'NO2'.
`y`	The name of the monitor site to be modelled, assumed to be one several names in the `site` column of `data`.
`x`	The other sites to use when building the model, the default 'rest' uses all supplied sites except 'y'.
`elements`	The number of inputs to use in the site models, can be any number up to length of x or combination thereof; by default this is set as `length(x):1`
`...`	extra arguments.

Details

fitNearSiteModel builds an air quality model for one location using air quality data from nearby sites.

Value

data with model output added as additional column.

quantify break-point/segments

Description

Quantify either break-points or break-segment methods for pollutant time-series

Usage

quantBreakPoints(
  data,
  pollutant,
  breaks,
  ylab = NULL,
  xlab = NULL,
  pt.col = c("lightgrey", "darkgrey"),
  line.col = "red",
  break.col = "blue",
  event = NULL,
  show = c("plot", "report"),
  ...
)

quantBreakSegments(
  data,
  pollutant,
  breaks,
  ylab = NULL,
  xlab = NULL,
  pt.col = c("lightgrey", "darkgrey"),
  line.col = "red",
  break.col = "blue",
  event = NULL,
  seg.method = 2,
  seg.seed = 12345,
  show = c("plot", "report"),
  ...
)
quantBreakPoints(
  data,
  pollutant,
  breaks,
  ylab = NULL,
  xlab = NULL,
  pt.col = c("lightgrey", "darkgrey"),
  line.col = "red",
  break.col = "blue",
  event = NULL,
  show = c("plot", "report"),
  ...
)

quantBreakSegments(
  data,
  pollutant,
  breaks,
  ylab = NULL,
  xlab = NULL,
  pt.col = c("lightgrey", "darkgrey"),
  line.col = "red",
  break.col = "blue",
  event = NULL,
  seg.method = 2,
  seg.seed = 12345,
  show = c("plot", "report"),
  ...
)

Arguments

`data`	Data source, typically a data.frame or similar, containing data-series to model and a paired time-stamp data-series, named date.
`pollutant`	The name of the data-series to break-point or break-segment model.
`breaks`	(Optional) The break-points and confidence intervals to use when building either break-point or break-segment models. If not supplied these are build using `findBreakPoints` and supplied arguments.
`ylab`	Y-label term, by default pollutant.
`xlab`	X-label term, by default date.
`pt.col`	Point fill and line colours for plot, defaults lightgrey and darkgrey.
`line.col`	Line colour for plot, default red.
`break.col`	Break-point/segment colour for plot, default blue.
`event`	An optional list of plot terms for an event marker, applied to a vertical line and text label. List items include: `x` the event date (YYYY-MM-DD format) require for both line and label; `y` by default 0.9 x y-plot range; `label` the label text, required for label; `line.size` the line width, by default 0.5; `font.size` the text size, by default 5; and, `hjust` the label left/right justification, 0 left, 0.5 centre, 1 right (default). See also examples below.
`show`	What to show before returning the break-point quantification mode, by default plot and report.
`...`	other parameters
`seg.method`	(`quantBreakSegments` only) the break-segment fitting method to use.
`seg.seed`	(`quantBreakSegments` only) the seed setting to use when fitting break-segments, default `12345`.

Details

quantBreakPoints and quantBreakSegments both use strucchange methods to identify potential break-points in time-series, and then quantify these as conventional break-points or break-segments, respectively:

Finding Break-points Using the strucchange methods of Zeileis and colleagues and independent change detection model, the functions apply a rolling-window approach, assuming the first window (or data subset) is without change, building a statistical model of that, advancing the window, building a second model and comparing these, and so on, to identify the most likely points of change in a larger data-series. See also findBreakPoints
Quantifying Break-points Using the supplied break-points to build a break-point model.
Quantifying Break-segments Using the confidence regions for the supplied break-points as the starting points to build a break-segment model.

Value

Both functions use the show argument to control which elements of the functions outputs are shown but also invisible return a list of all outputs which can caught using, e.g.:

brk.mod <- quantBreakPoints(data, pollutant)

Note

AQEval function quantBreakSegments is currently running segmented v.1.3-4 while we evaluate latest version, v.1.4-0.

Author(s)

Karl Ropkins

References

Regarding strucchange methods see in-package documentation, e.g. breakpoints, and:

Achim Zeileis, Christian Kleiber, Walter Kraemer and Kurt Hornik (2003). Testing and Dating of Structural Changes in Practice. Computational Statistics & Data Analysis, 44, 109-123. DOI doi:10.1016/S0167-9473(03)00030-6.

Regarding segmented methods see in-package documentation, e.g. segmented, and:

Vito M. R. Muggeo (2003). Estimating regression models with unknown break-points. Statistics in Medicine, 22, 3055-3071. DOI 10.1002/sim.1545.

Vito M. R. Muggeo (2008). segmented: an R Package to Fit Regression Models with Broken-Line Relationships. R News, 8/1, 20-25. URL https://cran.r-project.org/doc/Rnews/.

Vito M. R. Muggeo (2016). Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. J of Statistical Computation and Simulation, 86, 3059-3067. DOI 10.1080/00949655.2016.1149855.

Vito M. R. Muggeo (2017). Interval estimation for the breakpoint in segmented regression: a smoothed score-based approach. Australian & New Zealand Journal of Statistics, 59, 311-322. DOI 10.1111/anzs.12200.

Regarding break-points/segment methods, see:

Examples

#using openair timeAverage to covert 1-hour data to 1-day averages

temp <- openair::timeAverage(aq.data, "1 day")

#break-points

quantBreakPoints(temp, "no2", h=0.3)

#break-segments

quantBreakSegments(temp, "no2", h=0.3)

#addition examples (not run)
## Not run: 
#in-call plot modification
#removing x axis label
#recolouring break line and
#adding an event marker
quantBreakPoints(temp, "no2", h=0.3,
       xlab="", break.col = "red",
       event=list(label="Event expected here",
                 x="2002-08-01", col="grey"))

## End(Not run)
#using openair timeAverage to covert 1-hour data to 1-day averages

temp <- openair::timeAverage(aq.data, "1 day")

#break-points

quantBreakPoints(temp, "no2", h=0.3)

#break-segments

quantBreakSegments(temp, "no2", h=0.3)

#addition examples (not run)
## Not run: 
#in-call plot modification
#removing x axis label
#recolouring break line and
#adding an event marker
quantBreakPoints(temp, "no2", h=0.3,
       xlab="", break.col = "red",
       event=list(label="Event expected here",
                 x="2002-08-01", col="grey"))

## End(Not run)

Spectral Analysis

Description

Time-series spectral frequency analysis.

Usage

spectralFrequency(data, pollutant, ...)
spectralFrequency(data, pollutant, ...)

Arguments

`data`	`data.frame` holding data to be analysed, expected to contain a timestamp data-series called `date` and a measurement time-series to be analysed identified using the `pollutant` argument.
`pollutant`	The name of the time-series, typically pollutant measurements, to be analysed.
`...`	extra arguments.

Details

spectralFrequency producing a time frequency analysis of the requested pollutant.

Value

spectralFrequency uses the show argument to control which elements of the functions outputs are shown but also invisibly returns a list of all outputs which can caught using, e.g.:

sfa.mod <- spectralFrequency(data, pollutant)

Examples

spectralFrequency(aq.data, "no2")
spectralFrequency(aq.data, "no2")

Package 'AQEval'

Help Index

Air Quality Evaluation

Description

AQEval

Author(s)

References

See Also

AQEval Example data

Description

Usage

Format

Details

Source

References

See Also

Examples

Some functions to calculate statistics

Description

Usage

Arguments

Value

Note

find and test break-points

Description

Usage

Arguments

Details

Value

References

See Also

find nearby sites

Description

Usage

Arguments

Details

Value

Note

Examples

isolateContribution

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Other Air Quality Models

Description

Usage

Arguments

Details

Value

quantify break-point/segments

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Spectral Analysis

Description

Usage

Arguments

Details

Value

Examples