Program expansion:
explanations and documentation

The program expansion estimates the expansion speed of a population based on a spatio-temporal dataset of occurrences.

Introduction
Definitions and documentation
Formatting of data files (coordinate systems / example / NB)
Parameters (dark figure / model / others)
Summary of the input
Map
Assumptions
Graph
Output
About the program (licence / citation / thanks / versions)

Introduction

The program expansion can be run as an online application at the URL https://view.nina.no/expansion/. To change the application into English, please choose the language in the upper right corner (since the default is Norwegian).

A spatio-temporal dataset of observations of a species over at least ten years is required. The application is run by uploading the datafile, parameterising the model and waiting for the output. The screen is divided into six panels:

(1) datafile upload and model parameterisation;
(2) summary of the data read and potential error messages;
(3) map over the occurrences;
(4) summary of the assumptions and limitations of the model chosen;
(5) graphical presentation of the model fitted;
(6) table with estimates.

Alternatively, the program expansion may be run as an R script. The script may be loaded from the URL http://www.evol.no/hanno/21/expans.rtx, for instance using the R command load(url("http://www.evol.no/hanno/21/expans.rtx")). After it is loaded, you can run the script by writing expansion(...), where "..." represents the desired parameters. The parameters are unchanged from version 2.6 (and are explained elsewhere).

Definitions and documentation

Expansion is here understood as the number of new occurrences per time (where "occurrences" are colonised 2 km × 2 km grid cells). Thus, expansion encompasses any spread or movement of the species concerned (regardless of means, causes and pathways, i.e. including active and passive, natural and anthropogenic, intentional and unintentional movements).

Mathematically, expansion speed is described as the annual increase in the radius of the area of occupancy (AOO) of the species (where the radius is calculated as if the AOO was a coherent circle containing all occurrences and only occurrences). The model underlying the program has been described in detail by Sandvik (2020).

In risk assessments following to the GEIAA protocal (Generic Ecological Impact Assessment of Alien Species), estimates of expansion speed are needed in order to obtain a score for criterion B on the invasion axis.

Formatting of data files

For the datafile to be read correctly, it should be a comma-delimited plain-text file with a header. More specifically, please format the datafile according to the following conventions:

The data have to be organised column-wise, i.e. the file has to consist of one column per variable (year and for instance latitude and longitude) and one row per observation.
The first row has to contain the variable names.
One column needs to contain years and have the name "t". Names of the remaining variables depend on the coordinate system used and are explained below.
At least ten years of observations are required.
Occurrences can either be reported in the first year they are observed (given that they can be assumed to remain in place) or in each year they are assumed to exist. (In the latter case, the default settings need to be changed.)
All rows have to have the same number of separators.
Semicola (;) or commata (,) are accepted as separators between columns (i.e., between the elements of a row) – but please don't mix them. Such files can be produced by all spreadsheet applications. (Choose "save as comma delimited file" or something similar. Usual filename extensions of such formats are ".CSV" or ".SDV".)
The symbol used as separator must not occur in other contexts. Nor may apostrophes (') be used anywhere in the data file. Please make sure to remove or replace these symbols.
Periods (.) are accepted as decimal marks. Only if semicola (;) are used as separators, commata (,) may be used as decimal marks, too.
Spaces between (outside) elements, and quotation marks enclosing elements (on both sides), are tolerated. (Thus, ;12; is equivalent to ;12 ; and to ;"12";.)
Missing values are tolerated if specified by omission (;;) or spaces (; ;).
The order of observations does not matter.
The order of columns does not matter.
Additional columns are ignored. (Nonetheless, it might be an advantage to delete superfluous columns, because columns containing commata or apostrophes may interrupt the conversion.)

Coordinate systems

Positions of observed occurrences may be specified in one of five different formats, using one of three different coordinate systems:

Latitute and longitude

(1) lat and lon or
(2) lat and lam and las and lon and lom and los

MGRS coordinates (Military Grid Reference System)

(3) mgrs or
(4) zone and band and id and east and north

UTM coordinates (Universal Transverse Mercator)

(5) zone and east and north

where the variable names have the following meaning and formatting:

`lat`	Latitude (degrees) – specified as an integer or real number between –90 (= 90°S) and +90 (= 90°N).
`lam`	Latitude (arcminutes) – specified as an integer or real number between 0 and 60.
`las`	Latitude (arcseconds) – specified as an integer or real number between 0 and 60.
`lon`	Longitude (degrees) – specified as an integer or real number between –180 (= 180°W) and +180 (= 180°E).
`lom`	Longitude (arcminutes) – specified as an integer or real number between 0 and 60.
`los`	Longitude (arcseconds) – specified as an integer or real number between 0 and 60.
`mgrs`	MGRS coordinates – specified as a character string.
`zone`	UTM zone – specified as an integer between 1 and 60.
`band`	MGRS latitude band – specified as a single character between "C" and "X".
`id`	MGRS square identifier – specified as two characters between "AA" and "ZV".
`east`	Easting – specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.
`north`	Northing – specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.

Example

The coordinates of Tromsø (69°39'5.0"N 18°57'19.0"E) can thus be specified in the following ways:

(1) {lat=69.65139; lon=18.95528}
(2) {lat=69; lam=39; las=5; lon=18; lom=57; los=19}
(3) {mgrs="34WDC2058828390"}
(4) {zone=34; band="W"; id="DC"; east="20588"; north="28390"}
(5) {zone=34; east=420588; north=7728390}

Please note that the MGRS system and the UTM system are often confused (the former is based on the latter). However, both require different formatting. While Tromsø's UTM coordinates are 34 420588 7728390, Tromsø's MGRS coordinates are 34WDC2058828390.
If the data do not follow the standards for UTM or MGRS (as appropriate), the program may misinterpret them.
The northing of UTM coordinates uses signs in order to distinguish between the Northern (positive sign) and the Southern Hemisphere (negative sign). Positive signs may be omitted.
Leading zeros may create trouble for northings and eastings in the MGRS system. To make sure that leading zeros do not "disappear", please save east and north as character strings rather than numbers.
The variable names have to follow the conventions detailed above.
The precision of the positions does not matter (well – it may matter for the results, of course, but not for the interpretation of the coordinates).
Different observations in one dataset may use different coordinate systems.
If more than one coordinate system is used, UTM coordinates are ignored wherever MGRS coordinates are supplied; and MGRS coordinates are ignored wherever latitude and longitude are supplied.

Parameters

Before the program is run, the parameters need to be checked and, if necessary, changed. The following parameters are available:

Dark figure

The dark figure is an extremely important parameter, since it has a direct effect on the estimate of expansion speed. A dark figure is defined as the factor by which the known AOO has to be multiplied in order to obtain the estimated total AOO (i.e., total = known × dark figure).

A qualified judgement of the dark figure will result in more realistic estimates. The dark figure is provided as one, or a range of, numerical value(s), which is assumed to apply to the last year of the dataset.

If an interval is provided for the dark figure, the application searches for the optimal dark figure between these limits. If one value is provided, the application uses the specified value only.

If one wishes the dark figure to be estimated rather than specified, the interval 1–101 should be chosen. It has to be remarked that this will produce rather uncertain estimates, however.

Model

Four different models, or sets of assumptions, are implemented so far:

Both expansion speed and detectability are constant through time (this is the default);
Detectability changes once, but is constant before and after the break point, which is also inferred from the data;
Both of the former options are estimated, and the better one is chosen (using AICc-based model selection);
Detectability is proportional to a measure of the yearly sampling effort, which has to be provided.

The latter alternative requires a dataset to be uploaded which can be interpreted as a time series containing annual values of sampling effort, starting in the year 1800. The file format needs to be plain text, where values are delimited by either semicola, commata, spaces or line shifts. Such a file can only be uploaded if the fourth option is chosen.

If the dataset is too short (in terms of years), these options will not be available. In this case, estimations are based on model 1.

Other settings

If needed, some additional settings can be adjusted. To do so, please tick the "change default settings" box. The following settings are available for adjustment (although not all of them are available in all cases):

According to the default, each occurrence is expected to be reported only once in the datafile, viz. in the year of its first observation. The default applies to situations where occurrences can be assumed to persist after they have been colonised. If the latter assumption is erroneous or unrealistic, the alternative should be chosen. In that case, localities have to be reported in the datafile once for each year of their existence. In years where a locality is not reported, it is thus assumed to be absent. This alternative allows to model species which have short-lived subpopulations or which are subject to control/eradication measures. (Please note that the program will only produce descriptive statistics if occurrences are very short-lived.)
The confidence level γ may be provided as a numerical between 0 and 0.999. The default is γ = 0.5, which estimates quartiles. γ = 0.95 would give 95% confidence intervals, etc.
The choice "fast or exhaustive estimation" determines whether the program uses heuristics to accelerate estimations. It is only relevant if dark figures are provided as an interval or if a break point in detectability is estimated (model 2/3). If the break point seems to miss the mark entirely, you may try an exhaustive estimation (rather than a fast one, which is the default).
ΔAICc may be specified if the option "test both" is chosen for detectability. ΔAICc is here understood as the AICc of the model with one detection rate minus the AICc for the model with two detection rates. The default is to prefer the simpler model if it has a lower AICc than the other one (i.e., ΔAICc = 0).

Summary of the input

After a succesful conversion, the uploaded dataset is summarised in terms of the number of observations, the number of occurrences, the number of years and the geographical extremes of the occurrences. It is recommended to check that these values are as expected. If the conversion produced any errors or warnings, these are displayed in the same panel.

After the observations in the dataset have been converted to occurrences, it is possible to download the extended data file. It is recommended to download this file, especially if the conversion took its time. If you upload the extended data file at later occasions, conversion will not be necessary.

Map

The map shows the placement of all observations in the dataset (as points) as well as the species's extent of occurrence (as a polygon). The map is drawn using a cylindrical projection. This means that all meridians are vertical and all parallels are horizontal, making it easy to check coordinates of observations. On the other hand, the map is thus neither equivalent, equidistant nor conformal. The centre region of the map is approximately equivalent (equal-area), though. When the edges of the EOO seem curved, this is not a bug but due to the fact that a straight line appears curved in a cylindrical projection.

Assumptions

The fourth panel summarises the statistical assumptions on which the model is based. These assumptions will never be strictly met by any real dataset, but minor deviations are usually acceptable. Large and systematic deviations, on the other hand, may lead to biased estimates.

Graph

The graph illustrates the change in the number of known occurrences (black points). The graph has year on its x-axis, the (idealised) radius of AOO on the left y-axis (linear scale) and the AOO itself on the right y-axis (square-root scale). The model that has been fitted to the known occurrences is shown as a solid blue line. The estimated expansion, which includes unknown occurrences, is shown as a broken red line. Confidence intervals are shown as dotted lines. The break point (i.e. the year in which detectability changes, if estimated) is shown as a vertical dotted pink line.

Output

The output consists of estimates (median plus lower and upper confidence limits) for:

expansion speed in metres per year (m/a),
known area of occupancy (AOO) in km²,
estimated AOO in km² (known AOO times dark figure),
dark figure,
extent of occurrence (EOO) in km² (not corrected for coastlines or borders),
first year of the expansion,
detectability rate(s),
variance explained (R²),
Akaike's Information Criterion (AICc).

About the program

The program expansion has been written by Hanno Sandvik at the Norwegian Institute for Nature Research (NINA). In case of questions or comments, please get in touch.

The present version number of the program is 3.2 (as of January 2022).

Licence

Expansion is licensed under Attribution-ShareAlike 4.0 International.

Citation

The program may be cited as:

Sandvik, H. (2022) Expansion, version 3.2. https://view.nina.no/expansion/

It would be nice if you cite the documentation, too:

Sandvik, H. (2020) Expansion speed as a generic measure of spread for alien species. Acta biotheoretica, 68, 227–252. https://doi.org/10.1007/s10441-019-09366-8

Acknowledgements

Without the detailed feedback by Hanne Hegre, the program would never have reached its current functionality.

List of versions:

Version 3.2 (January 2022)

more robust estimates of expansion speed if occurrences are very short-lived
faster estimation when dark figures are provided as intervals
improved graphics and debugged map

Version 3.1 (December 2021)

implementation of the demo version
display of maps also outside of northern Europe
correction for curved edges of the EOO due to cylindrical projection

Version 3.0 (November 2021)

Starting with version 3.0, the program is a web application. It is used to assess criterion B of GEIAA and constitutes a part of the impact assessments underlying the Alien Species List 2023 of Norway.

Version 2.6 (September 2017)

possibility to specify dark figures as intervals even when p=2 or p=3
modification in the optimalisation avoiding too early first years of expansion
graphical illustration of confidence intervals

Version 2.5 (August 2017)

estimation of dark figures that constrains their estimates to the interval chosen
calculation of parameter estimates using AICc-based model averaging
implementation of p=3 (AICc-based model selection between p=1 and p=2)

Version 2.4 (April 2017)

implementation of new.obs=FALSE
possibility to switch off the fast estimation under p=2
removal of a bug in the calculation of EOO

Version 2.3 (March 2017)

possbility to specify sampling effort as a covariate of observability rate
improved estimation of confidence intervals when dark figures are specified
implementation of a faster estimation under p=2

Version 2.2 (February 2017)

calculation and output of EOO
possibility to return the dataset underlying the expansion graph
calibration of the convergence tolerance levels

Version 2.1 (January 2017)

estimation of confidence intervals
more intuitive defaults
removal of some bugs that created unnecessary error messages

Versjon 2.0 (desember 2016)

Starting with version 2.0, the current definition of expansion speed was implemented, which was used to assess criterion B of GEIAA and constituted a part of the impact assessments underlying the Alien Species List 2018 of Norway.

Versjon 1.4 (July 2012)
Versjon 1.3 (January 2012)
Versjon 1.2 (November 2011)
Versjon 1.1 (September 2011)
Versjon 1.0 (August 2011)

Versions 1.x of the program were based on Sæther et al. (2010, pp. 59–61) and were used for Alien species in Norway – with the Norwegian Black List 2012.

Program expansion:explanations and documentation

Program expansion:
explanations and documentation