Program expansion:
explanations and documentation

The program expansion estimates the expansion speed of a population based on a spatio-temporal dataset of occurrences.

 

 

Introduction   opp

The program expansion can be run as an online application at the URL https://view.nina.no/expansion/. To change the application into English, please choose the language in the upper right corner (since the default is Norwegian).

A spatio-temporal dataset of observations of a species over at least ten years is required. The application is run by uploading the datafile, parameterising the model and waiting for the output. The screen is divided into six panels:

 

Alternatively, the program expansion may be run as an R script. The script may be loaded from the URL http://www.evol.no/hanno/21/expans.rtx, for instance using the R command load(url("http://www.evol.no/hanno/21/expans.rtx")). After it is loaded, you can run the script by writing expansion(...), where "..." represents the desired parameters. The parameters are unchanged from version 2.6 (and are explained elsewhere).

 

Definitions and documentation   opp

Expansion is here understood as the number of new occurrences per time (where "occurrences" are colonised 2 km × 2 km grid cells). Thus, expansion encompasses any spread or movement of the species concerned (regardless of means, causes and pathways, i.e. including active and passive, natural and anthropogenic, intentional and unintentional movements).

Mathematically, expansion speed is described as the annual increase in the radius of the area of occupancy (AOO) of the species (where the radius is calculated as if the AOO was a coherent circle containing all occurrences and only occurrences). The model underlying the program has been described in detail by Sandvik (2020).

In risk assessments following to the GEIAA protocal (Generic Ecological Impact Assessment of Alien Species), estimates of expansion speed are needed in order to obtain a score for criterion B on the invasion axis.

 

Formatting of data files   opp

For the datafile to be read correctly, it should be a comma-delimited plain-text file with a header. More specifically, please format the datafile according to the following conventions:

  • The data have to be organised column-wise, i.e. the file has to consist of one column per variable (year and for instance latitude and longitude) and one row per observation.
  • The first row has to contain the variable names.
  • One column needs to contain years and have the name "t". Names of the remaining variables depend on the coordinate system used and are explained below.
  • At least ten years of observations are required.
  • Occurrences can either be reported in the first year they are observed (given that they can be assumed to remain in place) or in each year they are assumed to exist. (In the latter case, the default settings need to be changed.)
  • All rows have to have the same number of separators.
  • Semicola (;) or commata (,) are accepted as separators between columns (i.e., between the elements of a row) – but please don't mix them. Such files can be produced by all spreadsheet applications. (Choose "save as comma delimited file" or something similar. Usual filename extensions of such formats are ".CSV" or ".SDV".)
  • The symbol used as separator must not occur in other contexts. Nor may apostrophes (') be used anywhere in the data file. Please make sure to remove or replace these symbols.
  • Periods (.) are accepted as decimal marks. Only if semicola (;) are used as separators, commata (,) may be used as decimal marks, too.
  • Spaces between (outside) elements, and quotation marks enclosing elements (on both sides), are tolerated. (Thus, ;12; is equivalent to ;12 ; and to ;"12";.)
  • Missing values are tolerated if specified by omission (;;) or spaces (; ;).
  • The order of observations does not matter.
  • The order of columns does not matter.
  • Additional columns are ignored. (Nonetheless, it might be an advantage to delete superfluous columns, because columns containing commata or apostrophes may interrupt the conversion.)

 

Coordinate systems

Positions of observed occurrences may be specified in one of five different formats, using one of three different coordinate systems:

where the variable names have the following meaning and formatting:

lat Latitude (degrees) – specified as an integer or real number between –90 (= 90°S) and +90 (= 90°N).
lam Latitude (arcminutes) – specified as an integer or real number between 0 and 60.
las Latitude (arcseconds) – specified as an integer or real number between 0 and 60.
lon Longitude (degrees) – specified as an integer or real number between –180 (= 180°W) and +180 (= 180°E).
lom Longitude (arcminutes) – specified as an integer or real number between 0 and 60.
los Longitude (arcseconds) – specified as an integer or real number between 0 and 60.
mgrs MGRS coordinates – specified as a character string.
zone UTM zone – specified as an integer between 1 and 60.
band MGRS latitude band – specified as a single character between "C" and "X".
id MGRS square identifier – specified as two characters between "AA" and "ZV".
east Easting – specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.
north Northing – specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.

 

Example

The coordinates of Tromsø (69°39'5.0"N 18°57'19.0"E) can thus be specified in the following ways:

  • (1)   {lat=69.65139; lon=18.95528}
  • (2)   {lat=69; lam=39; las=5; lon=18; lom=57; los=19}
  • (3)   {mgrs="34WDC2058828390"}
  • (4)   {zone=34; band="W"; id="DC"; east="20588"; north="28390"}
  • (5)   {zone=34; east=420588; north=7728390}

 

NB

  • Please note that the MGRS system and the UTM system are often confused (the former is based on the latter). However, both require different formatting. While Tromsø's UTM coordinates are 34 420588 7728390, Tromsø's MGRS coordinates are 34WDC2058828390.
  • If the data do not follow the standards for UTM or MGRS (as appropriate), the program may misinterpret them.
  • The northing of UTM coordinates uses signs in order to distinguish between the Northern (positive sign) and the Southern Hemisphere (negative sign). Positive signs may be omitted.
  • Leading zeros may create trouble for northings and eastings in the MGRS system. To make sure that leading zeros do not "disappear", please save east and north as character strings rather than numbers.
  • The variable names have to follow the conventions detailed above.
  • The precision of the positions does not matter (well – it may matter for the results, of course, but not for the interpretation of the coordinates).
  • Different observations in one dataset may use different coordinate systems.
  • If more than one coordinate system is used, UTM coordinates are ignored wherever MGRS coordinates are supplied; and MGRS coordinates are ignored wherever latitude and longitude are supplied.

 

Parameters   opp

Before the program is run, the parameters need to be checked and, if necessary, changed. The following parameters are available:

 

Dark figure

The dark figure is an extremely important parameter, since it has a direct effect on the estimate of expansion speed. A dark figure is defined as the factor by which the known AOO has to be multiplied in order to obtain the estimated total AOO (i.e., total = known × dark figure).

A qualified judgement of the dark figure will result in more realistic estimates. The dark figure is provided as one, or a range of, numerical value(s), which is assumed to apply to the last year of the dataset.

If an interval is provided for the dark figure, the application searches for the optimal dark figure between these limits. If one value is provided, the application uses the specified value only.

If one wishes the dark figure to be estimated rather than specified, the interval 1–101 should be chosen. It has to be remarked that this will produce rather uncertain estimates, however.

 

Model

Four different models, or sets of assumptions, are implemented so far:

  1. Both expansion speed and detectability are constant through time (this is the default);
  2. Detectability changes once, but is constant before and after the break point, which is also inferred from the data;
  3. Both of the former options are estimated, and the better one is chosen (using AICc-based model selection);
  4. Detectability is proportional to a measure of the yearly sampling effort, which has to be provided.

The latter alternative requires a dataset to be uploaded which can be interpreted as a time series containing annual values of sampling effort, starting in the year 1800. The file format needs to be plain text, where values are delimited by either semicola, commata, spaces or line shifts. Such a file can only be uploaded if the fourth option is chosen.

If the dataset is too short (in terms of years), these options will not be available. In this case, estimations are based on model 1.

 

Other settings

If needed, some additional settings can be adjusted. To do so, please tick the "change default settings" box. The following settings are available for adjustment (although not all of them are available in all cases):

  • According to the default, each occurrence is expected to be reported only once in the datafile, viz. in the year of its first observation. The default applies to situations where occurrences can be assumed to persist after they have been colonised. If the latter assumption is erroneous or unrealistic, the alternative should be chosen. In that case, localities have to be reported in the datafile once for each year of their existence. In years where a locality is not reported, it is thus assumed to be absent. This alternative allows to model species which have short-lived subpopulations or which are subject to control/eradication measures. (Please note that the program will only produce descriptive statistics if occurrences are very short-lived.)
  • The confidence level γ may be provided as a numerical between 0 and 0.999. The default is γ = 0.5, which estimates quartiles. γ = 0.95 would give 95% confidence intervals, etc.
  • The choice "fast or exhaustive estimation" determines whether the program uses heuristics to accelerate estimations. It is only relevant if dark figures are provided as an interval or if a break point in detectability is estimated (model 2/3). If the break point seems to miss the mark entirely, you may try an exhaustive estimation (rather than a fast one, which is the default).
  • ΔAICc may be specified if the option "test both" is chosen for detectability. ΔAICc is here understood as the AICc of the model with one detection rate minus the AICc for the model with two detection rates. The default is to prefer the simpler model if it has a lower AICc than the other one (i.e., ΔAICc = 0).

 

Summary of the input   opp

After a succesful conversion, the uploaded dataset is summarised in terms of the number of observations, the number of occurrences, the number of years and the geographical extremes of the occurrences. It is recommended to check that these values are as expected. If the conversion produced any errors or warnings, these are displayed in the same panel.

After the observations in the dataset have been converted to occurrences, it is possible to download the extended data file. It is recommended to download this file, especially if the conversion took its time. If you upload the extended data file at later occasions, conversion will not be necessary.

 

Map   opp

The map shows the placement of all observations in the dataset (as points) as well as the species's extent of occurrence (as a polygon). The map is drawn using a cylindrical projection. This means that all meridians are vertical and all parallels are horizontal, making it easy to check coordinates of observations. On the other hand, the map is thus neither equivalent, equidistant nor conformal. The centre region of the map is approximately equivalent (equal-area), though. When the edges of the EOO seem curved, this is not a bug but due to the fact that a straight line appears curved in a cylindrical projection.

 

Assumptions   opp

The fourth panel summarises the statistical assumptions on which the model is based. These assumptions will never be strictly met by any real dataset, but minor deviations are usually acceptable. Large and systematic deviations, on the other hand, may lead to biased estimates.

 

Graph   opp

The graph illustrates the change in the number of known occurrences (black points). The graph has year on its x-axis, the (idealised) radius of AOO on the left y-axis (linear scale) and the AOO itself on the right y-axis (square-root scale). The model that has been fitted to the known occurrences is shown as a solid blue line. The estimated expansion, which includes unknown occurrences, is shown as a broken red line. Confidence intervals are shown as dotted lines. The break point (i.e. the year in which detectability changes, if estimated) is shown as a vertical dotted pink line.

 

Output   opp

The output consists of estimates (median plus lower and upper confidence limits) for:

  • expansion speed in metres per year (m/a),
  • known area of occupancy (AOO) in km²,
  • estimated AOO in km² (known AOO times dark figure),
  • dark figure,
  • extent of occurrence (EOO) in km² (not corrected for coastlines or borders),
  • first year of the expansion,
  • detectability rate(s),
  • variance explained (R²),
  • Akaike's Information Criterion (AICc).

 

About the program   opp

The program expansion has been written by Hanno Sandvik at the Norwegian Institute for Nature Research (NINA). In case of questions or comments, please get in touch.

The present version number of the program is 3.2 (as of January 2022).

 

Licence

Expansion is licensed under Attribution-ShareAlike 4.0 International.

 

Citation

The program may be cited as:

It would be nice if you cite the documentation, too:

 

Acknowledgements

Without the detailed feedback by Hanne Hegre, the program would never have reached its current functionality.

 

List of versions:

  • Version 3.2 (January 2022)
    • more robust estimates of expansion speed if occurrences are very short-lived
    • faster estimation when dark figures are provided as intervals
    • improved graphics and debugged map
  • Version 3.1 (December 2021)
    • implementation of the demo version
    • display of maps also outside of northern Europe
    • correction for curved edges of the EOO due to cylindrical projection
  • Version 3.0 (November 2021)
  • Version 2.6 (September 2017)
    • possibility to specify dark figures as intervals even when p=2 or p=3
    • modification in the optimalisation avoiding too early first years of expansion
    • graphical illustration of confidence intervals
  • Version 2.5 (August 2017)
    • estimation of dark figures that constrains their estimates to the interval chosen
    • calculation of parameter estimates using AICc-based model averaging
    • implementation of p=3 (AICc-based model selection between p=1 and p=2)
  • Version 2.4 (April 2017)
    • implementation of new.obs=FALSE
    • possibility to switch off the fast estimation under p=2
    • removal of a bug in the calculation of EOO
  • Version 2.3 (March 2017)
    • possbility to specify sampling effort as a covariate of observability rate
    • improved estimation of confidence intervals when dark figures are specified
    • implementation of a faster estimation under p=2
  • Version 2.2 (February 2017)
    • calculation and output of EOO
    • possibility to return the dataset underlying the expansion graph
    • calibration of the convergence tolerance levels
  • Version 2.1 (January 2017)
    • estimation of confidence intervals
    • more intuitive defaults
    • removal of some bugs that created unnecessary error messages
  • Versjon 2.0 (desember 2016)
  • Versjon 1.4 (July 2012)
  • Versjon 1.3 (January 2012)
  • Versjon 1.2 (November 2011)
  • Versjon 1.1 (September 2011)
  • Versjon 1.0 (August 2011)