Program expansion: explanations and documentation
The program expansion estimates the expansion speed of a population
based on a spatio-temporal dataset of occurrences.
The program expansion can be run as an online application at the URL
https://view.nina.no/expansion/.
To change the application into English, please choose the language in the upper right corner
(since the default is Norwegian).
A spatio-temporal dataset of observations of a species over at least ten years is required.
The application is run by uploading the datafile, parameterising the model and waiting for the output.
The screen is divided into six panels:
Alternatively, the program expansion may be run as an
R script. The script may be loaded from the URL
http://www.evol.no/hanno/21/expans.rtx,
for instance using the R command
load(url("http://www.evol.no/hanno/21/expans.rtx")) .
After it is loaded, you can run the script by writing expansion(...) ,
where "..." represents the desired parameters. The parameters are unchanged from version 2.6
(and are explained elsewhere).
Expansion is here understood as the number of new occurrences
per time (where "occurrences" are colonised 2 km × 2 km
grid cells). Thus, expansion encompasses any spread or movement of the
species concerned (regardless of means, causes and pathways, i.e. including
active and passive, natural and anthropogenic, intentional and unintentional
movements).
Mathematically, expansion speed is described as the
annual increase in the radius of the area of occupancy (AOO) of the species
(where the radius is calculated as if the AOO was a coherent circle containing
all occurrences and only occurrences). The model underlying the program has
been described in detail by Sandvik (2020).
In risk assessments following to the GEIAA protocal
(Generic Ecological Impact Assessment of Alien Species),
estimates of expansion speed are needed in order to obtain a score for
criterion B on the invasion axis.
For the datafile to be read correctly, it should be a comma-delimited plain-text file
with a header. More specifically, please format the datafile according to the following
conventions:
- The data have to be organised column-wise, i.e. the file has to consist of
one column per variable (year and for instance latitude and longitude) and
one row per observation.
- The first row has to contain the variable names.
- One column needs to contain years and have the name "
t ". Names of the
remaining variables depend on the coordinate system used and are explained
below.
- At least ten years of observations are required.
- Occurrences can either be reported in the first year they are observed (given that they
can be assumed to remain in place) or in each year they are assumed to exist. (In the latter
case, the default settings need to be changed.)
- All rows have to have the same number of separators.
- Semicola (
; ) or commata (, ) are accepted as separators
between columns (i.e., between the elements of a row) – but please don't mix them.
Such files can be produced by all spreadsheet applications. (Choose "save as comma
delimited file" or something similar. Usual filename extensions of such formats are
".CSV" or ".SDV".)
- The symbol used as separator must not occur in other contexts.
Nor may apostrophes (
' ) be used anywhere in the data file.
Please make sure to remove or replace these symbols.
- Periods (
. ) are accepted as decimal marks. Only if semicola
(; ) are used as separators, commata (, ) may be used
as decimal marks, too.
- Spaces between (outside) elements, and quotation marks
enclosing elements (on both sides), are tolerated. (Thus,
;12; is
equivalent to ;12 ; and to ;"12"; .)
- Missing values are tolerated if specified by omission (
;; ) or spaces
(; ; ).
- The order of observations does not matter.
- The order of columns does not matter.
- Additional columns are ignored. (Nonetheless, it might be an advantage
to delete superfluous columns, because columns containing commata or apostrophes
may interrupt the conversion.)
Coordinate systems
Positions of observed occurrences may be specified in one of five different
formats, using one of three different coordinate systems:
where the variable names have the following meaning and formatting:
lat |
Latitude (degrees) – specified as an integer or real number between –90
(= 90°S) and +90 (= 90°N). |
lam |
Latitude (arcminutes) – specified as an integer or real number between 0 and 60. |
las |
Latitude (arcseconds) – specified as an integer or real number between 0 and 60. |
lon |
Longitude (degrees) – specified as an integer or real number between –180
(= 180°W) and +180 (= 180°E). |
lom |
Longitude (arcminutes) – specified as an integer or real number between 0 and 60. |
los |
Longitude (arcseconds) – specified as an integer or real number between 0 and 60. |
mgrs |
MGRS coordinates – specified as a character string. |
zone |
UTM zone – specified as an integer between 1 and 60. |
band |
MGRS latitude band – specified as a single character between "C" and "X". |
id |
MGRS square identifier – specified as two characters between "AA" and "ZV". |
east |
Easting – specified as a number, although it can be formatted as a character string.
The meaning and range of allowed values differ between UTM and MGRS. |
north |
Northing – specified as a number, although it can be formatted as a character string.
The meaning and range of allowed values differ between UTM and MGRS. |
Example
The coordinates of Tromsø (69°39'5.0"N 18°57'19.0"E) can thus be specified
in the following ways:
- (1) {lat=69.65139; lon=18.95528}
- (2) {lat=69; lam=39; las=5; lon=18; lom=57; los=19}
- (3) {mgrs="34WDC2058828390"}
- (4) {zone=34; band="W"; id="DC"; east="20588"; north="28390"}
- (5) {zone=34; east=420588; north=7728390}
NB
- Please note that the MGRS system and the UTM system are often confused
(the former is based on the latter). However, both require different formatting.
While Tromsø's UTM coordinates are 34 420588 7728390,
Tromsø's MGRS coordinates are 34WDC2058828390.
- If the data do not follow the standards for UTM or MGRS (as appropriate),
the program may misinterpret them.
- The northing of UTM coordinates uses signs in order to distinguish between
the Northern (positive sign) and the Southern Hemisphere (negative sign).
Positive signs may be omitted.
- Leading zeros may create trouble for northings and eastings in the MGRS system.
To make sure that leading zeros do not "disappear", please save
east
and north as character strings rather than numbers.
- The variable names have to follow the conventions detailed above.
- The precision of the positions does not matter (well – it may matter for
the results, of course, but not for the interpretation of the coordinates).
- Different observations in one dataset may use different coordinate systems.
- If more than one coordinate system is used, UTM coordinates are ignored
wherever MGRS coordinates are supplied; and MGRS coordinates are ignored
wherever latitude and longitude are supplied.
Before the program is run, the parameters need to be checked and, if necessary,
changed. The following parameters are available:
Dark figure
The dark figure is an extremely important parameter, since it has a direct
effect on the estimate of expansion speed. A dark figure is defined as
the factor by which the known AOO has to be multiplied in order
to obtain the estimated total AOO
(i.e., total = known × dark figure).
A qualified judgement of the dark figure will result in more realistic estimates.
The dark figure is provided as one, or a range of, numerical value(s), which is
assumed to apply to the last year of the dataset.
If an interval is provided for the dark figure, the application searches
for the optimal dark figure between these limits. If one value is provided,
the application uses the specified value only.
If one wishes the dark figure to be estimated rather than
specified, the interval 1–101 should be chosen. It has to be remarked
that this will produce rather uncertain estimates, however.
Model
Four different models, or sets of assumptions, are implemented so far:
- Both expansion speed and detectability are constant through time
(this is the default);
- Detectability changes once, but is constant before and after the break point,
which is also inferred from the data;
- Both of the former options are estimated, and the better one is chosen (using
AICc-based model selection);
- Detectability is proportional to a measure of the yearly sampling effort,
which has to be provided.
The latter alternative requires a dataset to be uploaded which can be interpreted
as a time series containing annual values of sampling effort, starting in the year
1800. The file format needs to be plain text, where values are delimited by either
semicola, commata, spaces or line shifts. Such a file can only be uploaded if the
fourth option is chosen.
If the dataset is too short (in terms of years), these options will not be
available. In this case, estimations are based on model 1.
Other settings
If needed, some additional settings can be adjusted. To do so, please tick the
"change default settings" box. The following settings are available for adjustment
(although not all of them are available in all cases):
- According to the default, each occurrence is expected to be reported only once
in the datafile, viz. in the year of its first observation. The default applies to
situations where occurrences can be assumed to persist after they have been
colonised. If the latter assumption is erroneous or unrealistic, the alternative
should be chosen. In that case, localities have to be reported in the datafile once
for each year of their existence. In years where a locality is not reported, it is
thus assumed to be absent. This alternative allows to model species which have
short-lived subpopulations or which are subject to control/eradication measures.
(Please note that the program will only produce descriptive statistics if
occurrences are very short-lived.)
- The confidence level γ may be provided as a numerical between 0 and 0.999.
The default is γ = 0.5, which estimates quartiles.
γ = 0.95 would give 95% confidence intervals, etc.
- The choice "fast or exhaustive estimation" determines whether the program uses
heuristics to accelerate estimations. It is only relevant if dark figures are provided
as an interval or if a break point in detectability is estimated (model 2/3).
If the break point seems to miss the mark entirely, you may try an
exhaustive estimation (rather than a fast one, which is the default).
- ΔAICc may be specified if the option "test both" is chosen for detectability.
ΔAICc is here understood as the AICc of the model with one detection rate
minus the AICc for the model with two detection rates. The default is to prefer
the simpler model if it has a lower AICc than the other one (i.e.,
ΔAICc = 0).
After a succesful conversion, the uploaded dataset is summarised in terms of the
number of observations, the number of occurrences, the number of years and the
geographical extremes of the occurrences. It is recommended to check that these
values are as expected. If the conversion produced any errors or warnings, these
are displayed in the same panel.
After the observations in the dataset have been converted to occurrences,
it is possible to download the extended data file. It is recommended to download
this file, especially if the conversion took its time. If you upload the extended
data file at later occasions, conversion will not be necessary.
The map shows the placement of all observations in the dataset (as points)
as well as the species's extent of occurrence (as a polygon).
The map is drawn using a cylindrical projection. This means that all meridians are vertical
and all parallels are horizontal, making it easy to check coordinates of observations.
On the other hand, the map is thus neither equivalent, equidistant nor conformal.
The centre region of the map is approximately equivalent (equal-area), though.
When the edges of the EOO seem curved, this is not a bug but due to the fact that a
straight line appears curved in a cylindrical projection.
The fourth panel summarises the statistical assumptions on which the model is based.
These assumptions will never be strictly met by any real dataset, but minor deviations
are usually acceptable. Large and systematic deviations, on the other hand, may lead to
biased estimates.
The graph illustrates the change in the number of known occurrences (black points).
The graph has year on its x-axis, the (idealised) radius
of AOO on the left y-axis (linear scale) and the AOO itself on the right
y-axis (square-root scale).
The model that has been fitted to the known occurrences is shown as a solid blue line.
The estimated expansion, which includes unknown occurrences, is shown as a broken red line.
Confidence intervals are shown as dotted lines.
The break point (i.e. the year in which detectability changes, if estimated) is shown
as a vertical dotted pink line.
The output consists of estimates (median plus lower and upper confidence limits) for:
- expansion speed in metres per year (m/a),
- known area of occupancy (AOO) in km²,
- estimated AOO in km² (known AOO times dark figure),
- dark figure,
- extent of occurrence (EOO) in km² (not corrected for coastlines or borders),
- first year of the expansion,
- detectability rate(s),
- variance explained (R²),
- Akaike's Information Criterion (AICc).
The program expansion has been written by
Hanno Sandvik at the
Norwegian Institute for Nature Research (NINA).
In case of questions or comments, please get in touch.
The present version number of the program is 3.2 (as of January 2022).
Licence
Expansion is licensed under Attribution-ShareAlike
4.0 International.
Citation
The program may be cited as:
It would be nice if you cite the documentation, too:
Acknowledgements
Without the detailed feedback by Hanne Hegre, the program would never have
reached its current functionality.
List of versions:
- Version 3.2 (January 2022)
- more robust estimates of expansion speed if occurrences are very short-lived
- faster estimation when dark figures are provided as intervals
- improved graphics and debugged map
- Version 3.1 (December 2021)
- implementation of the demo version
- display of maps also outside of northern Europe
- correction for curved edges of the EOO due to cylindrical projection
- Version 3.0 (November 2021)
- Version 2.6 (September 2017)
- possibility to specify dark figures as intervals even when
p=2 or p=3
- modification in the optimalisation avoiding too early first years of expansion
- graphical illustration of confidence intervals
- Version 2.5 (August 2017)
- estimation of dark figures that constrains their estimates to the interval chosen
- calculation of parameter estimates using AICc-based model averaging
- implementation of
p=3 (AICc-based model selection between p=1 and p=2 )
- Version 2.4 (April 2017)
- implementation of
new.obs=FALSE
- possibility to switch off the fast estimation under
p=2
- removal of a bug in the calculation of EOO
- Version 2.3 (March 2017)
- possbility to specify sampling effort as a covariate of observability rate
- improved estimation of confidence intervals when dark figures are specified
- implementation of a faster estimation under
p=2
- Version 2.2 (February 2017)
- calculation and output of EOO
- possibility to return the dataset underlying the expansion graph
- calibration of the convergence tolerance levels
- Version 2.1 (January 2017)
- estimation of confidence intervals
- more intuitive defaults
- removal of some bugs that created unnecessary error messages
- Versjon 2.0 (desember 2016)
- Versjon 1.4 (July 2012)
- Versjon 1.3 (January 2012)
- Versjon 1.2 (November 2011)
- Versjon 1.1 (September 2011)
- Versjon 1.0 (August 2011)
|