Program expansion:
explanations and documentation

The R program expansion estimates the expansion speed of a population based on a spatio-temporal dataset of occurrences.

Please note that this page describes version 2.6 of the program, which has been superseded by versions 3.x.

 

 

Introduction and installation

The R script does not require any previous knowledge of R, but presupposes that R is installed on the computer. Here is a step-by-step instruction:

  • R is an open software package that can be downloaded for free at http://www.r-project.org. Please follow the instructions on that site to install the package.
  • After you have installed and started R, the expansion script can be loaded in one of two ways:
    • write load(url("http://www.evol.no/hanno/17/expand.rtx")) directly in your R pane (this requires your computer to be online); or
    • use your browser to navigate to http://www.evol.no/hanno/17/expand.rtx and save this file to your harddisk; later, write load("...") in your R pane, where "..." specifies the file location [for example, load("c:/aliens/expand.rtx"); this requires your computer to be online only when dowloading the file for the first time, whereupon it can be loaded locally from your computer].
  • Now you can run the script by writing expansion(...), where "..." represents the parameters, which are explained in detail below.

 

Please note that this R-script is not part of any R package. Therefore, no R help will be available for this function. Please refer to this site instead.

 

The program requires a dataset containing the spatio-temporal information about the observed occurrences of the population. The dataset is specified using the data parameter. A program call thus has the form expansion(data=...), where "..." may be any of the following three objects:

  • (1)   a character string specifying the location of a data file;
  • (2)   a data frame;
  • (3)   a matrix.

If you are an R beginner, you should choose the first option (as the two latter methods presuppose that your data have already been read to R). The formatting required for the data file according to option 1 is explained below. All three options require that the data are organised into columns that are named precisely as specified in the following paragraphs.

  • One column has to contain years and have the name t. Years have to be integers.
  • The geographic positions of observations can be specified using one to six columns, depending on the coordinate system used:

 

Coordinate systems

Positions of observed occurrences may be specified in one of five different formats, using one of three different coordinate systems:

where the variable names have the following meaning and formatting:

lat Latitude (degrees) – specified as an integer or real number between –90 (= 90°S) and +90 (= 90°N).
lam Latitude (arcminutes) – specified as an integer or real number between 0 and 60.
las Latitude (arcseconds) – specified as an integer or real number between 0 and 60.
lon Longitude (degrees) – specified as an integer or real number between –180 (= 180°W) and +180 (= 180°E).
lom Longitude (arcminutes) – specified as an integer or real number between 0 and 60.
los Longitude (arcseconds) – specified as an integer or real number between 0 and 60.
mgrs MGRS coordinates – specified as a character string.
zone UTM zone – specified as an integer between 1 and 60.
band MGRS latitude band – specified as a single character between "C" and "X".
id MGRS square identifier – specified as two characters between "AA" and "ZV".
east Easting – specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.
north Northing – specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.

 

Example

The coordinates of Tromsø (69°39'5.0"N 18°57'19.0"E) can thus be specified in the following ways:

  • (1)   {lat=69.65139; lon=18.95528}
  • (2)   {lat=69; lam=39; las=5; lon=18; lom=57; los=19}
  • (3)   {mgrs="34WDC2058828390"}
  • (4)   {zone=34; band="W"; id="DC"; east="20588"; north="28390"}
  • (5)   {zone=34; east=420588; north=7728390}

 

NB

  • Please note that the MGRS system and the UTM system are often confused (the former is based on the latter). However, both require different formatting. While Tromsø's UTM coordinates are 34 420588 7728390, Tromsø's MGRS coordinates are 34WDC2058828390.
  • If the data do not follow the standards for UTM or MGRS (as appropriate), the program may misinterpret them.
  • The northing of UTM coordinates uses signs in order to distinguish between the Northern (positive sign) and the Southern Hemisphere (negative sign). Positive signs may be omitted.
  • Leading zeros may create trouble for northings and eastings in the MGRS system. To make sure that leading zeros do not "disappear", please save east and north as character strings rather than numbers.
  • The variable names have to follow the conventions detailed above.
  • The precision of the positions does not matter (well – it may matter for the results, of course, but not for the interpretation of the coordinates).

  • Different observations in one dataset may use different coordinate systems.
  • If more than one coordinate system is used, UTM coordinates are ignored wherever MGRS coordinates are supplied; and MGRS coordinates are ignored wherever latitude and longitude are supplied.
  • The order of observations does not matter.
  • The order of columns does not matter.
  • Additional columns are ignored. (Nonetheless, it might be an advantage to delete superfluous columns, because columns containing commata or apostrophes may interrupt the conversion.)

 

Formatting of data files

If the data are read from an external file, please follow these formatting rules:

  • The data have to be organised column-wise, i.e. the file has to consist of one column per variable (year and for instance latitude and longitude) and one row per observation.

  • The first row has to contain the variable names (see above for the variable names that have to be used).
  • All rows have to have the same number of separators.
  • Missing values are tolerated if specified by omission ("") or spaces (" "). (Other symbols, such as "?" or "NA", will generate error messages.)
  • Semicola (;) or commata (,) are accepted as separators between columns (i.e., between the elements of a row) – but please don't use both. Such files can be produced by all spreadsheet applications. (Choose "save as comma delimited file" or something similar. Usual filename extensions of such formats are ".CSV" or ".SDV".)
  • The symbol used as separator must not occur in other places. Nor may apostrophes (') be used anywhere in the data file. Please make sure to remove or replace these symbols.
  • The data parameter is used to specify the location of the data file. The location should be specified as a character string containing the file name and complete location within quotation marks, e.g. expansion(data="c:/aliens/data/art12.sdv"). Please note the use of slash (/) instead of backslash (\).
  • Periods (.) are accepted as decimal marks. Only if semicola (;) are used as seperators, commata (,) may be used as decimal marks, too.
  • Spaces between (outside) elements, and quotation marks (') enclosing elements (on both sides), are tolerated.

 

Example:

  • t;lat;lon
  • 2006;60.00;12,00
  • 2006;60.24;12.76

  • 2007;   61.53;   11.36

  • 2008;62.84;10.92

  • 2010;64.15;12,1

  • 2010;64.84;"12.15"

 

Parameters

Use of the data parameter is explained above. The remaining parameters are optional, although dark.fig should be provided, and save and p can be useful. Parameters are provided separated by commata, such as expansion(data="folder/datafil", dark.fig=10, save=TRUE). The following parameters are available:

data Spatio-temporal dataset of observations. This is the only mandatory parameter. It is explained in detail above.
dark.fig Dark figure assumed to apply to the last year of the dataset, provided as one or more numerical value(s). A dark figure should be provided, because this will result in more realistic estimates. If one value is provided, the script searches for the optimal dark figure in the vicinity of this value. If two values are provided [e.g., as dark.fig=c(5,50)], the script searches for the optimal dark figure between these two values. If more than two values are provided [e.g., as dark.fig=c(2,3,4,5) or dark.fig=2:5], the script tries out exactly the dark figures specified. The default is dark.fig=c(1,Inf), i.e. all values ≥ 1.
exact Logical variable specifying whether the dark figure provided should be treatead as fixed. If exact=FALSE, the script searches in the vicinity around the value specified. If exact=TRUE, the script only uses the dark figure specified (although this is ignored if dark.fig is two or more numbers). The default is FALSE.
p Number or numerical vector specifying how observability is modelled. The default is p=1, which entails that observability is assumed to be constant. If p=2, two observability rates are estimated for two periods of time, where the break point is also inferred from the data. If p=3, both of the former options are estimated, and the better one is chosen (using AICc-based model selection). If p is provided as a vector of length > 100, it is interpreted as a time series containing annual values of sampling effort, starting in the year 1800.
fast Logical variable specifying whether estimation of the break point under p=2 (or 3) should be fast (default) or exhaustive. If the break point estimated seems to miss the mark entirely, an exhaustive search should be tried, using FAST=FALSE.
new.obs Logical or numerical variable specifying whether the dataset contains new observations only. The default, new.obs=TRUE, implies that occurrences are reported only in the year of their first observation, and are assumed to remain in place in subsequent years. Write new.obs=FALSE if the dataset reports each occurrence for each year of its existence – this enables models of species that have short-lived subpopulations, or of species that are subject to eradication measures. If a species is very short-lived (e.g., its occurrences usually disappear within a year), it is better to use new.obs=−1 (in this case, only descriptive statistics are provided; no modelling of the process is implemented yet).
mech [not yet implemented]
form [not yet implemented]
map Logical variable indicating whether the observations should be shown on a map. Currenly this only works for Northern Europe. The default is map=TRUE. To switch off map view, write map=FALSE.
quiet Logical variable that suppresses messages and warnings if TRUE. (When quiet=-1, more details are reported.)
save Logical value or text string indicating whether the data should be saved after transformation. By saving the transformed dataset, coordinates do not have to be transformed each time the script is run. If specified using a text string, the latter is interpreted as file name. The default is save=FALSE.
data.out Logical value or letter that can change the value of the function. If data.out=FALSE, which is the default, the script returns expansion speed as the function value. If data.out=TRUE or data.out="A", the value returned by the script is changed to a matrix containing the annual estimates of area of occupancy (AOO) from the expansion graph. The matrix has four columns: year (containing years), point (containing the observed AOO in a given year, i.e. the points of the graph), blue (containing the fitted values of the known AOO in a given year, i.e. the blue line of the graph), and red (containing the estimated total AOO including dark figures in a given year, i.e. the red line of the graph). If data.out="r", the value returned is a matrix with the columns just described, which, however, contain radii rather than areas. Areas are provided in km², radii in km. Note that the value of the funcation has to be assigned to a new variable using the arrow symbol "<-", e.g. datapoints <- expansion(...).
gamma Numerical variable between 0 and 1, specifying the confidence level (γ). The default is gamma=0.5, which estimates quartiles. gamma=0.95 gives 95% confidence intervals.

Further parameters are available, although they may rarely be needed. The ones that are implemented thus far are:

  • R (the radius of the Earth in kilometres; used during estimating the extent of occurrence; the average radius of the WGS 84 reference ellipsoid is used as default, i.e. R=6371),
  • A0 (the area of one AOO grid cell in square kilometres; defaults to A0=4),
  • language (text string which can shift from English to Norwegian output; correspondingly, the following parameters exist: mtall = dark.fig, eksakt = exact, kjapp = fast, ny.obs = new.obs, mek = mech, kart = map, hold.munn = quiet, lagre = save, and spraak = language),
  • dist (text string specifying whether optimisation uses a normal or a binomial distribution),
  • DeltaAICc (difference in AICc-units between models at which p=2 is preferred over p=1; defaults to 0),
  • kontr1 and kontr2 (lists containing parameters that control the optimisation).

 

Output

Before starting the estimation itself, the script give a summary of the input data and the model assumptions (this can be turned off using the quiet parameter). The function's value (returned invisibly) is the number representing the expansion speed of the population in metres per year. The output provided on the screen consists of estimates (median plus lower and upper confidence limits) for:

  • expansion speed in metres per year (m/a),
  • known area of occupancy (AOO) in km²,
  • estimated AOO in km² (known AOO times dark figure),
  • dark figure,
  • extent of occurrence (EOO) in km² (not corrected for coastlines or borders),
  • first year of the expansion,
  • observability rate(s),
  • coefficient of determination (R²),
  • Akaike's Information Criterion (AICc).

 

Documentation and definitions

Expansion is here understood as the number of new occurrences per time (where "occurrences" are colonised 2 km × 2 km grid cells). Thus, expansion encompasses any spread or movement of the species concerned (regardless of means, causes and pathways, i.e. including active and passive, natural and anthropogenic, intentional and unintentional movements).

Mathematically, expansion speed is described as the annual increase in the radius of the area of occupancy of the species (where the radius is calculated as if the AOO was a coherent circle containing all occurrences and only occurrences). The model underlying the program has been described in detail by Sandvik (Acta Biotheor, 2019).

In risk assessments according to the Generic Ecological Impact Assessment of Alien Species (GEIAA), estimates of expansion speed are needed in order to obtain a score for criterion B on the invasion axis. For more detailed explanations, please consult the Guidelines published by the Norwegian Biodiversity Information Centre.

 

About the program

The program expansion has been written by Hanno Sandvik at the Centre for Biodiversity Dynamics (Norwegian University of Science and Technology), now at the Norwegian Institute for Nature Research (starting from version 2.5).

Please note that this page describes version 2.6 of the program, which has been superseded by versions 3.x.

 

Acknowledgements

Without the detailed feedback by Hanne Hegre, the program would never have reached its current functionality.

 

Overview of past and more recent versions :

  • Version 2.0 (December 2016)
    • During the upgrade from version 1.4 to 2.0, the new definition of expansion as the annual increase in AOO was implemented.
  • Version 2.1 (January 2017)
    • estimation of confidence intervals
    • more intuitive defaults
    • removal of some bugs that created unnecessary error messages
  • Version 2.2 (February 2017)
    • calculation and output of EOO
    • possibility to return the dataset underlying the expansion graph
    • calibration of the convergence tolerance levels
  • Version 2.3 (March 2017)
    • possbility to specify sampling effort as a covariate of observability rate
    • improved estimation of confidence intervals when dark figures are specified
    • implementation of a faster estimation under p=2
  • Version 2.4 (April 2017)
    • implementation of new.obs=FALSE
    • possibility to switch off the fast estimation under p=2
    • removal of a bug in the calculation of EOO
  • Version 2.5 (August 2017)
    • estimation of dark figures that constrains their estimates to the interval chosen
    • calculation of parameter estimates using AICc-based model averaging
    • implementation of p=3 (AICc-based model selection between p=1 and p=2)
  • Version 2.6 (September 2017)
    • possibility to specify dark figures as intervals even when p=2 or p=3
    • modification in the optimalisation avoiding too early first years of expansion
    • graphical illustration of confidence intervals
  • Version 3.0 (October 2021)