Program EXPANSION: explanations and documentation

The R-script EXPANSION estimates a population's expansion rate from spatio-temporal data on the occurrences of the population. Please note that this page describes version 1.4 of the program, which has been superseded by versions 3.x.

 

Introduction and installation

The R-script does not require any previous knowledge of R, but presupposes that R has been installed on your computer. A point-by-point instruction follows here:

·    R is an open and free programming language and environment. It can be downloaded from http://www.r-project.org. Follow the installation instructions at that site to install the package.

·    After you have installed and started R, the lifetime script can be loaded in one of two ways:

·     write load(url("http://www.evol.no/hanno/12/expans.rtx")) directly in your R pane (this requires your computer to be online); or

·     use your browser to navigate to http://www.evol.no/hanno/12/expans.rtx and save this file to your harddisk; later, write load("...") in your R pane, where "..." specifies the file location [for example, load("c:/aliens/expans.rtx"); this requires your computer to be online only when dowloading the file for the first time, whereupon it can be loaded locally from your computer].

·    Now you can run the script by writing expansion(...), where "..." represents the parameters, which are explained in detail below.

Please note that this R-script is not part of any R package. Therefore, no R help will be available for this function. Please refer to this site instead.

 

The program requires a dataset containing the spatio-temporal information about the observed occurrences of the population. The dataset is specified using the parameter data. A program call thus has the form expansion(data=...), where "..." may be any of the following three objects:

(1) a character string specifying the location of a data file;

(2) a data frame;

(3) a matrix.

If you are an R beginner, you should choose the first option (as the two latter methods presuppose that your data have already been read to R). The formatting required for the data file according to option 1 is explained below. All three options require that the data are organised into columns that are named precisely as specified in the following paragraphs.

·    One column has to contain years and have the name t. Years have to be integers.

·    The geographic positions of observations can be specified using one to six columns, depending on the coordinate system used:

 

Coordinate systems

Positions of observed occurrences may be specified in one of five different formats, using one of three different coordinate systems:

·    Latitute and longitude

                    (1)      lat   AND lon   OR

                    (2)      lat   AND lam   AND las   AND lon   AND lom   AND los

·    MGRS coordinates (Military Grid Reference System)

                    (3)      mgrs   OR

                    (4)      zone   AND band   AND id   AND east   AND north   OR

·    UTM coordinates (Universal Transverse Mercator)

                    (5)      zone   AND east   AND north  

where the variable names have the following meaning and formatting:

lat Latitude (degrees). Specified as an integer or real number between –90 (= 90°S) and +90 (= 90°N).
lam Latitude (arcminutes). Specified as an integer or real number between 0 and 60.
las Latitude (arcseconds). Specified as an integer or real number between 0 and 60.
lon Longitude (degrees). Specified as an integer or real number between –180 (= 180°W) and +180 (= 180°E).
lom Longitude (arcminutes). Specified as an integer or real number between 0 and 60.
los Longitude (arcseconds). Specified as an integer or real number between 0 and 60.
mgrs MGRS coordinates. Specified as a character string.
zone UTM zone. Specified as an integer between 1 and 60.
band MGRS latitude band. Specified as a single character between "C" and "X".
id MGRS square identifier. Specified as two characters between "AA" and "ZV".
east Easting. Specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.
north Northing. Specified as a number, although it can be formatted as a character string. The meaning and range of allowed values differ between UTM and MGRS.

For use in Norway, positions can also be provided in other formats, e.g. in terms of midpoints of municipalities. The explanation of these options in given in Norwegian only.

 

Example

The coordinates of Tromsø (69°39'5.0"N 18°57'19.0"E) can thus be specified in the following ways:

        (1)      {lat=69.65139; lon=18.95528}

        (2)      {lat=69; lam=39; las=5.0; lon=18; lom=57; los=19.0}

        (3)      {mgrs="34WDC2058828390"}

        (4)      {zone=34; band="W"; id="DC"; east=20588; north=28390}

        (5)      {zone=34; east=420588; north=7728390}

[The precision of the coordinates in this example is (1) roughly 1.1 m (N–S) / 0.4 m (E–W), (2) roughly 3.1 m (N–S) / 1.1 m (E–W), (3–5) exactly 1 m.]

 

NB

·    Please note that the MGRS system and the UTM system are often confused (the former is based on the latter). However, both require different formatting. While Tromsø's UTM coordinates are 34 420588 7728390, Tromsø's MGRS coordinates are 34WDC2058828390. The northing of UTM coordinates uses signs in order to distinguish between the Northern (positive sign) and the Southern Hemisphere (negative sign). Positive signs may be omitted.

·    If the data do not follow the standards for UTM or MGRS (as appropriate), the program may misinterpret them.

·    The variable names have to follow the conventions detailed above.

·    Leading zeros may create trouble for northings and eastings in the MGRS system. To make sure that leading zeros do not "disappear", please save east and north as character strings rather than numbers.

·    The precision of the positions does not matter (well – it may matter for the results, of course, but not for the interpretation of the coordinates).

·    Different observations in one dataset may use different coordinate systems.

·    If more than one coordinate system is used, UTM coordinates are ignored wherever MGRS coordinates are supplied; and MGRS coordinates are ignored wherever latitude and longitude are supplied.

·    The order of observations does not matter.

·    The order of columns does not matter.

·    Additional columns are ignored.

 

Formatting of data files

If the data are read from an external file, please follow these formatting rules:

·    The data have to be organised column-wise, i.e. the file has to consist of one column per variable (year and for instance latitude and longitude) and one row per observation.

·    The first row has to contain the variable names (see above for the variable names that have to be used).

·    All rows have to have the same number of separators.

·    Missing values are tolerated if specified by omission ("") or spaces (" "). (Other symbols, such as "?" or "NA", will generate error messages.)

·    Semicola (;) or commata (,) are accepted as separators between columns (i.e., between the elements of a row) – but please don't use both. Such files can be produced by all spreadsheet applications. (Choose "save as comma delimited file" or something similar. Usual filename extensions of such formats are ".CSV" or ".SDV".)

·    The symbol used as separator must not occur in other places. Nor may apostrophes (') be used anywhere in the data file. Please make sure to remove or replace these symbols.

·    The parameter data is used to specify the location of the data file. The location should be specified as a character string containing the file name and complete location within quotation marks, e.g. expansion(data="c:/aliens/data/species6.sdv"). Please note the use of slash (/) instead of backslash (\).

·    Periods (.) are accepted as decimal marks. Only if semicola (;) are used as seperators, commata (,) may be used as decimal marks, too.

·    Spaces between (outside) elements, and quotation marks (") enclosing elements (on both sides), are tolerated.

 

Example:

t;lat;lon
2006;60.00;12,00
2006;60.54;12.26
2006;60.24;12.76
2007;61.94;11.74
2007;   61.53;   11.36
2008;62.75;10.18
2008;62.84;10.92
2009;63.81;11.43
2010;64.97;12.64
2010;64.15;12,1
2010;64.84;"12.15"

 

Parameters

Use of the parameter data is explained above. The remaining parameters are optional and usually not required. An overview will be provided here shortly.

·    map (logical variable indicating whether the observations should be shown on a map; the default is map=TRUE; to switch off map view, write map=FALSE),

·    rmax (initial estimate of the maximum possible distance of expansion; the default is infinity; other values can be specified in kilometres),

·    quiet (turns off messages and warnings if TRUE; the default is quiet=FALSE),

·    front (integer that allows to specify how the expansion front should be defined; options available so far are front=0, which estimates expansion from the entire population's average distance from the first observation at any time; front=1, which defines the expansion front as the average of the distances that are at least as large as the previous year's expansion front; front=2, which defines the expansion front as the average of the distances that are at least as large as the single largest distance in the previous year; front=3, which defines the expansion front as the largest distance that is at least as large as the maximum distance observed in the previous year; the default is front=2, which is a rather robust definition under many conditions),

·    new.obs (logical variable indicating whether occurrences are only reported in the year of their first observation, and should assumed to be present also in later years; if so, ny.obs=TRUE, which is the default; the parameter is ignored if front > 0),

·    type (integer or vector of integers between 0 and 3, indicating the functional form that is fitted to the data; type=0 fits a linear model, type=1 a truncated model, type=2 an asymptotic model, and type=3 a sigmoid model; if a vector is provided, the respective models are tested in turn, and the results for the model with the lowest AIC are presented; the default is type=0),

·    output (logical variable indicating whether the function should produce an output consisting of a list of model estimates; defaults to outdata=FALSE),

·    outdata (logical variable indicating whether the function should produce an output consisting of a data-frame with the locations transformed to x and y distances in kilometres after applying an azimuthal projection; defaults to output=FALSE),

·    save (logical variable or character string indicating whether the function should save the data to a file after transforming them to latitudes and longitudes; using this option, it is sufficient to transform MGRS or UTM coordinates once, while later calls can directly load the transformed data; this is done if save is either TRUE or a file name; the default is save=FALSE),

·    det (logical value indicating whether details from all models should be displayed when more than one model is tested; the default is det=FALSE),

·    the remaining parameters allow overriding some default settings (xy: logical value indicating whether the data are already transformed to an azimuthal projection and expressed in kilometres in x and y direction, using the columns x and y; phi0: latitude of the centre of the azimuthal projection; lambda0: longitude of the centre of the azimuthal projection; language: allows to switch between English and Norwegian output), affect the graphical representation of the course of expansion (alpha: alpha level of the confidence intervals displayed, defaults to 0.05; header: header for the graph; xlab: legend for the x axis; ylab: legend for the y axis; ylim: factor by which the y axis is streched, defaults to 1.5; hmax: number of years shown in addition to the ones with data, defaults to 20; ...: further graphical parameters if desired), or allow parameterisation in Norwegian (where kart = map, hold.munn = quiet, tittel = header, ny.obs = new.obs, typ = type, utmat = output, utdata = outdata, lagre = save, and spraak = language).

 

Output

The function does not have any value. Its output is displayed directly on the screen instead. The output starts with of a short summary of the input data (which can be suppressed by letting the parameter quiet=TRUE). The remainder consists of:

·    estimates of the expansion rate (v ± 95% confidence intervals) and of the standard deviation of the spread distance s, based on the assumption of no observation error (i.e., all variation is assumed to be due to process noise);

·    estimates (± 95% confidence intervals) based on the assumption of no process noise (i.e., all variation is assumed to be due to observation error). Depending on the model chosen, the following parameters may be estimated:

·     expansion rate v in kilometres per year,

·     the time t0 of first introduction as a year,

·     the maximum expansion distance K in kilometres,

·     the time (year) tx specifying the inflection point of a sigmoid curve,

·     the standard deviation s of the spread distance,

·     the parameter b, which describes the increase of the variance observed with time,

·     Akaike's Information Criterion AIC.

NB: The output does not use thousands separators. Periods (or commata) in numbers thus signify decimal marks.

 

About the program

The R-script EXPANSION has been written by Hanno Sandvik with contributions by Jarle Tufto at the Centre for Biodiversity Dynamics (CBD), Norwegian University of Science and Technology (NTNU).

The description on this page refers to version 1.4 (June 2012) and is retained for documentation purposes only. It has been superseded by version 2.0 as of December 2016.

In case of questions or comments, please contact Hanno Sandvik.