Package version: 0.6.1
Phylogenetic Eigenvector Maps (PEM) is a method to perform phylogenetic modelling. Phylogenetic modelling consists in modelling trait evolution and predicting trait values using phylogeny as an explanatory factor (Guénard et al. 2013). Phylogenetic modelling allows one to predict trait values when it is difficult or impractical to obtain them, for instance when species are rare, extinct, or when information is needed for several species and trait values are only available for a relatively small number of them (Guénard et al. 2011, 2014).
To apply phylogenetic modelling, one needs to have a set of species with known phylogeny and trait values (hereafter referred to as the model species) as well as to know the locations, with respect to the phylogeny of the models species, of the species for which trait values are being predicted (hereafter referred to as the target species). Phylogenetic modelling can be performed jointly with trait correlation modelling: it is possible to use other traits with known (or estimable) values for the target species to help predict a trait of interest. Phylogenetic trees being acyclic graphs, I will hereby prefer terms belonging to the graph theory over terms phylogeneticists may be more familiar with. Therefore I will use edge over branches and vertex over root, node or tip; safe in cases where I want to be specific about what a vertex represents.
The Phylogenetic eigenvector maps (PEM) expression
Statistical maps are a type of geographic map representing the values or states of a variable across space <https://encyclopedia2.thefreedictionary.com/statistical+map).
In a paper entitled “The interpretation of statistical maps”, (Moran 1948) described tests of significance of the spatial relationships among values of qualitative variables on statistical maps.
“Moran’s eigenvector maps” (MEM), an expression coined by Dray et al. (2006), describes the variation of spatial eigenvectors whose eigenvalues are proportional to Moran’s I spatial autocorrelation statistics (Moran 1950) of the corresponding eigenvectors. Spatial eigenvectors are mathematical constructs that describe the variation of quantities across space (or time) at different spatial scales. They were originally called Principal Coordinates of Neighbour Matrices by Borcard and Legendre (2002).
Phylogenetic eigenvector maps (PEM) (Diniz-Filho et al. 2012; Guénard et al. 2013) are sets of eigenfunctions describing the structure of a phylogenetic graph, which represents either a Darwinian phylogenetic tree or a reticulated tree, i.e., a phylogenetic tree with reticulations. The various eigenvectors describe the variation across a phylogeny at different phylogenetic scales.
Contrary to MEM, the eigenvalues in PEM are not proportional to Moran’s I autocorrelation coefficients of the corresponding eigenvectors.
The PEM work flow consists in
calculating the influence matrix of the graph,
specifying a model of trait evolution along the edges of the phylogenetic tree,
calculating the left eigenvectors of the weighted and centred influence matrix and
use these eigenvectors as descriptors (Guénard et al. 2013).
An R language implementation of that approach is found in package MPSEM. MPSEM was developed to make the aforementioned process as seamless as possible. It is a work in progress; I welcome anyone to provide relevant suggestions and constructive remarks aimed at making MPSEM a better, more efficient and user-friendly, interface to phylogenetic modelling.
Assuming package MPSEM is installed, the first step to calculate a PEM is to load package MPSEM, which depends on packages ape and MASS:
## Loading required package: ape
## Loading required package: magrittr
## Loading required package: sf
## Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
For the present tutorial, we will use the data set perissodactyla from R package caper. These data from Purvis and Rambaut (1995) are loaded into your R workspace as follows:
Fig. 1. The phylogenetic tree used for this example.
The perissodactyla data set contains
perissodactyla.tree, a phylogenetic tree encompassing 18
odd-toed ungulate species (Fig. 1) and perissodactyla.data,
a data frame containing information four trait for these species,
namely, the (\(\log_{10}\)) of the
adult female weight, gestation weight, and neonatal weight, as well a
fourth categorical trait telling whether of not the species has
territorial behaviour.
| Binomial | log.female.wt | log.gestation.length | log.neonatal.wt | Territoriality |
|---|---|---|---|---|
| Ceratotherium simum | 6.262 | 2.690 | 4.903 | Yes |
| Dicerorhinus sumatrensis | 5.910 | 2.602 | 4.544 | NA |
| Diceros bicornis | 6.183 | 2.653 | 4.699 | No |
| Equus africanus | 5.439 | 2.562 | 4.398 | Yes |
| Equus burchelli | 5.477 | 2.562 | 4.477 | No |
| Equus grevyi | 5.653 | 2.591 | 4.602 | Yes |
| Equus hemionus | 5.462 | 2.562 | 4.398 | Yes |
| Equus zebra | 5.462 | 2.562 | 4.398 | No |
| Rhinoceros sondaicus | 6.153 | 2.681 | 4.699 | NA |
| Rhinoceros unicornis | 6.237 | 2.681 | 4.845 | Yes |
| Tapirus indicus | 5.455 | 2.599 | 3.919 | Yes |
| Tapirus pinchaque | 5.377 | NA | 3.699 | NA |
| Tapirus terrestris | 5.332 | 2.601 | 3.760 | NA |
There are four species for which territoriality is missing: the Sumatran rhinoceros (Dicerorhinus sumatrensis), the Javan rhinoceros (Rhinoceros sondaicus), the mountain tapir (Tapirus pinchaque), and the South American tapir (Tapirus terrestris). For the benefit of this example, we performed a quick search over the Internet to attempt at completing this data. We came up with the following completion data:
These data are not the result of a professional enquiery and thus must not be used outside of the present example. Poeple interested in the behaviour of these species should seek advice from a competent scientist.
Before going any further, it is important to make sure that the
species in the tree object are the same and presented in the same order
as the ones in the data table. Glancing at the data table, species
clearly cannot match since the latter feature information for only 13 of
the 18 species in the tree. We will therefore match the tip labels of
the original tree in the data table using the binary (Latin) species
names in a character vector spmatch. When no matching
element from the data table is found, the value NA appears
at the corresponding position in spmatch. We can therefore
use these missing values to reference the species that can be dropped
from the tree using function drop.tip() from package
ape as follows:
match(
perissodactyla.tree$tip.label,
perissodactyla.data[,1L]
) -> spmatch
drop.tip(
perissodactyla.tree,
perissodactyla.tree$tip.label[is.na(spmatch)]
) -> perissodactyla.treeNow that the data match the tree in terms of species content, we then need to make sure that species ordering also matches as follows:
## [,1] [,2]
## [1,] "Dicerorhinus sumatrensis" "Ceratotherium simum"
## [2,] "Rhinoceros sondaicus" "Dicerorhinus sumatrensis"
## [3,] "Rhinoceros unicornis" "Diceros bicornis"
## [4,] "Diceros bicornis" "Equus africanus"
## [5,] "Ceratotherium simum" "Equus burchelli"
## [6,] "Tapirus terrestris" "Equus grevyi"
## [7,] "Tapirus pinchaque" "Equus hemionus"
## [8,] "Tapirus indicus" "Equus zebra"
## [9,] "Equus grevyi" "Rhinoceros sondaicus"
## [10,] "Equus burchelli" "Rhinoceros unicornis"
## [11,] "Equus zebra" "Tapirus indicus"
## [12,] "Equus africanus" "Tapirus pinchaque"
## [13,] "Equus hemionus" "Tapirus terrestris"
Since they do not, we need to recalculate spmatch with
the new, reduced, tree and re-order the data accordingly:
match(
perissodactyla.tree$tip.label,
perissodactyla.data[,1L]
) -> spmatch
perissodactyla.data[spmatch,] -> perissodactyla.data
all(perissodactyla.tree$tip.label == perissodactyla.data[,1L])## [1] TRUE
The last code line is just a last check to guarantee that all species names are matching. As a last step before we are done with data manipulation, I will put the binary names in place of the row names and delete the table’s first row:
perissodactyla.data[,1L] -> rownames(perissodactyla.data)
perissodactyla.data[,-1L] -> perissodactyla.dataThe data now appear as follows:
| log.female.wt | log.gestation.length | log.neonatal.wt | Territoriality | |
|---|---|---|---|---|
| Dicerorhinus sumatrensis | 5.910 | 2.602 | 4.544 | No |
| Rhinoceros sondaicus | 6.153 | 2.681 | 4.699 | No |
| Rhinoceros unicornis | 6.237 | 2.681 | 4.845 | Yes |
| Diceros bicornis | 6.183 | 2.653 | 4.699 | No |
| Ceratotherium simum | 6.262 | 2.690 | 4.903 | Yes |
| Tapirus terrestris | 5.332 | 2.601 | 3.760 | Yes |
| Tapirus pinchaque | 5.377 | NA | 3.699 | Yes |
| Tapirus indicus | 5.455 | 2.599 | 3.919 | Yes |
| Equus grevyi | 5.653 | 2.591 | 4.602 | Yes |
| Equus burchelli | 5.477 | 2.562 | 4.477 | No |
| Equus zebra | 5.462 | 2.562 | 4.398 | No |
| Equus africanus | 5.439 | 2.562 | 4.398 | Yes |
| Equus hemionus | 5.462 | 2.562 | 4.398 | Yes |
As previously announced, I use the vocabulary of the graph theory when describing PEM: a tree is a (directed) graph, a branch is an edge, and the root, nodes, and tips are vertices. PEM allows one to specify a model of trait evolution along the edges of the tree. The trait evolution model is a power function of the edge lengths, with parameters \(a\) and \(\psi\) describing the shape of the relationship between the edge lengths and trait evolution rate:
\[ w_{a,\psi}(\phi_{j})= \begin{cases} \psi\phi^{\frac{1-a}{2}} & \phi_{j}>0\\ 0 & \phi_{j}=0, \end{cases} \] where \(a\) is the steepness parameter describing how abrupt the changes in trait values occur with time following branching, \(\psi\) is the evolution rate of the trait, and \(\phi_{j}\) is the length of edge \(j\) (Guénard et al. 2013).
Fig. 2. Output of the edge weighting function for different sets of
parameters \(a\) and \(\psi\).
As the steepness parameter increases, the weight assigned to a given edge increases more sharply with respect to the phylogenetic distance (or evolutionary time; Fig. 2). In the context of PEM, the edge weight represent the relative rate of evolution of the trait; the greater the edge weight, the greater the trait change along that edge. When \(a=0\), trait evolution is neutral and therefore proceeds by random walk along edges. When \(a=1\), edge weights no longer increase as a function of edge lengths. That situation corresponds to the scenario in which trait evolution is driven by the strongest possible natural selection: following a speciation event, trait either change abruptly (directional selection) at the vertex or do not change at all (stabilizing selection).
Also, the shape parameters may differ for different parts of the phylogenetic tree or network. For instance, the trait may have evolved neutrally at a steady rate from the base of a tree up to a certain node, and then, may have been subjected to different levels of selection pressure and various evolution rate from some of the nodes towards their tips.
The first step to build a PEM is to convert the
phylogenetic tree. The is done by giving the tree to function
as.graph() as follows:
Here’s a snipet showing how the graph container used by MPSEM stores graph information:
## Classes 'graph' and 'data.frame': 24 obs. of 1 variable:
## $ species: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## - attr(*, "edge")='data.frame': 23 obs. of 3 variables:
## ..$ from : int [1:23] 14 15 16 17 17 18 18 16 19 19 ...
## ..$ to : int [1:23] 15 16 17 1 18 2 3 19 4 5 ...
## ..$ distance: int [1:23] 9 4 2 2 1 1 1 3 1 1 ...
The object consists in a data frame containing the vertex
information, together with another data frame containing the edge
information and stored as an attribute called edge. The
vertex data frame may have \(0\) or
more columns containing vertex information and its rows reference the
vertices (minimally, their name labels, which are stored as the row
names of the data frame). In the present case, a logical vector created
by as.graph() and called $species stores
whether a given vertex represents a species (i.e., it is a tip)
or not (i.e., it is a node). The edge attribute is
a data frame containing information about the edges of the graph, namely
the indices of their origin and destination vertices (the first two
columns), and an arbitrary number of supplementary columns storing any
other edge properties. In the present case, a numeric vector created by
as.graph() and called $distance stores the
phylogenetic distances (\(\phi_{j}\)),
which, in this example, correspond to the branch lengths of
perissodactyla.tree.
Beginning on MPSEM version 0.6, PEM are generated
using function PEM (before version 0.6, a defunct function
called PEM.build() was used for that purpose). That
function minimally takes a graph-class object and outputs a PEM-class
object. By default, the whole graph is assumed to have a single value of
the steepness parameter (i.e., assuming neutral evolution;
\(a=0\) for all the graph’s edges) and
homogeneous evolution rate (i.e., \(\psi = 1\) for all the graph’s edges).
Function PEM() is called as follows:
## Calculate the PEM object using the default:
perissodactyla.PEM <- PEM(gr1)
## Show the PEM object:
perissodactyla.PEM## A phylogenetic eigenvector map (PEM) for 13 species:
## ...,
## obtained from the following phylogenetic graph:
##
## A graph with 24 vertices and 23 edges
## ----------------------------------
## Vertex labels: Dicerorhinus sumatrensis, Rhinoceros sondaicus, Rhinoceros unicornis, Diceros bicornis, Ceratotherium simum, ... + 14 more ..., N9, N10, N11
## Edge labels: E1, E2, E3, E4, E5, ... + 13 more ..., E21, E22, E23
## Vertex information: species
## Available edge information: distance, a, psi
Heterogeneous evolutionary processes are defined by two linear sub
models on the graph’s edges; one for parameter \(a\) and one for parameter \(\psi\). Each of these models is itself
defined using a model matrix and a parameter vector. These model
matrices have as many rows as the number of edges, and as many columns
as the numbers of parameters. The model matrix for \(a\) (\(\mathbf{M}_a\), argument mm_a)
needs at least a single all-ones column working together with a length
one parameter vector (\(\mathbf{b}_a\))
to represent a constant value to apply to all edges (this is the default
for mm_a). The value of \(a\) for any edge \(i\) is obtained from the linear combination
\(\eta_i = \mathbf{M}_a \mathbf{b}_a\)
as follows:
\[ a_i = \frac{1}{1 + e^{-\eta_i}}, \]
which constrains the value of \(a_i\) between \(0\) and \(1\). The default value for
mm_a is a single-column all-ones matrix, whereas the single
value for \(\mathbf{b}_a\) is \(-\infty\), which makes \(a_i = 0\) for all edges. The linear sub
model for \(\psi\) is defined in a
similar way, with the exception that it needs no all-ones column. The
value of \(\psi\) for any edge \(i\) is obtained from the linear combination
\(\eta_i = \mathbf{M}_\psi
\mathbf{b}_\psi\) as follows:
\[ \psi_i = \frac{2}{1 + e^{-\eta_i}}, \]
which constrains the values of \(\psi\) between \(0\) and \(2\). The model matrix for \(\psi\) (\(\mathbf{M}_{\psi}\), argument
mm_psi) may be a zero-column matrix working together with a
length zero parameter vector (\(\mathbf{b}_{\psi}\)) implying a constant
value \(\eta=0\) to apply to all
edges.
The default behavior of function PEM() for
mm_psi to be a zero-column matrix and b_psi to
be a zero-length vector thereby applying a constant value of \(\psi=1\) for all the edges of the
graph.
The elements of the model matrices are binary values (i.e.,
taking either values \(0\) or \(1\)) which can be defined manually or using
functions isUnderVertex() or isUnderEdge().
Function isUnderVertex() generates sets of vectors with the
value \(1\) for all edges under a given
vertex and \(0\) elsewhere. Likewise
function isUnderEdge() generates sets of vectors with the
value \(1\) for all edges under a given
edge and \(0\) elsewhere.
The following figure will help us figure out the indices of the edges involved:
perissodactyla.tree -> tree
sprintf("N%s",1L:tree$Nnode) -> tree$node.label
par(mar=c(2,2,2,2))
plot(tree, show.node.label=TRUE)
edgelabels(
sprintf("E%d",1L:nrow(tree$edge)),
edge=1L:nrow(tree$edge),
bg="white",
cex=0.75
)
Fig. 3. The labelled training species tree for this example.
For instance, let us define a model matrix with a contrast for all species of genus Equus, which are located under vertex N8, and a second contrast vector for the complete lineage leading to species of genus Tapirus from the rest of the tree, which are all found under edge E9 as follows:
## N9 E11
## E1 0 0
## E2 0 0
## E3 0 0
## E4 0 0
## E5 0 0
## E6 0 0
## E7 0 0
## E8 0 0
## E9 0 0
## E10 0 0
## E11 0 1
## E12 0 1
## E13 0 1
## E14 0 1
## E15 0 1
## E16 0 0
## E17 1 0
## E18 1 0
## E19 1 0
## E20 1 0
## E21 1 0
## E22 1 0
## E23 1 0
A PEM is generated on the basis of this model matrix
using function PEM() as follows:
## Calculate the PEM:
PEM(
x = gr1,
a = c(-6,5,8), ## steepness sub model parameters
psi = c(-1,1), ## evolution rate sub model parameters
mm_a = cbind(1, mm), ## model matrix: steepness sub model
mm_psi = mm ## model matrix: evolution rate sub model
) -> perissodactyla.PEM
## Show the PEM object:
perissodactyla.PEM## A phylogenetic eigenvector map (PEM) for 13 species:
## ...,
## obtained from the following phylogenetic graph:
##
## A graph with 24 vertices and 23 edges
## ----------------------------------
## Vertex labels: Dicerorhinus sumatrensis, Rhinoceros sondaicus, Rhinoceros unicornis, Diceros bicornis, Ceratotherium simum, ... + 14 more ..., N9, N10, N11
## Edge labels: E1, E2, E3, E4, E5, ... + 13 more ..., E21, E22, E23
## Vertex information: species
## Available edge information: distance, a, psi
## Extract the graph from within the PEM-class object:
gr <- perissodactyla.PEM$graph()
## Show the steepness and evolution rate at the edges:
round(edge(gr)[,c("a","psi")],3)## a psi
## E1 0.002 1.000
## E2 0.002 1.000
## E3 0.002 1.000
## E4 0.002 1.000
## E5 0.002 1.000
## E6 0.002 1.000
## E7 0.002 1.000
## E8 0.002 1.000
## E9 0.002 1.000
## E10 0.002 1.000
## E11 0.881 1.462
## E12 0.881 1.462
## E13 0.881 1.462
## E14 0.881 1.462
## E15 0.881 1.462
## E16 0.002 1.000
## E17 0.269 0.538
## E18 0.269 0.538
## E19 0.269 0.538
## E20 0.269 0.538
## E21 0.269 0.538
## E22 0.269 0.538
## E23 0.269 0.538
In this example, it is assumed that the steepness and evolution rate are \(a \approx 0.269\) and \(\psi \approx 0.538\) among genus Equus (E15–E21), \(a \approx 0.881\) and \(\psi \approx 1.462\) among genus Tapirus (E9–E13), and \(a \approx 0.002\) and \(\psi = 1\) from the root of the tree up to the vertex where the two latter genera begin as well as among the other genera (E1–E8 and E14). The models may involve more complex designs these two additive terms (eg., nested factors, interaction terms) provided that the parameter values are known or that sufficient data is available to estimate the parameters.
In addition to the phylogenetic graph, to evolution model matrices,
and two parameter vectors, function PEM() has an argument
d, the name of the edge property where the phylogenetic
distances are stored and sp, the name of the vertex
property specifying what vertex is a species. The default values for
d and sp are those produced by
as.graph(), and can therefore be omitted in most cases. The
pem-class object that is used to store PEM information
is rather complex and we will hereby refrain from browsing through it.
Methods as.matrix and as.data.frame enables
one to extract the eigenvectors from the PEM-class object.
For a set of \(n\) species, these
methods return a matrix encompassing at most \(n - 1\) column vectors that can be used in
models to represent phylogenetic structure in traits. Here the
phylogenetic patterns of variation described by two eigenvectors of the
PEM we calculated above:
tmp <- par(no.readonly = TRUE)
par(mfrow=c(1,2), mar=c(1.1,1.1,2.6,0.1))
## Singular vectors are extracted using the as.matrix method:
perissodactyla.U <- as.matrix(perissodactyla.PEM)
plot(perissodactyla.tree, x.lim=60, cex=1.0)
par(mar=c(1.1,0.1,2.6,1.1))
plot(NA, xlim=c(1,ncol(perissodactyla.U)), ylim=c(1,nrow(perissodactyla.U)),
ylab="", xlab="", axes=FALSE, cex=1.5)
axis(3, at=1:ncol(perissodactyla.U), tick=FALSE, cex.axis=1.1,
label=parse(text=sprintf("bold(u)[%d]",1:ncol(perissodactyla.U))))
absrng <- max(abs(perissodactyla.U))
for(i in 1:ncol(perissodactyla.U))
points(
x = rep(i,nrow(perissodactyla.U)),
y = 1:nrow(perissodactyla.U),
cex = 4*abs(perissodactyla.U[,i])/absrng,
bg = grey(c(0,1))[1.5 + 0.5*sign(perissodactyla.U[,i])],
pch=22
)Fig. 4. Example of a set of 12 eigenvectors obtained from the exemplary data set. The size of the markers scales linearly with the absolute value of the loadings, whereas the background colors are showing their signs (white: negative loadings, black: positive loadings).
For instance, the pattern shown by the first eigenvector essentially contrasts Equids and the other odd-toed ungulate species, whereas the pattern shown by the second eigenvector essentially contrasts tapirs and Rhinocerotids. The other \(10\) eigenvectors show various other smaller-scale phylogenetic patterns.
Class PEM has a method called evolution.model to
estimate the best set of weighting function parameters to use for
modelling, which are often unknown to the user. From the initial values
provided when creating the PEM-class object, method
evolution.model estimates the parameter empirically using
restricted maximum likelihood from one or more response variables
provided to it through its argument y and, optionally, from
a set of auxiliary traits (argument x).
Method
evolution.modelreplaces the defunct functionPEM.fitSimple(), enabled one to estimate the value of a single \(a\) value, with \(\psi=1\) for a tree. The framework developed since MPSEM version 0.6 enables more flexibility in specifying models with heterogeneous \(a\) and \(\psi\) for various parts of the phylogenetic network.
Method evolution.model requires a response variable
(here the neonate weight) that will be used to optimize the parameter
vectors \(\mathbf{b}_a\) and (as
required) \(\mathbf{b}_\psi\) and is
called as follows:
perissodactyla.PEM <- PEM(gr1, a = 0) ## A simpler, single-steepness, model
evolution.model(
object = perissodactyla.PEM,
y = perissodactyla.data[,"log.neonatal.wt"]
) -> opt
opt## $par
## $par$a
## a
## -3.092961
##
## $par$psi
## NULL
##
##
## $value
## [1] -6.206356
##
## $counts
## function gradient
## 11 10
##
## $convergence
## [1] 0
##
## $message
## NULL
Method evolution.model returns a list with information
about the optimization process. Alternatively, the response trait
(argument y) and any other auxiliary trait (argument
x; here the female weight and territoriality) may be
prepared using function model.data and then used as
follows:
perissodactyla.PEM_aux <- PEM(gr1, a = 0)
## Data preparation:
model.data(
formula = log.neonatal.wt~log.female.wt+Territoriality,
data = perissodactyla.data
) -> mdat
evolution.model(
object = perissodactyla.PEM_aux,
y = mdat$y,
x = mdat$x
) -> opt_aux
opt_aux## $par
## $par$a
## a
## -7.854558
##
## $par$psi
## NULL
##
##
## $value
## [1] -26.78732
##
## $counts
## function gradient
## 17 16
##
## $convergence
## [1] 0
##
## $message
## NULL
Function model.data produces a list with the response
traits (element $y) and auxiliary traits (element
$x) specified by a formula, with the data provided using
data frame, not unlike what functions like lm or
glm perform internally prior to model estimation. It is
noteworthy that any call to evolution.model automatically
updates the PEM contained in the PEM object provided to the method and
makes the generalized least squares estimate of the response trait
variance (or residual variance with respect to the auxiliary trait)
available from within the PEM object. Response trait variance estimates
can later be accessed as follows:
## y
## 0.002198552
It is noteworthy that estimates of the steepness parameter (stored as
element $optim\$par of the PEM objects)
and, consequently, the resulting phylogenetic eigenvectors, will be
different depending on the use of auxiliary traits. In the example
above, for instance, \(\eta_a\) was
estimated to -3.09 in the first call, when no auxiliary trait is
involved and to -7.85 in the second call, when the female weight is used
as an auxiliary trait.
The location of any target species is some distance from its latest
common ancestor (LCA) with respect to the species of the graph, which
itself is found either at a vertex, or on an edge at a specific distance
from its origin. Class PEM has a predict method that
returns prediction scores for any graph location, be it at a vertex or
at any point along an edge. This method internally calls one of two
embedded functions for any target. These embedded functions are called
$predictVertex() and $predictEdge() and return
the PEM prediction scores for any vertex or edge,
respectively. PEM method predict is called as follows:
## Model without an auxiliary trait.
predict(
object = perissodactyla.PEM,
newdata = data.frame(
row.names = sprintf("target_%d",1:6),
ref = c(1,3,5,2,3,4),
dist = c(NA,0.1,NA,NA,0.4,1.1),
lca = c(0,2,2.3,0,2,1.1)
)
)## U_1 U_2 U_3 U_4 U_5
## target_1 -0.2334072 -0.2644587 0.011414391 -0.30411875 5.551115e-16
## target_2 -0.2113182 -0.1830799 0.004433838 -0.01371500 2.081668e-17
## target_3 -0.2291860 -0.2414216 0.007327834 0.55782540 1.249001e-16
## target_4 -0.2372192 -0.2797923 0.012915017 -0.37705407 2.157996e-15
## target_5 -0.2158627 -0.1999588 0.005901288 -0.07587468 4.163336e-17
## target_6 -0.2317470 -0.2584575 0.010916147 -0.28432807 4.371503e-16
## U_6 U_7 U_8 U_9 U_10
## target_1 0.0005361094 0.003396832 -0.84025547 -6.054886e-16 -7.672726e-17
## target_2 -0.0004238857 -0.008886551 -0.07313661 -1.336439e-16 -1.738718e-16
## target_3 0.0014502567 0.013188572 0.02945158 1.294640e-01 6.196581e-01
## target_4 0.0008842668 0.010612812 0.38219140 6.782104e-01 -4.242322e-02
## target_5 -0.0002070819 -0.005820489 -0.08768594 -1.267050e-16 -1.599940e-16
## target_6 0.0004802361 0.002928223 -0.66174791 -4.805885e-16 -9.060505e-17
## U_11 U_12
## target_1 -3.101458e-17 -5.315848e-17
## target_2 -1.212202e-16 -2.960198e-16
## target_3 -1.160240e-01 -2.929185e-01
## target_4 8.155275e-02 1.777077e-01
## target_5 -7.958684e-17 -2.682642e-16
## target_6 -1.713679e-17 -8.785295e-17
## attr(,"vf")
## target_1 target_2 target_3 target_4 target_5 target_6
## 0.00000000 0.02149769 0.02457284 0.00000000 0.02149769 0.01213451
## Model with two auxiliary traits.
predict(
object = perissodactyla.PEM_aux,
newdata = data.frame(
row.names = sprintf("target_%d",1:6),
ref = c(1,3,5,2,3,4),
dist = c(NA,0.1,NA,NA,0.4,1.1),
lca = c(0,2,2.3,0,2,1.1)
)
)## U_1 U_2 U_3 U_4 U_5
## target_1 -0.2325394 -0.2650544 0.010594892 2.706169e-16 -0.30985302
## target_2 -0.2114436 -0.1847421 0.004332260 3.469447e-17 -0.01374785
## target_3 -0.2288632 -0.2444297 0.007426229 -6.106227e-16 0.55679988
## target_4 -0.2358233 -0.2786599 0.011792224 2.532696e-15 -0.37635283
## target_5 -0.2157070 -0.2010899 0.005621500 9.020562e-17 -0.07575331
## target_6 -0.2309132 -0.2589683 0.010133280 2.567391e-16 -0.28896758
## U_6 U_7 U_8 U_9 U_10
## target_1 0.0005747580 -0.003816806 -0.83821982 -1.186209e-15 1.432476e-18
## target_2 -0.0004027899 0.008331341 -0.06772762 1.252424e-16 1.471492e-16
## target_3 0.0014242852 -0.012122976 0.02687793 1.714914e-02 -5.829325e-01
## target_4 0.0008801433 -0.009850387 0.38462275 7.034840e-01 5.119741e-02
## target_5 -0.0001889849 0.005436984 -0.08105394 3.503681e-17 2.026604e-16
## target_6 0.0005139699 -0.003274218 -0.65115514 -9.294694e-16 8.469920e-17
## U_11 U_12
## target_1 5.112876e-17 -2.798141e-17
## target_2 -4.382395e-18 3.675355e-16
## target_3 -3.923367e-01 7.724943e-02
## target_4 -4.799870e-02 -1.360734e-02
## target_5 1.066399e-16 2.426355e-16
## target_6 1.343955e-16 -3.492030e-17
## attr(,"vf")
## target_1 target_2 target_3 target_4 target_5 target_6
## 0.000000000 0.004395922 0.005055036 0.000000000 0.004395922 0.002418318
Argument newdata is a three-column data frame with the
following contents:
The method returns a matrix of PEM scores to be used
as predictors with an attribute called "vf" giving the
prediction variance increment associated with the distance between the
LCA and the target.
To model trait values, PEM are used as descriptors
in other modelling method. Any suitable method can be used. The
eigenvectors from the PEM are obtained from method
as.matrix or as.data.frame as follows:
## U_1 U_2 U_3 U_4
## Dicerorhinus sumatrensis -0.2334072 -0.264458733 0.011414391 -3.041187e-01
## Rhinoceros sondaicus -0.2372192 -0.279792251 0.012915017 -3.770541e-01
## Rhinoceros unicornis -0.2372192 -0.279792251 0.012915017 -3.770541e-01
## Diceros bicornis -0.2291860 -0.241421639 0.007327834 5.578254e-01
## Ceratotherium simum -0.2291860 -0.241421639 0.007327834 5.578254e-01
## Tapirus terrestris -0.1957794 0.484237873 0.024570194 -1.833539e-02
## Tapirus pinchaque -0.1957794 0.484237873 0.024570194 -1.833539e-02
## Tapirus indicus -0.1896915 0.431072788 0.018511825 -1.020231e-02
## Equus grevyi 0.3794008 -0.027351475 0.389935401 7.601483e-05
## Equus burchelli 0.3794008 -0.027351475 0.389935401 7.601483e-05
## Equus zebra 0.3619129 -0.022734984 0.231934991 1.449725e-05
## Equus africanus 0.3133767 -0.007612044 -0.565679048 -5.358669e-03
## Equus hemionus 0.3133767 -0.007612044 -0.565679048 -5.358669e-03
## U_5 U_6 U_7
## Dicerorhinus sumatrensis 1.755698e-15 0.0005361094 0.0033968320
## Rhinoceros sondaicus 1.631083e-15 0.0008842668 0.0106128116
## Rhinoceros unicornis -3.043460e-15 0.0008842668 0.0106128116
## Diceros bicornis -3.338364e-16 0.0014502567 0.0131885715
## Ceratotherium simum -2.396862e-18 0.0014502567 0.0131885715
## Tapirus terrestris -6.231239e-16 0.0025385830 0.3864071508
## Tapirus pinchaque 6.094650e-16 0.0025385830 0.3864071508
## Tapirus indicus -2.560450e-17 -0.0003589066 -0.8371022320
## Equus grevyi -1.763404e-17 0.3554350121 0.0003057891
## Equus burchelli -3.160065e-16 0.3554350121 0.0003057891
## Equus zebra -3.174918e-17 -0.8589360503 0.0036789739
## Equus africanus -7.071068e-01 0.0690713051 0.0044988900
## Equus hemionus 7.071068e-01 0.0690713051 0.0044988900
## U_8 U_9 U_10
## Dicerorhinus sumatrensis -0.8402554692 -2.880504e-16 -2.747266e-16
## Rhinoceros sondaicus 0.3821913996 6.782104e-01 -4.242322e-02
## Rhinoceros unicornis 0.3821913996 -6.782104e-01 4.242322e-02
## Diceros bicornis 0.0294515823 -1.294640e-01 -6.196581e-01
## Ceratotherium simum 0.0294515823 1.294640e-01 6.196581e-01
## Tapirus terrestris 0.0020576693 -1.508116e-01 3.366809e-01
## Tapirus pinchaque 0.0020576693 1.508116e-01 -3.366809e-01
## Tapirus indicus 0.0091255490 4.793189e-17 -1.244279e-16
## Equus grevyi 0.0002628681 2.292487e-02 -2.949654e-02
## Equus burchelli 0.0002628681 -2.292487e-02 2.949654e-02
## Equus zebra 0.0007824705 -1.766083e-16 -3.452116e-16
## Equus africanus 0.0012102056 2.535610e-15 -4.108709e-16
## Equus hemionus 0.0012102056 -2.404882e-15 -1.333152e-16
## U_11 U_12
## Dicerorhinus sumatrensis 8.489236e-17 5.409366e-17
## Rhinoceros sondaicus 8.155275e-02 1.777077e-01
## Rhinoceros unicornis -8.155275e-02 -1.777077e-01
## Diceros bicornis 1.160240e-01 2.929185e-01
## Ceratotherium simum -1.160240e-01 -2.929185e-01
## Tapirus terrestris 1.648704e-01 5.802755e-01
## Tapirus pinchaque -1.648704e-01 -5.802755e-01
## Tapirus indicus 4.363163e-17 -1.791379e-16
## Equus grevyi -6.728338e-01 2.142407e-01
## Equus burchelli 6.728338e-01 -2.142407e-01
## Equus zebra 2.927527e-17 -1.229610e-17
## Equus africanus 1.805253e-16 -1.791994e-17
## Equus hemionus 1.140290e-19 3.759121e-17
## U_1 U_2 U_3 U_4
## Dicerorhinus sumatrensis -0.2334072 -0.264458733 0.011414391 -3.041187e-01
## Rhinoceros sondaicus -0.2372192 -0.279792251 0.012915017 -3.770541e-01
## Rhinoceros unicornis -0.2372192 -0.279792251 0.012915017 -3.770541e-01
## Diceros bicornis -0.2291860 -0.241421639 0.007327834 5.578254e-01
## Ceratotherium simum -0.2291860 -0.241421639 0.007327834 5.578254e-01
## Tapirus terrestris -0.1957794 0.484237873 0.024570194 -1.833539e-02
## Tapirus pinchaque -0.1957794 0.484237873 0.024570194 -1.833539e-02
## Tapirus indicus -0.1896915 0.431072788 0.018511825 -1.020231e-02
## Equus grevyi 0.3794008 -0.027351475 0.389935401 7.601483e-05
## Equus burchelli 0.3794008 -0.027351475 0.389935401 7.601483e-05
## Equus zebra 0.3619129 -0.022734984 0.231934991 1.449725e-05
## Equus africanus 0.3133767 -0.007612044 -0.565679048 -5.358669e-03
## Equus hemionus 0.3133767 -0.007612044 -0.565679048 -5.358669e-03
## U_5 U_6 U_7
## Dicerorhinus sumatrensis 1.755698e-15 0.0005361094 0.0033968320
## Rhinoceros sondaicus 1.631083e-15 0.0008842668 0.0106128116
## Rhinoceros unicornis -3.043460e-15 0.0008842668 0.0106128116
## Diceros bicornis -3.338364e-16 0.0014502567 0.0131885715
## Ceratotherium simum -2.396862e-18 0.0014502567 0.0131885715
## Tapirus terrestris -6.231239e-16 0.0025385830 0.3864071508
## Tapirus pinchaque 6.094650e-16 0.0025385830 0.3864071508
## Tapirus indicus -2.560450e-17 -0.0003589066 -0.8371022320
## Equus grevyi -1.763404e-17 0.3554350121 0.0003057891
## Equus burchelli -3.160065e-16 0.3554350121 0.0003057891
## Equus zebra -3.174918e-17 -0.8589360503 0.0036789739
## Equus africanus -7.071068e-01 0.0690713051 0.0044988900
## Equus hemionus 7.071068e-01 0.0690713051 0.0044988900
## U_8 U_9 U_10
## Dicerorhinus sumatrensis -0.8402554692 -2.880504e-16 -2.747266e-16
## Rhinoceros sondaicus 0.3821913996 6.782104e-01 -4.242322e-02
## Rhinoceros unicornis 0.3821913996 -6.782104e-01 4.242322e-02
## Diceros bicornis 0.0294515823 -1.294640e-01 -6.196581e-01
## Ceratotherium simum 0.0294515823 1.294640e-01 6.196581e-01
## Tapirus terrestris 0.0020576693 -1.508116e-01 3.366809e-01
## Tapirus pinchaque 0.0020576693 1.508116e-01 -3.366809e-01
## Tapirus indicus 0.0091255490 4.793189e-17 -1.244279e-16
## Equus grevyi 0.0002628681 2.292487e-02 -2.949654e-02
## Equus burchelli 0.0002628681 -2.292487e-02 2.949654e-02
## Equus zebra 0.0007824705 -1.766083e-16 -3.452116e-16
## Equus africanus 0.0012102056 2.535610e-15 -4.108709e-16
## Equus hemionus 0.0012102056 -2.404882e-15 -1.333152e-16
## U_11 U_12
## Dicerorhinus sumatrensis 8.489236e-17 5.409366e-17
## Rhinoceros sondaicus 8.155275e-02 1.777077e-01
## Rhinoceros unicornis -8.155275e-02 -1.777077e-01
## Diceros bicornis 1.160240e-01 2.929185e-01
## Ceratotherium simum -1.160240e-01 -2.929185e-01
## Tapirus terrestris 1.648704e-01 5.802755e-01
## Tapirus pinchaque -1.648704e-01 -5.802755e-01
## Tapirus indicus 4.363163e-17 -1.791379e-16
## Equus grevyi -6.728338e-01 2.142407e-01
## Equus burchelli 6.728338e-01 -2.142407e-01
## Equus zebra 2.927527e-17 -1.229610e-17
## Equus africanus 1.805253e-16 -1.791994e-17
## Equus hemionus 1.140290e-19 3.759121e-17
Package MPSEM includes a framework for ordinary
least squares (OLS) regression based on R’s
lm function and methods. This framwork is implemented by
function pemlm, which called as follows:
## A first pemlm model without an auxiliary trait:
pemlm(
log.neonatal.wt~1,
perissodactyla.data,
perissodactyla.PEM
) -> lm1
## Summary of the first model:
summary(lm1)##
## Call:
## pemlm(formula = log.neonatal.wt ~ 1, data = perissodactyla.data,
## pem = perissodactyla.PEM)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.71185 -0.01285 0.06615 0.28815 0.49215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.4108 0.1083 40.74 3.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3903 on 12 degrees of freedom
## Analysis of Variance Table
##
## Response: log.neonatal.wt
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 12 1.8281 0.15235
## A second pemlm model with two auxiliary traits:
pemlm(
log.neonatal.wt~log.female.wt+Territoriality,
perissodactyla.data,
perissodactyla.PEM_aux
) -> lm2
## Summary of the second model:
summary(lm2)##
## Call:
## pemlm(formula = log.neonatal.wt ~ log.female.wt + Territoriality,
## data = perissodactyla.data, pem = perissodactyla.PEM_aux)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38986 -0.14943 0.04769 0.21016 0.28578
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.2450 1.1849 -0.207 0.84034
## log.female.wt 0.8238 0.2021 4.076 0.00223 **
## TerritorialityYes -0.0956 0.1463 -0.653 0.52827
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2482 on 10 degrees of freedom
## Multiple R-squared: 0.6631, Adjusted R-squared: 0.5957
## F-statistic: 9.842 on 2 and 10 DF, p-value: 0.004339
## Analysis of Variance Table
##
## Response: log.neonatal.wt
## Df Sum Sq Mean Sq F value Pr(>F)
## log.female.wt 1 1.18599 1.18599 19.2569 0.00136 **
## Territoriality 1 0.02629 0.02629 0.4269 0.52827
## Residuals 10 0.61587 0.06159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Models produced by pemlm do not immediately include
phylogenetic eigenvectors. The process for adding phylogenetic
eigenvectors to the model is mediated using a set of embedded functions,
a complete description of which would be beyond the scope of the present
document. Embedded function $forward performs stepwise
variable addition of the PEM eigenfunctions on the
basis of the corrected Akaike Information Criterion (AICc) (Hurvich and Tsai 1993) as follows:
## [1] 2 8
## A PEM-based linear model
## -------------------
##
## Auxiliary linear model:
##
## Call:
## pemlm(formula = log.neonatal.wt ~ 1, data = perissodactyla.data,
## pem = perissodactyla.PEM)
##
## Coefficients:
## (Intercept)
## 4.411
##
## Includes 2 eigenfunction(s)
## PEM-based linear model:
##
## Call:
## pemlm(formula = log.neonatal.wt ~ 1, data = perissodactyla.data,
## pem = perissodactyla.PEM)
##
## Coefficients:
## (Intercept) U_2 U_8
## 4.4108 -1.3041 0.1799
##
## Call:
## pemlm(formula = log.neonatal.wt ~ 1, data = perissodactyla.data,
## pem = perissodactyla.PEM)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.14549 -0.04264 -0.02299 0.03044 0.17201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.41085 0.02704 163.142 < 2e-16 ***
## U_2 -1.30413 0.09748 -13.378 1.04e-07 ***
## U_8 0.17989 0.09748 1.845 0.0948 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09748 on 10 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9376
## F-statistic: 91.19 on 2 and 10 DF, p-value: 3.795e-07
## Analysis of Variance Table
##
## Response: log.neonatal.wt
## Df Sum Sq Mean Sq F value Pr(>F)
## U 2 1.73312 0.86656 91.189 3.795e-07 ***
## Residuals 10 0.09503 0.00950
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 2 8 1 12
## A PEM-based linear model
## -------------------
##
## Auxiliary linear model:
##
## Call:
## pemlm(formula = log.neonatal.wt ~ log.female.wt + Territoriality,
## data = perissodactyla.data, pem = perissodactyla.PEM_aux)
##
## Coefficients:
## (Intercept) log.female.wt TerritorialityYes
## -0.2450 0.8238 -0.0956
##
## Includes 4 eigenfunction(s)
## PEM-based linear model:
##
## Call:
## pemlm(formula = log.neonatal.wt ~ log.female.wt + Territoriality,
## data = perissodactyla.data, pem = perissodactyla.PEM_aux)
##
## Coefficients:
## (Intercept) log.female.wt TerritorialityYes U_2
## -1.30244 0.99142 0.06366 -0.35480
## U_8 U_1 U_12
## -0.07332 0.74822 0.11152
##
## Call:
## pemlm(formula = log.neonatal.wt ~ log.female.wt + Territoriality,
## data = perissodactyla.data, pem = perissodactyla.PEM_aux)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.033391 -0.010608 0.005651 0.008548 0.047264
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.30244 0.71649 -1.818 0.118981
## log.female.wt 0.99142 0.12594 7.872 0.000222 ***
## TerritorialityYes 0.06366 0.02185 2.913 0.026864 *
## U_2 -0.35480 0.13771 -2.577 0.041966 *
## U_8 -0.07332 0.04038 -1.816 0.119317
## U_1 0.74822 0.09085 8.236 0.000173 ***
## U_12 0.11152 0.03191 3.495 0.012908 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02985 on 6 degrees of freedom
## Multiple R-squared: 0.9971, Adjusted R-squared: 0.9942
## F-statistic: 341 on 6 and 6 DF, p-value: 2.49e-07
## Analysis of Variance Table
##
## Response: log.neonatal.wt
## Df Sum Sq Mean Sq F value Pr(>F)
## log.female.wt 1 1.18599 1.18599 1331.015 2.829e-08 ***
## Territoriality 1 0.02629 0.02629 29.504 0.001615 **
## U 4 0.61053 0.15263 171.297 2.600e-06 ***
## Residuals 6 0.00535 0.00089
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The perisoadctyla data set has one missing gestation weight value for
the mountain tapir (Tapirus pinchaque). Here, we will exemplify
how to make predictions by estimating its most likely value. First, we
need a graph (here a tree) for all the species with the exception of
Tapirus pinchaque, and the coordinates of that species in that
graph. This is accomplished by method locate as
follows:
sp <- "Tapirus pinchaque" ## A selected species.
train <- locate(gr1, sp) ## The locate method.
train## $x
##
## A graph with 22 vertices and 21 edges
## ----------------------------------
## Vertex labels: Dicerorhinus sumatrensis, Rhinoceros sondaicus, Rhinoceros unicornis, Diceros bicornis, Ceratotherium simum, ... + 12 more ..., N9, N10, N11
## Edge labels: E1, E2, E3, E4, E5, ... + 11 more ..., E21, E22, E23
## Vertex information: species
## Available edge information: distance
##
##
## $location
## ref dist ladist
## Tapirus pinchaque 12 2 1
Here, we can se that Tapirus pinchaque is located at
distance 1 from and LCA located at distance 2 along an edge with index
12. The locate method returns a two-element list whose
first member ($x) is the graph where the target species
have been removed (ie. the residual graph) and whose second member
($location) a three-column date frame of the graph location
(columns ref, dist, and lca described
previously) for each of the targets. We then need to prepare the model
data for all the traits available as follows:
## Prepare the model data:
model.data(
formula = log.gestation.length~log.female.wt+log.neonatal.wt+Territoriality,
data = perissodactyla.data
) -> mdatFor making predictions, we need to build a PEM using
residual graph obtain from method locate as follows:
## Initial PEM (single a = 0.5, single psi = 1) built from the residual graph:
pem.train <- PEM(train$x, a = 0)
## Estimate the evolution model:
evolution.model(
object = pem.train,
y = mdat$y[names(mdat$y) != sp],
x = mdat$x[rownames(mdat$x) != sp,]
) -> opt
## Build a linear model:
pemlm(
formula = log.gestation.length~log.female.wt+log.neonatal.wt+Territoriality,
data = perissodactyla.data %>% .[rownames(.) != sp,],
pem = pem.train
) -> lm3
## Summary and anova for the auxiliary trait model:
summary(lm3)##
## Call:
## pemlm(formula = log.gestation.length ~ log.female.wt + log.neonatal.wt +
## Territoriality, data = perissodactyla.data %>% .[rownames(.) !=
## sp, ], pem = pem.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0247101 -0.0055813 -0.0001364 0.0086916 0.0207697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.865932 0.070124 26.609 4.28e-09 ***
## log.female.wt 0.182814 0.019924 9.175 1.61e-05 ***
## log.neonatal.wt -0.070346 0.021641 -3.251 0.0117 *
## TerritorialityYes 0.015655 0.008579 1.825 0.1055
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01422 on 8 degrees of freedom
## Multiple R-squared: 0.9425, Adjusted R-squared: 0.9209
## F-statistic: 43.7 on 3 and 8 DF, p-value: 2.629e-05
## Analysis of Variance Table
##
## Response: log.gestation.length
## Df Sum Sq Mean Sq F value Pr(>F)
## log.female.wt 1 0.0233187 0.0233187 115.3626 4.967e-06 ***
## log.neonatal.wt 1 0.0025089 0.0025089 12.4119 0.007811 **
## Territoriality 1 0.0006730 0.0006730 3.3296 0.105489
## Residuals 8 0.0016171 0.0002021
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 5
##
## Call:
## pemlm(formula = log.gestation.length ~ log.female.wt + log.neonatal.wt +
## Territoriality, data = perissodactyla.data %>% .[rownames(.) !=
## sp, ], pem = pem.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.017202 -0.004369 0.001068 0.005830 0.010879
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.902382 0.051566 36.892 2.79e-09 ***
## log.female.wt 0.178403 0.014303 12.473 4.90e-06 ***
## log.neonatal.wt -0.072024 0.015460 -4.659 0.00232 **
## TerritorialityYes 0.009518 0.006469 1.471 0.18466
## U_5 -0.031971 0.010841 -2.949 0.02144 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01015 on 7 degrees of freedom
## Multiple R-squared: 0.9744, Adjusted R-squared: 0.9597
## F-statistic: 66.48 on 4 and 7 DF, p-value: 1.192e-05
## Analysis of Variance Table
##
## Response: log.gestation.length
## Df Sum Sq Mean Sq F value Pr(>F)
## log.female.wt 1 0.0233187 0.0233187 226.3459 1.377e-06 ***
## log.neonatal.wt 1 0.0025089 0.0025089 24.3527 0.001685 **
## Territoriality 1 0.0006730 0.0006730 6.5327 0.037776 *
## U 1 0.0008959 0.0008959 8.6963 0.021439 *
## Residuals 7 0.0007212 0.0001030
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Finally, we use the predict() method (for
pemlm-class objects) to obtain the predicted value as
follows:
## Make the prediction:
predict(
object = lm3,
newdata = perissodactyla.data[sp,],
newloc = train$location
) -> prd
prd## Tapirus pinchaque
## 2.605867
Here is and example of how to perform a leave-one-out
cross-validation of a data set using the R code
featured in the previous sections. Predictions will be added to table to
a table called perissodactyla.pred. The cross-validation is
carried out as follows:
## Table storing the results:
data.frame(
observed = perissodactyla.data$log.neonatal.wt,
auxiliary = NA,
predictions = NA,
lower = NA,
upper = NA,
row.names = rownames(perissodactyla.data)
) -> perissodactyla.pred
## Obtaining the updated model data:
model.data(
formula = log.neonatal.wt~log.female.wt+log.gestation.length+Territoriality,
data = perissodactyla.data
) -> mdat
## For each species i:
for(i in 1L:nrow(perissodactyla.pred)) {
## Calculate the residual graph and location:
train <- locate(gr1, rownames(perissodactyla.pred)[i])
## Calculate a PEM:
pem.train <- PEM(train$x, a = 0)
## Estimate the evolution model:
evolution.model(
object = pem.train,
y = mdat$y[-i],
x = mdat$x[-i,]
) -> opt
## Build an empty (auxiliary trait only) pemlm model:
pemlm(
formula = log.neonatal.wt~log.female.wt+log.gestation.length+Territoriality,
data = perissodactyla.data[-i,],
pem = pem.train
) -> lm_cv
## Make prediction using the empty model:
predict(
lm_cv$auxModel,
perissodactyla.data[i,]
) -> perissodactyla.pred[i,2L]
## Add the PEM eigenfunction(s) on the basis of the AICc:
lm_cv$forward()
## Make the prediction using the PEM-based model, including the limits of the
## 95% prediction interval:
predict(
object = lm_cv,
newdata = perissodactyla.data[i,],
newloc = train$location,
interval = "prediction"
) -> perissodactyla.pred[i,3L:5L]
}
## Prediction coefficient using the auxiliary traits alone:
Psquare(perissodactyla.pred$observed, perissodactyla.pred$auxiliary)## [1] 0.7574069
## Prediction coefficient using the PEM eigenfunctions:
Psquare(perissodactyla.pred$observed, perissodactyla.pred$predictions)## [1] 0.9440948
Fig. 5. Leave-one-out crossvalidated prediction of the neonatal weight
for 13 odd-toed ungulate species.
From the present cross-validation, we found that the (\(\log_{10}\)) neonatal body mass can be predicted with a cross-validated \(R^{2}\) of 0.94 (Fig. 5).
The influence matrix is used internally to calculate PEM. It is a matrix having as many rows as the number of vertices (species + nodes) and as many columns as the number of edges. Any given element of the influence matrix is coding whether a vertex, which is represented a row of the matrix is influenced an edge, which is represented by a column of the matrix. In the context of PEM, a vertex is influenced by an edge when the former has ancestors on the latter or, in other words, when an edge is on the path leading from a tip to the root of the tree. The influence matrix is obtained as follows:
##
## An Influence matrix involving 24 vertices and 23 edges
## --------------------------------------------------------------
##
## E1 E2 E3 E4 E5 E6 E7 +14 ... E21 E22 E23
## Dicerorhinus sumatrensis 1 1 1 1 0 0 0 +14 ... 0 0 0
## Rhinoceros sondaicus 1 1 1 0 1 1 0 +14 ... 0 0 0
## Rhinoceros unicornis 1 1 1 0 1 0 1 +14 ... 0 0 0
## Diceros bicornis 1 1 0 0 0 0 0 +14 ... 0 0 0
## Ceratotherium simum 1 1 0 0 0 0 0 +14 ... 0 0 0
## Tapirus terrestris 1 0 0 0 0 0 0 +14 ... 0 0 0
## Tapirus pinchaque 1 0 0 0 0 0 0 +14 ... 0 0 0
## +14 ...
## N9 0 0 0 0 0 0 0 +14 ... 0 0 0
## N10 0 0 0 0 0 0 0 +14 ... 0 0 0
## N11 0 0 0 0 0 0 0 +14 ... 0 0 0
##
## Available edge information: distance
The calculation of the influence matrix performed by
PEM() for a given phylogenetic graph need not be done every
time new weighting function parameters are to be tried. For that reason,
MPSEM provides an update method for the PEM class that
takes a previously calculated PEM object, applies new
edge weighting, and recalculates the phylogenetic eigenvectors:
## Update the PEM object, setting a to 0.9:
update(perissodactyla.PEM, a = log(0.9/(1 - 0.9)), psi=NULL)
## Access the parameter values
perissodactyla.PEM$par()## $a
## [1] 2.197225
##
## $psi
## NULL
## The parameter values are also available from the edge properties:
edge(perissodactyla.PEM$graph())## from to distance a psi
## E1 14 15 9 0.9 1
## E2 15 16 4 0.9 1
## E3 16 17 2 0.9 1
## E4 17 1 2 0.9 1
## E5 17 18 1 0.9 1
## E6 18 2 1 0.9 1
## E7 18 3 1 0.9 1
## E8 16 19 3 0.9 1
## E9 19 4 1 0.9 1
## E10 19 5 1 0.9 1
## E11 15 20 5 0.9 1
## E12 20 21 2 0.9 1
## E13 21 6 1 0.9 1
## E14 21 7 1 0.9 1
## E15 20 8 3 0.9 1
## E16 14 22 9 0.9 1
## E17 22 23 4 0.9 1
## E18 23 24 3 0.9 1
## E19 24 9 1 0.9 1
## E20 24 10 1 0.9 1
## E21 23 11 4 0.9 1
## E22 22 12 8 0.9 1
## E23 22 13 8 0.9 1
The response trait variance estimates will not readily be updated by the update method
## y
## 0.0110771
## [1] 0.02318053
## [1] 0.02318053