This tutorial will explain how to create phylogenetic trees using parsimony, maximum likelihood and bayesian approaches

0. Prerequisits

  1. RAxML needs to be installed in your PATH – you can download it from: here
  2. Libraries: ape, ips and phangorn installed
  3. ClustalW needs to be installed in your PATH – download it from: here
library("ape")
library("phangorn")
library("ips")
library("ggplot2")
library("ggtree")

1. Loading data and aligning the sequences

Data in fasta format
Check the alignment – are there any gaps?

data<-read.dna("/Users/martafarrebelmonte/Documents/WORK/Langurs/christian_OnlyFrancois.fa", format="fasta")
dataAli<-clustal(data)
checkAlignment(dataAli)
## 
## Number of sequences: 31 
## Number of sites: 387 
## 
## Some gap lengths are not multiple of 3: 1
## 
## Frequencies of gap lengths:
##  1 
## 93 
##    => no gap on the left border of the alignment
##    => no gap on the right border of the alignment
## 
## Number of unique contiguous base segments defined by gaps: 4 
## Number of segment lengths not multiple of 3: 2 
##     => on the left border of the alignement: 0 
##     => on the right border                 : 1 
##     => positions of these segments inside the alignment: 17..194 
## 
## Number of segregating sites (including gaps): 52
## Number of sites with at least one substitution: 52
## Number of sites with 1, 2, 3 or 4 observed bases:
##   1   2   3   4 
## 335  51   1   0

datPhy <- phyDat(dataAli, type = "DNA", levels = NULL)
alview(dataAli,file="msa.fa", uppercase = TRUE, showpos = FALSE)

2. Distance methods

Here we will use distance methods to produce the tree – only as an example
You’ll need to select between F81 or JC69 models of base frequencies

dm <- dist.ml(datPhy, "F81") #F81 uses empirical base frequencies
treeUPGMA <-upgma(dm)
treeNJ <- NJ(dm)
layout(matrix(c(1,2), 2, 1), height=c(1,2))
par(mar = c(0,0,2,0)+ 0.1)
plot(treeUPGMA, main="UPGMA")
plot(treeNJ, "unrooted", main="NJ")