Computerintensive Methoden - Coalescent Theory - Project B

From StatWiki

Jump to: navigation, search

Contents


The following data were taken from the segregating sites in a sequence of nucleotides from the Y chromosome of 131 Northern Africans. Thirteen segregating sites were found and 7 different alleles. At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large sample of chimpanzees). The alleles observed and their frequencies are given below.



Alleles
1 2 3 4 5 6 7 8 9 10 11 12 13
A 0 0 0 0 0 0 0 0 0 0 0 0 0
E 1 0 0 1 0 1 0 0 0 0 0 0 0
F 1 0 0 1 1 0 0 0 0 0 0 0 0
G 1 0 0 1 0 0 1 1 0 0 0 1 1
J 1 0 0 1 0 0 1 1 1 1 1 0 0
K 1 1 1 0 0 0 0 0 0 0 0 0 0
R 1 0 0 1 0 0 1 0 0 0 0 0 0

Task 1

Calculate the matrix given the Hamming distance between each allele.

d=dist(alleles,method="manhattan")
d
 
#
  A E F G J K

E 3 F 3 2 G 6 5 5 J 7 6 6 5 K 3 4 4 7 8 R 3 2 2 3 4 4

Task 2

Calculate the nucleotide diversity.

theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1)))
theta.pi 
 
#
[1] 3.116618

Task 3

Carry out the Tajima test to verify the Wright-Fisher model.

S=dim(alleles)[2]
n=sum(freq)

theta.l=S/sum(1/(1:(n-1)))

an=sum(1/1:(n-1))
bn=sum(1/((1:(n-1))^2))

e1 = (n+1)/(3*an * (n-1)) - 1/an^2
e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 )

var.theta=e1*S + e2*S*(S-1)

D=(theta.pi - theta.l)/sqrt(var.theta) 
 
#

	Tajima Test

data:  
D = 0.7971, p-value = 0.4254
sample estimates:
 Theta L Theta Pi 
2.385938 3.116618 

Personal tools