Computerintensive Methoden - Coalescent Theory - Project A

From StatWiki

Jump to: navigation, search

Contents


The following data were taken from the segregating sites in a sequence of nucleotides from the Y chromosome of 355 Europeans. Sixteen segregating sites were found and 11 different alleles. At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large sample of chimpanzees). The alleles observed and their frequencies are given below.



Alleles
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
C 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0
E 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0
F 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
G 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1
I 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0
J 0 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0
K 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
L 0 1 0 0 1 0 0 1 1 0 0 0 0 1 1 0
N 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
R 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0

Task 1

Calculate the matrix given the Hamming distance between each allele.

d=dist(alleles,method="manhattan")
d

#
  C E F G I J K L N Q

E 4 F 4 2 G 1 5 5 I 2 4 4 3 J 4 6 6 5 4 K 6 4 4 7 6 8 L 1 5 5 2 3 5 7 N 6 4 4 7 6 8 4 7 Q 3 1 1 4 3 5 3 4 3 R 2 2 2 3 2 4 4 3 4 1

Task 2

Calculate the nucleotide diversity.

theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1)))
theta.pi 

#
[1] 2.584801

Task 3

Carry out the Tajima test to verify the Wright-Fisher model.

S=dim(alleles)[2]
n=sum(freq) 

theta.l=S/sum(1/(1:(n-1)))

an=sum(1/1:(n-1))
bn=sum(1/((1:(n-1))^2))

e1 = (n+1)/(3*an * (n-1)) - 1/an^2
e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 )

var.theta=e1*S + e2*S*(S-1)

D=(theta.pi - theta.l)/sqrt(var.theta) 

#

	Tajima Test

data:  
D = 0.1013, p-value = 0.9193
sample estimates:
 Theta L Theta Pi 
2.481419 2.584801 

Task 4

Consider the following effects i) population growth, ii) directional selection, iii) divisions within
the population. Which of these factors might be the dominant factor in the evolution of the population
given the sign of the realisation of Tajima’s test statistic?

(i) Population growth might not be the dominant factor, because \hat{\theta}_L < \hat{\theta}_\pi\, and this suggests falling population. (Chapter 1; Page 32)

(ii)

(iii) Because \hat{\theta}_L < \hat{\theta}_\pi\, division would be possible, but the migrationrate must be very high, since we accept H_0\, (= we got a Wright-Fisher Model). (Chapter 3; Page 11)

Personal tools