Computerintensive Methoden - Coalescent Theory - Project C
From StatWiki
Contents |
The following data were taken from the segregating sites in a sequence of nucleotides from the Y
chromosome of 133 Asians. Fourteen segregating sites were found and 9 different alleles.
At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large
sample of chimpanzees). The alleles observed and their frequencies are given below.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| F | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| G | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
| H | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| J | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
| N | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| P | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Q | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Task 1
Calculate the matrix given the Hamming distance between each allele.
d=dist(alleles,method="manhattan") d #
C F G H J N P QF 4 G 1 5 H 3 3 4 J 4 6 5 5 N 6 4 7 5 8 P 7 5 8 6 9 5 Q 3 1 4 2 5 3 4 R 2 2 3 1 4 4 5 1
Task 2
Calculate the nucleotide diversity.
theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1))) theta.pi #
[1] 2.910686
Task 3
Carry out the Tajima test to verify the Wright-Fisher model.
S=dim(alleles)[2] n=sum(freq) theta.l=S/sum(1/(1:(n-1))) an=sum(1/1:(n-1)) bn=sum(1/((1:(n-1))^2)) e1 = (n+1)/(3*an * (n-1)) - 1/an^2 e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 ) var.theta=e1*S + e2*S*(S-1) D=(theta.pi - theta.l)/sqrt(var.theta) #
Tajima Test data: D = 0.1607, p-value = 0.8723 sample estimates: Theta L Theta Pi 2.745342 2.910686
