Computerintensive Methoden - Coalescent Theory - Project A
From StatWiki
Contents |
The following data were taken from the segregating sites in a sequence of nucleotides from the Y
chromosome of 355 Europeans. Sixteen segregating sites were found and 11 different alleles.
At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large
sample of chimpanzees). The alleles observed and their frequencies are given below.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| E | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| F | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| G | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| I | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| J | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| K | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| L | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| N | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Q | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| R | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Task 1
Calculate the matrix given the Hamming distance between each allele.
d=dist(alleles,method="manhattan") d #
C E F G I J K L N QE 4 F 4 2 G 1 5 5 I 2 4 4 3 J 4 6 6 5 4 K 6 4 4 7 6 8 L 1 5 5 2 3 5 7 N 6 4 4 7 6 8 4 7 Q 3 1 1 4 3 5 3 4 3 R 2 2 2 3 2 4 4 3 4 1
Task 2
Calculate the nucleotide diversity.
theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1))) theta.pi #
[1] 2.584801
Task 3
Carry out the Tajima test to verify the Wright-Fisher model.
S=dim(alleles)[2] n=sum(freq) theta.l=S/sum(1/(1:(n-1))) an=sum(1/1:(n-1)) bn=sum(1/((1:(n-1))^2)) e1 = (n+1)/(3*an * (n-1)) - 1/an^2 e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 ) var.theta=e1*S + e2*S*(S-1) D=(theta.pi - theta.l)/sqrt(var.theta) #
Tajima Test data: D = 0.1013, p-value = 0.9193 sample estimates: Theta L Theta Pi 2.481419 2.584801
Task 4
Consider the following effects i) population growth, ii) directional selection, iii) divisions within the population. Which of these factors might be the dominant factor in the evolution of the population given the sign of the realisation of Tajima’s test statistic?
(i) Population growth might not be the dominant factor, because
and this suggests falling population. (Chapter 1; Page 32)
(ii)
(iii) Because
division would be possible, but the migrationrate must be very high, since we accept
(= we got a Wright-Fisher Model). (Chapter 3; Page 11)
