Computerintensive Methoden - Coalescent Theory - Project A

Contents

The following data were taken from the segregating sites in a sequence of nucleotides from the Y chromosome of 355 Europeans. Sixteen segregating sites were found and 11 different alleles. At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large sample of chimpanzees). The alleles observed and their frequencies are given below.



Alleles
1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16
C   0   1   0   0   1   0   0   1   1   0   0   0   0   1   0   0
E   0   1   0   0   1   0   1   0   0   0   0   0   0   0   0   0
F   0   1   0   0   1   1   0   0   0   0   0   0   0   0   0   0
G   0   1   0   0   1   0   0   1   1   0   0   0   0   1   0   1
I   0   1   0   0   1   0   0   1   1   0   0   0   1   0   0   0
J   0   1   0   0   1   0   0   1   1   1   1   1   0   0   0   0
K   0   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0
L   0   1   0   0   1   0   0   1   1   0   0   0   0   1   1   0
N   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
Q   0   1   0   0   1   0   0   0   0   0   0   0   0   0   0   0
R   0   1   0   0   1   0   0   1   0   0   0   0   0   0   0   0


Calculate the matrix given the Hamming distance between each allele.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/efceb3767014f32d28cc5a9bde8b706b5d0f9bdb_%i.pdf'
> rpdfno<-0
> rhtml<-''
> rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/'
> source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R')
> rout<-'text'
> cat('<!--- Start of program --->\n')
<!--- Start of program --->
> d=dist(alleles,method="manhattan")
Calls: dist -> as.matrix
Execution halted
in
d=dist(alleles,method="manhattan")
d



Calculate the nucleotide diversity.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/22f0839c602b2270e542860dcf082ed51f6dfeb5_%i.pdf'
> rpdfno<-0
> rhtml<-''
> rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/'
> source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R')
> rout<-'text'
> cat('<!--- Start of program --->\n')
<!--- Start of program --->
> theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1)))
Calls: as.dist -> as.matrix
Execution halted
in
theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1)))
theta.pi



Carry out the Tajima test to verify the Wright-Fisher model.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/d50b60f660f168849c1e5668bc80085248190387_%i.pdf'
> rpdfno<-0
> rhtml<-''
> rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/'
> source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R')
> rout<-'text'
> cat('<!--- Start of program --->\n')
<!--- Start of program --->
> S=dim(alleles)[2]
Execution halted
in
S=dim(alleles)[2]
n=sum(freq)
theta.l=S/sum(1/(1:(n-1)))
an=sum(1/1:(n-1))
bn=sum(1/((1:(n-1))^2))
e1 = (n+1)/(3*an * (n-1)) - 1/an^2
e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 )
var.theta=e1*S + e2*S*(S-1)
D=(theta.pi - theta.l)/sqrt(var.theta)


REngine.php: <!--- Start of program --->
Error in names(D) = "D" : names() applied to a non-vector
Execution halted
in
names(D)="D"
est=c(theta.l, theta.pi)
names(est)=c("Theta L", "Theta Pi")
ret=list(statistic=D, method="Tajima Test", estimate=est, p.value=2*(1-pnorm(abs(D))))
class(ret)="htest"
ret



Consider the following effects i) population growth, ii) directional selection, iii) divisions within

(i) Population growth might not be the dominant factor, because $\hat{\theta}_L < \hat{\theta}_\pi\,$ and this suggests falling population. (Chapter 1; Page 32)
(iii) Because $\hat{\theta}_L < \hat{\theta}_\pi\,$ division would be possible, but the migrationrate must be very high, since we accept $H_0\,$ (= we got a Wright-Fisher Model). (Chapter 3; Page 11)