Computerintensive Methoden - Coalescent Theory - Project A

From StatWiki
Jump to: navigation, search

The following data were taken from the segregating sites in a sequence of nucleotides from the Y chromosome of 355 Europeans. Sixteen segregating sites were found and 11 different alleles. At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large sample of chimpanzees). The alleles observed and their frequencies are given below.



Alleles
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
C 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0
E 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0
F 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
G 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 1
I 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0
J 0 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0
K 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
L 0 1 0 0 1 0 0 1 1 0 0 0 0 1 1 0
N 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
R 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0

Task 1

Calculate the matrix given the Hamming distance between each allele.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/efceb3767014f32d28cc5a9bde8b706b5d0f9bdb_%i.pdf'
> rpdfno<-0
> rhtml<-''
> rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/'
> source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R')
> rout<-'text'
> cat('<!--- Start of program --->\n')
<!--- Start of program --->
> d=dist(alleles,method="manhattan")
Error in as.matrix(x) : object 'alleles' not found
Calls: dist -> as.matrix
Execution halted
in

d=dist(alleles,method="manhattan") d

Task 2

Calculate the nucleotide diversity.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/22f0839c602b2270e542860dcf082ed51f6dfeb5_%i.pdf'
> rpdfno<-0
> rhtml<-''
> rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/'
> source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R')
> rout<-'text'
> cat('<!--- Start of program --->\n')
<!--- Start of program --->
> theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1)))
Error in as.matrix(d) : object 'd' not found
Calls: as.dist -> as.matrix
Execution halted
in

theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1))) theta.pi

Task 3

Carry out the Tajima test to verify the Wright-Fisher model.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/d50b60f660f168849c1e5668bc80085248190387_%i.pdf'
> rpdfno<-0
> rhtml<-''
> rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/'
> source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R')
> rout<-'text'
> cat('<!--- Start of program --->\n')
<!--- Start of program --->
> S=dim(alleles)[2]
Error: object 'alleles' not found
Execution halted
in

S=dim(alleles)[2] n=sum(freq)

theta.l=S/sum(1/(1:(n-1)))

an=sum(1/1:(n-1)) bn=sum(1/((1:(n-1))^2))

e1 = (n+1)/(3*an * (n-1)) - 1/an^2 e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 )

var.theta=e1*S + e2*S*(S-1)

D=(theta.pi - theta.l)/sqrt(var.theta)

REngine.php: <!--- Start of program --->
Error in names(D) = "D" : names() applied to a non-vector
Execution halted
in

names(D)="D"

est=c(theta.l, theta.pi) names(est)=c("Theta L", "Theta Pi")

ret=list(statistic=D, method="Tajima Test", estimate=est, p.value=2*(1-pnorm(abs(D)))) class(ret)="htest" ret

Task 4

Consider the following effects i) population growth, ii) directional selection, iii) divisions within
the population. Which of these factors might be the dominant factor in the evolution of the population
given the sign of the realisation of Tajima’s test statistic?

(i) Population growth might not be the dominant factor, because \hat{\theta}_L < \hat{\theta}_\pi\, and this suggests falling population. (Chapter 1; Page 32)

(ii)

(iii) Because \hat{\theta}_L < \hat{\theta}_\pi\, division would be possible, but the migrationrate must be very high, since we accept H_0\, (= we got a Wright-Fisher Model). (Chapter 3; Page 11)