# Computerintensive Methoden - Coalescent Theory - Project A

The following data were taken from the segregating sites in a sequence of nucleotides from the Y chromosome of 355 Europeans. Sixteen segregating sites were found and 11 different alleles. At each site 0 represents the ancestral variant (as observed in the majority of a reasonably large sample of chimpanzees). The alleles observed and their frequencies are given below.

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

C | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |

E | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

F | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

G | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |

I | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |

J | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |

K | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

L | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |

N | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Q | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

R | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

# Task 1

Calculate the matrix given the Hamming distance between each allele.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/efceb3767014f32d28cc5a9bde8b706b5d0f9bdb_%i.pdf' > rpdfno<-0 > rhtml<-'' > rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/' > source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R') > rout<-'text' > cat('<!--- Start of program --->\n') <!--- Start of program ---> > d=dist(alleles,method="manhattan") Error in as.matrix(x) : object 'alleles' not found Calls: dist -> as.matrix Execution haltedin

d=dist(alleles,method="manhattan") d

# Task 2

Calculate the nucleotide diversity.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/22f0839c602b2270e542860dcf082ed51f6dfeb5_%i.pdf' > rpdfno<-0 > rhtml<-'' > rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/' > source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R') > rout<-'text' > cat('<!--- Start of program --->\n') <!--- Start of program ---> > theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1))) Error in as.matrix(d) : object 'd' not found Calls: as.dist -> as.matrix Execution haltedin

theta.pi=sum(as.dist(as.matrix(d) * (freq %o% freq)))*(2/(sum(freq)*(sum(freq)-1))) theta.pi

# Task 3

Carry out the Tajima test to verify the Wright-Fisher model.

REngine.php: > rpdf<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/d50b60f660f168849c1e5668bc80085248190387_%i.pdf' > rpdfno<-0 > rhtml<-'' > rfiles<-'/var/www/localhost/htdocs/StatWiki/Rfiles/R/' > source('/var/www/localhost/htdocs/StatWiki/Rfiles/R/@.R') > rout<-'text' > cat('<!--- Start of program --->\n') <!--- Start of program ---> > S=dim(alleles)[2] Error: object 'alleles' not found Execution haltedin

S=dim(alleles)[2] n=sum(freq)

theta.l=S/sum(1/(1:(n-1)))

an=sum(1/1:(n-1)) bn=sum(1/((1:(n-1))^2))

e1 = (n+1)/(3*an * (n-1)) - 1/an^2 e2 = 1/(an^2+bn) * ( (2*(n^2+n+3))/(9*n*(n-1)) - (n+2)/(n*an) + bn/an^2 )

var.theta=e1*S + e2*S*(S-1)

D=(theta.pi - theta.l)/sqrt(var.theta)

REngine.php: <!--- Start of program ---> Error in names(D) = "D" : names() applied to a non-vector Execution haltedin

names(D)="D"

est=c(theta.l, theta.pi) names(est)=c("Theta L", "Theta Pi")

ret=list(statistic=D, method="Tajima Test", estimate=est, p.value=2*(1-pnorm(abs(D)))) class(ret)="htest" ret

# Task 4

Consider the following effects i) population growth, ii) directional selection, iii) divisions within the population. Which of these factors might be the dominant factor in the evolution of the population given the sign of the realisation of Tajima’s test statistic?

(i) Population growth might not be the dominant factor, because and this suggests falling population. (Chapter 1; Page 32)

(ii)

(iii) Because division would be possible, but the migrationrate must be very high, since we accept (= we got a Wright-Fisher Model). (Chapter 3; Page 11)