# Seminar aus Statistik für das Magisterstudium - Übung 1

## Contents

- 1 Example 3.1
- 2 Example 3.2
- 3 Exercise 3.3
- 4 Exercise 3.4
- 5 Exercise 3.5
- 6 Exercise 3.7
- 7 Exercise 3.9
- 8 Exercise 3.10
- 9 Exercise 7.4
- 10 Exercise 7.5
- 11 Exercise 7.6
- 12 Example 4.1
- 13 Exercise 4.2
- 14 Exercise 4.3
- 15 Exercise 4.4
- 16 Exercise 4.5
- 17 Exercise 5.2
- 18 Exercise 5.3
- 19 Example 5.4
- 20 Exercise 5.6
- 21 Example 3.6
- 22 Example 7.1
- 23 Example 7.2
- 24 Exercise 3.10
- 25 Exercise 3.11
- 26 Example 9.1
- 27 Example 9.2
- 28 Exercise 9.3
- 29 Exercise 9.4
- 30 Exercise 9.5
- 31 Example 9.6
- 32 Example 9.7
- 33 Example 10.1
- 34 Exercise 10.2
- 35 Example 10.3
- 36 Example 10.4

Seminar aus Angewandter Statistik WS07/08 (Prof. M. Schemper) Übungsbeispiele zur Fallzahlplanung

## Example 3.1

A trial of transcutaneous electrical stimulation (TNS) for the relief of pain in patients with osteoarthritis of the knee is planned on the basis of the preliminary results published by Smith, Lewith & Machin(1983) who obtained a 25% placebo and a 65% TNS response after four weeks of treatment. A response was defined as a pre-specified reduction in pain experienced by the patient following treatment. How many patients need to be recruited to such a trial with a = 0.05, (one-sided) and power 1 - ß = 0.90? What is the sample size required if a two-sided test is used?

## Example 3.2

Familiari et al. (1981) compared two drugs in the treatment of peptic ulcer. Forty days after treatment they found 23/30 (77%) of subjects were healed using pirenzepine and 18/31 (58%) using trithiozine. They stated that there was no significant difference between the two drugs at the 5% level, on a two-sided test. If, in fact, 71, = 0.58 and 72 = 0.77, what is the probability of detecting this difference at the 5% level?

## Exercise 3.3

Elwood & Sweetnam (1979) conducted a clinical trial of aspirin for treatment of patients who had a myocardial infarction. They were hoping to obtain a 25% reduction in mortality, based on a mortality of about 15% in the year following infarction. How many patients should they include in their study with 80% power and 5 significance level?

## Exercise 3.4

In fact Elwood & Sweetnam (1979) recruited about 850 patients to each treatment group. What is the probability of their detecting a significant result at the 5% level, given that the underlying proportions are ni=0.10 and it,=0.15?

## Exercise 3.5

The rate of wound infection over one year in a operating theatre was 10%. This figure has been confirmed from several other operating theatres with the same scrub-up preparation. An investigator wishes to test the efficacy of a new scrub-up preparation. How many operations does he need to examine in order to be 90% confident that the new procedure only produces a 5% infection rate?

## Exercise 3.7

In the trial referred to in Example 3.1, Smith, Lewith & Machin (1983) obtain a 40% difference in the percentage success rate between placebo and TNS. If another investigator believes that with his patients the placebo response is likely to be about 65%, he will be unable to observe 40% difference in response due to TNS, since this would require a 105% response in his treatment group! However, the odds ratio in favour of TNS has been found to be {0.65/(1 - 0.65)}/{0.25/(1 - 0.25)} = 5,57. How many subjects would the investigator require to detect an odds ratio of 5.57, with 7L = 0.65, significance level a = 0.05 (one-sided) and power 1 - ß = 0.90?

## Exercise 3.9

In Example 3.1, what is the effect on the power if the investigators planned to use the continuity correction? 1

## Exercise 3.10

Given the sample size obtained in Example 3.1, what is the effect on the power if the total sample size is kept fixed, but the ratio of TNS to placebo is 2:1? 3:1?

## Exercise 7.4

Consider a clinical trial of the use of a drug in twin pregnancies. An obstetrician wishes to show a significant prolongation of pregnancy by use of the drug when compared to placebo. In the absence of any data on the Standard deviation of pregnancy duration she argues that a normal pregnancy ranges from 33 to 40 weeks, and so a rough guess for the standard deviation is (40-33)/4 = 1.75 weeks. How many pregnancies must the obstetrician observe if she decides that one week is a clinically significant increase in the length of a pregnancy? (two-sided a=0.05, and power 80%).

## Exercise 7.5

Woollard & Couper (1983) describe a clinical trial for comparing Moducren with Propranolol as initial therapies in essential hypertension. They proposed to compare the change in blood pressure due to the two drugs. Given that they can recruit only about 50 patients for each drug, and that they are looking for a 'medium' effect size of about 6 = 0.5 (0.2 and 0.8 would refer to Cohen's small and large effects, respectively). What is the power of the test, given a two-sided significance level of a = 0.05?

## Exercise 7.6

What size of effect could Woollard & Couper (loc cit), of Example 7.5, reasonably expect to detect (power of 80%) with their trial?

## Example 4.1

Consider a clinical trial to compare two treatments for stage 1 breast cancer. The treatments are mastectomy or simple removal of the lump, but leaving the remainder of the breast. We would like to show that, at woret, lump removal is only 10% inferior to mastectomy. Assuming the 5 year survival rate of stage 1 breast cancer after mastectomy is 60%, how large a trial would be needed to show that the 5 year survival rate for lump removal was at least 50%? (Assume a one sided significance level of 10% is sought, and a power of 80%.)

## Exercise 4.2

If we knew, before the trial planned in Example 4.1, that 150 patients for each treatment was the maximum that we would be able to recruit, then given that the expected survival rate is 60%, what sort of difference in proportions can we rule out? (Assume a significance level of 5%.)

## Exercise 4.3

Bennett, Dismukes, Duma et al. (1979) designed a clinical trial to test whether a combination chemotherapy for a shorter period would be at least as good as conventional therapy for patients with crypococcal meningitis. They recruited 39 patients to each treatment arm and wished to conclude that a difference of less than 20% in response rate between the treatments would indicate equivalence. Assuming a one sided test size of 10% and an overall response rate of 50%, what is the full power of the trial?

## Exercise 4.4

The following is an extract from the protocol of the EORTC (European Organisation for Research and Treatment of Cancer), kidney-sparing trial; how many patients did they decide to recruit? With radical nephrectomy the expected 5 year survival rate is approximately 90%. There are several arguments in favour of conservative therapy, but only if this does not decrease the 5 year survival rate by more than 2 10%. It is desired to use error rates a = 0.05 and f3 =0.10. On the basis of previous experience, the expected entry rate is 70 patients per year, how long will the study take to complete?

## Exercise 4.5

A study showed that patients on drug A had a diastolic BP of 96 mm Hg, as against 94 mm Hg on drug B. The standard deviation was 8 mm Hg. There were 15 patients per group. Was this difference significant? What was the power of the test to detect a difference of 5 mm Hg? Other studies have suggested that drug B does have an effect. To detect a clinically relevant difference of 5 mm Hg, how many patients should be entered? If, on the other hand, one wishes to confirm that the difference is likely to be less than 5 mm Hg (i.e. to show equivalence), how many patients are required (assuming 80% power, 5% p-value)?

## Exercise 5.2

## Exercise 5.3

(Kiy Campbell, Elwood, Abbas & Waters (1984) estimated the prevalence of angina in a group of 1400 women to be about 15%. What is the 95% confidence interval of this result?

## Example 5.4

There are 1000 diabetics on a register. If the estimated prevalence of impotence amongst diabetics is assumed to be 20%, how many subjects do we require if we are willing to allow a 95% confidence interval of 4 percentage points either side of the true prevalence (that is, if the true prevalence were p, we would wish the confidence interval to be from p-4 to p+4))?

## Exercise 5.6

A clinical trial compares the diastolic blood pressures in two groups of patients. It was expected that the mean BP in each of the two groups might be 100 and 90 mm Hg respectively. Past experience suggests that a is likely to be 10 mmHg. If the observed difference turned out to be 1 mm Hg (not significant!), the 95% confidence interval would be —5 mm to +7 mm. The 7 mm Hg difference might be medically worthwhile. Suppose a difference of 5 mm Hg is taken to be the smallest medically relevant difference, what sample size is required for obtaining a C.1. of+/- 5 mm Hg (i.e. d=10) ?

## Example 3.6

In a clinical trial for treatment of pre-menstrual syndrome, suitable women are allocated at random to one of the two treatments for one menstrual cycle, and then, after a further cycle to reduce any carry-over effects, the other treatment for the third cycle. The outcome is measured as either 'obtained relief or 'did not obtain relief. The expected proportion of women obtaining relief for treatments 1 and 2 are 0.2 and 0.3 3 respectively. How many women should be included in the trial, for a two-sided significance level of 0.05 and power 0.90?

## Example 7.1

A psychologist wishes to test the IQ of a certain population. His null hypothesis is that the IQ is 100, and he has no preconceived notion of whether the group are likely to above or below it. He wishes to be able to detect a fairly small difference from 100, e.g., 0.2 standard deviations from it, so that if he gets a nonsignificant result from his analysis, he can be sure that the average IQ from his population lies very dose to 100. How many subjects should he recruit for a power of 95%?

## Example 7.2

An investigator wishes to test whether a particular drug reduces blood pressure. He proposes to measure blood pressure on a group of patients, administer the drug, and then measure the blood pressure again one hour later. Previous studies have shown that, in the absence of any drug effect, the standard deviation of the difference between hourly, within-subjects blood pressure measurements is about 10 mm Hg. The investigator decides that he is looking for a fall in blood pressure of more than 10 mm Hg. How many patients should the investigator recruit to be significant at the one and two-sided 5% level, with a power of 80%? What if the power was 90%?

## Exercise 3.10

A certain drug (ocreotide) is to be compared with placebo in the treatment of the vomiting which occurs in cases of intestinal obstruction due to terminal malignancy. The response is presence or absence of vomiting during the 24 hrs following treatment. A previous study suggested that the response rates (absence of vomiting) are Th2=0.8 for the treated group and Tti= 0.4 for the control group. How many patients are required in the clinical trial if McNemar's test is to be used in the analysis (without continuity correction), with two sided significance level of 0.05 and power 0.80?

## Exercise 3.11

What is the power of the study in exercise 3.10, if in fact we have 49 patients available, at two-sided 5% level?

Cell Density before and after eradication of H. Pylori
Before After Difference
Mean 15.2 24.4 -9.2
S.D. 13.9 23.5 15.9
~,. Exercise 7.7
An investigator wished to compare the effect of placebo on blood pressure, against the effect of a drug. The difference being sought is 5 mm Hg, with a standard deviation of blood pressure readings being 10 mm Hg. How many patients are required if a Wilcoxon test is used with a = 0.05, 1-(3 = 0.90?
(Assume that departures from a normal distribution are relatively unimportant; further, remember that the ARE of Wilcoxon to t-test is 7t /3 and therefore t-test sample sizes could be multiplied by 1.05; also consider specification by P(X<Y))
K Exercise X
What is an appropriate sample size for a follow-up study designed to test whether the incidence of
4
coronary heart disease (CHD) in white males aged 39-59 is related to their serum cholesterol level (Hsieh, 1989)?
The probability of CHD event during an 18-month follow up with a mean value of serum cholesterol is 0.07. We wish to detect an odds ratio of 1.5 for an individual with a cholesterol level of one Standard deviation above the mean, using a (one tailed) significance level of 5% and power of 80%.
Furthermore, assume we wish to detect the same effect white controlling for the effects of triglyceride; and the correlation coefficient of cholesterol level and log-triglyceride is 0.4. How many patients are then required?
Worked example
Hypertensive patients were to receive one of three randomised treatments. A researcher judged that the outcome would be about 100 mm Hg, 95 mm Hg and 85 mm Hg respectively. From previous studies, he/she knew that the standard deviation within each group would be about 15 mm Hg. How many patients are required to obtain a significant result at 5% (1%) significance, given a power of 90%?

## Example 9.1

An adjuvant study of the drug Levamisole is proposed for patients with resectable cancer of the colon (Dukes's C) in which the primary objective of the study is to compare the efficacy of Levamisole against a placebo control with respect to relapse free survival. How many relapses need to be observed in the trial if a decrease in relapse rates at one year, from 50% to 40% is anticipated, and a power of 80% is required? (Assume a one-sided test at 5% significance level)

## Example 9.2

How many subjects need to be recruited to the study described in Example 9.1 if it is assumed there will be a 10% withdrawal of patients beyond the control of the investigator?

## Exercise 9.3

The Multicenter Study Group (1980) describe a double blind controlled study of long-term oral acetylcysteine against placebo in chronic bronchitis. Their results gave the percentage of exacerbation-free subjects with placebo at 6 months as 25%. They also observed a doubling of median exacerbation-free times with the active treatment. A repeat trial is planned. How many subjects should be recruited if the power is to be set at 90% and the one-sided test size at 5%?

## Exercise 9.4

A trial is planned involving patients with bed sores. It is postulated that ultrasound treatment will halve the healing time of such sores and a double blind trial is proposed using ultrasound and a placebo, the placebo being a non functioning ultrasound machine. The investigators would like a power of 90% and a two-sided test at 5%. The proportion healed without treatment at 21 days is approximately 70%. How many patients should be recruited to the study? Suppose the investigators could only recruit 100 patients to their proposed study. What would be the corresponding power?

## Exercise 9.5

The Medical Research Council CHART trial of hyper-fractionated radiotherapy for head and neck cancer is designed in a 60:40 ratio, with 60% of patients receiving CHART. It is assumed that the 2-year tumour control rate for conventional therapy is 45%, and that a worthwhile and realistic difference would be an improvement of 15% or 20%. (90% power and 5% level of significance). How many patients are required? What effect does the allocation ratio make upon the number of patients required? The protocol assumed there would be a loss of 10% of patients during the follow-up; what effect does this have? (n'=n/(1-Prob(loss)) ) The protocol aims to recruit 500 patients; is this reasonable?

## Example 9.6

A trial reported in the Lancet (Valerius, Koch, Hoiby 1991) recruited 26 patients. The authors claimed that there was a significant (p<0.05) effect after one year, and observed 50% of the 12 untreated patients 5 alive at one year, compared to 85% of the 14 treated patients. What was the power of the test? If the study were repeated, with equal numbers of patients per group, how many patients would you aim to recruit for 80% power? How many for 90% power?

## Example 9.7

The Medical Research Council trial of surgery for gastric cancer, comparing R1 (standard surgery as in most Western countries) against R2 (total gastrectomy, including removal of nodes, as practised in Japan) was designed to compare 5-year survival and to detect a change from 20% survival to 34% survival. How many patients were required (90% power, 5% significance level)? This trial was designed in 1985, and is about to close. It now appears likely that the baseline (R1) survival rate may be 27%; how many patients would be required in order to detect a difference between 27% survival and 40% survival? Why do you think the estimate of survival appears to be 27% as opposed to 20%? Should the patient recruitment be extended? Over the last 7 years the surgeons have modified their views about R2 surgery, and now believe that 13% difference is too optimistic (they have not, however, been shown any survival comparisons from this trial; that data remains confidential). A difference of at most 10% seems more likely, and would still be clinically important. How many patients are required to detect a 10% difference? Or an 8% difference? Alternatively, what is the power of detecting a 10% difference given 400 patients in the trial? Comment on the changing views of surgeons!

## Example 10.1

An adjuvant study of the drug Levamisole is proposed for patients with resectable cancer of the colon (Duke's C) in which the primary objective of the study is to compare the efficacy of Levamisole against a placebo control with respect to relapse free survival. Assuming relapse times have an exponential distribution, how many relapses need to be observed in the trial if it is anticipated that the median relapse time is likely to be increased from 1 year to 1.35 years, and a power of 80% required?

## Exercise 10.2

Two drugs ampicillin and metronidazole are to be compared for their differing effects on the postoperative recovery of patients having an appendicectomy as measured by the duration of post-operative pyrexia. Previous studies (Chant, Turner & Machin, 1983) suggest the median duration of pyrexia with metronidazole to be approximately 80 hours. Duration of post- operative pyrexia is assumed to follow an exponential distribution. How many patients should be recruited to a trial to demonstrate a clinically worthwhile reduction in postoperative pyrexia of 10 hours at two-sided test size a =0.05 and power 1-13 = 0.90? It should be noted that such a trial does not require prolonged follow up of patients and the event of interest, i.e. retum to normal temperature, will be observed in almost all patients. There will be very few patient losses as the follow up time is only a few days.

## Example 10.3

In the trial described in Example 10.1 it is anticipated that the recruitment rate will be approximately 80 patients per year. For what period should the trial be conducted and how many patients should be recruited?

## Example 10.4

The Multicenter Study Group (1980), describe a double blind controlled study of long-term oral acetylcysteine against placebo in chronic bronchitis. How many patients should be recruited to such a trial if it were possible to assume that the exacerbation-free times induced by treatment have exponential distributions? Approximately 200 patients could be recruited in each 13 week period which corresponds to the median exacerbation-free time, a doubling of this time with the active treatment is anticipated and the power is set at 90% and the one-sided test size at 5%. 6