The BCPS is heavily weighted on Biostats, study design, and regulatory issues. I recommend you buy the little biostatistics book ACCP offers (it’s cheap and it has some good practice problems in it). It was worth more than any other book I purchased for the BCPS. This is also a really good study guide and here’s a really, really simple sheet. This study guide has more topics in biostatistics (including the “which statistical test to pick” questions).

### Biostats Definitions:

Here are some basic things in case you have forgotten them:

**Nominal date**– data with no order (yes/no, male/female)**Ordinal data**– data with order, but no consistent difference in magnitude change (classes of heart failure, pain scales)**Interval data**– continuous data with consistent interval difference (temperature)**Ratio data**– continuous data with consistent interval difference, but zero is the starting point (HR, BP)**Mean**– “Average.” Only with continuous data (parametric, normally distributed)**Median**– 50th percentile. The data point exactly in the middle of the data points. Usually only used with ordinal data or continuous data that is not normally distributed**Mode**– the most frequently occurring value**Range**– how far apart the data points are**Interquartile range**– related to the median. Most data is in the 25-75 percentile.**Standard deviation (SD)**– only applicable to parametric data. Measures how data points scatter around the mean. Not available for nominal data or ordinal data. 99% of the data should be found in +/-3 SDs, 95% of the data in +/- 2 SDs, 68% in +/- 1 SD.**Standard error the mean (SEM)**– smaller than the standard deviation. Average variability of data.**Parametric data**– continuous data that is normally distributed (like a parabola)**Nonparametric data**– continuous data that does not follow a normal distribution (skewed)- If mean > median, skewed to the right

**Correlation coefficient (R**) – how variables relate. The closer the number is to 1, the stronger the relationship.**R2**– How much of the relationship is due to Y (ie: 70% of weight gain is due to calories, 30% is unknown).**Narange Scale**– the probability that an adverse effect is related to drug toxicity.**Lot Proportional Hazard**– survival data

### Therapeutic Index

- Therapeutic Index = Median Toxic Dose / Medicontinuousive Dose (also Lethal dose 50/effective dose 50)
- The higher the TI, the safer the drug
- IF LD50>ED50 TI is large so the drug is safe

### Specificity, Sensitivity, Predictive Values and Accuracy

- Sensitivity = True Positives / (True Positives + False Negatives) Sensitivity measures true positives. If a highly sensitive test is negative, you can be sure they don’t have the disease (
**SN**OUT Rules Things Out) - Specificity = True Negatives / (True Negatives + False Positives) Specificity measures true negative. If a highly specific test is positive, you can be sure they have the disease (
**SP**in Rules Things In) - Bigger Numbers are more significant.
- Positive Predictive Value = True Positives / (True Positives + False Positives) This is the percentage of people who test positive who actually have the disease. *Bigger numbers are more significant*
- Negative Predictive Value = True Negatives / (True Negatives + False Negatives ) This is the percentage of people who test negative who don’t have the disease. *Bigger numbers are more significant*
- Accuracy = (True Positives + True Negatives) / Total
- When you increase sensitivity, you decrease specificity. You get more diagnosis but more false positives.
- Prevalence = the number of cases / total at risk; incidence = new cases/total population at risk

Some people like to set up a table (this is called a “confusion matrix” in the statistics world):

+ | – | ||

+ | TP | FP | TP+FP |

– | FN | TN | FN+TN |

TP+FN | TN+FP | TP+FP+FN+TN |

- Sensitivity= TP/(TP+FN) (column 1)
- Specificity = TN/ (TN+FP) (column 2)
- PPV = TP/ (TP+FP) (row 1)
- NPV = TN/(TN+FN) (row 2)
- Accuracy = diagonal down chart : TP+TN/ (TP+FP+FN+TN)

### Hypothesis Testing:

The null hypothesis is that there is **no difference** between groups in a study. In order to find significance, you need to REJECT the null. This can be a little confusing.

Null is True | Null is False | |

Accept Null | correct decision | Type II Error (β) |

Reject Null | Type I error (alpha) | correct decision |

- Type 1 error: Reject the null hypothesis when it is true. A difference is found where none exists. The maximum acceptable alpha error is usually 0.05 (think of alpha as the p-value you are designing the study to obtain).
- Type 2 Errors: Accept the null hypothesis when it is not true. No difference found when one exists. The maximum acceptable probability of a Type II error should be 20% (β = 0.2).
- Beta errors are usually due to sample size or a poorly powered study. The easiest way to decrease Beta is to increase the sample size. Alpha and sample size have the greatest impact on study power.

- You’ll always have the risk of making either a Type 1 or Type 2 error, but never have the risk of making both. If the p-value is significant, you have the risk of making a Type 1 error. If it is not, you have the risk of making a Type 2. For example, a p-value of 0.01 would mean there is a chance of committing a Type I error (i.e.: you found the p was significant, rejected the null and stated there was a different between the groups. In real life, there was no difference between the two groups).
**“If the p-value is low, the null must go.” If the p-value is less than alpha, the null is rejected.****P VALUES DO NOT SUGGEST CLINICAL SIGNIFICANCE**, just statistical significance. Clinical significance can only be assessed by reading the study and finding the methods, inclusion criteria, etc.- Confidence intervals:
- The closer a data point lies to the 95% confidence interval, the more likely it represents the population.
- For a ratio confidence interval, if it includes 1, it’s not significant (think 1/1 = 1, no difference)
- For a continuous confidence interval, if it includes 0, it’s not significant (think 1-1 = 0, no difference)

### Relative Risk

- Relative Risk = incidence in exposed patients / incidence in non-exposed patients
- >1 incidence in the exposed group is higher
- <1 incidence in the exposed group is lower

- Absolute risk reduction = Risk Reduction in the control group – Risk Reduction in the treatment group
- For this one, use a table for sure (these will likely be on the test):

Treatment | Disease | No Disease |

Exposed | A | B |

Unexposed | C | D |

- Relative Risk = A/(A+B)
- Absolute Risk Reduction (ARR) = C/(C+D)
- Odd Ratio = AD/CB
- Only used in retrospective studies
- If used in a prospective study, odds ratio overestimates the risks. They may try to trip you up on this.
- The further the OR is from 1, the more the OR overestimates the RR

- Number needed to treat or Number Needed to Harm: 1/ARR (it’s a decimal, not a percentage)
- Include duration of study = must treat 10 patients for 5 years

- AR (
**absolute risk**) = the number of events (good or bad) in treated or control groups, divided by the number of people in that group. - ARC = the AR of events in the control group.
- ART = the AR of events in the treatment group.
**ARR**(**absolute risk reduction**) = ARC – ART.- RR (
**relative risk**) = ART / ARC.