The BCPS is heavily weighted on Biostats, study design, and regulatory issues. I recommend you buy the little biostatistics book ACCP offers (it’s cheap and it has some good practice problems in it). It was worth more than any other book I purchased for the BCPS. This is also a really good study guide and here’s a really, really simple sheet. This study guide has more topics in biostatistics (including the “which statistical test to pick” questions).
Biostats Definitions:
Here are some basic things in case you have forgotten them:
- Nominal date – data with no order (yes/no, male/female)
- Ordinal data – data with order, but no consistent difference in magnitude change (classes of heart failure, pain scales)
- Interval data – continuous data with consistent interval difference (temperature)
- Ratio data – continuous data with consistent interval difference, but zero is the starting point (HR, BP)
- Mean– “Average.” Only with continuous data (parametric, normally distributed)
- Median – 50th percentile. The data point exactly in the middle of the data points. Usually only used with ordinal data or continuous data that is not normally distributed
- Mode – the most frequently occurring value
- Range – how far apart the data points are
- Interquartile range – related to the median. Most data is in the 25-75 percentile.
- Standard deviation (SD) – only applicable to parametric data. Measures how data points scatter around the mean. Not available for nominal data or ordinal data. 99% of the data should be found in +/-3 SDs, 95% of the data in +/- 2 SDs, 68% in +/- 1 SD.
- Standard error the mean (SEM) – smaller than the standard deviation. Average variability of data.
- Parametric data – continuous data that is normally distributed (like a parabola)
- Nonparametric data – continuous data that does not follow a normal distribution (skewed)
- If mean > median, skewed to the right
- Correlation coefficient (R) – how variables relate. The closer the number is to 1, the stronger the relationship.
- R2 – How much of the relationship is due to Y (ie: 70% of weight gain is due to calories, 30% is unknown).
- Narange Scale – the probability that an adverse effect is related to drug toxicity.
- Lot Proportional Hazard – survival data
Therapeutic Index
- Therapeutic Index = Median Toxic Dose / Medicontinuousive Dose (also Lethal dose 50/effective dose 50)
- The higher the TI, the safer the drug
- IF LD50>ED50 TI is large so the drug is safe
Specificity, Sensitivity, Predictive Values and Accuracy
- Sensitivity = True Positives / (True Positives + False Negatives) Sensitivity measures true positives. If a highly sensitive test is negative, you can be sure they don’t have the disease (SNOUT Rules Things Out)
- Specificity = True Negatives / (True Negatives + False Positives) Specificity measures true negative. If a highly specific test is positive, you can be sure they have the disease (SPin Rules Things In)
- Bigger Numbers are more significant.
- Positive Predictive Value = True Positives / (True Positives + False Positives) This is the percentage of people who test positive who actually have the disease. *Bigger numbers are more significant*
- Negative Predictive Value = True Negatives / (True Negatives + False Negatives ) This is the percentage of people who test negative who don’t have the disease. *Bigger numbers are more significant*
- Accuracy = (True Positives + True Negatives) / Total
- When you increase sensitivity, you decrease specificity. You get more diagnosis but more false positives.
- Prevalence = the number of cases / total at risk; incidence = new cases/total population at risk
Some people like to set up a table (this is called a “confusion matrix” in the statistics world):
+ | – | ||
+ | TP | FP | TP+FP |
– | FN | TN | FN+TN |
TP+FN | TN+FP | TP+FP+FN+TN |
- Sensitivity= TP/(TP+FN) (column 1)
- Specificity = TN/ (TN+FP) (column 2)
- PPV = TP/ (TP+FP) (row 1)
- NPV = TN/(TN+FN) (row 2)
- Accuracy = diagonal down chart : TP+TN/ (TP+FP+FN+TN)
Hypothesis Testing:
The null hypothesis is that there is no difference between groups in a study. In order to find significance, you need to REJECT the null. This can be a little confusing.
Null is True | Null is False | |
Accept Null | correct decision | Type II Error (β) |
Reject Null | Type I error (alpha) | correct decision |
- Type 1 error: Reject the null hypothesis when it is true. A difference is found where none exists. The maximum acceptable alpha error is usually 0.05 (think of alpha as the p-value you are designing the study to obtain).
- Type 2 Errors: Accept the null hypothesis when it is not true. No difference found when one exists. The maximum acceptable probability of a Type II error should be 20% (β = 0.2).
- Beta errors are usually due to sample size or a poorly powered study. The easiest way to decrease Beta is to increase the sample size. Alpha and sample size have the greatest impact on study power.
- You’ll always have the risk of making either a Type 1 or Type 2 error, but never have the risk of making both. If the p-value is significant, you have the risk of making a Type 1 error. If it is not, you have the risk of making a Type 2. For example, a p-value of 0.01 would mean there is a chance of committing a Type I error (i.e.: you found the p was significant, rejected the null and stated there was a different between the groups. In real life, there was no difference between the two groups).
- “If the p-value is low, the null must go.” If the p-value is less than alpha, the null is rejected.
- P VALUES DO NOT SUGGEST CLINICAL SIGNIFICANCE, just statistical significance. Clinical significance can only be assessed by reading the study and finding the methods, inclusion criteria, etc.
- Confidence intervals:
- The closer a data point lies to the 95% confidence interval, the more likely it represents the population.
- For a ratio confidence interval, if it includes 1, it’s not significant (think 1/1 = 1, no difference)
- For a continuous confidence interval, if it includes 0, it’s not significant (think 1-1 = 0, no difference)
Relative Risk
- Relative Risk = incidence in exposed patients / incidence in non-exposed patients
- >1 incidence in the exposed group is higher
- <1 incidence in the exposed group is lower
- Absolute risk reduction = Risk Reduction in the control group – Risk Reduction in the treatment group
- For this one, use a table for sure (these will likely be on the test):
Treatment | Disease | No Disease |
Exposed | A | B |
Unexposed | C | D |
- Relative Risk = A/(A+B)
- Absolute Risk Reduction (ARR) = C/(C+D)
- Odd Ratio = AD/CB
- Only used in retrospective studies
- If used in a prospective study, odds ratio overestimates the risks. They may try to trip you up on this.
- The further the OR is from 1, the more the OR overestimates the RR
- Number needed to treat or Number Needed to Harm: 1/ARR (it’s a decimal, not a percentage)
- Include duration of study = must treat 10 patients for 5 years
- AR (absolute risk) = the number of events (good or bad) in treated or control groups, divided by the number of people in that group.
- ARC = the AR of events in the control group.
- ART = the AR of events in the treatment group.
- ARR (absolute risk reduction) = ARC – ART.
- RR (relative risk) = ART / ARC.