I wrote this page inspired by a WhatsApp discussion with my med school classmates. We realized that our medical education in statistics was limited, and we struggled with understanding medical research. The information here is quite basic. I don’t remember being taught this stuff—or I may have been asleep in those classes!
Levels of evidence
- Animal Studies, Expert Opinion, and Anecdotal Evidence: These are considered the lowest levels of evidence. Be cautious with expert opinions, especially from “senior” doctors who may not be up-to-date with current research. Anecdotal evidence, such as “in my 20+ years of experience, I have seen…” should also be scrutinized through a scientific lens. I am not looking down on clinical experience, but our own biases and preconceptions can easily lead us astray. As Dr. Richard Feynman, the renowned Nobel Prize-winning physicist, said, “The first principle is that you must not fool yourself — and you are the easiest person to fool.”
- Randomized Controlled Trials (RCTs): These are the gold standard for evaluating interventions. Participants are randomly assigned to treatment groups, which helps minimize bias and confounding factors.
- Systematic Reviews and Meta-Analyses provide higher-quality evidence by summarizing multiple studies. However, the quality of these reviews depends on the quality of the included studies (Garbage In, Garbage Out—GIGO).
- Nutrition Epidemiology Studies considered low-quality evidence due to reliance on food recall surveys, which are notoriously inaccurate.
What to look for?
Observational vs. Interventional Studies
Observational studies examine outcomes without manipulating variables. They include:
- Cohort studies: Follow groups over time to assess outcomes. For example, longitudinal studies over many years.
- Case-control studies: Compare groups with and without a condition
- Cross-sectional studies: Analyze data at a specific point in time
Interventional studies, like RCTs, actively manipulate variables to study their effects on outcomes.
Observational studies can generate good hypotheses and include a large sample size, but they do not tell us about causation.
Association is not causation
Association means two things are linked or connected in some way, but it doesn’t necessarily mean one causes the other.
- Association: When two things happen together or are related but don’t necessarily cause each other.
- Causation: When one thing directly makes another thing happen.
Imagine you notice that on days when more ice cream is sold, there are also more drownings at the beach. These two things are associated – they happen together. But does ice cream cause drownings? Of course not! What’s really happening is that on hot summer days:
- People buy more ice cream because it’s hot
- More people go to the beach and swim because it’s hot
- With more people at the beach, there’s a higher chance of drowning.
So while ice cream sales and drownings are associated (they both increase on hot days), one doesn’t cause the other. The hot weather is actually influencing both. This is extremely important to remember because many people don’t differentiate between observational studies and interventional studies.
Relative Risk vs. Absolute Risk
Understanding this is one of the most important things that I learnt. Relative risk (RR) compares the risk of an outcome between two groups, expressed as a ratio or percentage. Absolute risk (AR) is the actual risk of an outcome in a population. For example, if a drug reduces heart attack risk from 4% to 2%:
- Relative risk reduction: 50% (2% is half of 4%)
- Absolute risk reduction: 2% (4% – 2%)
Relative risk can sometimes overstate the effect, so considering absolute risk is important for clinical decision-making. Many people who have vested interests in the study outcome emphasize the relative risk reduction.
Number Needed To Treat
The Number Needed to Treat (NNT) provides a measurement of the effectiveness of a medication or treatment by calculating the estimated number of individuals who must receive the treatment to benefit one person. To know more, check here.
Sample size
The sample size is a critical aspect of research design that significantly impacts the reliability, validity, and generalizability of study findings. Here are the key reasons why sample size is important:
1. Accuracy and Precision
- Larger Sample Size: A larger sample size enhances the precision of estimates, leading to a narrower margin of error. This means the results are more likely to be close to the true population parameter.
- Smaller Sample Size: A smaller sample size can lead to greater variability and less reliable results, increasing the margin of error and the risk of drawing incorrect conclusions.
2. Statistical Power
- Definition: Statistical power is the probability that a study will detect an effect when there is an effect to be detected.
- Impact: A larger sample size increases the power of a statistical test, reducing the likelihood of Type II errors (false negatives). This means the study is more likely to detect true effects.
3. Generalizability
- Representativeness: A larger sample size is more likely to be representative of the population, allowing researchers to generalize the findings to a broader group.
- Variability: In populations with high variability, a larger sample size helps ensure that the sample accurately reflects the diversity within the population.
- Limited generalizability or lack of external validity: It is the extent to which research findings can be applied to settings or groups other than those in which the original study was conducted. Its results may not be applicable or relevant to populations with different characteristics from those of the study participants. For example, the Framingham Risk Score for CVD is based on studies done on people of predominantly European ancestry. It does not apply to South Asians. Yet it is routinely used to assess CVD risk in them, which unfortunately results in underestimating their risk.
- Sex and gender-based analysis of trial data: Sex refers to the biological and physiological characteristics that define humans as male or female based on chromosomes, reproductive organs, hormones and secondary sexual characteristics. Gender is a social construct that encompasses the roles, behaviours, activities, and attributes that a given society considers appropriate for men and women.
- The phrase “Women are not small men” highlights the importance of recognizing and studying sex-and gender-based differences in physiology, disease manifestation, and treatment responses. In the United States, the landmark decision to ensure the inclusion of women in clinical trials was made in 1993. This act mandated that women and minorities be included in NIH-funded clinical research and that clinical trials be designed to allow for the analysis of differences between sexes. However, though women are included in clinical trials, often, the sex and gender-based data is not analyzed.
Studies on nutritional supplements are often hampered by small sample sizes because large studies require more funding.
Surrogate endpoints and hard endpoints of a clinical trial
Surrogate Endpoints. Biomarkers or intermediate outcomes are used as substitutes for clinically meaningful endpoints. For example, lowering LDL-C versus preventing death from CVD. Death is a hard endpoint.
- It can be measured more quickly than clinical outcomes. Often require smaller sample sizes. It may be less expensive to study
- Limitations: It may not always accurately predict the clinical outcome. This can sometimes lead to misleading conclusions about treatment efficacy
Hard Endpoints. Death, heart attack, stroke or quality of life measures.
- Definition: Direct measures of how a patient feels, functions, or survives. These are clinically meaningful outcomes.
- Advantages: Directly measure the clinical benefit to patients. Provide stronger evidence for treatment efficacy.
- Limitations: It may take longer to observe. Often require larger sample sizes and are more expensive to study
Changes in surrogate endpoints may not always translate to meaningful clinical benefits, while changes in hard endpoints are inherently meaningful. For example, many people with LDL-C on target have MI.
Primary and Secondary Endpoints
Primary Endpoints:
- Definition: The main outcome that a study is designed to evaluate. It’s the most important measure used to determine if the treatment being studied is effective.
- Statistical power: The study is specifically designed and powered (in terms of sample size) to detect differences in the primary endpoint.
- Importance: It’s the key focus of the study and usually the basis for regulatory decisions about drug approval.
- Pre-specification: This must be clearly defined before the study begins.
- Number: Typically, there is only one primary endpoint per study to maintain statistical integrity.
- Examples: Overall survival in cancer trials and reduction in major cardiovascular events in heart disease studies.
Secondary Endpoints:
- Definition: Additional outcomes of interest that are not the main focus of the study.
- Statistical power: The study may not be specifically powered to detect differences in secondary endpoints.
- Importance: They provide supporting information and help interpret the primary results, but they are not usually the basis for regulatory decisions alone.
- Pre-specification: This should also be defined before the study begins but should have less statistical authority than the primary endpoint.
- Number: There can be multiple secondary endpoints in a study.
- Purpose: Often used to generate hypotheses for future research or provide additional context for the primary results.
- Examples: Quality of life measures, biomarker changes, or other clinical outcomes related to the disease being studied.
Key Differences:
- Priority: Primary endpoints are the main focus, while secondary endpoints are supplementary.
- Statistical significance: Results from secondary endpoints should be interpreted with caution, as positive findings may be due to chance given multiple comparisons.
- Study design impact: The primary endpoint largely determines the study design, including sample size and duration.
- Regulatory weight: Primary endpoints carry more weight in regulatory decisions about drug approval.
It’s important to note that while secondary endpoints provide valuable information, they should not be over-interpreted, especially if they contradict the primary endpoint results.
Statistical Significance vs. Clinical Significance
Statistical significance (typically p < 0.05) indicates that a result is unlikely due to chance. However, it doesn’t necessarily imply clinical importance. Clinical significance refers to the practical importance of an effect. A statistically significant result may not be clinically meaningful if the effect size is small or irrelevant to patient outcomes. For example, a drug might show a statistically significant reduction in blood pressure of 2 mmHg, but this small change may not be clinically meaningful for most patients.
Another example is that in ART, the take-home baby rate is more important than the pregnancy rate. If an intervention increases the pregnancy rate but not the take-home baby rate, the intervention is not as helpful because many pregnancies may end up in miscarriage.
(Please don’t ask me to explain p-value; I am still struggling with that one 🙂
How is AI helping with clinical trials?
To know more please read this paper in Nature .
Translating research to clinical practice
The most important question we must address is how to effectively translate research findings into actionable health advice and recommendations for patients.