Statistical Test Selector
Answer the following questions to determine the appropriate statistical test for your study.
What is the main purpose of your analysis?
Explanation:
The purpose of your analysis determines the broad category of statistical tests that would be appropriate for your study.
How many groups are you comparing?
Explanation:
The number of groups in your study affects which statistical test is appropriate. Some tests are designed specifically for two-group comparisons, while others can handle multiple groups.
What type of data are you analyzing?
Explanation:
The type of data you're analyzing is crucial for selecting the appropriate test. Continuous data are measured on a scale, while categorical data fall into distinct categories.
Are the data normally distributed?
Explanation:
Normal distribution is a key assumption for many statistical tests. You can check normality using histograms, Q-Q plots, or statistical tests like Shapiro-Wilk.
What type of categorical data?
Explanation:
The specific type of categorical data affects which test is most appropriate. Binary data have only two possible values, while ordinal data have a natural order.
Are the samples paired or independent?
Explanation:
Paired samples involve the same subjects measured twice (e.g., before and after treatment). Independent samples involve different subjects in each group.
Are the samples paired or independent?
Explanation:
For non-normally distributed data, we use non-parametric tests that don't assume normality.
How to check for normality
Methods to check normality:
1. Visual methods: Histogram, Q-Q plot
2. Statistical tests: Shapiro-Wilk test, Kolmogorov-Smirnov test
If p-value > 0.05 in these tests, data can be considered normally distributed.
Normal Distribution
Non-Normal Distribution
What type of data are you analyzing?
Explanation:
For multiple group comparisons, the data type determines whether to use ANOVA (for continuous data) or chi-square (for categorical data).
Are the data normally distributed?
Explanation:
For multiple group comparisons with continuous data, we use ANOVA if data are normally distributed, or Kruskal-Wallis if not.
Are the groups related or independent?
Explanation:
For normally distributed data with multiple groups, we use repeated measures ANOVA for related samples and one-way ANOVA for independent samples.
What types of variables are you analyzing?
Explanation:
The types of variables you're analyzing determine which correlation or association test is appropriate.
Are both variables normally distributed?
Explanation:
For two continuous variables, we use Pearson correlation if both are normally distributed, or Spearman correlation if not.
How many categories in the categorical variable?
Explanation:
When analyzing a categorical and a continuous variable, the number of categories determines whether to use t-tests (for two categories) or ANOVA (for multiple categories).
What type of outcome are you predicting?
Explanation:
The type of outcome you're predicting determines whether to use linear regression (for continuous outcomes) or logistic regression (for categorical outcomes).
How many categories in the outcome variable?
Explanation:
For categorical outcomes, we use binary logistic regression for two categories or multinomial logistic regression for multiple categories.
Recommended Statistical Test
Loading recommendation...
Sample Size Calculator
Calculate the required sample size for your study based on statistical parameters.
Select Study Type
Parameters for Comparing Means
The standardized difference between means:
Small effect: 0.2
Medium effect: 0.5
Large effect: 0.8
Results
Required sample size will appear here after calculation.
Quick Reference Guide
Common statistical tests used in medical research and when to use them.
Comparing Groups
Test | When to Use | Example in Medical Research |
---|---|---|
Independent t-test | Compare means between two independent groups with normally distributed data | Comparing mean blood pressure between treatment and control groups |
Paired t-test | Compare means between two related groups with normally distributed data | Comparing blood pressure before and after treatment in the same patients |
Mann-Whitney U test | Compare two independent groups with non-normally distributed data | Comparing pain scores between two different treatment groups |
Wilcoxon signed-rank test | Compare two related groups with non-normally distributed data | Comparing pain scores before and after treatment in the same patients |
One-way ANOVA | Compare means among three or more independent groups with normally distributed data | Comparing mean blood glucose levels among three different drug treatments |
Repeated measures ANOVA | Compare means among three or more related groups with normally distributed data | Comparing blood glucose levels at multiple time points after treatment |
Kruskal-Wallis test | Compare three or more independent groups with non-normally distributed data | Comparing pain scores among three different treatment groups |
Friedman test | Compare three or more related groups with non-normally distributed data | Comparing pain scores at multiple time points after treatment |
Categorical Data Analysis
Test | When to Use | Example in Medical Research |
---|---|---|
Chi-square test | Compare proportions between independent groups | Comparing the proportion of patients with side effects between two treatments |
Fisher's exact test | Compare proportions between independent groups with small sample sizes | Comparing rare adverse events between two treatments |
McNemar's test | Compare proportions between related groups | Comparing the presence of a symptom before and after treatment |
Correlation and Regression
Test | When to Use | Example in Medical Research |
---|---|---|
Pearson correlation | Assess linear relationship between two normally distributed continuous variables | Assessing the relationship between BMI and blood pressure |
Spearman correlation | Assess monotonic relationship between two variables when at least one is not normally distributed | Assessing the relationship between disease severity score and quality of life score |
Linear regression | Predict a continuous outcome based on one or more predictor variables | Predicting blood pressure based on age, BMI, and sodium intake |
Logistic regression | Predict a binary outcome based on one or more predictor variables | Predicting the likelihood of heart attack based on risk factors |
Cox proportional hazards | Analyze time-to-event data with censoring | Analyzing survival time after cancer diagnosis based on treatment type |
Common Statistical Terms
Term | Definition |
---|---|
p-value | The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true |
Confidence interval | A range of values that is likely to contain the true population parameter with a certain level of confidence |
Effect size | A quantitative measure of the magnitude of a phenomenon, such as the difference between groups or the strength of a relationship |
Power | The probability of correctly rejecting the null hypothesis when it is false |
Type I error | Rejecting the null hypothesis when it is true (false positive) |
Type II error | Failing to reject the null hypothesis when it is false (false negative) |
Interactive Examples
Explore these examples to better understand statistical concepts and tests.
Example 1: Comparing Two Treatment Groups
A randomized controlled trial compared a new antihypertensive medication (Treatment A) with a standard medication (Treatment B) in 50 patients with hypertension. The primary outcome was reduction in systolic blood pressure after 8 weeks of treatment.
Sample Data:
Treatment A (mmHg reduction) | Treatment B (mmHg reduction) |
---|---|
15, 12, 17, 14, 18, 20, 13, 16, 19, 14 | 10, 8, 12, 9, 11, 14, 7, 10, 13, 9 |
Statistical Analysis:
Since we are comparing two independent groups with continuous data, we need to first check if the data are normally distributed. Assuming normality, an independent t-test would be appropriate.
Results: Mean reduction in Treatment A = 15.8 mmHg, Treatment B = 10.3 mmHg
t-statistic: 5.42, p-value: < 0.001
Interpretation: There is a statistically significant difference in blood pressure reduction between the two treatments, with Treatment A showing a greater reduction.
Example 2: Before-After Intervention Study
A study evaluated the effect of a 12-week exercise program on HbA1c levels in 15 patients with type 2 diabetes.
Sample Data:
Patient | HbA1c Before (%) | HbA1c After (%) |
---|---|---|
1-5 | 8.2, 7.9, 8.5, 7.6, 8.0 | 7.5, 7.3, 7.8, 7.1, 7.4 |
6-10 | 8.3, 7.8, 8.1, 7.7, 8.4 | 7.6, 7.2, 7.5, 7.0, 7.7 |
11-15 | 7.5, 8.2, 7.9, 8.3, 8.0 | 7.0, 7.6, 7.3, 7.7, 7.4 |
Statistical Analysis:
Since we are comparing measurements from the same patients before and after an intervention, a paired t-test would be appropriate (assuming the differences are normally distributed).
Results: Mean HbA1c before = 8.03%, after = 7.41%
Mean difference: 0.62% (95% CI: 0.54 to 0.70)
t-statistic: 16.5, p-value: < 0.001
Interpretation: There is a statistically significant reduction in HbA1c levels after the exercise program.
Example 3: Association Between Risk Factor and Disease
A case-control study examined the association between smoking status and lung cancer in 200 participants (100 cases with lung cancer, 100 controls without lung cancer).
Sample Data:
Smokers | Non-smokers | Total | |
---|---|---|---|
Lung Cancer | 75 | 25 | 100 |
No Lung Cancer | 40 | 60 | 100 |
Total | 115 | 85 | 200 |
Statistical Analysis:
Since we are examining the association between two categorical variables, a chi-square test would be appropriate.
Results: Chi-square statistic = 24.35, p-value: < 0.001
Odds Ratio: 4.5 (95% CI: 2.4 to 8.4)
Interpretation: There is a statistically significant association between smoking and lung cancer. The odds of having lung cancer are 4.5 times higher among smokers compared to non-smokers.
Example 4: Correlation Between Clinical Variables
A study examined the correlation between body mass index (BMI) and systolic blood pressure (SBP) in 50 adults.
Sample Data (excerpt):
Patient | BMI (kg/m²) | SBP (mmHg) |
---|---|---|
1-5 | 22.5, 27.8, 31.2, 24.6, 29.3 | 118, 132, 145, 125, 138 |
6-10 | 26.1, 33.5, 25.2, 30.7, 28.4 | 128, 150, 122, 142, 135 |
Statistical Analysis:
Since we are examining the relationship between two continuous variables, a Pearson correlation would be appropriate (assuming both variables are normally distributed).
Results: Correlation coefficient (r) = 0.72, p-value: < 0.001
Interpretation: There is a strong positive correlation between BMI and systolic blood pressure, indicating that as BMI increases, systolic blood pressure tends to increase as well.