Medical Statistics Assistant

Interactive tool for selecting appropriate statistical tests and calculating sample sizes

Statistical Test Selector

Answer the following questions to determine the appropriate statistical test for your study.

What is the main purpose of your analysis?

Explanation:

The purpose of your analysis determines the broad category of statistical tests that would be appropriate for your study.

How many groups are you comparing?

Explanation:

The number of groups in your study affects which statistical test is appropriate. Some tests are designed specifically for two-group comparisons, while others can handle multiple groups.

What type of data are you analyzing?

Explanation:

The type of data you're analyzing is crucial for selecting the appropriate test. Continuous data are measured on a scale, while categorical data fall into distinct categories.

Are the data normally distributed?

Explanation:

Normal distribution is a key assumption for many statistical tests. You can check normality using histograms, Q-Q plots, or statistical tests like Shapiro-Wilk.

What type of categorical data?

Explanation:

The specific type of categorical data affects which test is most appropriate. Binary data have only two possible values, while ordinal data have a natural order.

Are the samples paired or independent?

Explanation:

Paired samples involve the same subjects measured twice (e.g., before and after treatment). Independent samples involve different subjects in each group.

Are the samples paired or independent?

Explanation:

For non-normally distributed data, we use non-parametric tests that don't assume normality.

How to check for normality

Methods to check normality:

1. Visual methods: Histogram, Q-Q plot

2. Statistical tests: Shapiro-Wilk test, Kolmogorov-Smirnov test

If p-value > 0.05 in these tests, data can be considered normally distributed.

Normal Distribution
Non-Normal Distribution

What type of data are you analyzing?

Explanation:

For multiple group comparisons, the data type determines whether to use ANOVA (for continuous data) or chi-square (for categorical data).

Are the data normally distributed?

Explanation:

For multiple group comparisons with continuous data, we use ANOVA if data are normally distributed, or Kruskal-Wallis if not.

Are the groups related or independent?

Explanation:

For normally distributed data with multiple groups, we use repeated measures ANOVA for related samples and one-way ANOVA for independent samples.

What types of variables are you analyzing?

Explanation:

The types of variables you're analyzing determine which correlation or association test is appropriate.

Are both variables normally distributed?

Explanation:

For two continuous variables, we use Pearson correlation if both are normally distributed, or Spearman correlation if not.

How many categories in the categorical variable?

Explanation:

When analyzing a categorical and a continuous variable, the number of categories determines whether to use t-tests (for two categories) or ANOVA (for multiple categories).

What type of outcome are you predicting?

Explanation:

The type of outcome you're predicting determines whether to use linear regression (for continuous outcomes) or logistic regression (for categorical outcomes).

How many categories in the outcome variable?

Explanation:

For categorical outcomes, we use binary logistic regression for two categories or multinomial logistic regression for multiple categories.

Recommended Statistical Test

Sample Size Calculator

Calculate the required sample size for your study based on statistical parameters.

Select Study Type

Parameters for Comparing Means

The probability of rejecting the null hypothesis when it is true (Type I error).
The probability of rejecting the null hypothesis when it is false (1 - Type II error).

The standardized difference between means:

Small effect: 0.2

Medium effect: 0.5

Large effect: 0.8

Results

Required sample size will appear here after calculation.

Quick Reference Guide

Common statistical tests used in medical research and when to use them.

Comparing Groups

Test When to Use Example in Medical Research
Independent t-test Compare means between two independent groups with normally distributed data Comparing mean blood pressure between treatment and control groups
Paired t-test Compare means between two related groups with normally distributed data Comparing blood pressure before and after treatment in the same patients
Mann-Whitney U test Compare two independent groups with non-normally distributed data Comparing pain scores between two different treatment groups
Wilcoxon signed-rank test Compare two related groups with non-normally distributed data Comparing pain scores before and after treatment in the same patients
One-way ANOVA Compare means among three or more independent groups with normally distributed data Comparing mean blood glucose levels among three different drug treatments
Repeated measures ANOVA Compare means among three or more related groups with normally distributed data Comparing blood glucose levels at multiple time points after treatment
Kruskal-Wallis test Compare three or more independent groups with non-normally distributed data Comparing pain scores among three different treatment groups
Friedman test Compare three or more related groups with non-normally distributed data Comparing pain scores at multiple time points after treatment

Categorical Data Analysis

Test When to Use Example in Medical Research
Chi-square test Compare proportions between independent groups Comparing the proportion of patients with side effects between two treatments
Fisher's exact test Compare proportions between independent groups with small sample sizes Comparing rare adverse events between two treatments
McNemar's test Compare proportions between related groups Comparing the presence of a symptom before and after treatment

Correlation and Regression

Test When to Use Example in Medical Research
Pearson correlation Assess linear relationship between two normally distributed continuous variables Assessing the relationship between BMI and blood pressure
Spearman correlation Assess monotonic relationship between two variables when at least one is not normally distributed Assessing the relationship between disease severity score and quality of life score
Linear regression Predict a continuous outcome based on one or more predictor variables Predicting blood pressure based on age, BMI, and sodium intake
Logistic regression Predict a binary outcome based on one or more predictor variables Predicting the likelihood of heart attack based on risk factors
Cox proportional hazards Analyze time-to-event data with censoring Analyzing survival time after cancer diagnosis based on treatment type

Common Statistical Terms

Term Definition
p-value The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true
Confidence interval A range of values that is likely to contain the true population parameter with a certain level of confidence
Effect size A quantitative measure of the magnitude of a phenomenon, such as the difference between groups or the strength of a relationship
Power The probability of correctly rejecting the null hypothesis when it is false
Type I error Rejecting the null hypothesis when it is true (false positive)
Type II error Failing to reject the null hypothesis when it is false (false negative)

Interactive Examples

Explore these examples to better understand statistical concepts and tests.

Example 1: Comparing Two Treatment Groups

A randomized controlled trial compared a new antihypertensive medication (Treatment A) with a standard medication (Treatment B) in 50 patients with hypertension. The primary outcome was reduction in systolic blood pressure after 8 weeks of treatment.

Sample Data:

Treatment A (mmHg reduction) Treatment B (mmHg reduction)
15, 12, 17, 14, 18, 20, 13, 16, 19, 14 10, 8, 12, 9, 11, 14, 7, 10, 13, 9

Statistical Analysis:

Since we are comparing two independent groups with continuous data, we need to first check if the data are normally distributed. Assuming normality, an independent t-test would be appropriate.

Results: Mean reduction in Treatment A = 15.8 mmHg, Treatment B = 10.3 mmHg

t-statistic: 5.42, p-value: < 0.001

Interpretation: There is a statistically significant difference in blood pressure reduction between the two treatments, with Treatment A showing a greater reduction.

Example 2: Before-After Intervention Study

A study evaluated the effect of a 12-week exercise program on HbA1c levels in 15 patients with type 2 diabetes.

Sample Data:

Patient HbA1c Before (%) HbA1c After (%)
1-5 8.2, 7.9, 8.5, 7.6, 8.0 7.5, 7.3, 7.8, 7.1, 7.4
6-10 8.3, 7.8, 8.1, 7.7, 8.4 7.6, 7.2, 7.5, 7.0, 7.7
11-15 7.5, 8.2, 7.9, 8.3, 8.0 7.0, 7.6, 7.3, 7.7, 7.4

Statistical Analysis:

Since we are comparing measurements from the same patients before and after an intervention, a paired t-test would be appropriate (assuming the differences are normally distributed).

Results: Mean HbA1c before = 8.03%, after = 7.41%

Mean difference: 0.62% (95% CI: 0.54 to 0.70)

t-statistic: 16.5, p-value: < 0.001

Interpretation: There is a statistically significant reduction in HbA1c levels after the exercise program.

Example 3: Association Between Risk Factor and Disease

A case-control study examined the association between smoking status and lung cancer in 200 participants (100 cases with lung cancer, 100 controls without lung cancer).

Sample Data:

Smokers Non-smokers Total
Lung Cancer 75 25 100
No Lung Cancer 40 60 100
Total 115 85 200

Statistical Analysis:

Since we are examining the association between two categorical variables, a chi-square test would be appropriate.

Results: Chi-square statistic = 24.35, p-value: < 0.001

Odds Ratio: 4.5 (95% CI: 2.4 to 8.4)

Interpretation: There is a statistically significant association between smoking and lung cancer. The odds of having lung cancer are 4.5 times higher among smokers compared to non-smokers.

Example 4: Correlation Between Clinical Variables

A study examined the correlation between body mass index (BMI) and systolic blood pressure (SBP) in 50 adults.

Sample Data (excerpt):

Patient BMI (kg/m²) SBP (mmHg)
1-5 22.5, 27.8, 31.2, 24.6, 29.3 118, 132, 145, 125, 138
6-10 26.1, 33.5, 25.2, 30.7, 28.4 128, 150, 122, 142, 135

Statistical Analysis:

Since we are examining the relationship between two continuous variables, a Pearson correlation would be appropriate (assuming both variables are normally distributed).

Results: Correlation coefficient (r) = 0.72, p-value: < 0.001

Interpretation: There is a strong positive correlation between BMI and systolic blood pressure, indicating that as BMI increases, systolic blood pressure tends to increase as well.