π¦ Testing the Proportional Hazards Assumption in Cox Models (with ggplot2 Diagnostics)
Brief:
This workflow explains how, why, and when to check the proportional hazards (PH) assumption in Cox regression, how to interpret violations (with plots), and when/why to switch to a more flexible model like Piecewise Exponential Additive Mixed Models (PAMMs). Includes a full-featured R function to automate the process, create publication-quality diagnostic plots, and guide model selection.
π Summary
Before trusting hazard ratios from a Cox proportional hazards model, the PH assumption must be checked. If hazards arenβt proportional (i.e., the effect of a variable changes over time), inferences and survival estimates may be misleading.
The principal diagnostic is the Schoenfeld residualβa statistic that captures, for each event time and covariate, how far the observed covariate value was from what was expected; trends in these residuals over time imply changing effects, thus violation of the PH assumption.
π Key Points
1οΈβ£ The PH Assumption
- Definition: The hazard ratio (effect of a covariate) is constant over time.
- If violated: Hazard ratios are not meaningful; model fit and predictions suffer.
- Common cause: Biological phenomena, treatment effects, or external events that unfold differently across time.
2οΈβ£ The Role of Schoenfeld Residuals
- What they are: At each event, they measure the difference between:
- The observed covariate for the subject who failed
- The risk set average for that covariate at that time
- If PH holds: Residuals should show no trend (random scatter around zero) when plotted against event time.
- If PH is violated: Residuals will show a systematic trend (up or down, or curvature) for some covariate(s).
3οΈβ£ The Automated Testing & Visualization Flow
- Frequentist Cox fitted via
coxph() - Check with
cox.zph(): - P-values for each covariate and global test
- Print interpretation and recommendation
- Diagnostic plots:
- Points = Schoenfeld residuals
- Smooth curve = loess fit (trend should be flat if PH holds)
- Red dashed = zero reference line
- Faceted by covariate with significance displayed
β "How-To" Steps
1οΈβ£ Fit the Cox Model and Test PH
# Standard workflow, usually with data already prepared
result <- test_proportional_hazards(
data = my_data,
time_var = "time_to_event",
status_var = "status", # 1=event, 0=censored (by convention)
covariates = c("age", "sex", "group")
)
2οΈβ£ Output: Printed Results
- Table of tests (p-values for each covariate and global statistic)
- β List of violated covariates
- Overall ("GLOBAL") test for model-wide violation
- Automated recommendation: Stay with Cox or switch to PAMM
3οΈβ£ Output: Diagnostic Plots
- Facets: One per covariate (ignores "GLOBAL")
- X = Event time | Y = Schoenfeld residuals
- Red = violated | Blue = OK
- Annotated with each covariate's p-value
- Curve (black, smoothed): deviation from zero means time-varying effect
Interpretation:
- Flat trend & scattered points: PH ok
- Curved, sloped, or patterned trend: PH violated
π§ Deep Dive: What Are Schoenfeld Residuals?
Mechanics
- At every event, for each covariate: \(r_i = x_{i,\text{event}} - \mathbb{E}[x | \text{risk set at } t_i]\)
- If model fits well under PH: No systematic pattern as a function of time
- Trend = time-varying effect: Residuals cluster above or below zero with time
Statistical Test
- Correlation/Test for Trend:
cox.zph()regresses residuals on (a function of) time, checks slope β 0- P-value < 0.05: significant evidence against PH
Visual Intuition
- Plotting residuals versus time:
- Random noise = good
- Upward/downward drift, waves, or other structure = violation
- Smoothed line (loess): Visualizes systematic drift
π» R Implementation: Key Features
- Flexible function: pass any dataset, variables
- Prints test tables + recommendations
- Publication-quality plots: Faceted, color-coded for violations, significance in subtitle, tweakable
- Returns models, test objects, tidy data for further plotting
Example Diagnostic Plot
Horizontal line at zero, smoothed curve, color-annotated: drift/trend = violation
(Your plot will render directly in RStudio/jupyter/quarto/knitr with the scriptβs output.)
π¦ What to Do When PH is Violated?
- Switch to a flexible model:
- PAMM: Models hazard (or log hazard) as a smooth function of time and covariates, allows interactions (
s(time, by = covariate)) - Other: stratified Cox (if only a categorical variable is problematic), time-varying covariates, or fully nonparametric models
π‘ Key Learnings & Pitfalls
- Always check PH before interpreting hazard ratios or survival curves
- PH can be violated for one covariate, or overall (GLOBAL)
- Diagnostic plot essential for publication/review
- Schoenfeld residuals: Best tool for detecting non-constant hazard ratios
- Switch modeling strategy (e.g., PAMM) if violation detectedβespecially for main exposures
R Function Reference
See:
test_proportional_hazards()for all-in-one analysis, text output, and customizable plotting.