Skip to content

🟦 Testing the Proportional Hazards Assumption in Cox Models (with ggplot2 Diagnostics)

Brief:
This workflow explains how, why, and when to check the proportional hazards (PH) assumption in Cox regression, how to interpret violations (with plots), and when/why to switch to a more flexible model like Piecewise Exponential Additive Mixed Models (PAMMs). Includes a full-featured R function to automate the process, create publication-quality diagnostic plots, and guide model selection.


πŸ“ Summary

Before trusting hazard ratios from a Cox proportional hazards model, the PH assumption must be checked. If hazards aren’t proportional (i.e., the effect of a variable changes over time), inferences and survival estimates may be misleading.

The principal diagnostic is the Schoenfeld residualβ€”a statistic that captures, for each event time and covariate, how far the observed covariate value was from what was expected; trends in these residuals over time imply changing effects, thus violation of the PH assumption.


πŸ”‘ Key Points

1️⃣ The PH Assumption

  • Definition: The hazard ratio (effect of a covariate) is constant over time.
  • If violated: Hazard ratios are not meaningful; model fit and predictions suffer.
  • Common cause: Biological phenomena, treatment effects, or external events that unfold differently across time.

2️⃣ The Role of Schoenfeld Residuals

  • What they are: At each event, they measure the difference between:
  • The observed covariate for the subject who failed
  • The risk set average for that covariate at that time
  • If PH holds: Residuals should show no trend (random scatter around zero) when plotted against event time.
  • If PH is violated: Residuals will show a systematic trend (up or down, or curvature) for some covariate(s).

3️⃣ The Automated Testing & Visualization Flow

  • Frequentist Cox fitted via coxph()
  • Check with cox.zph():
  • P-values for each covariate and global test
  • Print interpretation and recommendation
  • Diagnostic plots:
  • Points = Schoenfeld residuals
  • Smooth curve = loess fit (trend should be flat if PH holds)
  • Red dashed = zero reference line
  • Faceted by covariate with significance displayed

βœ… "How-To" Steps

1️⃣ Fit the Cox Model and Test PH

# Standard workflow, usually with data already prepared
result <- test_proportional_hazards(
  data = my_data,
  time_var = "time_to_event", 
  status_var = "status",               # 1=event, 0=censored (by convention)
  covariates = c("age", "sex", "group")
)

2️⃣ Output: Printed Results

  • Table of tests (p-values for each covariate and global statistic)
  • ❌ List of violated covariates
  • Overall ("GLOBAL") test for model-wide violation
  • Automated recommendation: Stay with Cox or switch to PAMM

3️⃣ Output: Diagnostic Plots

  • Facets: One per covariate (ignores "GLOBAL")
  • X = Event time | Y = Schoenfeld residuals
  • Red = violated | Blue = OK
  • Annotated with each covariate's p-value
  • Curve (black, smoothed): deviation from zero means time-varying effect

Interpretation:
- Flat trend & scattered points: PH ok
- Curved, sloped, or patterned trend: PH violated


🧠 Deep Dive: What Are Schoenfeld Residuals?

Mechanics

  • At every event, for each covariate: \(r_i = x_{i,\text{event}} - \mathbb{E}[x | \text{risk set at } t_i]\)
  • If model fits well under PH: No systematic pattern as a function of time
  • Trend = time-varying effect: Residuals cluster above or below zero with time

Statistical Test

  • Correlation/Test for Trend:
  • cox.zph() regresses residuals on (a function of) time, checks slope β‰  0
  • P-value < 0.05: significant evidence against PH

Visual Intuition

  • Plotting residuals versus time:
  • Random noise = good
  • Upward/downward drift, waves, or other structure = violation
  • Smoothed line (loess): Visualizes systematic drift

πŸ’» R Implementation: Key Features

  • Flexible function: pass any dataset, variables
  • Prints test tables + recommendations
  • Publication-quality plots: Faceted, color-coded for violations, significance in subtitle, tweakable
  • Returns models, test objects, tidy data for further plotting

Example Diagnostic Plot

Schoenfeld Example Plot Horizontal line at zero, smoothed curve, color-annotated: drift/trend = violation

(Your plot will render directly in RStudio/jupyter/quarto/knitr with the script’s output.)


🚦 What to Do When PH is Violated?

  • Switch to a flexible model:
  • PAMM: Models hazard (or log hazard) as a smooth function of time and covariates, allows interactions (s(time, by = covariate))
  • Other: stratified Cox (if only a categorical variable is problematic), time-varying covariates, or fully nonparametric models

πŸ’‘ Key Learnings & Pitfalls

  • Always check PH before interpreting hazard ratios or survival curves
  • PH can be violated for one covariate, or overall (GLOBAL)
  • Diagnostic plot essential for publication/review
  • Schoenfeld residuals: Best tool for detecting non-constant hazard ratios
  • Switch modeling strategy (e.g., PAMM) if violation detectedβ€”especially for main exposures

R Function Reference

See: test_proportional_hazards() for all-in-one analysis, text output, and customizable plotting.



πŸ“Ž Example Usage

# Test PH on veteran lung data
result <- test_proportional_hazards(
  data = veteran,
  time_var = "time",
  status_var = "status",
  covariates = c("age", "karno", "celltype")
)

# Further customize diagnostic plot
result$plot + theme_bw()