Bayesian Logistic Regression Application in Diabetes Probability Prediction

IDC6940 Fall 2025 Capstone Project Presentation

Namita Mishra, Autumn Wilcox (Advisor: Dr. Cohen)

2025-09-28

Our Question

  • Can Bayesian logistic regression provide more stable and transparent inference than classical MLE for diabetes-related outcomes in NHANES 2013–2014?

  • Focus predictors: BMI category (BMDBMIC), Age (RIDAGEYR), Sex (RIAGENDR), Race/Ethnicity (RIDRETH1)

  • Outcome: DIQ240 (usual diabetes doctor) — diabetes-related; we may swap to DIQ010 (diagnosis) in the next iteration and keep DIQ240 as a covariate.

Data Pipeline (Reproducible)

  • Source: NHANES 2013–2014
  • Files: BMX_H, DEMO_H, DIQ_H
  • Preprocessing is in R/data_prep.R
  • Analysis slides load prebuilt data:
[1] 9813    9

Variables

  • Predictors/Covariates: BMDBMIC (BMI category), RIDAGEYR (Age), RIAGENDR (Sex), RIDRETH1 (Race/Ethnicity)

  • Survey Design: WTMEC2YR (weights), SDMVPSU (PSU), SDMVSTRA (strata)

  • Outcome (current placeholder): DIQ240 — not a diagnosis; proxy for diabetes care.

  • Plan: consider DIQ010 as outcome; keep DIQ240 as access/behavior covariate.

Quick EDA: Age Table

SEQN BMDBMIC RIDAGEYR
73557 NA 69
73558 NA 54
73559 NA 72
73560 Normal weight 9
73561 NA 73
73562 NA 56

Quick EDA: Age Distribution

Quick EDA: BMI Category Codes

Survey Design

           mean     SE
RIDAGEYR 37.504 0.4412

Methods

  • Baseline: Logistic regression (MLE)
  • Main: Bayesian logistic regression with weakly-informative priors
  • Missingness: Prefer multiple imputation (or Bayesian missing-data modeling) over listwise deletion to avoid instability/separation
  • Compare: effect estimates (ORs / posteriors), uncertainty (CIs vs credible intervals), calibration and discrimination

Modeling

What to Expect in Results

  • Stability: Priors shrink unstable MLE estimates under separation
  • Uncertainty: Posteriors provide full credible intervals
  • Practicality: Survey-aware modeling reflects population inference
  • Sensitivity: Check priors; compare MI vs. deletion

Next Steps

  • Finalize outcome (DIQ010 likely) and keep DIQ240 as covariate
  • Implement MI (document MAR/MNAR assumptions)
  • Fit MLE and Bayesian models; add survey-weighted variants
  • Report effects, uncertainty, and predictive performance

References

See references.bib. (Key: survey methods, Bayesian GLMs, imputation.)