Bayesian Logistic Regression Application in Diabetes Probability Prediction
IDC6940 Fall 2025 Capstone Project Presentation
Namita Mishra, Autumn Wilcox (Advisor: Dr. Cohen)
2025-09-28
Our Question
Can Bayesian logistic regression provide more stable and transparent inference than classical MLE for diabetes-related outcomes in NHANES 2013–2014?
Focus predictors: BMI category (BMDBMIC), Age (RIDAGEYR), Sex (RIAGENDR), Race/Ethnicity (RIDRETH1)
Outcome: DIQ240 (usual diabetes doctor) — diabetes-related; we may swap to DIQ010 (diagnosis) in the next iteration and keep DIQ240 as a covariate.
Data Pipeline (Reproducible)
- Source: NHANES 2013–2014
- Files: BMX_H, DEMO_H, DIQ_H
- Preprocessing is in
R/data_prep.R
- Analysis slides load prebuilt data:
Variables
Predictors/Covariates: BMDBMIC (BMI category), RIDAGEYR (Age), RIAGENDR (Sex), RIDRETH1 (Race/Ethnicity)
Survey Design: WTMEC2YR (weights), SDMVPSU (PSU), SDMVSTRA (strata)
Outcome (current placeholder): DIQ240 — not a diagnosis; proxy for diabetes care.
Plan: consider DIQ010 as outcome; keep DIQ240 as access/behavior covariate.
Quick EDA: Age Table
73557 |
NA |
69 |
73558 |
NA |
54 |
73559 |
NA |
72 |
73560 |
Normal weight |
9 |
73561 |
NA |
73 |
73562 |
NA |
56 |
Quick EDA: Age Distribution
Quick EDA: BMI Category Codes
Survey Design
mean SE
RIDAGEYR 37.504 0.4412
Methods
- Baseline: Logistic regression (MLE)
- Main: Bayesian logistic regression with weakly-informative priors
- Missingness: Prefer multiple imputation (or Bayesian missing-data modeling) over listwise deletion to avoid instability/separation
- Compare: effect estimates (ORs / posteriors), uncertainty (CIs vs credible intervals), calibration and discrimination
What to Expect in Results
- Stability: Priors shrink unstable MLE estimates under separation
- Uncertainty: Posteriors provide full credible intervals
- Practicality: Survey-aware modeling reflects population inference
- Sensitivity: Check priors; compare MI vs. deletion
Next Steps
- Finalize outcome (DIQ010 likely) and keep DIQ240 as covariate
- Implement MI (document MAR/MNAR assumptions)
- Fit MLE and Bayesian models; add survey-weighted variants
- Report effects, uncertainty, and predictive performance
References
See references.bib. (Key: survey methods, Bayesian GLMs, imputation.)