A design of experiments approach to validation sampling for logistic regression modeling with error-prone medical records.
Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time [...]
Author(s): Ouyang, Liwen, Apley, Daniel W, Mehrotra, Sanjay
DOI: 10.1093/jamia/ocv132