With growing legal and scientific evidence for the importance of reducing model bias, both model developers and deployers need tools to quantify the bias. Unfortunately, algorithmic bias can take as many forms as there are implementations. In this talk, Paul M. Heider covesr a range clinical NLP use cases like de-identification and predicting diagnoses, highlighting the utility of behavioral testing and comparative evaluation methods to identify the scope of a model’s bias.