A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.
Clinical databases require accurate entity resolution (ER). One approach is to use algorithms that assign questionable cases to manual review. Few studies have compared the performance of common algorithms for such a task. Furthermore, previous work has been limited by a lack of objective methods for setting algorithm parameters. We compared the performance of common ER algorithms: using algorithmic optimization, rather than manual parameter tuning, and on two-threshold classification (match/manual [...]
Author(s): Joffe, Erel, Byrne, Michael J, Reeder, Phillip, Herskovic, Jorge R, Johnson, Craig W, McCoy, Allison B, Sittig, Dean F, Bernstam, Elmer V
DOI: 10.1136/amiajnl-2013-001744