Efficient sequential and parallel algorithms for record linkage.
Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms.
Author(s): Mamun, Abdullah-Al, Mi, Tian, Aseltine, Robert, Rajasekaran, Sanguthevar
DOI: 10.1136/amiajnl-2013-002034