- The Digitising Scotland Team
University of St Andrews
Tom is a third-year Computer Science Ph.D. student at the University of St Andrews, under the supervision of Prof. Alan Dearle and Dr. Graham Kirby. His thesis focuses on the evaluation of Data Linkage methodology using statistically verifiable synthetic populations. Tom holds a BSc in Computer Science (also from St Andrews) with a dissertation exploring approaches to representing and querying probabilistically linked genealogies while considering the associated uncertainties and provenance. His wider research interests include data linkage methodology, synthetic data, data science, and software engineering.
PhD project details:
Evaluating Data Linkage: Creating longitudinal synthetic data to provide gold-standard linked data sets for robust methodological evaluation
When performing probabilistic data linkage on real-world data we, by the fact we need to link it, do not know the true linkage. Therefore, the success of a linkage approach is difficult to evaluate. Often small hand linked datasets are used as a ‘gold-standard’ for a linkage approach to be evaluated against. However, errors in the hand-linkage and the limited size, number, and availability of these datasets do not allow for robust evaluation.
This work looks to demonstrates the benefits of using longitudinal synthetic data to provide gold-standard linked data sets for linkage evaluation in the domain of Population Reconstruction. This research outlines:
– a model for generating population data sets
– an approach for statistical verification of the generated data sets
– an evaluation approach to verify the model across the range of possible inputs
– an approach for using the data to robustly evaluate linkage approaches
In application, the synthesised gold-standard data sets will be used to evaluate a range of preexisting linkage approaches. The evaluation will consider the performance of each linkage approach over variations in the population size, characteristics, and levels of data corruption.