Exploiting historical registers: Automatic methods for coding c19th and c20th cause of death descriptions to standard classifications

Carson, J., Kirby, G., Dearle, A., Williamson, L., Garrett, E., Reid, A. & Dibben, C. (2013) NTTS (New Techniques and Technologies for Statistics) 15-17 March 2013

Other information:

The increasing availability of digitised registration records presents a significant opportunity for research. Returning to the original records allows researchers to classify descriptions, such as cause of death, to modern medical understandings of illness and disease, rather than relying on contemporary registrars’ classifications.

Linkage of an individual’s records together also allows the production of sparse life-course micro-datasets. The further linkage of these into family units then presents the possibility of reconstructing family structures and producing multi-generational studies. We describe work to develop a method for automatically coding to standard classifications the causes of death from 8.3 million Scottish death certificates. We have evaluated a range of approaches using text processing and supervised machine learning, obtaining accuracy from 72%-96% on several test sets. We present results and speculate on further development that may be needed for classification of the full data set.

Available online: Link
Download outcome document: Exploiting historical registers: Automatic methods for coding c19th and c20th cause of death descriptions to standard classifications. PDF

Recent News

Vacancy: Early Stage Researcher

Applications are invited for an Early Stage Researcher position funded by the Marie Sklodowska-Curie Innovative Training Network “LONGPOP (Methodologies and... Read more...

Digitising Scotland on Twitter

Follow us on #digitisingscot Read more...

Recent Blog

Digitising Scotland Project Blog (Computer Science)

Digitising Scotland Project Blog (Computer Science) The Computer Science group from the project have put together a website detailing their... Read more...

Latest Tweets