Automatic Methods for Coding Historical Occupation Descriptions to Standard Classifications
Recent Blog
Digitising Scotland on Twitter
Follow us on #digitisingscot Read more...
Digitising Scotland
Kirby, G., Carson, J., Dunlop, F., Dibben, C., Dearle, A., Williamson, L., Garrett, E., Reid, A. (2015) 3 43-60 Springer International Publishing ISBN: 978-3-319-19883-5
Other information: Abstract
The increasing availability of digitised registration records presents a significant opportunity for research in many fields including those of human geography, genealogy and medicine. Re-examining original records allows researchers to study factors such as occupation, cause of death, illness and geographic region. This can be facilitated by coding these factors to standard classification. This chapter describes work to develop a method for automatically coding the occupations of 29 million Scottish birth, death and marriage records, containing around 50 million occupation descriptions, to standard classifications. A range of approaches using text processing and supervised machine learning is evaluated, achieving classification performance of 75% micro-precision/recall, 61% macro-precision and 66% macro-recall on a smaller test set. Further development that may be needed for classification of the full data set is discussed.
Book Chapter
Available online: https://www.springer.com/gp/book/9783319198835
Follow us on #digitisingscot Read more...
© 2024 Digitising Scotland - The University of Edinburgh Site by [wideopenspace]
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Cookie | Duration | Description |
---|---|---|
__utma | 2 years | Used to distinguish users and sessions. The cookie is created when the javascript library executes and no existing __utma cookies exists. The cookie is updated every time data is sent to Google Analytics. |
__utmb | 30 minutes | Used to determine new sessions/visits. The cookie is created when the javascript library executes and no existing __utmb cookies exists. The cookie is updated every time data is sent to Google Analytics. |
__utmc | Not used in ga.js. Set for interoperability with urchin.js. Historically, this cookie operated in conjunction with the __utmb cookie to determine whether the user was in a new session/visit. | |
__utmt | 10 minutes | Used to throttle request rate. |
__utmz | 6 months | Stores the traffic source or campaign that explains how the user reached your site. The cookie is created when the javascript library executes and is updated every time data is sent to Google Analytics. |
_ga | 2 years | Used to distinguish users. |
_gat | 1 minute | Used to throttle request rate. |
_gid | 24 hours | Used to distinguish users. |