|
Abstract |
This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town. |
|