Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019, Historical Methods: A Journal of Quantitative and Interdisciplinary History
…
43 pages
1 file
In this paper we describe the record linkage procedure to create a panel from Cape Colony census returns, or opgaafrolle, for 1787-1828, a dataset of 42 354 household-level observations. Based on a subset of manually linked records, we first evaluate statistical models and deterministic algorithms to best identify and match households over time. By using household-level characteristics in the linking process and near-annual data, we are able to create high-quality links for 84 percent of the dataset. We compare basic analyses on the linked panel dataset to the original cross-sectional data, evaluate the feasibility of the strategy when linking to supplementary sources, and discuss the scalability of our approach to the full Cape panel.
Australian & New Zealand Journal of Statistics
The Australian Bureau of Statistics (ABS) is creating a longitudinal sample, called the Australian Censes Longitudinal Dataset (ACLD), by linking person records across its 5-yearly Census of Population and Housing. This paper proposes a Multi-Panel framework for selecting and weighting records in the ACLD. This framework can be applied more generally to selecting longitudinal samples from a series of cross-sectional administrative files. The proposed Page 2 framework avoids some significant limitations of the popular "Top-up" sampling approach to maintaining the cross-sectional and longitudinal representativeness of a sample over time. .
2019
The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. We evaluate different automated methods for record linkage, performing a series of comparisons across methods and against hand linking. We have three main findings that lead us to conclude that automated methods perform well. First, a number of automated methods generate very low (less than 5%) false positive rates. The automated methods trace out a frontier illustrating the tradeoff between the false positive rate and the (true) match rate. Relative to more conservative automated algorithms, humans tend to link more observations but at a cost of higher rates of false positives. Second, when human linkers and algorithms use the same linking variables, there is relatively little disagreement between them. Third, across a number of plausible analyses, coefficient estimates and parameters of interest are very similar when using linked samples based on each of the different automated methods. We provide code and Stata commands to implement the various automated methods.
1987
This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a priori knowledge of truth of matches is assumed. No previously created lookup tables are needed. The methods are illustrated with numerical results using files from the 1988 Dress Rehearsal Census for which the truth of matches is known.
International Journal of Social Research Methodology, 2008
Linkage of household survey responses with administrative data is increasingly on the agenda. Unique individual identifiers have clear benefits for making linkages but are also subject to problems of survey item non-response and measurement error. Our experimental study that linked survey responses to UK government agency records on benefits and tax credits elucidates this trade-off. We compare five linkage criteria: one based on a respondent-supplied National Insurance Number (NINO) and the other four using different combinations of sex, name, address and date of birth. As many linkages were made using non-NINO-based matches as were made using matches on NINO and the former were also relatively accurate when assessed in terms of false-positive and false-negative linkage rates. The potential returns from hierarchical and pooled matching are also examined.
2004
ESRC Research Centre on Micro-social Change. Established in 1989 to identify, explain, model and forecast social change in Britain at the individual and household level, the Centre specialises in research using longitudinal data.
Advances in Artificial Intelligence, Proceedings of 30th Canadian Conference on Artificial Intelligence, 2017
In this paper, we present a method, that uses domain knowledge, to automatically discover and assign household identifiers to individual historical records. We apply this algorithm on a full count real census (the 1891 Canadian census) to assign household identifiers to all the records.
The History of the Family, 2018
To study the intergenerational dynamics of productivity, social mobility and demographic change of any contemporary society is a challenge. To do this for a pre-industrial society at the southern tip of Africa seems almost impossible. Yet this is the purpose of the Cape of Good Hope Panel, an annual panel data setstill under constructionof Cape Colony settler tax records over almost two centuries. The transcription of this ambitious project is now in its fourth year. Here we describe the history of the project, the transcription process, and present some preliminary results.
2016
The use of administrative datasets as a data source in official statistics has become much more common as there is a drive for more outputs to be produced more efficiently. Many outputs rely on linkage between two or more datasets, and this is often undertaken in a number of phases with different methods and rules. In these situations we would like to be able to assess the quality of the linkage, and this involves some re-assessment of both links and non-links. In this paper we discuss sampling approaches to obtain estimates of false negatives and false positives with reasonable control of both accuracy of estimates and cost. Approaches to stratification of links (non-links) to sample are evaluated using information from the 2011 England and Wales population census.
Journal of Business & Economic Statistics, 2023
Historical Life Course Studies, 2020
During the 19th and early 20th century about 220,000 Dutch born persons migrated to the USA. The Historical Sample of the Netherlands (HSN) contains about 85,500 persons born in the Netherlands between 1812 and 1922. In this article we report the way we have matched persons from the HSN with the American censuses from the period 1850 till 1940. For this purpose, a linking process was designed, comprising of three stages: harmonization, matching and validation. The different nature of the two datasets (HSN and the USA Censuses) asked for some harmonization prior to the matching. Once the data had been properly prepared, two strategies were applied in order to link the data sets. The first one, called Similarity Approach, matched individuals from both datasets by comparing on the basis of resemblance of first and last names. The second approach, called Transformation Approach, made use of dictionaries with Anglicized versions of Dutch first and last names and their most common or most...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Research Papers in Economics, 2021
BMC Medical Research Methodology, 2014
Historical Methods: A Journal of Quantitative and Interdisciplinary History, 2015
2014 IEEE International Conference on Data Mining Workshop, 2014
BMC Health Services Research, 2010
Population Health Metrics, 2014
Research Data Journal for the Humanities and Social Sciences
Economic History of Developing Regions, 2019
Research Data Journal for the Humanities and Social Sciences
Population Reconstruction, 2015