Debates over the future of the United Kingdom’s traditional decadal census have led to the exploration of supplementary data sources which could support the provision of timely and enhanced statistics on population and housing in small areas. Much of the attention has been focused on the (re)use of governmental administrative datasets but, following Dugmore (Information Collected by Commercial Companies: What Might Be of Value to ONS? Dugmore, 2009)3 as well as several more recent papers (Big Data as a Source of Statistical Information.Daas and Puts 20142; ‘Official Statistics and Big Data.’Struijs, Braaksma, and Daas 20146; On the Value of Digital Traces for Commercial Strategy and Public Policy: Telecommunications Data as a Case Study Claxton, Reades, and Anderson 20121), we argue that data held by commercial organisations may offer considerable additional value as a supplement to familiar census or survey and administrative data sources.

Over the last 18 months we have been conducting exploratory research via an ESRC funded ‘Transformative Social Science’ programme grant (Census 2022: Transforming Small Area Socio-Economic Indicators through 'Big Data') and a parallel feasibility study funded by the Office for National Statistics through their Big Data project. Both of these have explored the potential for inferring traditional census-like or novel household attributes from large scale (‘big’) consumption data whilst the ONS report also developed indicators of half-hourly ‘occupancy’ as a potential aid to survey and enumeration fieldwork.

An initial paper, The Role of Digital Trace Data in Supporting the Collection of Population Statistics - the Case for Smart Metered Electricity Consumption Data: Collecting Population Statistics via Smart Metered Electricity Load Data, based on this work, published in Population, Space and Place in July 2015, reviewed the potential value of a number of commercial datasets before focusing on high temporal resolution household electricity load data collected via smart metering. Based on previous work that has used smart meter data to cluster customers according to consumption profiles, we suggest that such data could provide indicators of household attributes that could then be aggregated at the census output area level to generate more frequent official small area statistics. These estimates could directly supplement existing census indicators or even enable development of novel small area indicators and yet with few exceptions (e.g. Caroll et al., 2013) we are unaware of any studies which have explicitly considered the role of smart meter-like household energy consumption data as a tool for generating small area population statistics.

Clearly, generating small area estimates requires a two stage process; first household attributes of interest need to be inferred from the consumption data, secondly the estimated household attributes would need to be aggregated to relevant output levels for publication as area based statistics with the normal disclosure controls. To date we have been concerned solely with the first stage and have not attempted to generate area based indicators largely because full population smart meter datasets for even a sample of small areas do not currently exist.

Our preliminary analysis (published in Newing et al. 2015)4 of a ‘smart meter-like’ dataset which has both power demand observed at the 1 second level and a linked household survey has confirmed some of the data cleansing, error checking and processing difficulties associated with ‘big data’(Puts, Daas, and de Waal 2015)5. Considerable human and computational resource was needed to produce even a relatively small (1 month) aggregated half-hourly dataset from the several terabytes of data available to us. However the analysis has also indicated that aggregated household load profiles may reveal key household and householder attributes of interest to census users and national statistical organisations. 

As an example, Figure 1 shows the mean power demand per half hour in a small sample of smart metered households. Whilst the numbers in each group are relatively small, there is a clear indication of differences in electricity demand profiles between them and especially between those without children and those who have at least one child. Similar differences were found for the employment status of the main survey respondent as well as grouped household income and the number of residents.

Ben_Anderson_figure_1.jpg

Figure1: mean power per half-hour for households with different numbers of children (Tues – Thursdays, October 2011)

Our review of evidence to date, combined with this preliminary descriptive analysis has given us confidence that there may well be value in exploring the use of, especially, smart meter electricity consumption data for the purposes of deriving official statistical indicators at small area levels of geography. As Struijs et al. note (Struijs et al., 2014)6, this approach may change the role of national statistical organisations from a data collector to a data aggregator with a focus on transparent and robust methodologies to combine, quality check and validate data from different commercial sources. 

The next step for this work is therefore to develop a range of appropriate predictive statistical models to derive household attributes of interest from electricity load profiles. This requires ‘labelled’ consumption data from relatively large scale representative samples where household characteristics are known. Unfortunately, such datasets are currently rare and those that do exist tend to be small scale or have restricted access. Models developed using such data could then be applied to a larger scale representative random sample or complete ‘census’ of anonymised smart meter data extracted from known geographical areas to produce estimates of inferred household characteristics at small area levels. Finally, these estimates would need to be validated against other sources of small area population statistics (e.g. Census 2011) in order to assess their efficacy. Unfortunately, at the time of writing anonymised large scale smart meter data extracts from known small area geographies simply do not exist. However we hope that our work can prompt the release of such datasets for research and statistical use although we acknowledge there may be substantial legal and ethical constraints. 

References:

1.Claxton, R, J Reades, and B Anderson. 2012. ‘On the Value of Digital Traces for Commercial Strategy and Public Policy: Telecommunications Data as a Case Study.’ In The Global Information Technology Report 2012, edited by S Dutta and B Bilbao-Osorio. Geneva: World Economic Forum

2. Daas, P. J. H., and M. J. H. Puts. 2014. ‘Big Data as a Source of Statistical Information.’ The Survey Statistician 69: 22–31 Study.’ In The Global Information Technology Report 2012, edited by S Dutta and B Bilbao-Osorio. Geneva: World Economic Forum.

3. Dugmore. 2009. Information collected by commercial companies: What might be of value to ONS? London: Demographic Decisions Ltd.

4. Newing, Andy, Ben Anderson, AbuBakr Bahaj, and Patrick James. 2015. ‘The Role of Digital Trace Data in Supporting the Collection of Population Statistics - the Case for Smart Metered Electricity Consumption Data: Collecting Population Statistics via Smart Metered Electricity Load Data.’ Population, Space and Place, July, n/a – n/a. doi:10.1002/psp.1972.

5. Puts, Marco, Piet Daas, and Ton de Waal. 2015. ‘Finding Errors in Big Data.’ Significance 12 (3): 26–29. doi:10.1111/j.1740-9713.2015.00826.x.

6. Struijs, P., B. Braaksma, and P. J. Daas. 2014. ‘Official Statistics and Big Data.Big Data & Society 1 (1): 2053951714538417–2053951714538417. doi:10.1177/2053951714538417.

Ben Anderson is a member of the Sustainable Energy Research Group, Faculty of Engineering & Environment (Energy & Climate Change), University of Southampton Find him on Twitter on @dataknut

Andy Newing, is a Lecturer in Retail Geography based in the Centre for Spatial Analysis and Policy (CSAP), School of Geography, University of Leeds

Any views or opinions presented are solely those of the author and do not necessarily represent those of the MRS Census and Geodemographic Group unless otherwise specifically stated.
Gkb_promo

Geodemographics - blogs and resources

Visit the Geodemographics Knowledge Base (GKB) for expert blogs and links to useful sources of geodemographic data and knowledge.

Visit the website A white arrowA black arrow
0 comments

Get the latest MRS news

Our newsletters cover the latest MRS events, policy updates and research news.