Datos administrativos agregados y estimación a partir de muestras no probabilísticas

Autores/as

DOI:

https://doi.org/10.3989/ris.2021.79.1.19.350

Palabras clave:

Metodología de encuestas, muestras no probabilísticas, aprendizaje automático, sesgo de selección, datos administrativos

Resumen


En los últimos años, la investigación con encuestas ha estado marcada por el uso más frecuente de muestras no probabilísticas fruto de la expansión de internet y la caída sostenida de las tasas de respuesta. Para garantizar el proceso de inferencia cada vez son necesarios ajustes más complejos para los que se precisan variables auxiliares, es decir, información acerca de toda la población. En este trabajo se comprueba el potencial de los datos administrativos agregados a nivel de municipio para ajustar dos encuestas provenientes de un panel de internautas, el panel AIMC-Q, promovido por la Asociación Española para la Investigación de los Medios de Comunicación (AIMC). Los resultados muestran que la capacidad de las variables administrativas agregadas para reducir el sesgo de las estimaciones es mínima.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Baker, Reg, J. Michael Brick, Nancy A. Bates, Mike Battaglia, Mick P. Couper, Jill A. Dever, Krista J. Gile y Roger Tourangeau. 2013. “Summary report of the aapor task force on non-probability sampling”. Journal of Survey Statistics and Methodology 1(2): 90-105.

Bethlehem, J. y S. Biffignandi. 2011. Handbook of Web Surveys. Londres: Wiley.

Biemer, Paul y Andy Peytchev. 2012. “Census geocoding for nonresponse bias evaluation in telephone surveys”. Public Opinion Quarterly 76(3): 432-52.

Biemer, Paul y Andy Peytchev. 2013. “Using geocoded census data for nonresponse bias correction: An assessment”. Journal of Survey Statistics and Methodology 1(1): 24- 44.

Blom, Annelies G., Michael Bosnjak, Anne Cornilleau, Anne Sophie Cousteaux, Marcel Das, Salima Douhou y Ulrich Krieger. 2016. “A comparison of four probability- based Online and mixed-mode panels in Europe”. Social Science Computer Review 34(1): 8-25.

Blom, Annelies G., Christina Gathmann y Ulrich Krieger. 2015. “Setting up an online panel representative of the general population: The German Internet Panel.” Field Methods 27(4): 391-408.

Brick, J. Michael. 2011. “The future of survey sampling”. Public Opinion Quarterly 75(5 SPEC. ISSUE): 872-88.

Buelens, Bart, Joep Burger y Jan A. van den Brakel. 2018. “Comparing inference methods for non-probability samples”. International Statistical Review 86(2): 322-43.

Buskirk, T. D., A. Kirchner, A. Eck y C.S Signorino. 2018. “An introduction to machine learning methods”. Survey Practice 11: 1-36.

Callegaro, M., K. L. Manfreda y V. Vehovar. 2015. Web survey methodology. Londres: SAGE Publications.

Chen, Kuang, Richard L. Valliant y Michael R. Elliott. 2018. “Model-assisted calibration of non-probability sample survey data using adaptive LASSO”. Survey Methodology 44(1). Consulta 11 de Marzo del 2019 (https://www150.statcan.gc.ca/n1/pub/12-001-x/2018001/article/54963-eng.pdf).

Connelly, Roxanne, Christopher J. Playford, Vernon Gayle y Chris Dibben. 2016. “The role of administrative data in the big data revolution in social science research”. Social Science Research 59: 1-12.

Couper, Mick P. 2013. “Is the sky falling? New technology, changing media, and the future of surveys”. Survey Research Methods 7(3): 145-56.

Dever, Jill, Ann Rafferty y Richard Valliant. 2008. “Internet surveys: can statistical adjustments eliminate coverage bias?”. Survey Research Methods 2(2): 47-60.

Dibben, Chris, Mark Elliot, Heather Gowans y Darren Lightfoot. 2015. “The data linkage environment”. Pp. 36- 62 en Methodological Developments in Data Linkage. Nueva Jersey: John Wiley & Sons.

Dorfman, Alan H. y Richard Valliant. 2005. “Superpopulation models in survey sampling”. Pp. 1575-77 en Encyclopedia of Biostatistics. Chichester: John Wiley & Sons.

Elliott, Michael R. y Richard Valliant. 2017. “Inference for nonprobability samples”. Statistical Science 32(2): 249-64.

Ferri-García, R. y M. D. M. Rueda. 2018. “Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys”. SORT: statistics and operations research transactions 42(2): 159-182.

Friedman, J., T. Hastie y R. Tibshirani. 2010. “Regularization paths for generalized linear models via coordinate descent”. Journal of statistical software 33(1).

Groves, Robert M. y M. Couper. 1998. Nonresponse in household interview surveys. Nueva York: John Wiley & Sons.

Gummer, Tobias y Joss Roßmann. 2018. “The effects of propensity score weighting on attrition biases in attitudinal, behavioral, and socio-demographic variables in a short-term web-based panel survey”. International Journal of Social Research Methodology 22(1): 81-95.

Hastie, T., R. Tibshirani y M. Wainwright. 2015. Statistical learning with sparsity: the lasso and generalizations. CRC press.

Hays, Ron D., Honghu Liu y Arie Kapteyn. 2015. “Use of internet panels to conduct surveys”. Behavior Research Methods 47(3):685-90.

Kang, J. D. Y. y J. L. Schafer. 2007. “Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data”. Statistical Science 22: 523-539.

Kish, Leslie. 1965. Survey sampling. Nueva Delhi: John Wiley & Sons.

Kreuter, Frauke. 2013. Improving Surveys with Paradata: Analytic Uses of Process Information. Nueva York: John Wiley & Sons.

Künn, Steffen. 2015. “The challenges of linking survey and administrative data”. IZA World of Labor 1-10.

Lahtinen, Kaisa y Sarah Butt. 2015. “Using auxiliary data to model nonresponse bias The challenge of knowing too much about nonrespondents rather than too little?”. Artículo presentado en el International Workshop on Household Nonresponse, 2 de septiembre, Leuven, Belgium.

Lee, Sunghee y Richard Valliant. 2009. “Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment”. Sociological Methods & Research 37(3): 319-43.

de Leeuw, Edith, Joop Hox y A. Luiten. 2018. “International nonresponse trends across countries and years: An analysis of 36 years of Labour Force Survey data”. Survey Insights: Methods from the Field 1-11. Consulta 11 de Marzo del 2019 (https://surveyinsights.org/?p=10452).

Levy, Paul S. y Stanley Lemeshow. 2013. Sampling of Populations: Methods and Applications. Nueva York: John Wiley & Sons.

Lohr, Sharon L. y Trivellore E. Raghunathan. 2017. “Combining survey data with other data sources”. Statistical Science 32(2): 293-312.

Mercer, Andrew, Arnold Lau y Courtney Kennedy. 2018. For Weighting Online Opt-In Samples, What Matters Most? Washington: Pew Research. Consulta 11 de Marzo del 2019 (http://www.pewresearch.org/2018/01/26/for-weighting-online-opt-in-samples-what-matters-most).

Morris, Sarah, Alun Humphrey, Pablo Cabrera Álvarez y Olivia D’Lima. 2016. The UK Time Diary Study 2014-2015. Technical Report. Londres: NatCen Social Research. Consulta 11 de Marzo del 2019 (http://doc.ukdataservice.ac.uk/doc/8128/mrdoc/pdf/8128_natcen_reports.pdf).

Neyman, Jerzy. 1934. “On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection”. Journal of the Royal Statistical Society 97(4): 558.

Park, A., C. Bryson, E. Ciery, J. Curtice y M. Phillips. 2013. British Social Attitudes 30th Report. Londres: NatCen Social Research. Consulta 11 de Marzo del 2019 (http://www.bsa.natcen.ac.uk/media/38723/bsa30_full_report_final.pdf).

Pasek, Josh. 2015. “Beyond probability sampling: population inference in a world without benchmarks”. SSRN Electronic Journal X(8):133-42.

Pasek, Josh. 2016. “When will nonprobability surveys mirror probability surveys? Considering types of inference and weighting strategies as criteria for correspondence” International Journal of Public Opinion Research 28(2): 269-91.

de Pedraza, Pablo, Kea Tijdens, Rafael Muñoz de Bustillo y Stephanie Steinmetz. 2010. “A Spanish continuous volunteer web survey: sample bias, weighting and efficiency”. Revista Española de Investigaciones Sociológicas 131(1): 109-30.

Peytchev, Andrey y Trivellore Raghunathan. 2013. “Evaluation and use of commercial data for nonresponse bias adjustment”. Ponencia presentada en American Association for Public Opinion Research annual conference, Boston, EE.UU.

Peytchev, Andrey, Stanley Presser y Mengmeng Zhang. 2018. “Improving traditional nonresponse bias adjustments: combining statistical properties with social theory”. Journal of Survey Statistics and Methodology (January): 1-25.

Playford, Christopher J., Vernon Gayle, Roxanne Connelly y Alasdair JG Gray. 2016. “Administrative social science data: The challenge of reproducible research”. Big Data & Society 3(2): 1-13.

Särndal, Carl-Erik y Sixten Lundström. 2005. Estimation in surveys with nonresponse. Nueva York: John Wiley & Sons.

Schonlau, M., A. Van Soest, A. Kapteyn y M. Couper. 2009. “Selection bias in web surveys and the use of propensity scores”. Sociological Methods and Research 37: 291-318.

Smith, Tom W. 2011. “The report of the International Workshop on using multi-level data from sample frames, auxiliary databases, paradata and related sources to detect and adjust for nonresponse bias in surveys”. International Journal of Public Opinion Research 23(3): 389-402.

Smith, Tom W. y Jibum Kim. 2013. “An assessment of the multi-level integrated database approach”. The ANNALS of the American Academy of Political and Social Science 645(1): 185-221.

.

Stevens, Leslie A. y Graeme Laurie. 2014. “The administrative data research centre scotland: a scoping report on the legal & ethical issues arising from access & linkage of administrative data”. Research Paper 2014/35. Edinburgh School of Law.

Valliant, R., A. H Dorfman y R. M. Royall. 2000. Finite population sampling and inference: A prediction approach. Nueva York: Wiley Series In Probability And Statistics.

Valliant, Richard y Jill A. Dever. 2011. “Estimating propensity adjustments for volunteer web surveys”. Sociological Methods & Research 40(1): 105-137.

Valliant, Richard, Jill A. Dever y F. Kreuter. 2018. Practical tools for designing and weighting survey samples. New York: Springer.

Valliant, Richard. 2019. “Comparing alternatives for estimation from nonprobability samples”. Journal of Survey Statistics and Methodology: 1-33.

Wang, Wei, David Rothschild, Sharad Goel y Andrew Gelman. 2015. “Forecasting elections with non-representative polls”. International Journal of Forecasting 31(3): 980-91.

Weiseberg, Herbert. 2005. The total survey error approach. Chicago: The University of Chicago Press.

West, Brady T. y Roderick J. A. Little. 2013. “Non-response adjustment of survey estimates based on auxiliary variables subject to error”. Journal of the Royal Statistical Society. Series C: Applied Statistics 62(2): 213-31.

West, Brady T., James Wagner, Frost Hubbard y Haoyu Gu. 2015. “The utility of alternative commercial data sources for survey operations and estimation: evidence from the national survey of family growth”. Journal of Survey Statistics and Methodology 3(2): 240-64.

Woollard, Matthew. 2014. Administrative data: Problems and benefits: A perspective from the United Kingdom. Editado por A. Dusa, D. Nelle, G. Stock y G. Wagner. Berlin: SCIVERO.

Wu, C. y R. R. Sitter. 2001. “A model-calibration approach to using complete auxiliary information from survey data”. Journal of the American Statistical Association, 96(453):.185-193.

Yeager, D. S., J. A. Krosnick, L. Chang, H. S. Javitz, M. S. Levendusky, A. Simpser y R. Wang. 2011., “Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples”. Public Opinion Quarterly 75: 709-747.

Publicado

2021-04-06

Cómo citar

Cabrera-Álvarez, P. . (2021). Datos administrativos agregados y estimación a partir de muestras no probabilísticas. Revista Internacional De Sociología, 79(1), e180. https://doi.org/10.3989/ris.2021.79.1.19.350

Número

Sección

Artículos

Datos de los fondos

Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona
Números de la subvención LCF/BQ/ES16/11570005