Datos administrativos agregados y estimación a partir de muestras no probabilísticas

Autores/as

DOI:

https://doi.org/10.3989/ris.2021.79.1.19.350

Palabras clave:

Metodología de encuestas, muestras no probabilísticas, aprendizaje automático, sesgo de selección, datos administrativos

Resumen


En los últimos años, la investigación con encuestas ha estado marcada por el uso más frecuente de muestras no probabilísticas fruto de la expansión de internet y la caída sostenida de las tasas de respuesta. Para garantizar el proceso de inferencia cada vez son necesarios ajustes más complejos para los que se precisan variables auxiliares, es decir, información acerca de toda la población. En este trabajo se comprueba el potencial de los datos administrativos agregados a nivel de municipio para ajustar dos encuestas provenientes de un panel de internautas, el panel AIMC-Q, promovido por la Asociación Española para la Investigación de los Medios de Comunicación (AIMC). Los resultados muestran que la capacidad de las variables administrativas agregadas para reducir el sesgo de las estimaciones es mínima.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Baker, Reg, J. Michael Brick, Nancy A. Bates, Mike Battaglia, Mick P. Couper, Jill A. Dever, Krista J. Gile y Roger Tourangeau. 2013. "Summary report of the aapor task force on non-probability sampling". Journal of Survey Statistics and Methodology 1(2): 90-105. https://doi.org/10.1093/jssam/smt008

Bethlehem, J. y S. Biffignandi. 2011. Handbook of Web Surveys. Londres: Wiley. https://doi.org/10.1002/9781118121757 PMid:21374080

Biemer, Paul y Andy Peytchev. 2012. "Census geocoding for nonresponse bias evaluation in telephone surveys". Public Opinion Quarterly 76(3): 432-52. https://doi.org/10.1093/poq/nfs035

Biemer, Paul y Andy Peytchev. 2013. "Using geocoded census data for nonresponse bias correction: An assessment". Journal of Survey Statistics and Methodology 1(1): 24- 44. https://doi.org/10.1093/jssam/smt003

Blom, Annelies G., Michael Bosnjak, Anne Cornilleau, Anne Sophie Cousteaux, Marcel Das, Salima Douhou y Ulrich Krieger. 2016. "A comparison of four probability- based Online and mixed-mode panels in Europe". Social Science Computer Review 34(1): 8-25. https://doi.org/10.1177/0894439315574825

Blom, Annelies G., Christina Gathmann y Ulrich Krieger. 2015. "Setting up an online panel representative of the general population: The German Internet Panel." Field Methods 27(4): 391-408. https://doi.org/10.1177/1525822X15574494

Brick, J. Michael. 2011. "The future of survey sampling". Public Opinion Quarterly 75(5 SPEC. ISSUE): 872-88. https://doi.org/10.1093/poq/nfr045

Buelens, Bart, Joep Burger y Jan A. van den Brakel. 2018. "Comparing inference methods for non-probability samples". International Statistical Review 86(2): 322-43. https://doi.org/10.1111/insr.12253

Buskirk, T. D., A. Kirchner, A. Eck y C.S Signorino. 2018. "An introduction to machine learning methods". Survey Practice 11: 1-36. https://doi.org/10.29115/SP-2018-0004

Callegaro, M., K. L. Manfreda y V. Vehovar. 2015. Web survey methodology. Londres: SAGE Publications.

Chen, Kuang, Richard L. Valliant y Michael R. Elliott. 2018. "Model-assisted calibration of non-probability sample survey data using adaptive LASSO". Survey Methodology 44(1). Consulta 11 de Marzo del 2019 (https://www150.statcan.gc.ca/n1/pub/12-001-x/2018001/article/54963-eng.pdf).

Connelly, Roxanne, Christopher J. Playford, Vernon Gayle y Chris Dibben. 2016. "The role of administrative data in the big data revolution in social science research". Social Science Research 59: 1-12. https://doi.org/10.1016/j.ssresearch.2016.04.015 PMid:27480367

Couper, Mick P. 2013. "Is the sky falling? New technology, changing media, and the future of surveys". Survey Research Methods 7(3): 145-56.

Dever, Jill, Ann Rafferty y Richard Valliant. 2008. "Internet surveys: can statistical adjustments eliminate coverage bias?". Survey Research Methods 2(2): 47-60.

Dibben, Chris, Mark Elliot, Heather Gowans y Darren Lightfoot. 2015. "The data linkage environment". Pp. 36- 62 en Methodological Developments in Data Linkage. Nueva Jersey: John Wiley & Sons. https://doi.org/10.1002/9781119072454.ch3

Dorfman, Alan H. y Richard Valliant. 2005. "Superpopulation models in survey sampling". Pp. 1575-77 en Encyclopedia of Biostatistics. Chichester: John Wiley & Sons. https://doi.org/10.1002/0470011815.b2a16076 PMCid:PMC1265829

Elliott, Michael R. y Richard Valliant. 2017. "Inference for nonprobability samples". Statistical Science 32(2): 249-64. https://doi.org/10.1214/16-STS598

Ferri-García, R. y M. D. M. Rueda. 2018. "Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys". SORT: statistics and operations research transactions 42(2): 159-182.

Friedman, J., T. Hastie y R. Tibshirani. 2010. "Regularization paths for generalized linear models via coordinate descent". Journal of statistical software 33(1). https://doi.org/10.18637/jss.v033.i01 PMid:20808728 PMCid:PMC2929880

Groves, Robert M. y M. Couper. 1998. Nonresponse in household interview surveys. Nueva York: John Wiley & Sons. https://doi.org/10.1002/9781118490082

Gummer, Tobias y Joss Roßmann. 2018. "The effects of propensity score weighting on attrition biases in attitudinal, behavioral, and socio-demographic variables in a short-term web-based panel survey". International Journal of Social Research Methodology 22(1): 81-95. https://doi.org/10.1080/13645579.2018.1496052

Hastie, T., R. Tibshirani y M. Wainwright. 2015. Statistical learning with sparsity: the lasso and generalizations. CRC press. https://doi.org/10.1201/b18401

Hays, Ron D., Honghu Liu y Arie Kapteyn. 2015. "Use of internet panels to conduct surveys". Behavior Research Methods 47(3):685-90. https://doi.org/10.3758/s13428-015-0617-9 PMid:26170052 PMCid:PMC4546874

Kang, J. D. Y. y J. L. Schafer. 2007. "Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data". Statistical Science 22: 523-539. https://doi.org/10.1214/07-STS227 PMid:18516239 PMCid:PMC2397555

Kish, Leslie. 1965. Survey sampling. Nueva Delhi: John Wiley & Sons.

Kreuter, Frauke. 2013. Improving Surveys with Paradata: Analytic Uses of Process Information. Nueva York: John Wiley & Sons. https://doi.org/10.1002/9781118596869

Künn, Steffen. 2015. "The challenges of linking survey and administrative data". IZA World of Labor 1-10. https://doi.org/10.15185/izawol.214

Lahtinen, Kaisa y Sarah Butt. 2015. "Using auxiliary data to model nonresponse bias The challenge of knowing too much about nonrespondents rather than too little?". Artículo presentado en el International Workshop on Household Nonresponse, 2 de septiembre, Leuven, Belgium.

Lee, Sunghee y Richard Valliant. 2009. "Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment". Sociological Methods & Research 37(3): 319-43. https://doi.org/10.1177/0049124108329643

de Leeuw, Edith, Joop Hox y A. Luiten. 2018. "International nonresponse trends across countries and years: An analysis of 36 years of Labour Force Survey data". Survey Insights: Methods from the Field 1-11. Consulta 11 de Marzo del 2019 (https://surveyinsights.org/?p=10452).

Levy, Paul S. y Stanley Lemeshow. 2013. Sampling of Populations: Methods and Applications. Nueva York: John Wiley & Sons.

Lohr, Sharon L. y Trivellore E. Raghunathan. 2017. "Combining survey data with other data sources". Statistical Science 32(2): 293-312. https://doi.org/10.1214/16-STS584

Mercer, Andrew, Arnold Lau y Courtney Kennedy. 2018. For Weighting Online Opt-In Samples, What Matters Most? Washington: Pew Research. Consulta 11 de Marzo del 2019 (http://www.pewresearch.org/2018/01/26/for-weighting-online-opt-in-samples-what-matters-most).

Morris, Sarah, Alun Humphrey, Pablo Cabrera Álvarez y Olivia D'Lima. 2016. The UK Time Diary Study 2014-2015. Technical Report. Londres: NatCen Social Research. Consulta 11 de Marzo del 2019 (http://doc.ukdataservice.ac.uk/doc/8128/mrdoc/pdf/8128_natcen_reports.pdf).

Neyman, Jerzy. 1934. "On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection". Journal of the Royal Statistical Society 97(4): 558. https://doi.org/10.2307/2342192

Park, A., C. Bryson, E. Ciery, J. Curtice y M. Phillips. 2013. British Social Attitudes 30th Report. Londres: NatCen Social Research. Consulta 11 de Marzo del 2019 (http://www.bsa.natcen.ac.uk/media/38723/bsa30_full_report_final.pdf).

Pasek, Josh. 2015. "Beyond probability sampling: population inference in a world without benchmarks". SSRN Electronic Journal X(8):133-42. https://doi.org/10.2139/ssrn.2804297

Pasek, Josh. 2016. "When will nonprobability surveys mirror probability surveys? Considering types of inference and weighting strategies as criteria for correspondence" International Journal of Public Opinion Research 28(2): 269-91. https://doi.org/10.1093/ijpor/edv016

de Pedraza, Pablo, Kea Tijdens, Rafael Muñoz de Bustillo y Stephanie Steinmetz. 2010. "A Spanish continuous volunteer web survey: sample bias, weighting and efficiency". Revista Española de Investigaciones Sociológicas 131(1): 109-30.

Peytchev, Andrey y Trivellore Raghunathan. 2013. "Evaluation and use of commercial data for nonresponse bias adjustment". Ponencia presentada en American Association for Public Opinion Research annual conference, Boston, EE.UU.

Peytchev, Andrey, Stanley Presser y Mengmeng Zhang. 2018. "Improving traditional nonresponse bias adjustments: combining statistical properties with social theory". Journal of Survey Statistics and Methodology (January): 1-25. https://doi.org/10.1093/jssam/smx035

Playford, Christopher J., Vernon Gayle, Roxanne Connelly y Alasdair JG Gray. 2016. "Administrative social science data: The challenge of reproducible research". Big Data & Society 3(2): 1-13. https://doi.org/10.1177/2053951716684143

Särndal, Carl-Erik y Sixten Lundström. 2005. Estimation in surveys with nonresponse. Nueva York: John Wiley & Sons. https://doi.org/10.1002/0470011351

Schonlau, M., A. Van Soest, A. Kapteyn y M. Couper. 2009. "Selection bias in web surveys and the use of propensity scores". Sociological Methods and Research 37: 291-318. https://doi.org/10.1177/0049124108327128

Smith, Tom W. 2011. "The report of the International Workshop on using multi-level data from sample frames, auxiliary databases, paradata and related sources to detect and adjust for nonresponse bias in surveys". International Journal of Public Opinion Research 23(3): 389-402. https://doi.org/10.1093/ijpor/edr035

Smith, Tom W. y Jibum Kim. 2013. "An assessment of the multi-level integrated database approach". The ANNALS of the American Academy of Political and Social Science 645(1): 185-221. https://doi.org/10.1177/0002716212463340

Stevens, Leslie A. y Graeme Laurie. 2014. "The administrative data research centre scotland: a scoping report on the legal & ethical issues arising from access & linkage of administrative data". Research Paper 2014/35. Edinburgh School of Law. https://doi.org/10.2139/ssrn.2487971

Valliant, R., A. H Dorfman y R. M. Royall. 2000. Finite population sampling and inference: A prediction approach. Nueva York: Wiley Series In Probability And Statistics.

Valliant, Richard y Jill A. Dever. 2011. "Estimating propensity adjustments for volunteer web surveys". Sociological Methods & Research 40(1): 105-137. https://doi.org/10.1177/0049124110392533

Valliant, Richard, Jill A. Dever y F. Kreuter. 2018. Practical tools for designing and weighting survey samples. New York: Springer.

Valliant, Richard. 2019. "Comparing alternatives for estimation from nonprobability samples". Journal of Survey Statistics and Methodology: 1-33.

Wang, Wei, David Rothschild, Sharad Goel y Andrew Gelman. 2015. "Forecasting elections with non-representative polls". International Journal of Forecasting 31(3): 980-91. https://doi.org/10.1016/j.ijforecast.2014.06.001

Weiseberg, Herbert. 2005. The total survey error approach. Chicago: The University of Chicago Press.

West, Brady T. y Roderick J. A. Little. 2013. "Non-response adjustment of survey estimates based on auxiliary variables subject to error". Journal of the Royal Statistical Society. Series C: Applied Statistics 62(2): 213-31. https://doi.org/10.1111/j.1467-9876.2012.01058.x

West, Brady T., James Wagner, Frost Hubbard y Haoyu Gu. 2015. "The utility of alternative commercial data sources for survey operations and estimation: evidence from the national survey of family growth". Journal of Survey Statistics and Methodology 3(2): 240-64. https://doi.org/10.1093/jssam/smv004

Woollard, Matthew. 2014. Administrative data: Problems and benefits: A perspective from the United Kingdom. Editado por A. Dusa, D. Nelle, G. Stock y G. Wagner. Berlin: SCIVERO.

Wu, C. y R. R. Sitter. 2001. "A model-calibration approach to using complete auxiliary information from survey data". Journal of the American Statistical Association, 96(453):.185-193. https://doi.org/10.1198/016214501750333054

Yeager, D. S., J. A. Krosnick, L. Chang, H. S. Javitz, M. S. Levendusky, A. Simpser y R. Wang. 2011., "Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples". Public Opinion Quarterly 75: 709-747. https://doi.org/10.1093/poq/nfr020

Publicado

2021-04-06

Cómo citar

Cabrera-Álvarez, P. . (2021). Datos administrativos agregados y estimación a partir de muestras no probabilísticas. Revista Internacional De Sociología, 79(1), e180. https://doi.org/10.3989/ris.2021.79.1.19.350

Número

Sección

Artículos

Datos de los fondos

Fundación Bancaria Caixa d'Estalvis i Pensions de Barcelona
Números de la subvención LCF/BQ/ES16/11570005