ARcenso: a Package Born From Chaos, Powered by Community

Andrea Gomez Vargas & Emanuel Ciardullo

Government Advances in Statistical Programming - GASP 2025

2025-06-25

ARcenso


Supported by the rOpenSci Champions Program cohort 2023-20241, this project is led by lead developer Andrea Gómez Vargas, co-developer Emanuel Ciardullo, and mentor Luis D. Verde.

It is a citizen-driven initiative that emerged from our professional experience working with census data at the National Institute of Statistics and Census of Argentina (INDEC).


Project Overview


Develop an R data package that makes available the official national population census data of Argentina, produced by the National Institute of Statistics and Censuses (INDEC), covering the period from 1970 to 2022. The data are homogenized, organized, and ready to use. The package provides open access to these datasets, facilitating their use by the public, researchers, and decision-makers.

Problem Statement


Historical census results for 1970, 1980, 1991, 2001, 2010 and 2022 in Argentina are available in different formats through physical books, PDFs, excel files, and REDATAM outputs, without having a unified system or format that would allow working with the data from these six census periods as a database.

This fragmentation limits data accessibility, interoperability, and reuse, especially for users working within the R environment.

Proposed Solution

From excel tables to ordered tables in R

Original excel download

Tidy table in R

Conceptual framework

FAIR Principles

Findable:

Centralized census data covering six national census periods (1970–2022), openly published in a single R data package with clear versioning.

Accessible:

Publicly available, homogenized census datasets provided in open formats and accompanied by comprehensive documentation and metadata.

Interoperable:

Tidy, well-structured tables that enable easy integration with other demographic, geographic, and socioeconomic datasets within the R ecosystem.

Reusable:

Includes detailed variable descriptions, standardized coding across census years, open licensing, and reproducible data structures that facilitate long-term use and cross-study comparisons.

Census Data: Themes and Structure (UN Framework)


Census Topics

  • Core: Essential variables
    (e.g., age, sex, population)
  • Derived-core: Calculated variables
    (e.g., fertility rates)
  • Additional: Country-specific topics
    (e.g., religion)

Conceptual Units

  • Population: Individuals
  • Housing: Physical dwellings
  • Household: People sharing a dwelling

Geographic Coverage

  • National level
  • Jurisdictional level

From Chaos to Package 📦

Problems: rain of excels and non-standardized formats

The Process




  • Download: Automated web scraping to collect census tables from official sources.

  • Select: Listed, classified, and extracted relevant files and metadata (census year, geography, topics).

  • Transform: Converted Excel tables into tidy, standardized datasets using base R.

  • Function development: Built R functions to access, manipulate, and visualize the data efficiently.

  • Package creation: Integrated datasets and functions into the ARcenso package for easy use and reproducibility.

  • Version control: Used Git and GitHub for tracking changes, collaboration, and release management.

Data Availability Roadmap


Stage Census years Geographic level
1 1970 National and 24 jurisdictions
1980 National level
2 1991 and 2001 National level
3 2010 National level
4 2022 National level
5 1980 and 1991 24 jurisdictions
6 2001 and 2010 24 jurisdictions
7 2022 24 jurisdictions

{ARcenso} 📦


Installation

# install.packages("remotes")
remotes::install_github("SoyAndrea/arcenso")



Package activation

library(arcenso)

get_census()

get tables

get_census( year = 1970, 
            topic = "CONDICIONES HABITACIONALES", 
            geolvl = "Total del país")
#> $c70_total_del_pais_poblacion_c18
#>                   regimen_de_tenencia hogares personas  cuartos
#> 1                         Propietario 3553250 13778700 11197900
#> 2            Inquilino o arrendatario 1380950  4692800  3305350
#> 3 Ocupante en relación de dependencia  353300  1402500   880050
#> 4                   Ocupante gratuito  575650  2271150  1196500
#> 5                    En otro carácter  192950   816350   419800
#> 
#> $c70_total_del_pais_poblacion_c20
#>     tama?o_hogar                     regimen_tenencia hogares
#> 1   De 1 persona                                Total  615900
#> 2   De 1 persona                          Propietario  255900
#> 3   De 1 persona             Inquilino o arrendatario  199350
#> 4   De 1 persona Ocupante con relación de dependencia   52600
#> 5   De 1 persona                    Ocupante gratuito   82100
#> 6   De 1 persona                                 Otro   25950
#> 7  De 2 personas                                Total 1125250
#> 8  De 2 personas                          Propietario  652950
#> 9  De 2 personas             Inquilino o arrendatario  302400
#> 10 De 2 personas Ocupante con relación de dependencia   49250
#> 11 De 2 personas                    Ocupante gratuito   91300
#> 12 De 2 personas                                 Otro   29350
#> 13 De 3 personas                                Total 1230600
#> 14 De 3 personas                          Propietario  744800
#> 15 De 3 personas             Inquilino o arrendatario  290650
#> 16 De 3 personas Ocupante con relación de dependencia   62150
#> 17 De 3 personas                    Ocupante gratuito  103200
#> 18 De 3 personas                                 Otro   29800
#> 19 De 4 personas                                Total 1255000
#> 20 De 4 personas                          Propietario  787900
#> 21 De 4 personas             Inquilino o arrendatario  266000
#> 22 De 4 personas Ocupante con relación de dependencia   65650
#> 23 De 4 personas                    Ocupante gratuito  102850
#> 24 De 4 personas                                 Otro   32600
#> 25 De 5 personas                                Total  818550
#> 26 De 5 personas                          Propietario  516100
#> 27 De 5 personas             Inquilino o arrendatario  157500
#> 28 De 5 personas Ocupante con relación de dependencia   48200
#> 29 De 5 personas                    Ocupante gratuito   71550
#> 30 De 5 personas                                 Otro   25200
#> 31 De 6 personas                                Total  443250
#> 32 De 6 personas                          Propietario  272000
#> 33 De 6 personas             Inquilino o arrendatario   80000
#> 34 De 6 personas Ocupante con relación de dependencia   29000
#> 35 De 6 personas                    Ocupante gratuito   45750
#> 36 De 6 personas                                 Otro   16500
#> 37 De 7 personas                                Total  276750
#> 38 De 7 personas                          Propietario  163400
#> 39 De 7 personas             Inquilino o arrendatario   44950
#> 40 De 7 personas Ocupante con relación de dependencia   19950
#> 41 De 7 personas                    Ocupante gratuito   35200
#> 42 De 7 personas                                 Otro   13250
#> 43 De 8 personas                                Total  121450
#> 44 De 8 personas                          Propietario   70600
#> 45 De 8 personas             Inquilino o arrendatario   18250
#> 46 De 8 personas Ocupante con relación de dependencia   10050
#> 47 De 8 personas                    Ocupante gratuito   16250
#> 48 De 8 personas                                 Otro    6300
#> 49 De 9 personas                                Total   76000
#> 50 De 9 personas                          Propietario   40950
#> 51 De 9 personas             Inquilino o arrendatario    9400
#> 52 De 9 personas Ocupante con relación de dependencia    7150
#> 53 De 9 personas                    Ocupante gratuito   12900
#> 54 De 9 personas                                 Otro    5600
#> 55   De 10 y más                                Total   93350
#> 56   De 10 y más                          Propietario   48650
#> 57   De 10 y más             Inquilino o arrendatario   12450
#> 58   De 10 y más Ocupante con relación de dependencia    9300
#> 59   De 10 y más                    Ocupante gratuito   14550
#> 60   De 10 y más                                 Otro    8400

check_repository()

report of available tables


check_repository( year = 1970, 
                  topic = "CONDICIONES HABITACIONALES", 
                  geolvl = "Total del país")
#>                            Archivo
#> 1 c70_total_del_pais_poblacion_c18
#> 2 c70_total_del_pais_poblacion_c20
#>                                                                                                      Titulo
#> 1    Cuadro 18. Total del país. Hogares particulares, personas y cuartos, por régimen de tenencia. Año 1970
#> 2 Cuadro 20. Total del país. Hogares particulares, por tamaño del hogar según régimen de tenencia. Año 1970

ARcenso()

shinyapp for consulting

arcenso()

Documentation

Powered by Community 🤝

Thanks to Our Community

Thanks to the feedback, collaboration, and support of R communities such as rOpenSci, LatinR, and R en Buenos Aires, as well as the joint work of fellow demographers, statisticians, and sociologists, ARcenso was built.

Community involvement provided:

  • Technical assistance in solving coding and data challenges.

  • Validation: Ensuring data quality and usability.

  • Dissemination: Promoting the use and awareness of the package.

  • Inspiration: driving continuous improvements and new features.

Gracias 😁

Andrea

Sociologist

rOpenSci Champion

Emanuel

Statistician