Census data consists of the geo-level crosswalk and base harmonized census data tables.
The append_census() function would take the geographic level as the argument then operate as follows:
Aggregate the census table up to the requested level. For example, if the requested unit is MSA then population would be aggregated up from tracts:
d.msa <-
d.tract %>%
group_by( MSA ) %>%
summarize( pop=sum(pop), unemployed=sum(unemployed), etc. )
Data tables are in nccsdata/geo/data.
Then the aggregated data is merged with the nonprofit data. The tractID field should be in every dataset, set add the MSA field and merge. Something like:
ids <- tractx %>% select( tractID, msaID )
core <- merge( core, ids, by="tractID", all.x=TRUE )
core <- merge( core, d.msa, by="msaID", all.x=TRUE )
The big caveat is that Census data has to be counts for the aggregation process to work. Weighted averages work as well. Fields like median income, however, are more challenging because the weighted average of median incomes is not mathematically equivalent to the median income of the full sample at the higher level of aggregation. It is probably good enough for most cases, but we need to provide some documentation.
Income inequality metrics (gini coefficients) are another example where aggregation is imperfect.
Census data consists of the geo-level crosswalk and base harmonized census data tables.
The append_census() function would take the geographic level as the argument then operate as follows:
Aggregate the census table up to the requested level. For example, if the requested unit is MSA then population would be aggregated up from tracts:
Data tables are in nccsdata/geo/data.
Then the aggregated data is merged with the nonprofit data. The tractID field should be in every dataset, set add the MSA field and merge. Something like:
The big caveat is that Census data has to be counts for the aggregation process to work. Weighted averages work as well. Fields like median income, however, are more challenging because the weighted average of median incomes is not mathematically equivalent to the median income of the full sample at the higher level of aggregation. It is probably good enough for most cases, but we need to provide some documentation.
Income inequality metrics (gini coefficients) are another example where aggregation is imperfect.