-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
189 lines (125 loc) · 6.35 KB
/
README.Rmd
File metadata and controls
189 lines (125 loc) · 6.35 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# microData <img src="man/figures/logo.png" align="right" height="138" alt="" />
<!-- badges: start -->
[](https://github.com/GutUrago/microData/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/GutUrago/microData?branch=master)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->
```{r}
#| echo: false
library(gt)
my_gt_theme <- function(data) {
gt_table <- gt(data)
rows <- nrow(gt_table[["_data"]])
my_table <- gt_table |>
tab_style(
style = cell_text(color = "black",
weight = "bold"),
locations = cells_title(groups = "title")
) |>
tab_style(
style = cell_fill(color = "grey90"),
locations = cells_body(rows = seq(1, rows, 2))
) |>
tab_options(
table.width = pct(100),
column_labels.background.color = "steelblue",
heading.align = "left") |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) |>
tab_style(
style = cell_text(weight = "bold",
size = "medium"),
locations = cells_column_spanners()
)
return(my_table)
}
```
## What?
I am developing the `microData` package to search, browse, and extract metadata from microdata provided by the World Bank (WB), Food and Agriculture Organization (FAO), International Household Survey (IHSN), United Nations High Commissioner for Refugees (UNHCR), and International Labor Organization (ILO) via the NADA REST API. Any researcher who has used microdata from these organizations knows how difficult and time-consuming it is to understand and import these data and variables into R. If you are a user or plan to use micradata, then this is the life-saving R package for you. In addition, the package constitutes simple and handy wrapper functions to ease the data analysis process.
## Abstract
The purpose of `microData` is to simplify the process of extracting complex metadata from data provided by various organizations, thereby improving data preparation efficiency. At the moment, it supports five international organizations, namely the World Bank, FAO, UNHCR, IHSN, and ILO. It has the ability to search, filter, extract, and perform other tasks that you can do on the web, but it cannot download the data file itself. This is because, to my knowledge, there is currently no available documentation for use with the API. I think it is due to data license issue because there are few accessible datasets through the API. Furthermore, this package has the ability to assist in obtaining the names of variables from a specific survey, as well as their labels. It also allows you to select only variables that you are interested in and rename them, while assigning variable descriptions as label attributes. You can set custom names and labels for the dataset. Labels play a crucial role when exporting tables and graphs, as they save you from setting long names in manuscripts manually. It also has simple and powerful functions to dummify variables and create a description table. Therefore, this package is available to alleviate all these difficulties.
<span style="color:orange">**Warning:** Since this package is still under development, I don't recommend you use it in reproducible code, as any changes can happen in the future.</span>
## Installation
You can install the development version of microData from [GitHub](https://github.com/GutUrago/microData) with:
``` r
# install.packages("devtools")
devtools::install_github("GutUrago/microData")
```
## Collection
All organizations supported by this package use the NADA API to publish micro-data, which makes use of similar terminologies. Collection simply means gathering multiple related studies or data sets. To see all available collections, you can use `collections()` function.
Note: I used customized gt table theme that I created in [this blog](https://guturago.github.io/blog/gt-table-theme/).
```{r}
library(microData)
collections(org = "wb") |>
head() |>
my_gt_theme()
```
## Searching
This package gives all flexibility of searching on the web. For more see the documentation for `search_catalog()`.
```{r}
#| eval: false
search_catalog(
keyword = "food",
org = "unhcr",
from = 2015,
to = 2024,
country ="Ethiopia",
sort_by = "year",
sort_order = "desc",
results = 10)
```
There is also handy function to check latest publications of these datasets.
```{r}
#| eval: false
latest_entries(org = "wb", limit = 15)
```
You can use `data_files` to see the data files included in the study. Let's see one of the popular survey on the WB. We can also use id number of the study, which is 359 instead of the name (See next code chunk).
```{r}
data_files(id = "TZA_1991_KHDS_v01_M", org = "wb") |>
head() |>
my_gt_theme()
```
How about variables included in the data file? Of course you can check them as well.
```{r}
variables(id = 359, file_id = "F3") |>
head() |>
my_gt_theme()
```
## Setting Attributes
Variables in microdata are often named something that has nothing to do with the variable except question order like this.
```{r echo=FALSE}
head(mdt) |> my_gt_theme()
```
Then you can prepare another data that contains metadata like this. It will be explained in detail in vignettes later.
```{r echo=FALSE}
head(mtdt) |> my_gt_theme()
```
You can use `set_attributes` function to rename and set labels to these variables.
```{r}
my_data <- set_attributes(
mdt,
mtdt,
old_name = var_id,
new_name = var_name,
label = label)
head(my_data) |> my_gt_theme()
```
labels are also assigned.
```{r}
str(my_data)
```
More coming soon!