http://www.strengejacke.de/sjPlot/labelleddata/
Working with labelled Data {sjmisc}
This document shows basic usage of the sjmisc package and how to work with labelled data.
Ressources:
Download package from CRAN
Developer snapshot at GitHub
Submission of bug reports and issues at GitHub
(back to table of content)
The sjmisc-Package
Basically, this package covers three domains of functionality:
reading and writing data between other statistical packages (like SPSS) and R, based on the haven and foreign packages
hence, sjmisc also includes function to work with labelled data
frequently applied recoding and variable conversion tasks
Labelled Data
In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly:
factor(c("low", "high", "mid", "high", "low"))
## [1] low high mid high low ## Levels: high low mid
Reading SPSS-data (from haven, foreign or sjmisc), keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the sjmisc-package:
library(sjmisc)data(efc)str(efc$e42dep)
## atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ... ## - attr(*, "label")= chr "elder's dependency" ## - attr(*, "labels")= Named num [1:4] 1 2 3 4 ## ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
While all plotting and table functions of the sjPlot-package make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics:
library(sjmisc)data(efc)barplot(table(efc$e42dep, efc$e16sex), beside = T, legend.text = T)
As you can see in the above figure, the plot has neither axis nor legend labels.
Adding value labels as factor values
to_label
is a sjmisc-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled.
barplot(table(to_label(efc$e42dep), to_label(efc$e16sex)), beside = T, legend.text = T)
to_factor
is a convenient replacement of as.factor
, which converts a numeric vector into a factor, but keeps the value and variable label attributes.
Getting and setting value and variable labels
There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame:
get_label()
to get variable labelsget_labels()
to get value labelsset_label()
to set variable labels (add them as vector attribute)set_labels()
to set value labels (add them as vector attribute)
With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted.
barplot(table(to_label(efc$e42dep), to_label(efc$e16sex)), beside = T, legend.text = T, main = get_label(efc$e42dep))
get_label(efc)
would return all data.frame’s variable labels. And get_labels(efc)
would return a list with all value labels of all data.frame’s variables.
Another example
Converting labelled
vectors into factor
s usually drops label attributes (e.g. using as_factor
) or replaces values with the associated labels (like to_label
does). If you want to convert a labelled vector into a numeric factor, but keep the label attributes (including variable labels), use to_factor
.
Functions like lm
simply copy these attributes and store these information in the returned object; see following example from the sjPlot
-package:
library(sjPlot) ## #refugeeswelcomedata(efc)# make education categoricalefc$c172code <- to_factor(efc$c172code) fit <- lm(barthtot ~ c160age + c12hour + c172code + c161sex, data = efc)
sjt.lm(fit, group.pred = TRUE)
Total score BARTHEL INDEX | ||||
B | CI | p | ||
(Intercept) | 87.54 | 76.34 – 98.75 | <.001 | |
carer’ age | -0.21 | -0.35 – -0.07 | .004 | |
average number of hours of care per week | -0.28 | -0.32 – -0.24 | <.001 | |
carer’s level of education | ||||
intermediate level of education | 1.37 | -3.12 – 5.85 | .550 | |
high level of education | -1.64 | -7.22 – 3.93 | .564 | |
carer’s gender | -0.39 | -4.49 – 3.71 | .850 | |
Observations | 821 | |||
R2 / adj. R2 | .271 / .266 |
Looking at str(fit$frame)
shows us that both variable and value label attributes are still there. Packages like sjPlot
make use of this feature and automatically label the table output (like seen above).
Restore labels from subsetted data
The base subset
function drops label attributes (or vector attributes in general) when subsetting data. Since version 1.0.3 of the sjmisc-package, there are handy functions to deal with this problem: copy_labels
and remove_labels
.
copy_labels
adds back labels to a subsetted data frame based on the original data frame. And remove_labels
removes all label attributes.
Losing labels during subset
efc.sub <- subset(efc, subset = e16sex == 1, select = c(4:8))str(efc.sub)
## 'data.frame': 296 obs. of 5 variables: ## $ e17age : num 74 68 80 72 94 79 67 80 76 88 ... ## $ e42dep : num 4 4 1 3 3 4 3 4 2 4 ... ## $ c82cop1: num 4 3 3 4 3 3 4 2 2 3 ... ## $ c83cop2: num 2 4 2 2 2 2 1 3 2 2 ... ## $ c84cop3: num 4 4 1 1 1 4 2 4 2 4 ...
Add back labels
efc.sub <- copy_labels(efc.sub, efc)str(efc.sub)
## 'data.frame': 296 obs. of 5 variables: ## $ e17age : atomic 74 68 80 72 94 79 67 80 76 88 ... ## ..- attr(*, "label")= Named chr "elder' age" ## .. ..- attr(*, "names")= chr "e17age" ## $ e42dep : atomic 4 4 1 3 3 4 3 4 2 4 ... ## ..- attr(*, "label")= Named chr "elder's dependency" ## .. ..- attr(*, "names")= chr "e42dep" ## ..- attr(*, "labels")= Named num 1 2 3 4 ## .. ..- attr(*, "names")= chr "independent" "slightly dependent" "moderately dependent" "severely dependent" ## $ c82cop1: atomic 4 3 3 4 3 3 4 2 2 3 ... ## ..- attr(*, "label")= Named chr "do you feel you cope well as caregiver?" ## .. ..- attr(*, "names")= chr "c82cop1" ## ..- attr(*, "labels")= Named num 1 2 3 4 ## .. ..- attr(*, "names")= chr "never" "sometimes" "often" "always" ## $ c83cop2: atomic 2 4 2 2 2 2 1 3 2 2 ... ## ..- attr(*, "label")= Named chr "do you find caregiving too demanding?" ## .. ..- attr(*, "names")= chr "c83cop2" ## ..- attr(*, "labels")= Named num 1 2 3 4 ## .. ..- attr(*, "names")= chr "Never" "Sometimes" "Often" "Always" ## $ c84cop3: atomic 4 4 1 1 1 4 2 4 2 4 ... ## ..- attr(*, "label")= Named chr "does caregiving cause difficulties in your relationship with your friends?" ## .. ..- attr(*, "names")= chr "c84cop3" ## ..- attr(*, "labels")= Named num 1 2 3 4 ## .. ..- attr(*, "names")= chr "Never" "Sometimes" "Often" "Always"
Conclusion
When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The sjmisc-package supports this feature and offers useful functions for these tasks.