Google Translate:

`factor` or `as.factor`?

Publish date: Sep 22, 2021
Tags: Programming

The story

When plotting with characters, it is always troublesome to sort them in ggplot2.

gapminder %>% 
  ggplot(aes(x=continent, y=lifeExp)) + 
  geom_boxplot()

There are several ways to solve the problems as follows.

# 1. using `fct_relevel` function to change the order in 
gapminder %>% 
  ggplot(aes(x=fct_relevel(continent,levels=c("Africa","Asia","Americas","Europe","Oceania")), y=lifeExp)) + 
  geom_boxplot()
## Warning: Outer names are only allowed for unnamed scalar atomic inputs

## Warning: Outer names are only allowed for unnamed scalar atomic inputs

- This method prevents changing the original data. We may want to make other plots with original order.

# 2. using `factor` or `reorder` function convert the chr variable to factor before plotting.
gapminder <- gapminder %>% 
  mutate(continent = as.factor(continent, levels = c("Africa","Asia","Americas","Europe","Oceania")))
## Error in `mutate()`:
## ! Problem while computing `continent = as.factor(...)`.
## Caused by error in `as.factor()`:
## ! unused argument (levels = c("Africa", "Asia", "Americas", "Europe", "Oceania"))

?

What’s wrong?

factor and as.factor

# 2. using `factor` or `reorder` function convert the chr variable to factor before plotting.
gapminder <- gapminder %>% 
  mutate(continent = factor(continent, levels = c("Africa","Asia","Americas","Europe","Oceania"))) # or repalce `factor` by `ordered`, rather than `as.ordered`

The problem is I used as.factor instead of factor. What’s the difference between them? - From the source code, I found as.factor does not have levels as a parameter. This function is simply like a wrapper of factor function. If the columns is already a factor column or integer one, as.factor is more efficient. However, it cannot convert characters to factors or manually specify the levels of factors.

Example:

all_years = as.factor(gapminder$year)[1:5]
factor(all_years)
## [1] 1952 1957 1962 1967 1972
## Levels: 1952 1957 1962 1967 1972
as.factor(all_years)
## [1] 1952 1957 1962 1967 1972
## Levels: 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

See the difference? :D

I will appreciate if you can explain why in the discussion below~

Source Code:

as.factor
## function (x) 
## {
##     if (is.factor(x)) 
##         x
##     else if (!is.object(x) && is.integer(x)) {
##         levels <- sort.int(unique.default(x))
##         f <- match(x, levels)
##         levels(f) <- as.character(levels)
##         if (!is.null(nx <- names(x))) 
##             names(f) <- nx
##         class(f) <- "factor"
##         f
##     }
##     else factor(x)
## }
## <bytecode: 0x12b3260e8>
## <environment: namespace:base>