How can I tell R to use a certain level as a reference if I use binary explanatory variables in a regression?

It's just using some level by default.

lm(x ~ y + as.factor(b)) 

with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that is used by R

To specify a factor level as a reference in a regression, you can use the relevels() function.

According to R Documentation:


Reorder Levels of Factor


The levels of a factor are reordered so that the level specified by ref is first and the others are moved down. This is useful for contr.treatment contrasts which take the first level as the reference.


relevel(x, ref, ...)



an unordered factor.


the reference level, typically a string.


additional arguments for future methods.

For example:


x <- rnorm(100)

DF <- data.frame(x = x,

                 y = 2 + (1.5*x) + rnorm(100, sd = 2),

                 b = gl(5, 20))


           x          y b

1  0.2352207  3.5520706 1

2 -0.3307359 -0.8167629 1

3 -0.3116238  2.4107511 1

4 -2.3023457 -1.0438110 1

5 -0.1708760  0.3453233 1

6  0.1402782  0.3571660 1


'data.frame': 100 obs. of  3 variables:

 $ x: num  0.235 -0.331 -0.312 -2.302 -0.171 ...

 $ y: num  3.552 -0.817 2.411 -1.044 0.345 ...

 $ b: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

m1 <- lm(y ~ x + b, data = DF)

To alter the factor levels:

DF$b = relevel(DF$b, ref=3)

m2 <- lm(y ~ x + b, data = DF)


Here the two models have estimated different reference levels.i.e.,


(Intercept)           x          b2          b3          b4          b5 

 1.86380751  1.34015281  0.36891046  0.03624094  0.75197019 -0.65507558 


(Intercept)           x          b1          b2          b4          b5 

 1.84948031  1.41392197  0.07761524  0.24765394  0.22572331 -0.10877612 

