Make your work reproducible

Understanding reproducibility and the set.seed() function in R is best achieved through generating various random numbers. Here are some more tips for making your work reproducible:

Using set.seed()

Example of reproducibility in fitting ML models using set.seed()

#First Line
set.seed(1234)   

#Second Line
model_05_rand_forest_ranger <- rand_forest(
    mode = "regression", mtry = 4, trees = 1000, min_n = 10
    ) %>%
    set_engine("ranger", splitrule = "extratrees", importance = "impurity") %>%
    fit(price ~ ., data = train_tbl %>% select(-id, -model, -model_tier))

#Third Line
model_05_rand_forest_ranger %>% calc_metrics(test_tbl)

Random Numbers

Here are several ways to get random numbers. These examples are informed by the R Cookbook, see here

# get one random number using runif() from base-R, stats package
# default 0 to 1
runif(1)

# get two random numbers
runif(2)

# get a vector of three random numbers
# increase range beyond the default, -10 to 110
runif(3, min = -10, max = 110)

# ensure three random numbers do *not* have decimals
# use floor() function to round down
floor(runif(3, min = -10, max = 110))

# sample() function does the same thing - using just one function
# replace parameter: should sampling be with or without replacement?
sample(-10:110, 3, replace = TRUE)

# Reproducibility
# use set.seed() before any of the aforementioned random number generators

set.seed(123)
sample(-10:110, 3, replace = FALSE)

Random Numbers from a Normal Distribution

# Get five random numbers from a normal distribution
# Here the default is mean = 0, standard deviation = 1.
rnorm(5)

# Change mean and standard deviation away from default
rnorm(5, mean = 66, sd = 12)

# Ensure reproducibility with set.seed()
set.seed(123)
rnorm(5, mean = 66, sd = 12)

# Ensure normal distribution by setting sufficiently large number with rnorm()
# Ensure reproducibility
# Plot a histogram

set.seed(123)
x <- rnorm(500, mean = 66, sd = 12)
hist(x)
Previous
Next