Package 'jrrosell' reference manual

Title:	Personal R package for Jordi Rosell
Description:	Useful functions for personal usage.
Authors:	Jordi Rosell [aut, cre]
Maintainer:	Jordi Rosell <[email protected]>
License:	CC0
Version:	0.0.0.9012
Built:	2025-03-16 06:12:07 UTC
Source:	https://github.com/jrosell/jrrosell

Add hash for each row

Description

It sorts the column names, it hash every row and add the column.

Usage

add_row_hash(df, primary_keys)
add_row_hash(df, primary_keys)

Arguments

`df`	a data.frame
`primary_keys`	the column anmes of the primary key

Examples

df <- data.frame(
  id = c(1, 2, 3),
  name = c("AAAAA", "BBBB", "CCC")
)
add_row_hash(df, id)
df <- data.frame(
  id = c(1, 2, 3),
  name = c("AAAAA", "BBBB", "CCC")
)
add_row_hash(df, id)

Data type utilities

Description

Get the bit representation of a double number

Usage

as.bitstring(x)
as.bitstring(x)

Arguments

`x`	A numeric vetor.

Details

Get the bit representation of a double number Using rev() ensures that the bit order is correct, and the binary representation aligns with the usual convention of having the MSB first and the LSB last. This is because numToBits() returns the bits in the reverse order, and without rev(), we end up with the LSB first and the MSB last.

Source

https://youtu.be/J4DnzjIFj8w

Examples

0.1 + 0.2 == 0.3
as.bitstring(0.1 + 0.2)
as.bitstring(0.3)
0.1 + 0.2 == 0.3
as.bitstring(0.1 + 0.2)
as.bitstring(0.3)

Multiple aside functions with base R pipe

Description

Multiple aside functions with base R pipe

Usage

aside(x, ...)
aside(x, ...)

Arguments

`x`	An object
`...`	functions to run aside

Examples

n_try <- 1
rnorm(200) |>
  matrix(ncol = 2) |>
  aside(
    print("Matrix prepared"),
    print(n_try)
  ) |>
  colSums()

n_try <- 1
rnorm(200) |>
  matrix(ncol = 2) |>
  aside(
    print("Matrix prepared"),
    print(n_try)
  ) |>
  colSums()

Calculate split proportion

Description

From a data frame, it returns the minimal split proportion for validation.

Usage

calc_split_prop(df)
calc_split_prop(df)

Arguments

`df`	A data frame

Details

The calc_validation_size function returns the optimal split proportion according to the number of rows for your validation set.

Source

https://stats.stackexchange.com/a/305063/7387

Examples

calc_split_prop(data.frame(row = 1:891))

calc_split_prop(data.frame(row = 1:891))

Calculate split size

Description

From binary classification problems, with the desired std_err it returns the minimal assesment/validation set size.

Usage

calc_split_size(std_err = 0.001)
calc_split_size(std_err = 0.001)

Arguments

std_err

The desired std_err numeric (default 0.001)

Details

The calc_validation_size function returns the minimal validation size for expected probabilities and desired error. s

Source

https://stats.stackexchange.com/a/304996/7387

Examples

calc_split_size()
calc_split_size(std_err = 0.02)

calc_split_size()
calc_split_size(std_err = 0.02)

Create a vector of characters from a string

Description

Create a vector of characters from a string

Usage

chars(x, ...)
chars(x, ...)

Arguments

`x`	a vector of characters of length 1.
`...`	unused

Details

chars expects a single string as input. To create a list of these, consider lapply(strings, chars).

Value

a vector of characters

Examples

chars("hola")
chars("hola")

Check if the last github version is installed

Description

Check if the last main github version is installed.

Usage

check_installed_gihub(repo)
check_installed_gihub(repo)

Arguments

repo

a github repo/package. Ex: check_installed_gihub("tidyverse/dplyr")

Examples

if (FALSE) {
  check_installed_gihub("jrosell/jrrosell")
}
if (FALSE) {
  check_installed_gihub("jrosell/jrrosell")
}

Count the number of duplicated rows

Description

Count the number of duplicated rows

Usage

count_duplicated_rows(df)
count_duplicated_rows(df)

Arguments

`df`	a data.frame

Examples

count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(3, 4, 5)))
count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(1, 4, 5)))
count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(3, 4, 5)))
count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(1, 4, 5)))

Count a variable or variables sorted

Description

It returns the ordered counts of the variable in the data.frame.

Usage

count_sorted(df, ...)
count_sorted(df, ...)

Arguments

`df`	a data.frame
`...`	the variables to use and other arguments to count

Examples

data.frame(a = c("x", "y", "x"), b = c("z", "z", "n")) |>
  count_sorted(a)
data.frame(a = c("x", "y", "x"), b = c("z", "z", "n")) |>
  count_sorted(a)

Detect cores that could be used

Description

Select cores in max/min of the available cores.

Usage

detect_cores(max = 10, min = 2)
detect_cores(max = 10, min = 2)

Arguments

`max`	An integer with the max desired cores (default 10)
`min`	An integer with the min desired cores (default 2)

Details

The detect_cores function uses parallelly package. It returns the desired max cores if available or it fails if not min cores are available.

Examples

cores <- detect_cores(max = 5, min = 1)
print(cores)
if (FALSE) {
  library(jrrosell)
  library(future)
  plan(multisession, workers = detect_cores(max = 10, min = 2))
  plan(sequential)
}
cores <- detect_cores(max = 5, min = 1)
print(cores)
if (FALSE) {
  library(jrrosell)
  library(future)
  plan(multisession, workers = detect_cores(max = 10, min = 2))
  plan(sequential)
}

Fit a workflow with specific parameters

Description

Fit a workflow with specific parameters

Usage

fit_results(wf, resamples, param_info = NULL, grid = 10, fn = "tune_grid", ...)
fit_results(wf, resamples, param_info = NULL, grid = 10, fn = "tune_grid", ...)

Arguments

`wf`	workflow
`resamples`	rset
`param_info`	for tune_* functions
`grid`	for tune_* functions
`fn`	the name of the function to run when tuning
`...`	Optional engine arguments

Examples

library(tidymodels)
library(xgboost)
library(modeldata)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = case)
train <- training(split)
resamples <- vfold_cv(train, v = 2, strata = case)
wf_spec <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow(boost_tree(mode = "classification"))
res_spec <- wf_spec |> fit_results(resamples)
res_spec |> collect_metrics()
library(tidymodels)
library(xgboost)
library(modeldata)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = case)
train <- training(split)
resamples <- vfold_cv(train, v = 2, strata = case)
wf_spec <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow(boost_tree(mode = "classification"))
res_spec <- wf_spec |> fit_results(resamples)
res_spec |> collect_metrics()

Fuzzy Token Set Ratio

Description

This function computes a fuzzy similarity score between two strings based on the token set ratio methodology. It considers the intersection and differences between tokenized word sets from the input strings, and calculates a similarity score normalized by string lengths.

Usage

fuzzy_token_set_ratio(s1, s2, score_cutoff = 0)
fuzzy_token_set_ratio(s1, s2, score_cutoff = 0)

Arguments

`s1`	A character string. The first string to compare.
`s2`	A character string. The second string to compare.
`score_cutoff`	A numeric value (default is `0`) specifying the minimum similarity score threshold. Scores below this threshold may trigger early exits in the computation.

Details

This function performs the following steps:

Tokenizes the input strings.
Identifies intersecting and differing tokens between the two tokenized sets.
Computes the longest common subsequence (LCS) distance for differing tokens and normalizes it.
Calculates similarity ratios for intersecting tokens combined with differing token sets.
Returns the maximum of the normalized LCS distance and the two intersecting token ratios.

The function short-circuits to return 100 if one token set is a subset of the other. If either input string is empty, the function returns 0.

Value

A numeric similarity score between 0 and 100, representing the degree of similarity between the two input strings.

Examples

# Example usage:
fuzzy_token_set_ratio("fuzzy was a bear", "fuzzy was a dog", score_cutoff = 80)
fuzzy_token_set_ratio("hello world", "world hello")


# Example usage:
fuzzy_token_set_ratio("fuzzy was a bear", "fuzzy was a dog", score_cutoff = 80)
fuzzy_token_set_ratio("hello world", "world hello")

Glimpse multiple datasets

Description

Glimpse multiple datasets

Usage

glimpses(...)
glimpses(...)

Arguments

...

Multiple data.frame

Examples

df1 <- data.frame(a = c(1, 2))
df2 <- data.frame(b = c(3, 4))
glimpses(df1, df2)
df1 <- data.frame(a = c(1, 2))
df2 <- data.frame(b = c(3, 4))
glimpses(df1, df2)

Do the last fit and get the metrics

Description

Do the last fit and get the metrics

Usage

last_fit_metrics(res, split, metric)
last_fit_metrics(res, split, metric)

Arguments

`res`	Tune results
`split`	The initial split object
`metric`	What metric to use to select the best workflow

Examples

library(tidymodels)
library(xgboost)
library(modeldata)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = class)
train <- training(split)
folds <- vfold_cv(train, v = 2, strata = class)
wf <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow_boost_tree()
res <- wf |>
  tune::tune_grid(
    resamples = folds,
    grid = 2,
    metrics = metric_set(roc_auc),
    control = tune::control_grid(save_workflow = TRUE, verbose = FALSE)
  )
res |> collect_metrics()
res |> last_fit_metrics(split, "roc_auc")
best <- res |> fit_best()
best |>
  augment(testing(split)) |>
  roc_auc(case, .pred_Test) |>
  pull(.estimate)

library(tidymodels)
library(xgboost)
library(modeldata)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = class)
train <- training(split)
folds <- vfold_cv(train, v = 2, strata = class)
wf <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow_boost_tree()
res <- wf |>
  tune::tune_grid(
    resamples = folds,
    grid = 2,
    metrics = metric_set(roc_auc),
    control = tune::control_grid(save_workflow = TRUE, verbose = FALSE)
  )
res |> collect_metrics()
res |> last_fit_metrics(split, "roc_auc")
best <- res |> fit_best()
best |>
  augment(testing(split)) |>
  roc_auc(case, .pred_Test) |>
  pull(.estimate)

Name unnamed chunks in .Rmd or .qmd files Use with caution.

Description

Name unnamed chunks in .Rmd or .qmd files Use with caution.

Usage

name_unnamed_chunks(file_path)
name_unnamed_chunks(file_path)

Arguments

file_path

the file name

Center and scale double vectors

Description

Center and scale double vectors

Usage

normalize_vec(...)
normalize_vec(...)

Arguments

...

a double vector or multiple double vectors

Examples

normalize_vec(1, 2, 3, )
normalize_vec(1, 2, 3, )

Make a sound and send an email when a process finished

Description

The notify_finished make a sound using beepr::beep, compose and email and send it returing the blastula::smtp_send call results.

Usage

notify_finished(name, body = "", ..., sound = 1, tictoc_result = NULL)
notify_finished(name, body = "", ..., sound = 1, tictoc_result = NULL)

Arguments

`name`	The process name (Required)
`body`	The contents of the email (Default "")
`...`	Additional arguments to pass to the template function. If you're using the default template, you can use font_family to control the base font, and content_width to control the width of the main content; see blastula_template(). By default, the content_width is set to ⁠1000px⁠. Using widths less than ⁠600px⁠ is generally not advised but, if necessary, be sure to test such HTML emails with a wide range of email clients before sending to the intended recipients. The Outlook mail client (Windows, Desktop) does not respect content_width.
`sound`	The sound for beepr::beep call (Default 1)
`tictoc_result`	the result from tictoc::toc (Default NULL)

Details

The following environment variables should be set:

MY_SMTP_USER from
MY_SMTP_RECIPIENT to
MY_SMTP_PASSWORD service password (for gmail you can use https://myaccount.google.com/apppasswords)
MY_SMTP_PROVIDER blastula provider (gmail if not set)

Examples

if (exists("not_run")) {
  tictoc::tic()
  Sys.sleep(1)
  jrrosell::notify_finished("job", "Well done", sound = "fanfare", tictoc_result = tictoc::toc())
}

if (exists("not_run")) {
  tictoc::tic()
  Sys.sleep(1)
  jrrosell::notify_finished("job", "Well done", sound = "fanfare", tictoc_result = tictoc::toc())
}

Github name of the package

Description

Get the name of the package from the DESCRIPTION file of the master branch in the github repo

Usage

package_github_name(x, file_lines = NULL)
package_github_name(x, file_lines = NULL)

Arguments

`x`	a single repo/package to check Ex: package_github_name("tidyverse/dplyr")
`file_lines`	(default = NULL, internal)

Examples

if (FALSE) {
  package_github_name("jrosell/jrrosell")
}
if (FALSE) {
  package_github_name("jrosell/jrrosell")
}

Github version of the package

Description

Get the version from the DESCRIPTION file of the master branch in the github repo

Usage

package_github_version(x, file_lines = NULL)
package_github_version(x, file_lines = NULL)

Arguments

`x`	a single repo/package to check Ex: package_github_version("tidyverse/dplyr")
`file_lines`	(default = NULL, internal)

Examples

if (FALSE) {
  package_github_version("jrosell/jrrosell")
}
if (FALSE) {
  package_github_version("jrosell/jrrosell")
}

Plot bars for non double columns

Description

Plot bars for non double columns

Usage

plot_bars(df, ..., top_values = 50)
plot_bars(df, ..., top_values = 50)

Arguments

`df`	a data.frame
`...`	optional parameters to geom_histogram
`top_values`	fist most common values (default 50)

Examples

plot_bars(data.frame(a = c("x", "y"), b = c("z", "z")))
plot_bars(data.frame(a = c("x", "y"), b = c("z", "z")))

Plot histograms for double columns

Description

Plot histograms for double columns

Usage

plot_histograms(df, ...)
plot_histograms(df, ...)

Arguments

`df`	a data.frame
`...`	optional parameters to geom_histogram

Examples

plot_histograms(data.frame(a = c(1, 2), b = c(1, 3)))
plot_histograms(data.frame(a = c(1, 2), b = c(1, 3)))

Plot missing values

Description

Plot missing values

Usage

plot_missing(df)
plot_missing(df)

Arguments

`df`	a data.frame

Examples

plot_missing(data.frame(a = c(1, NA), b = c(NA, 4)))
plot_missing(data.frame(a = c(1, NA), b = c(NA, 4)))

Plot a variable

Description

It returns a bar or a histogram of the variable

Usage

plot_variable(df, variable, ..., type = "numeric")
plot_variable(df, variable, ..., type = "numeric")

Arguments

`df`	a data.frame
`variable`	the variable to use.
`...`	params passed to geom_*
`type`	numeric (default) or nominal.

Examples

data.frame(a = c("x", "y", "y"), b = c("z", "z", "x")) |> plot_variable(a)
data.frame(a = c("x", "y", "y"), b = c("z", "z", "x")) |> plot_variable(a)

Prep, juice and glimpse a recipe or workflow

Description

Prep, juice and glimpse a recipe or workflow

Usage

prep_juice(object)
prep_juice(object)

Arguments

object

A recipe or a workflow object with a recipe

Source

https://recipes.tidymodels.org/reference/update.step.html

Examples

recipes::recipe(spray ~ ., data = InsectSprays) |>
  prep_juice()
recipes::recipe(spray ~ ., data = InsectSprays) |>
  workflows::workflow(parsnip::linear_reg()) |>
  prep_juice()
recipes::recipe(spray ~ ., data = InsectSprays) |>
  prep_juice()
recipes::recipe(spray ~ ., data = InsectSprays) |>
  workflows::workflow(parsnip::linear_reg()) |>
  prep_juice()

Prep, juice and get cols from a recipe or workflow

Description

Prep, juice and get cols from a recipe or workflow

Usage

prep_juice_ncol(object)
prep_juice_ncol(object)

Arguments

object

A recipe or a workflow object with a recipe

Examples

recipes::recipe(spray ~ ., data = InsectSprays) |>
  prep_juice_ncol()
recipes::recipe(spray ~ ., data = InsectSprays) |>
  prep_juice_ncol()

Prepare Text for Analysis

Description

This function processes a given text string by converting it to lowercase, removing numbers, non-alphanumeric characters, extra whitespace, and stopwords based on a specified language. It also transliterates text to ASCII, splits words, and reconstructs a clean text string suitable for analysis.

Usage

prepare_text(text, stopwords = NULL)
prepare_text(text, stopwords = NULL)

Arguments

`text`	A character vector or object that can be coerced to a character string. Represents the input text to be cleaned.
`stopwords`	A character vector specifying stopwords removal. Defaults to `"spanish"` stopwords from the tm:stopwords package.

Value

A cleaned character string, with stopwords removed and text formatted for analysis.

Examples

# Example usage:
prepare_text("¡Hola! Esto es una prueba 123.")

# Example usage:
prepare_text("¡Hola! Esto es una prueba 123.")

Read character columns with clean names

Description

It's useful for reading the most common types of flat file data, comma separated values and tab separated values.

Usage

read_chr(file, delim = ",", locale = NULL, ...)
read_chr(file, delim = ",", locale = NULL, ...)

Arguments

`file`	Either a path to a file, a connection, or literal data (either a single string or a raw vector).
`delim`	Single character used to separate fields within a record.
`locale`	The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use locale() to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.
`...`	Other parameters to readr::read_delim.

Details

The read_chr function works like readr::read_delim, except that column sreturned would be characters and with clean names. It requires readr and janitor packages installed.

Examples

es <- readr::locale("es", tz = "Europe/Madrid", decimal_mark = ",", grouping_mark = ".")
read_chr(readr::readr_example("mtcars.csv"), delim = ",", locale = es)

es <- readr::locale("es", tz = "Europe/Madrid", decimal_mark = ",", grouping_mark = ".")
read_chr(readr::readr_example("mtcars.csv"), delim = ",", locale = es)

Read the html text of an url

Description

It's useful for getting the text for webpages in a single character vector.

Usage

read_url(url, sleep = 0)
read_url(url, sleep = 0)

Arguments

`url`	Full url including http or https protocol and the page path.
`sleep`	Seconds to sleep after the request is done and before returning the result.

Details

The read_url function works uses rvest::read_html and purr::possibly and it's fault tolearnt.

Examples

if (FALSE) read_url("https://www.google.cat/", sleep = 1)

if (FALSE) read_url("https://www.google.cat/", sleep = 1)

Read a sheet from a xlsx file into a tibbles

Description

It's useful for reading a single sheets from a Excel/Openoffice file.

Usage

read_xlsx(xlsxFile, ..., sheet = 1, startRow = 1)
read_xlsx(xlsxFile, ..., sheet = 1, startRow = 1)

Arguments

`xlsxFile`	The name of the file.
`...`	Other parameters to openxls::read.xlsx function
`sheet`	The name or index of the sheet (default 1)
`startRow`	The number of the starting reading row (default 1)

Details

The write_xlsx it's a wroapper for openxls::write.xlsx.

Examples

l <- list("IRIS" = iris, "MTCATS" = mtcars, matrix(runif(1000), ncol = 5))
tmp_file <- tempfile(fileext = ".xlsx")
write_xlsx(l, tmp_file, colWidths = c(NA, "auto", "auto"))
read_xlsx(tmp_file)
file.remove(tmp_file)

l <- list("IRIS" = iris, "MTCATS" = mtcars, matrix(runif(1000), ncol = 5))
tmp_file <- tempfile(fileext = ".xlsx")
write_xlsx(l, tmp_file, colWidths = c(NA, "auto", "auto"))
read_xlsx(tmp_file)
file.remove(tmp_file)

Sanitize title with dashes

Description

It generates slugs URLs as WordPress does

Usage

sanitize_title_with_dashes(title)
sanitize_title_with_dashes(title)

Arguments

title

the title

Examples

sanitize_title_with_dashes("Hello world")
sanitize_title_with_dashes("Hello world")

Tune a recipe using glmnet and lightgbm and stacks

Description

Tune a recipe using glmnet and lightgbm and stacks

Usage

score_recipe(rec, resamples, grids = list(10, 10), metric = "accuracy")
score_recipe(rec, resamples, grids = list(10, 10), metric = "accuracy")

Arguments

`rec`	recipe
`resamples`	rset
`grids`	for glmnet and lightgbm tuning
`metric`	to be compared

spain_ccaas

Description

spain_ccaas

Usage

spain_ccaas
spain_ccaas

Format

`spain_ccaas`

A sf object with 19 rows and 4 columns:

OBJECTID
codigo
nombre
geometry

Source

https://github.com/koldLight/curso-r-dataviz/blob/master/dat/spain_ccaas.geojson

Examples

library(sf)
data(spain_ccaas)
head(spain_ccaas)
library(sf)
data(spain_ccaas)
head(spain_ccaas)

spain_provinces

Description

spain_provinces

Usage

spain_provinces
spain_provinces

Format

`spain_provinces`

A sf object with 60 rows and 4 columns:

OBJECTID
codigo
nombre
geometry

Source

https://github.com/koldLight/curso-r-dataviz/blob/master/dat/spain_provinces.geojson

Examples

library(sf)
data(spain_provinces)
head(spain_provinces)
library(sf)
data(spain_provinces)
head(spain_provinces)

Sum the missing values from a data.frame

Description

Sum the missing values from a data.frame

Usage

sum_missing(...)
sum_missing(...)

Arguments

...

one or multiple data.frame

Examples

sum_missing(data.frame(a = c(1, 2), b = c(3, 4)))
sum_missing(data.frame(a = c(1, NA), b = c(3, 4)))
sum_missing(data.frame(a = c(1, NA), b = c(NA, 4)))
sum_missing(data.frame(a = c(NA, NA), b = c(NA, NA)))
sum_missing(data.frame(a = c(1, 2), b = c(3, 4)))
sum_missing(data.frame(a = c(1, NA), b = c(3, 4)))
sum_missing(data.frame(a = c(1, NA), b = c(NA, 4)))
sum_missing(data.frame(a = c(NA, NA), b = c(NA, NA)))

Select constant columns from a data.frame

Description

Select constant columns from a data.frame

Usage

summarize_n_distinct(df)
summarize_n_distinct(df)

Arguments

`df`	a data.frame

Examples

summarize_n_distinct(data.frame(a = c(1, 2), b = c(2, 3)))
summarize_n_distinct(data.frame(a = c(1, 1), b = c(2, 3)))
summarize_n_distinct(data.frame(a = c(1, 2), b = c(2, 3)))
summarize_n_distinct(data.frame(a = c(1, 1), b = c(2, 3)))

Tee pipe that return the original value instead of the result

Description

Pipe a value forward into a functio or call expression and return the original value instead of the result. This is useful when an expression is used for its side-effect, say plotting or printing.

Usage

tee(x, expr)
tee(x, expr)

Arguments

`x`	An object
`expr`	An expresion

Details

The tee pipe works like |>, except the return value is x itself, and not the result of expr call.

Thanks

I want to give credit to Michael Milton and Matthew Kay for the idea and the code.

Source

https://mastodon.social/@[email protected]/109555362766969210

Examples

rnorm(200) |>
  matrix(ncol = 2) |>
  as.data.frame() |>
  tee(\(x) {
    ggplot(x, aes(V1, V2)) +
      geom_point()
  }) |>
  colSums()

rnorm(200) |>
  matrix(ncol = 2) |>
  as.data.frame() |>
  tee(\(x) {
    ggplot(x, aes(V1, V2)) +
      geom_point()
  }) |>
  colSums()

Sets a minimal theme using the Roboto font family

Description

It requires roboto fonts installed in your O.S. and run z

Usage

theme_roboto(
  base_size = 11,
  strip_text_size = 12,
  strip_text_margin = 5,
  subtitle_size = 13,
  subtitle_margin = 10,
  plot_title_size = 16,
  plot_title_margin = 10,
  ...
)
theme_roboto(
  base_size = 11,
  strip_text_size = 12,
  strip_text_margin = 5,
  subtitle_size = 13,
  subtitle_margin = 10,
  plot_title_size = 16,
  plot_title_margin = 10,
  ...
)

Arguments

`base_size`	= 11
`strip_text_size`	= 12
`strip_text_margin`	= 5
`subtitle_size`	= 13
`subtitle_margin`	= 10
`plot_title_size`	= 16
`plot_title_margin`	= 10
`...`	Other parameters passed to theme_set

Examples

library(jrrosell)
library(ggplot2)
theme_set(theme_roboto())
ggplot(iris, aes(Species)) +
  geom_bar()
library(jrrosell)
library(ggplot2)
theme_set(theme_roboto())
ggplot(iris, aes(Species)) +
  geom_bar()

Sets a dark blue colored dark minimal theme using the Roboto font family

Description

Sets a dark blue colored dark minimal theme using the Roboto font family

Usage

theme_set_roboto_darkblue(...)
theme_set_roboto_darkblue(...)

Arguments

...

Other parameters passed to theme_set

Examples

library(jrrosell)
library(ggplot2)
theme_set_roboto_darkblue()
ggplot(iris, aes(Species)) +
  geom_bar()
library(jrrosell)
library(ggplot2)
theme_set_roboto_darkblue()
ggplot(iris, aes(Species)) +
  geom_bar()

Update recipe step values by id

Description

Update the vaules of a specific recipe step located by id

Usage

update_step(object, target_id, ...)
update_step(object, target_id, ...)

Arguments

`object`	A recipe or a workflow object with a recipe
`target_id`	The id name of the step
`...`	The arguments to update the step.

Examples

recipes::recipe(spray ~ ., data = InsectSprays) |>
  recipes::step_ns(count, deg_free = hardhat::tune(), id = "ns") |>
  update_step("ns", deg_free = 1)
recipes::recipe(spray ~ ., data = InsectSprays) |>
  recipes::step_ns(count, deg_free = hardhat::tune(), id = "ns") |>
  update_step("ns", deg_free = 1)

Create an xgboost tunable workflow for regression and classification

Description

Create an xgboost tunable workflow for regression and classification

Usage

workflow_boost_tree(rec, engine = "xgboost", counts = TRUE, ...)
workflow_boost_tree(rec, engine = "xgboost", counts = TRUE, ...)

Arguments

`rec`	prerocessing recipe to build the workflow
`engine`	xgboost, lightgbm (xgboost by default)
`counts`	Optional logic argument wether mtry use counts or not
`...`	optional engine arguments

Examples

library(tidymodels)
library(xgboost)
library(modeldata)
library(future)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = class)
train <- training(split)
folds <- vfold_cv(train, v = 2, strata = class)
wf <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow_boost_tree()
doFuture::registerDoFuture()
plan(sequential)
res <- wf |>
  tune::tune_grid(
    folds,
    grid = 2,
    metrics = metric_set(roc_auc),
    control = tune::control_grid(save_workflow = TRUE, verbose = FALSE)
  )
res |> collect_metrics()
res |> last_fit_metrics(split, "roc_auc")
best <- res |> fit_best()
best |>
  augment(testing(split)) |>
  roc_auc(case, .pred_Test) |>
  pull(.estimate)
library(tidymodels)
library(xgboost)
library(modeldata)
library(future)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = class)
train <- training(split)
folds <- vfold_cv(train, v = 2, strata = class)
wf <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow_boost_tree()
doFuture::registerDoFuture()
plan(sequential)
res <- wf |>
  tune::tune_grid(
    folds,
    grid = 2,
    metrics = metric_set(roc_auc),
    control = tune::control_grid(save_workflow = TRUE, verbose = FALSE)
  )
res |> collect_metrics()
res |> last_fit_metrics(split, "roc_auc")
best <- res |> fit_best()
best |>
  augment(testing(split)) |>
  roc_auc(case, .pred_Test) |>
  pull(.estimate)

Create a tuneable glmnet worfklow for regression and classification

Description

Create a tuneable glmnet worfklow for regression and classification

Usage

workflow_elasticnet(rec, engine = "glmnet", ...)
workflow_elasticnet(rec, engine = "glmnet", ...)

Arguments

`rec`	prerocessing recipe to build the workflow
`engine`	glmnet, spark, brulee (glmnet by default)
`...`	Optional engine arguments

Examples

library(tidymodels)
library(glmnet)
library(modeldata)
library(future)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = class)
train <- training(split)
folds <- vfold_cv(train, v = 2, strata = class)
wf <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow_elasticnet()
doFuture::registerDoFuture()
plan(sequential)
res <- wf |>
  tune::tune_grid(
    folds,
    grid = 2,
    metrics = metric_set(roc_auc),
    control = tune::control_grid(save_workflow = TRUE, verbose = FALSE)
  )
res |> collect_metrics()
res |> last_fit_metrics(split, "roc_auc")
best <- res |> fit_best()
best |>
  augment(testing(split)) |>
  roc_auc(case, .pred_Test) |>
  pull(.estimate)
library(tidymodels)
library(glmnet)
library(modeldata)
library(future)
data(cells)
split <- cells |>
  mutate(across(where(is.character), as.factor)) |>
  sample_n(500) |>
  initial_split(strata = class)
train <- training(split)
folds <- vfold_cv(train, v = 2, strata = class)
wf <- train |>
  recipe(case ~ .) |>
  step_integer(all_nominal_predictors()) |>
  workflow_elasticnet()
doFuture::registerDoFuture()
plan(sequential)
res <- wf |>
  tune::tune_grid(
    folds,
    grid = 2,
    metrics = metric_set(roc_auc),
    control = tune::control_grid(save_workflow = TRUE, verbose = FALSE)
  )
res |> collect_metrics()
res |> last_fit_metrics(split, "roc_auc")
best <- res |> fit_best()
best |>
  augment(testing(split)) |>
  roc_auc(case, .pred_Test) |>
  pull(.estimate)

Write a list of tibbles to a xlsx file

Description

It's useful for saving multiple data to a multiple sheets of a single Excel/Openoffice/libreoffice file.

Usage

write_xlsx(data, distfile, ...)
write_xlsx(data, distfile, ...)

Arguments

`data`	A named list of tibbles
`distfile`	The name of the destination file.
`...`	Other parameters to openxls::write.xlsx function

Details

The write_xlsx it's a wroapper for openxls::write.xlsx.

Examples

l <- list("IRIS" = iris, "MTCATS" = mtcars, matrix(runif(1000), ncol = 5))
tmp_file <- tempfile(fileext = ".xlsx")
write_xlsx(l, tmp_file, colWidths = c(NA, "auto", "auto"))
file.remove(tmp_file)

l <- list("IRIS" = iris, "MTCATS" = mtcars, matrix(runif(1000), ncol = 5))
tmp_file <- tempfile(fileext = ".xlsx")
write_xlsx(l, tmp_file, colWidths = c(NA, "auto", "auto"))
file.remove(tmp_file)

Package 'jrrosell'

Help Index

Add hash for each row

Description

Usage

Arguments

Examples

Data type utilities

Description

Usage

Arguments

Details

Source

Examples

Multiple aside functions with base R pipe

Description

Usage

Arguments

Examples

Calculate split proportion

Description

Usage

Arguments

Details

Source

Examples

Calculate split size

Description

Usage

Arguments

Details

Source

Examples

Create a vector of characters from a string

Description

Usage

Arguments

Details

Value

See Also

Examples

Check if the last github version is installed

Description

Usage

Arguments

Examples

Count the number of duplicated rows

Description

Usage

Arguments

Examples

Count a variable or variables sorted

Description

Usage

Arguments

Examples

Detect cores that could be used

Description

Usage

Arguments

Details

Examples

Fit a workflow with specific parameters

Description

Usage

Arguments

Examples

Fuzzy Token Set Ratio

Description

Usage

Arguments

Details

Value

Examples

Glimpse multiple datasets

Description

Usage

Arguments

Examples

Do the last fit and get the metrics