| Title: | Personal R package for Jordi Rosell |
|---|---|
| Description: | Useful functions for personal usage. |
| Authors: | Jordi Rosell [aut, cre] (ORCID: <https://orcid.org/0000-0002-4349-1458>) |
| Maintainer: | Jordi Rosell <[email protected]> |
| License: | CC0 |
| Version: | 0.0.0.9014 |
| Built: | 2026-05-28 08:32:37 UTC |
| Source: | https://github.com/jrosell/jrrosell |
It sorts the column names, it hash every row and add the column.
add_row_hash(df, primary_keys)add_row_hash(df, primary_keys)
df |
a data.frame |
primary_keys |
the column anmes of the primary key |
df <- data.frame( id = c(1, 2, 3), name = c("AAAAA", "BBBB", "CCC") ) add_row_hash(df, id)df <- data.frame( id = c(1, 2, 3), name = c("AAAAA", "BBBB", "CCC") ) add_row_hash(df, id)
It changes the selected code for the the generated documentation using the configured ollama model.
addin_generate_documentation( model = "qwen2.5-coder:3b", context = rstudioapi::getActiveDocumentContext() )addin_generate_documentation( model = "qwen2.5-coder:3b", context = rstudioapi::getActiveDocumentContext() )
model |
A single string with the ollama model to use. |
context |
the IDE context. Defaults to rstudioapi::getActiveDocumentContext |
Nothing.
Get the bit representation of a double number
as.bitstring(x)as.bitstring(x)
x |
A numeric vetor. |
Get the bit representation of a double number Using rev() ensures that the bit order is correct, and the binary representation aligns with the usual convention of having the MSB first and the LSB last. This is because numToBits() returns the bits in the reverse order, and without rev(), we end up with the LSB first and the MSB last.
https://youtu.be/J4DnzjIFj8w
0.1 + 0.2 == 0.3 as.bitstring(0.1 + 0.2) as.bitstring(0.3)0.1 + 0.2 == 0.3 as.bitstring(0.1 + 0.2) as.bitstring(0.3)
Multiple aside functions with base R pipe
aside(x, ...)aside(x, ...)
x |
An object |
... |
functions to run aside |
n_try <- 1 rnorm(200) |> matrix(ncol = 2) |> aside( print("Matrix prepared"), print(n_try) ) |> colSums()n_try <- 1 rnorm(200) |> matrix(ncol = 2) |> aside( print("Matrix prepared"), print(n_try) ) |> colSums()
From a data frame, it returns the minimal split proportion for validation.
calc_split_prop(df, k = 10)calc_split_prop(df, k = 10)
df |
A data frame |
k |
number of desired folds (default 10) |
The calc_validation_size function returns the optimal split proportion according to the number of rows for your validation set.
https://stats.stackexchange.com/a/305063/7387
calc_split_prop(data.frame(row = 1:891))calc_split_prop(data.frame(row = 1:891))
From binary classification problems, with the desired std_err it returns the minimal assesment/validation set size.
calc_split_size( std_err = NULL, confidence_interval = 0.95, margin_error = 0.02 )calc_split_size( std_err = NULL, confidence_interval = 0.95, margin_error = 0.02 )
std_err |
The desired std_err numeric (default NULL) |
confidence_interval |
(default 0.95) |
margin_error |
(default 0.02) |
The calc_validation_size function returns the minimal validation size for expected probabilities and desired error. s
https://stats.stackexchange.com/a/304996/7387
calc_split_size() calc_split_size(confidence_interval = 0.95, margin_error = 0.02) calc_split_size(std_err = 0.02)calc_split_size() calc_split_size(confidence_interval = 0.95, margin_error = 0.02) calc_split_size(std_err = 0.02)
Create a vector of characters from a string
chars(x, ...)chars(x, ...)
x |
a vector of characters of length 1. |
... |
unused |
chars expects a single string as input. To create a list of these,
consider lapply(strings, chars).
a vector of characters
https://github.com/jonocarroll/charcuterie
chars("hola")chars("hola")
Check if the last main github version is installed.
check_installed_github(repo)check_installed_github(repo)
repo |
a github repo/package. Ex: check_installed_github("tidyverse/dplyr") |
if (FALSE) { check_installed_github("jrosell/jrrosell") }if (FALSE) { check_installed_github("jrosell/jrrosell") }
Count the number of duplicated rows
count_duplicated_rows(df)count_duplicated_rows(df)
df |
a data.frame |
count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(3, 4, 5))) count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(1, 4, 5)))count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(3, 4, 5))) count_duplicated_rows(data.frame(a = c(1, 2, 3), b = c(1, 4, 5)))
It returns the ordered counts of the variable in the data.frame.
count_sorted(df, ...)count_sorted(df, ...)
df |
a data.frame |
... |
the variables to use and other arguments to count |
data.frame(a = c("x", "y", "x"), b = c("z", "z", "n")) |> count_sorted(a)data.frame(a = c("x", "y", "x"), b = c("z", "z", "n")) |> count_sorted(a)
Select cores in max/min of the available cores.
detect_cores(max = 10, min = 2)detect_cores(max = 10, min = 2)
max |
An integer with the max desired cores (default 10) |
min |
An integer with the min desired cores (default 2) |
The detect_cores function uses parallelly package. It returns the desired max cores if available or it fails if not min cores are available (excluding parallelly.availableCores.omit reserved cores or 1 if not defined).
cores <- detect_cores(max = 5, min = 1) print(cores)cores <- detect_cores(max = 5, min = 1) print(cores)
Print and expression and return invisible NULL at the end of a pipe.
end_pipe(x, expr)end_pipe(x, expr)
x |
An object |
expr |
An expresion |
Fit a workflow with specific parameters
fit_results(wf, resamples, param_info = NULL, grid = 10, fn = "tune_grid", ...)fit_results(wf, resamples, param_info = NULL, grid = 10, fn = "tune_grid", ...)
wf |
workflow |
resamples |
rset |
param_info |
for tune_* functions |
grid |
for tune_* functions |
fn |
the name of the function to run when tuning |
... |
Optional engine arguments |
This function computes a fuzzy similarity score between two strings based on the token set ratio methodology. It considers the intersection and differences between tokenized word sets from the input strings, and calculates a similarity score normalized by string lengths.
fuzzy_token_set_ratio(s1, s2, score_cutoff = 0)fuzzy_token_set_ratio(s1, s2, score_cutoff = 0)
s1 |
A character string. The first string to compare. |
s2 |
A character string. The second string to compare. |
score_cutoff |
A numeric value (default is |
This function performs the following steps:
Tokenizes the input strings.
Identifies intersecting and differing tokens between the two tokenized sets.
Computes the longest common subsequence (LCS) distance for differing tokens and normalizes it.
Calculates similarity ratios for intersecting tokens combined with differing token sets.
Returns the maximum of the normalized LCS distance and the two intersecting token ratios.
The function short-circuits to return 100 if one token set is a subset of the other. If either input string is empty, the function returns 0.
A numeric similarity score between 0 and 100, representing the degree of similarity between the two input strings.
# Example usage: fuzzy_token_set_ratio("fuzzy was a bear", "fuzzy was a dog", score_cutoff = 80) fuzzy_token_set_ratio("hello world", "world hello")# Example usage: fuzzy_token_set_ratio("fuzzy was a bear", "fuzzy was a dog", score_cutoff = 80) fuzzy_token_set_ratio("hello world", "world hello")
It returns the genereated documentation from the selected model
generate_documentation(x, model = "qwen2.5-coder:3b")generate_documentation(x, model = "qwen2.5-coder:3b")
x |
A single string. |
model |
A single string with the ollama model to use. |
Generated documentation as a character string.
The multilingual sentiment lexicon was obtained from here on 2024-12-18 https://aclanthology.org/P14-2063/
get_sentiments_by_language(language = "en", lexicon = "chen_skiena")get_sentiments_by_language(language = "en", lexicon = "chen_skiena")
language |
two letters language code. |
lexicon |
default and only valid value "chen_skiena" |
The files were generated this way: chen_skiena_lexicon <- bind_rows( here::here("P14-2063.Datasets", "readable_neg_words_list.txt") |> read_delim(delim = " ", col_names = c("word", "lang")) |> mutate(sentiment = factor("negative", levels = c("negative", "positive"))), here::here("P14-2063.Datasets", "readable_pos_words_list.txt") |> read_delim(delim = " ", col_names = c("word", "lang")) |> mutate(sentiment = factor("positive", levels = c("negative", "positive"))) ) chen_skiena_lexicon |> write_fst( here::here("inst", "extdata", "chen_skiena_lexicon.fst"), compress = 100 ) top_languages <- rlang::chr( ca = 'catalan', zh = 'chinese_simplified', da = 'danish', nl = 'dutch', en = 'english', eo = 'esperanto', fi = 'finnish', fr = 'french', de = 'german', el = 'greek', hu = 'hungarian', it = 'italian', la = 'latin', pt = 'portuguese', es = 'spanish', sv = 'swedish' ) nrc_lexicon <- read_delim("NRC-Emotion-Lexicon-ForVariousLanguages.txt", delim = "\t") |> janitor::clean_names() |> pivot_longer(cols = anger:trust, names_to = "sentiment") |> rename(english = english_word) |> pivot_longer(cols = -c(sentiment, value), names_to = "language", values_to = "word") |> filter(sentiment %in% c("positive", "negative")) |> filter(language %in% top_languages) |> transmute( sentiment = factor(sentiment, c("negative", "positive")), language = factor(language, unique(language)), word = factor(word, unique(word)), ) |> glimpse() nrc_lexicon |> write_fst( here::here("nrc_lexicon.fst"), compress = 100 )
A tibble with word and sentiment columns
https://juliasilge.github.io/tidytext/reference/get_sentiments.html
get_sentiments_by_language("ca")get_sentiments_by_language("ca")
Glimpse multiple datasets
glimpses(...)glimpses(...)
... |
Multiple data.frame |
df1 <- data.frame(a = c(1, 2)) df2 <- data.frame(b = c(3, 4)) glimpses(df1, df2)df1 <- data.frame(a = c(1, 2)) df2 <- data.frame(b = c(3, 4)) glimpses(df1, df2)
Do the last fit and get the metrics
last_fit_metrics(res, split, metric)last_fit_metrics(res, split, metric)
res |
Tune results |
split |
The initial split object |
metric |
What metric to use to select the best workflow |
Use with caution. It will overwrite your files.
name_unnamed_chunks(file_path)name_unnamed_chunks(file_path)
file_path |
the file name |
This function processes a given text string by converting it to lowercase, removing numbers, non-alphanumeric characters, extra whitespace. It also transliterates text to ASCII, splits words, and reconstructs a clean text string suitable for analysis.
normalize_text(text, remove_digits = TRUE, remove_accents = TRUE)normalize_text(text, remove_digits = TRUE, remove_accents = TRUE)
text |
A character vector or object that can be coerced to a character string. Represents the input text to be cleaned. |
remove_digits |
= TRUE |
remove_accents |
= TRUE |
A normalized character vector
Center and scale double vectors
normalize_vec(...)normalize_vec(...)
... |
a double vector or multiple double vectors |
normalize_vec(1, 2, 3, )normalize_vec(1, 2, 3, )
The notify_finished make a sound using beepr::beep, compose and email and send it returing the blastula::smtp_send call results.
notify_finished(name, body = "", ..., sound = 1, tictoc_result = NULL)notify_finished(name, body = "", ..., sound = 1, tictoc_result = NULL)
name |
The process name (Required) |
body |
The contents of the email (Default "") |
... |
Additional arguments to pass to the template function. If you're using the default template, you can use font_family to control the base font, and content_width to control the width of the main content; see blastula_template(). By default, the content_width is set to 1000px. Using widths less than 600px is generally not advised but, if necessary, be sure to test such HTML emails with a wide range of email clients before sending to the intended recipients. The Outlook mail client (Windows, Desktop) does not respect content_width. |
sound |
The sound for beepr::beep call (Default 1) |
tictoc_result |
the result from tictoc::toc (Default NULL) |
The following environment variables should be set:
MY_SMTP_USER from
MY_SMTP_RECIPIENT to
MY_SMTP_PASSWORD service password (for gmail you can use https://myaccount.google.com/apppasswords)
MY_SMTP_PROVIDER blastula provider (gmail if not set)
if (exists("not_run")) { tictoc::tic() Sys.sleep(1) jrrosell::notify_finished("job", "Well done", sound = "fanfare", tictoc_result = tictoc::toc()) }if (exists("not_run")) { tictoc::tic() Sys.sleep(1) jrrosell::notify_finished("job", "Well done", sound = "fanfare", tictoc_result = tictoc::toc()) }
Get the name of the package from the DESCRIPTION file of the master branch in the github repo
package_github_name(x, file_lines = NULL)package_github_name(x, file_lines = NULL)
x |
a single repo/package to check Ex: package_github_name("tidyverse/dplyr") |
file_lines |
(default = NULL, internal) |
if (FALSE) { package_github_name("jrosell/jrrosell") }if (FALSE) { package_github_name("jrosell/jrrosell") }
Get the version from the DESCRIPTION file of the master branch in the github repo
package_github_version(x, file_lines = NULL)package_github_version(x, file_lines = NULL)
x |
a single repo/package to check Ex: package_github_version("tidyverse/dplyr") |
file_lines |
(default = NULL, internal) |
if (FALSE) { package_github_version("jrosell/jrrosell") }if (FALSE) { package_github_version("jrosell/jrrosell") }
Plot bars for non double columns
plot_bars(df, ..., top_values = 50)plot_bars(df, ..., top_values = 50)
df |
a data.frame |
... |
optional parameters to geom_histogram |
top_values |
fist most common values (default 50) |
plot_bars(data.frame(a = c("x", "y"), b = c("z", "z")))plot_bars(data.frame(a = c("x", "y"), b = c("z", "z")))
Plot histograms for double columns
plot_histograms(df, ...)plot_histograms(df, ...)
df |
a data.frame |
... |
optional parameters to geom_histogram |
plot_histograms(data.frame(a = c(1, 2), b = c(1, 3)))plot_histograms(data.frame(a = c(1, 2), b = c(1, 3)))
Plot missing values
plot_missing(df)plot_missing(df)
df |
a data.frame |
plot_missing(data.frame(a = c(1, NA), b = c(NA, 4)))plot_missing(data.frame(a = c(1, NA), b = c(NA, 4)))
It returns a bar or a histogram of the variable
plot_variable(df, variable, ..., type = "numeric")plot_variable(df, variable, ..., type = "numeric")
df |
a data.frame |
variable |
the variable to use. |
... |
params passed to geom_* |
type |
numeric (default) or nominal. |
data.frame(a = c("x", "y", "y"), b = c("z", "z", "x")) |> plot_variable(a)data.frame(a = c("x", "y", "y"), b = c("z", "z", "x")) |> plot_variable(a)
Prep, juice and glimpse a recipe or workflow
prep_juice(object)prep_juice(object)
object |
A recipe or a workflow object with a recipe |
https://recipes.tidymodels.org/reference/update.step.html
Prep, juice and get cols from a recipe or workflow
prep_juice_ncol(object)prep_juice_ncol(object)
object |
A recipe or a workflow object with a recipe |
Prepare docs for Analysis
prepare_docs(df, ...)prepare_docs(df, ...)
df |
data frame with and id and text columns. |
... |
paramters passed to |
A df with a list of tokens and character vector prepared_text columns for documents at column id and text at column "text"
# Example usage: prepare_docs(data.frame(id = 1, text = "¡Hola! Esto es una prueba 123."))# Example usage: prepare_docs(data.frame(id = 1, text = "¡Hola! Esto es una prueba 123."))
This function processes a given text string by converting it to lowercase, removing numbers, non-alphanumeric characters, extra whitespace, and stopwords based on a specified language. It also transliterates text to ASCII, splits words, and reconstructs a clean text string suitable for analysis.
prepare_text(...)prepare_text(...)
... |
paramters passed to |
A cleaned character string, with stopwords removed and text formatted for analysis.
# Example usage: prepare_text("¡Hola! Esto es una prueba 123.")# Example usage: prepare_text("¡Hola! Esto es una prueba 123.")
This function processes a given text string by converting it to lowercase, removing numbers, non-alphanumeric characters, extra whitespace, and stopwords based on a specified language. It also transliterates text to ASCII, splits words, and reconstructs a clean text string suitable for analysis.
prepare_tokens( text, stopwords = NULL, lang = "spanish", sep = "\\s+", remove_digits = TRUE, remove_accents = TRUE, lemmatize = c("none", "udpipe", "spacyr"), model_dir = getwd() )prepare_tokens( text, stopwords = NULL, lang = "spanish", sep = "\\s+", remove_digits = TRUE, remove_accents = TRUE, lemmatize = c("none", "udpipe", "spacyr"), model_dir = getwd() )
text |
A character vector or object that can be coerced to a character string. Represents the input text to be cleaned. |
stopwords |
A character vector specifying stopwords removal. Defaults tm:stopwords package. |
lang |
defaults to |
sep |
separator for spliting defaults to |
remove_digits |
= TRUE |
remove_accents |
= TRUE |
lemmatize |
= c("none", "udpipe", "spacyr") defaults to |
model_dir |
defaults to getwd() |
A cleaned character vector, with stopwords removed and text formatted for analysis and can be lemmatized optionally and then returns a character vector of lemmas.
It's useful for reading the most common types of flat file data, comma separated values and tab separated values.
read_chr( file, delim = ",", locale = NULL, ..., date_names = "en", date_format = "%AD", time_format = "%AT", decimal_mark = ".", grouping_mark = "", tz = "CET", encoding = "UTF-8", asciify = FALSE )read_chr( file, delim = ",", locale = NULL, ..., date_names = "en", date_format = "%AD", time_format = "%AT", decimal_mark = ".", grouping_mark = "", tz = "CET", encoding = "UTF-8", asciify = FALSE )
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). |
delim |
Single character used to separate fields within a record. |
locale |
The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use locale() to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names. |
... |
Other parameters to readr::read_delim. |
date_names |
"en" from readr::locale |
date_format |
"%AD" from readr::locale |
time_format |
"%AT" from readr::locale |
decimal_mark |
"." from readr::locale |
grouping_mark |
"" from readr::locale |
tz |
"CET" |
encoding |
"UTF-8" |
asciify |
FALSE |
The read_chr function works like readr::read_delim, except that
column sreturned would be characters and with clean names. It requires readr and janitor packages installed.
read_chr(readr::readr_example("mtcars.csv"), delim = ",")read_chr(readr::readr_example("mtcars.csv"), delim = ",")
Read the HTML text of a URL with rate-limiting
read_url(url, sleep = 1, capacity = 1, realm = NULL)read_url(url, sleep = 1, capacity = 1, realm = NULL)
url |
Full URL to request |
sleep |
Time (in seconds) to refill the bucket. Default: 1 |
capacity |
Max requests per refill period. Default: 1 (i.e., one request every |
realm |
Optional unique throttling scope. Defaults to domain of URL. |
It's useful for getting the text of webpages in a single character vector.
HTML content as string or NULL on failure
if (FALSE) read_url("https://www.google.cat/", sleep = 1)if (FALSE) read_url("https://www.google.cat/", sleep = 1)
It's useful for reading a single sheets from a Excel/Openoffice file.
read_xlsx(xlsxFile, ..., sheet = 1, startRow = 1)read_xlsx(xlsxFile, ..., sheet = 1, startRow = 1)
xlsxFile |
The name of the file. |
... |
Other parameters to openxls::read.xlsx function |
sheet |
The name or index of the sheet (default 1) |
startRow |
The number of the starting reading row (default 1) |
The write_xlsx it's a wroapper for openxls::write.xlsx.
l <- list("IRIS" = iris, "MTCARS" = mtcars) tmp_file <- tempfile(fileext = ".xlsx") write_xlsx(l, tmp_file) df <- read_xlsx(tmp_file) file.remove(tmp_file)l <- list("IRIS" = iris, "MTCARS" = mtcars) tmp_file <- tempfile(fileext = ".xlsx") write_xlsx(l, tmp_file) df <- read_xlsx(tmp_file) file.remove(tmp_file)
This function processes character vectors and remove the specified stop words or the stoop words of the langauge from the tm package
remove_stopwords(text, stopwords = NULL, lang = "spanish")remove_stopwords(text, stopwords = NULL, lang = "spanish")
text |
A character vector or object that can be coerced to a character string. Represents the input text to be cleaned. |
stopwords |
A character vector specifying stopwords removal. Defaults tm:stopwords package. |
lang |
defaults to |
A character vector without stopwords
When parallizing within resamples, required memory can crash the system.
request_max_safe_cores_from_rss( estimated_max_rss, memory_usage = 0.5, verbose = TRUE )request_max_safe_cores_from_rss( estimated_max_rss, memory_usage = 0.5, verbose = TRUE )
estimated_max_rss |
butes of maximum rss it will used (You can get it from syrup package) |
memory_usage |
the proportion of the system memory that will be used (0.8) |
verbose |
to debug (TRUE) |
The detect_cores function uses parallelly package. It returns the desired max cores if available or it fails if not min cores are available (excluding system reserved cores).
Extract body from httr2 response using yyjsonr
resp_body_yyjson(resp, check_type = TRUE, simplifyVector = FALSE, ...)resp_body_yyjson(resp, check_type = TRUE, simplifyVector = FALSE, ...)
resp |
A httr2::response object, created by httr2::req_perform(). |
check_type |
Should the type actually be checked? Provided as a
convenience for when using this function inside |
simplifyVector |
Should JSON arrays containing only primitives (i.e. booleans, numbers, and strings) be caused to atomic vectors? |
... |
Other parameters |
It generates slug URLs as WordPress does
sanitize_title_with_dashes(title)sanitize_title_with_dashes(title)
title |
the title |
sanitize_title_with_dashes("Hello world")sanitize_title_with_dashes("Hello world")
Tune a recipe using glmnet and lightgbm and stacks
score_recipe(rec, resamples, grids = list(10, 10), metric = "accuracy")score_recipe(rec, resamples, grids = list(10, 10), metric = "accuracy")
rec |
recipe |
resamples |
rset |
grids |
for glmnet and lightgbm tuning |
metric |
to be compared |
It generates slug URLs handling ASCII normalization
slugify(x)slugify(x)
x |
a character vector |
sanitize_title_with_dashes("Hello world")sanitize_title_with_dashes("Hello world")
spain_ccaas
spain_ccaasspain_ccaas
spain_ccaasA sf object with 19 rows and 4 columns:
https://github.com/koldLight/curso-r-dataviz/blob/master/dat/spain_ccaas.geojson
spain_provinces
spain_provincesspain_provinces
spain_provincesA sf object with 60 rows and 4 columns:
https://github.com/koldLight/curso-r-dataviz/blob/master/dat/spain_provinces.geojson
Sum the missing values from a data.frame
sum_missing(...)sum_missing(...)
... |
one or multiple data.frame |
Select constant columns from a data.frame
summarize_n_distinct(df)summarize_n_distinct(df)
df |
a data.frame |
Pipe a value forward into a functio or call expression and return the original value instead of the result. This is useful when an expression is used for its side-effect, say plotting or printing.
tee(x, expr)tee(x, expr)
x |
An object |
expr |
An expresion |
The tee pipe works like |>, except the
return value is x itself, and not the result of expr call.
I want to give credit to Michael Milton and Matthew Kay for the idea and the code.
https://mastodon.social/@[email protected]/109555362766969210
It requires roboto fonts installed in your O.S. and run z
theme_roboto( base_size = 13, strip_text_size = 14, strip_text_margin = 6, subtitle_size = 14, subtitle_margin = 10, plot_title_size = 18, plot_title_margin = 12, ... )theme_roboto( base_size = 13, strip_text_size = 14, strip_text_margin = 6, subtitle_size = 14, subtitle_margin = 10, plot_title_size = 18, plot_title_margin = 12, ... )
base_size |
= 11 |
strip_text_size |
= 12 |
strip_text_margin |
= 5 |
subtitle_size |
= 13 |
subtitle_margin |
= 10 |
plot_title_size |
= 16 |
plot_title_margin |
= 10 |
... |
Other parameters passed to theme_set |
Sets a dark blue colored dark minimal theme using the Roboto font family
theme_set_roboto_darkblue(...)theme_set_roboto_darkblue(...)
... |
Other parameters passed to theme_set |
This function generates a character vector for a given text string
tokenize_text(text, sep = "\\s+")tokenize_text(text, sep = "\\s+")
text |
A character vector or object that can be coerced to a character string. Represents the input text to be cleaned. |
sep |
= "\s+" |
A character vector
Update the vaules of a specific recipe step located by id
update_step(object, target_id, ...)update_step(object, target_id, ...)
object |
A recipe or a workflow object with a recipe |
target_id |
The id name of the step |
... |
The arguments to update the step. |
Create an xgboost tunable workflow for regression and classification
workflow_boost_tree(rec, engine = "xgboost", counts = TRUE, ...)workflow_boost_tree(rec, engine = "xgboost", counts = TRUE, ...)
rec |
prerocessing recipe to build the workflow |
engine |
xgboost, lightgbm (xgboost by default) |
counts |
Optional logic argument wether mtry use counts or not |
... |
optional engine arguments |
Create a tuneable glmnet worfklow for regression and classification
workflow_elasticnet(rec, engine = "glmnet", ...)workflow_elasticnet(rec, engine = "glmnet", ...)
rec |
prerocessing recipe to build the workflow |
engine |
glmnet, spark, brulee (glmnet by default) |
... |
Optional engine arguments |
It's useful for saving multiple data to a multiple sheets of a single Excel/Openoffice/libreoffice file.
write_xlsx(data, distfile, ...)write_xlsx(data, distfile, ...)
data |
A named list of tibbles |
distfile |
The name of the destination file. |
... |
Other parameters to openxls::write.xlsx function |
The write_xlsx it's a wroapper for openxls::write.xlsx.