| Title: | Preprocessing and Feature Engineering Steps before Modeling |
|---|---|
| Description: | The package provides pipeable functions to simplify preprocessing of tabular data prior to machine learning modeling. Users can combine multiple datasets, define feature engineering steps (such as creating new predictors from nominal or numeric columns), and then split the data back into preprocessed datasets ready to be used in machine learning workflows. |
| Authors: | Jordi Rosell [aut, cre] (ORCID: <https://orcid.org/0000-0002-4349-1458>) |
| Maintainer: | Jordi Rosell <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-06-07 08:32:49 UTC |
| Source: | https://github.com/jrosell/barn |
Combine multiple data frames based on their common columns.
That's the first step for preprocessing with the barn package.
When printing it shows the characteristics of the combined datasets.
barn(..., nominal_sufix = "_cat", numeric_sufix = "_num") ## S3 method for class 'barn' print(x, form_width = 30, ...)barn(..., nominal_sufix = "_cat", numeric_sufix = "_num") ## S3 method for class 'barn' print(x, form_width = 30, ...)
... |
Extra arguments. |
nominal_sufix |
An optional string for dealing with nominal variables. Defaults to "_cat". |
numeric_sufix |
An optional string for dealing with numeric variables. Defaults to "_num". |
x |
An object of class "barn". |
form_width |
An integer specifying the minimum column width (in characters). Default is 30. |
A barn object containing the combined data frame, row counts,
full <- data.frame(id = 1:3, p1 = c("A", "B", "C"), p2 = 10:12, y = 1:3) holdout <- data.frame(id = 4:5, p1 = c("D", "E"), p2 = 1:2) original <- data.frame(id = 1:2, p1 = c("F", "G"), p2 = 3:4, y = 4:5) print(barn(full, holdout, original))full <- data.frame(id = 1:3, p1 = c("A", "B", "C"), p2 = 10:12, y = 1:3) holdout <- data.frame(id = 4:5, p1 = c("D", "E"), p2 = 1:2) original <- data.frame(id = 1:2, p1 = c("F", "G"), p2 = 3:4, y = 4:5) print(barn(full, holdout, original))
Splits the combined data frame from a barn object back into a named list
containing the preprocessed predictors.
harvest(barn_obj)harvest(barn_obj)
barn_obj |
An object of class |
A named list of data frames, one for each dataset originally passed to
barn().
full <- data.frame(id = 1:3, p1 = c("A", "B", "C"), p2 = 10:12, y = 1:3) holdout <- data.frame(id = 4:5, p1 = c("D", "E"), p2 = 1:2) original <- data.frame(id = 1:2, p1 = c("F", "G"), p2 = 3:4, y = 4:5) harvested <- barn(full, holdout) |> harvest() names(harvested) harvested[["full"]]full <- data.frame(id = 1:3, p1 = c("A", "B", "C"), p2 = 10:12, y = 1:3) holdout <- data.frame(id = 4:5, p1 = c("D", "E"), p2 = 1:2) original <- data.frame(id = 1:2, p1 = c("F", "G"), p2 = 3:4, y = 4:5) harvested <- barn(full, holdout) |> harvest() names(harvested) harvested[["full"]]
Frequency encoding of nominal variables.
plant_count_encode(barn_obj, nominal_suffix = "_cat")plant_count_encode(barn_obj, nominal_suffix = "_cat")
barn_obj |
A Barn object, created by |
nominal_suffix |
The suffix applied to column names. Defaults to "_cat". |
The modified barn_obj with the transformed combined data frame.
Creates new integer columns by extracting specific digits from numeric columns. This function emulates a feature engineering technique often used in machine learning.
plant_decimals_extract(barn_obj, numeric_sufix = "_num", from = 1, to = 10)plant_decimals_extract(barn_obj, numeric_sufix = "_num", from = 1, to = 10)
barn_obj |
A |
numeric_sufix |
The suffix used to identify numeric columns to process. Defaults to "_num". |
from |
The starting digit position to extract (e.g., 1 for the first decimal place). Defaults to 1. |
to |
The ending digit position to extract (e.g., 9 for the ninth decimal place). Defaults to 9. |
The modified barn_obj with the transformed combined data frame.
df <- tibble::tibble(x_num = c(1.234, 5.678, NA)) b <- barn(df) |> plant_decimals_extract(from = 1, to = 3) harvest(b)[[1]]df <- tibble::tibble(x_num = c(1.234, 5.678, NA)) b <- barn(df) |> plant_decimals_extract(from = 1, to = 3) harvest(b)[[1]]
Creates new numeric columns by rounding existing numeric columns at specified decimal precisions. This is useful for feature engineering, where different rounding granularities may capture meaningful patterns.
plant_decimals_round(barn_obj, numeric_sufix = "_num", precisions = c(9, 8))plant_decimals_round(barn_obj, numeric_sufix = "_num", precisions = c(9, 8))
barn_obj |
A |
numeric_sufix |
The suffix used to identify numeric columns to process. Defaults to "_num". |
precisions |
A numeric vector specifying the number of decimal
places to round to (e.g., |
The modified barn_obj with the transformed combined data frame.
df <- tibble::tibble(x_num = c(1.23456789)) b <- barn(df) |> plant_decimals_round(precisions = c(2, 3)) harvest(b)[[1]] harvest(b)[[1]]$x_r2_num harvest(b)[[1]]$x_r3_numdf <- tibble::tibble(x_num = c(1.23456789)) b <- barn(df) |> plant_decimals_round(precisions = c(2, 3)) harvest(b)[[1]] harvest(b)[[1]]$x_r2_num harvest(b)[[1]]$x_r3_num
Transform nominal columns from factors to integers.
plant_label_encode(barn_obj)plant_label_encode(barn_obj)
barn_obj |
An instance of class "barn". |
The modified barn_obj with the transformed combined data frame.
A function to create new features based on combinations of categorical columns in a barn object.
plant_new_nominal_pairs(barn_obj, nominal_suffix = "_cat")plant_new_nominal_pairs(barn_obj, nominal_suffix = "_cat")
barn_obj |
An object inheriting from the "barn" class. |
nominal_suffix |
A character string that specifies the suffix for the newly created columns. Optional, default is "_cat". |
The modified barn_obj with the transformed combined data frame.
A function to transform numeric and character columns in a barn object into new factor columns.
It appends "_num" for numeric columns, "_cat" for character columns, and renames both to factors.
Original columns are deleted from the combined data frame within the barn object.
plant_new_numeric_factors( barn_obj, numeric_suffx = "_num", nominal_suffix = "_cat" )plant_new_numeric_factors( barn_obj, numeric_suffx = "_num", nominal_suffix = "_cat" )
barn_obj |
A |
numeric_suffx |
The suffix for new numeric factor columns. Default is "_num". |
nominal_suffix |
The suffix for new nominal factor columns. Default is "_cat". |
The modified barn_obj with the transformed combined data frame.
A function to group and summarize to add aggregations to a barn_obj using specified variables and expressions.
WARNING: Risk of overfitting and bad generalization if not done
when resampling.
plant_summarize(barn_obj, .by = NULL, ...)plant_summarize(barn_obj, .by = NULL, ...)
barn_obj |
An object of class 'barn'. |
.by |
Varible(s) to group by. Currently unused; must be empty. |
... |
Expressions to compute summarizing values. |
A modified barn_obj with summarized data in the combined slot.