LogisticRegression / Classifier Layer

Logistic Regression Classifier - a fundamental classification algorithm that models probability of discrete outcomes. Similar to sklearn.linear_model.LogisticRegression.

Mathematical form: where are the model weights and is the feature vector.

Key characteristics:

  • Linear decision boundary
  • Probabilistic predictions
  • Binary and multiclass support
  • Multiple regularization options
  • Efficient for sparse data

Common applications:

  • Risk assessment (credit scoring, medical diagnosis)
  • Customer behavior prediction
  • Email spam detection
  • Binary text classification
  • Marketing response prediction
  • Quality control pass/fail

Outputs:

  1. Predicted Table: Input data with added prediction columns
  2. Validation Results: Cross-validation metrics on training data
  3. Test Metric: Performance metrics on test set
  4. ROC Curve Data: True/False positive rates for threshold tuning
  5. Confusion Matrix: Detailed classification performance breakdown
  6. Feature Importances: Coefficient magnitudes showing feature relevance

Note: Works best with preprocessed, standardized numerical features.

Table
0
0
Predicted Table
1
Validation Results
2
Test Metric
3
ROC Curve Data
4
Confusion Matrix
5
Feature Importances

SelectFeatures

[column, ...]

Feature columns for model training. Selection tips:

  • Include relevant predictive variables
  • Avoid highly correlated features
  • Consider feature scaling requirements
  • Balance between information and model complexity

Best practices:

  • Remove redundant features
  • Include domain-important variables
  • Consider feature interactions
  • Check for multicollinearity

If empty, uses all numeric columns except target.

Target column for classification. Requirements:

  • Categorical or numeric labels
  • At least two unique classes
  • No missing values

Preprocessing tips:

  • Encode categorical labels
  • Check class balance
  • Consider label noise
  • Verify label consistency

Params

oneof
DefaultParams

Standard parameter configuration optimized for general-purpose classification tasks.

Default configuration:

  • L2 regularization (prevents overfitting)
  • LBFGS solver (memory-efficient quasi-Newton method)
  • Balanced class weights (handles imbalanced datasets)
  • C=0.1 (moderate regularization strength)
  • 100 max iterations

Best suited for:

  • Medium-sized datasets
  • Relatively balanced classes
  • When feature scaling is applied
  • Initial model exploration

Configurable parameters for fine-tuning the logistic regression model. Allows detailed control over regularization, optimization, and convergence behavior. Essential for adapting the model to specific data characteristics and performance requirements.

LbfgsL2

The combination of solver and penalty to use. Solver-penalty combinations for optimization. Each combination offers different trade-offs:

Performance characteristics:

  • Small datasets (n < 10k): Choose liblinear
  • Large datasets: Prefer sag or saga
  • Many features: newton-cg or lbfgs
  • Sparse data: saga with L1

Memory usage:

  • Low memory: liblinear, saga
  • Medium: lbfgs, newton-cg
  • High: newton-cholesky (quadratic in features)

Multiclass support:

  • Full support: newton-cg, sag, saga, lbfgs
  • Binary only: newton-cholesky
  • One-vs-rest only: liblinear
LbfgsNone ~

L-BFGS with no penalty

LbfgsL2 ~

L-BFGS with L2 penalty

LibLinearL1 ~

Liblinear with L1 penalty

LibLinearL2 ~

Liblinear with L2 penalty

NewtonCgNone ~

Newton-CG with no penalty

NewtonCgL2 ~

Newton-CG with L2 penalty

NewtonCholeskyNone ~

Newton-Cholesky with no penalty

NewtonCholeskyL2 ~

Newton-Cholesky with L2 penalty

SagNone ~

SAG with no penalty

SagL2 ~

SAG with L2 penalty

SagaNone ~

SAGA with no penalty

SagaL2 ~

SAGA with L2 penalty

SagaL1 ~

SAGA with L1 penalty

SagaElasticNet ~

SAGA with ElasticNet penalty

Dual

bool
false

Optimization formulation selection. Can be used only for liblinear solver with L2 penalty. Dual formulation is advantageous when n_samples < n_features. Primal (false) is preferred when n_samples > n_features. Automatically ignored if incompatible with solver.

0.0001

Convergence criterion threshold. Controls optimization precision:

  • Smaller values: More precise but slower convergence
  • Larger values: Faster but potentially less optimal solution

Typical range: 1e-6 to 1e-3. Adjust based on precision needs.

0.1

Inverse regularization strength (C parameter). Controls model complexity:

  • Small values (<1): Stronger regularization, simpler model
  • Large values (>1): Weaker regularization, more complex model

Common ranges:

  • 0.001 to 0.1: High regularization (noisy data)
  • 0.1 to 1: Moderate regularization (typical)
  • 1 to 10: Low regularization (clean data)

Must be positive. Scale often requires adjustment with dataset size.

true

Whether to include bias term (intercept). Effects:

  • true: Model can learn offset from origin (recommended)
  • false: Decision boundary through origin

Set false only when data is pre-centered or for theoretical analysis.

Scaling factor for synthetic intercept feature. Only used with liblinear solver and fit_intercept=true. Effects:

  • Larger values: More emphasis on intercept fitting
  • Smaller values: Less emphasis on intercept

Useful for handling numerical stability in specific cases.

Balanced

Class weight adjustment strategy for handling imbalanced datasets:

Mathematical form:

  • Balanced:
  • Uniform: for all classes

Impact on model:

  • Affects class importance during training
  • Influences decision boundary placement
  • Controls misclassification penalties
  • Balances precision vs recall trade-off

Selection criteria:

  • Class distribution in data
  • Cost of different error types
  • Business/domain requirements
  • Performance metrics priorities
Uniform ~

Equal weights for all classes:

Characteristics:

  • No adjustment for class frequencies
  • Natural class proportions preserved
  • Faster training process
  • Original data distribution maintained

Best for:

  • Balanced datasets (similar class frequencies)
  • When natural proportions matter
  • Representative sampling
  • When all errors equally costly

Warning: May underperform on imbalanced data

Balanced ~

Weights inversely proportional to class frequencies:

Formula:

where:

  • is weight for class i
  • is total samples
  • is number of classes
  • is samples in class i

Properties:

  • Automatic weight adjustment
  • Compensates class imbalance
  • Balanced error contribution
  • Frequency-based weights

Best for:

  • Imbalanced datasets
  • Minority class importance
  • Skewed distributions
  • Fair classification needs

Note: May increase sensitivity to noise in rare classes

Random number generator seed. Used for:

  • Reproducible results with stochastic solvers (sag, saga, liblinear)
  • Consistent data shuffling
  • Benchmark comparisons

Same seed guarantees identical results across runs.

100

Maximum number of solver iterations. Consider:

  • Increase if model warns about non-convergence
  • Typical ranges: • 100-500: Simple problems • 500-1000: Complex or large datasets • 1000+: Difficult convergence cases

Balance between convergence quality and computation time.

Auto

Strategy for handling multiple classes. Affects both model architecture and training:

  • Auto: Automatically choose based on data and solver
  • Ovr (One-vs-Rest):
    • Trains N binary classifiers
    • Memory efficient
    • Works with all solvers
    • Good for imbalanced classes
  • Multinomial:
    • Single model for all classes
    • More accurate probability estimates
    • Only with newton-cg, sag, saga, lbfgs
    • Better when classes are balanced
Auto ~
Ovr ~
MultiNomial ~
-1

ElasticNet mixing parameter [0, 1]. Controls L1 vs L2 penalty mix:

  • 0.0: Pure L2 regularization
  • 1.0: Pure L1 regularization
  • 0.0-1.0: Mix of both

Only used with elasticnet penalty. Any value other than [0-1] disables elasticnet.

Exhaustive search over specified parameter grid to find optimal model configuration. Similar to sklearn.model_selection.GridSearchCV.

Search process:

  • Tests all parameter combinations
  • Uses cross-validation for each combination
  • Selects best parameters based on scoring metric

Performance considerations:

  • Computation time grows exponentially with parameters
  • Memory usage depends on data size and cv folds
  • Consider RandomizedSearchCV for large parameter spaces

Best practices:

  • Start with broad parameter ranges
  • Refine ranges based on initial results
  • Monitor for overfitting across folds

Penalty

[enum, ...]
L2

Regularization penalty type controlling model complexity and feature selection:

Selection impact:

  • Controls overfitting prevention
  • Influences feature selection
  • Affects model sparsity
  • Determines solution stability

Common use cases:

  • L2: General purpose, dense features
  • L1: Feature selection, sparse solutions
  • ElasticNet: Combined benefits of L1 and L2
  • None: When regularization not needed

Mathematical form:

None ~

No regularization penalty applied

Characteristics:

  • Uncontrolled model complexity
  • Maximum flexibility
  • Risk of overfitting
  • Full parameter range

Best for:

  • Very small datasets
  • Theoretical analysis
  • When bias undesirable
  • Testing/debugging purposes

Warning: Use with caution as it may lead to overfitting

L1 ~

L1 penalty (Lasso):

Characteristics:

  • Absolute magnitude penalty
  • Produces sparse solutions
  • Feature selection capability
  • Path algorithms possible

Best for:

  • Feature selection
  • High-dimensional data
  • When sparse solutions desired
  • Eliminating irrelevant features
L2 ~

L2 penalty (Ridge):

Characteristics:

  • Squared magnitude penalty
  • Shrinks all weights toward zero
  • Handles correlated features well
  • Produces dense solutions

Best for:

  • Most classification tasks
  • When all features potentially relevant
  • Dealing with multicollinearity
  • Stable solutions needed
ElasticNet ~

ElasticNet penalty:

Characteristics:

  • Combines L1 and L2 penalties
  • Controls sparsity via ratio
  • Group selection capability
  • More stable than pure L1

Best for:

  • Correlated features
  • When both sparsity and stability needed
  • Group feature selection
  • Balanced regularization

Solver

[enum, ...]
Lbfgs

Optimization algorithm for minimizing the logistic regression cost function:

Mathematical objective:

Selection criteria:

  1. Dataset characteristics:

    • Sample size (n_samples)
    • Feature count (n_features)
    • Sparsity level
  2. Memory constraints:

    • Low: liblinear, saga
    • Medium: lbfgs, newton-cg
    • High: newton-cholesky
  3. Problem type:

    • Binary classification
    • Multiclass problems
    • Different regularizations

Performance considerations:

  • Convergence speed
  • Memory usage
  • Numerical stability
  • Scalability
Lbfgs ~

Limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm:

Characteristics:

  • Quasi-Newton method
  • Memory efficient
  • Good convergence
  • Handles many features

Best for:

  • Medium to large datasets
  • High-dimensional data
  • When memory limited

Supports: L2 regularization, multinomial

LibLinear ~

Coordinate Descent algorithm from LIBLINEAR library:

Characteristics:

  • Highly optimized
  • Memory efficient
  • Fast for small datasets
  • Supports L1 regularization

Best for:

  • Small to medium datasets
  • Binary classification
  • Sparse features

Supports: L1/L2 regularization, one-vs-rest

NewtonCg ~

Newton-Conjugate Gradient algorithm:

Characteristics:

  • Second-order method
  • Accurate convergence
  • Handles large features
  • More iterations needed

Best for:

  • When accuracy critical
  • High-dimensional data
  • Multinomial problems

Supports: L2 regularization, multinomial

NewtonCholesky ~

Newton method using Cholesky decomposition:

Characteristics:

  • Direct second-order method
  • Very fast convergence
  • High memory usage
  • Numerically stable

Best for:

  • Small to medium features
  • Dense data
  • When memory available

Supports: L2 regularization, binary only

Sag ~

Stochastic Average Gradient descent:

Characteristics:

  • Fast convergence
  • Linear memory usage
  • Efficient for large samples
  • Smooth optimization

Best for:

  • Large datasets
  • Online learning
  • L2 regularization

Supports: L2 regularization, multinomial

Saga ~

SAGA (Stochastic Average Gradient descent variant):

Characteristics:

  • Unbiased SAG variant
  • Supports all penalties
  • Good convergence
  • Memory efficient

Best for:

  • Large datasets
  • Sparse data
  • L1 regularization

Supports: All regularizations, multinomial

CFactor

[f64, ...]
1

Regularization strength values to test. Recommended ranges:

  • Logarithmic scale: [0.001, 0.01, 0.1, 1, 10, 100]
  • Fine-tuning: [0.1, 0.2, 0.5, 1.0]

Smaller values = stronger regularization. Tips:

  • Start with broad range
  • Refine around best values
  • Consider dataset size in selection

Dual

[bool, ...]
false

Dual formulation options to test. Considerations:

  • [false]: Standard for n_samples > n_features
  • [true, false]: When optimal formulation unclear

Only relevant for liblinear solver with L2 penalty.

Tolerance

[f64, ...]
0.0001

Convergence tolerance values to test. Typical ranges:

  • Coarse: [1e-2, 1e-3, 1e-4]
  • Fine: [1e-4, 1e-5, 1e-6]

Trade-off:

  • Lower values: Better precision, slower convergence
  • Higher values: Faster training, less precise

Synthetic feature scaling for intercept. Relevant when:

  • Using liblinear solver
  • fit_intercept is true

Typical values: 1.0 (default) to 100.0 Increase if model has trouble learning intercept.

ClassWeights

[enum, ...]
Balanced

Class weight adjustment strategy for handling imbalanced datasets:

Mathematical form:

  • Balanced:
  • Uniform: for all classes

Impact on model:

  • Affects class importance during training
  • Influences decision boundary placement
  • Controls misclassification penalties
  • Balances precision vs recall trade-off

Selection criteria:

  • Class distribution in data
  • Cost of different error types
  • Business/domain requirements
  • Performance metrics priorities
Uniform ~

Equal weights for all classes:

Characteristics:

  • No adjustment for class frequencies
  • Natural class proportions preserved
  • Faster training process
  • Original data distribution maintained

Best for:

  • Balanced datasets (similar class frequencies)
  • When natural proportions matter
  • Representative sampling
  • When all errors equally costly

Warning: May underperform on imbalanced data

Balanced ~

Weights inversely proportional to class frequencies:

Formula:

where:

  • is weight for class i
  • is total samples
  • is number of classes
  • is samples in class i

Properties:

  • Automatic weight adjustment
  • Compensates class imbalance
  • Balanced error contribution
  • Frequency-based weights

Best for:

  • Imbalanced datasets
  • Minority class importance
  • Skewed distributions
  • Fair classification needs

Note: May increase sensitivity to noise in rare classes

false

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. Effects:

  • true: Faster for multiple similar fits
  • false: Fresh start each time

Useful for:

  • Path algorithms
  • Incremental learning

Random seed for reproducibility. Ensures:

  • Consistent cross-validation splits
  • Reproducible solver behavior
  • Comparable grid search results

Essential for research and debugging.Used when solver == sag, saga or liblinear to shuffle the data.

MaxIter

[u64, ...]
100

Maximum iterations for each fit. Common ranges:

  • Standard: [100, 200, 500]
  • Extended: [100, 500, 1000, 2000]

Tips:

  • Include larger values if seeing non-convergence
  • Consider computation budget
  • Monitor convergence warnings

MultiClass

[enum, ...]
Auto

Strategy for handling multiple classes. Affects both model architecture and training:

  • Auto: Automatically choose based on data and solver
  • Ovr (One-vs-Rest):
    • Trains N binary classifiers
    • Memory efficient
    • Works with all solvers
    • Good for imbalanced classes
  • Multinomial:
    • Single model for all classes
    • More accurate probability estimates
    • Only with newton-cg, sag, saga, lbfgs
    • Better when classes are balanced
Auto ~
Ovr ~
MultiNomial ~

L1Ratio

[f64, ...]
-1

ElasticNet mixing parameter values. Common patterns:

  • Broad search: [0.0, 0.25, 0.5, 0.75, 1.0]
  • Fine-tuning: [0.1, 0.2, 0.3, 0.4]

Only relevant when using elasticnet penalty. Any value other than [0-1] disables elasticnet.

Accuracy

Metric for evaluating model performance during training and validation:

  • Default: Uses estimator's built-in scoring
  • Accuracy: Proportion of correct predictions
  • BalancedAccuracy: Arithmetic mean of recall for each class
  • LogLoss: Negative log-likelihood of true labels
  • RocAuc: Area under ROC curve (threshold-independent)

Choose based on:

  • Class balance (balanced_accuracy for imbalanced)
  • Need for probability calibration (log_loss)
  • Binary vs multiclass (roc_auc for binary)
Default ~
Accuracy ~
BalancedAccuracy ~
LogLoss ~
RocAuc ~

Split

oneof
DefaultSplit

Standard train-test split configuration optimized for general classification tasks.

Configuration:

  • Test size: 20% (0.2)
  • Random seed: 98
  • Shuffling: Enabled
  • Stratification: Based on target distribution

Advantages:

  • Preserves class distribution
  • Provides reliable validation
  • Suitable for most datasets

Best for:

  • Medium to large datasets
  • Independent observations
  • Initial model evaluation

Splitting uses the ShuffleSplit strategy or StratifiedShuffleSplit strategy depending on the field stratified. Note: If shuffle is false then stratified must be false.

Configurable train-test split parameters for specialized requirements. Allows fine-tuning of data division strategy for specific use cases or constraints.

Use cases:

  • Time series data
  • Grouped observations
  • Specific train/test ratios
  • Custom validation schemes

Random seed for reproducible splits. Ensures:

  • Consistent train/test sets
  • Reproducible experiments
  • Comparable model evaluations

Same seed guarantees identical splits across runs.

true

Data shuffling before splitting. Effects:

  • true: Randomizes order, better for i.i.d. data
  • false: Maintains order, important for time series

When to disable:

  • Time dependent data
  • Sequential patterns
  • Grouped observations
0.8

Proportion of data for training. Considerations:

  • Larger (e.g., 0.8-0.9): Better model learning
  • Smaller (e.g., 0.5-0.7): Better validation

Common splits:

  • 0.8: Standard (80/20 split)
  • 0.7: More validation emphasis
  • 0.9: More training emphasis
false

Maintain class distribution in splits. Important when:

  • Classes are imbalanced
  • Small classes present
  • Representative splits needed

Requirements:

  • Classification tasks only
  • Cannot use with shuffle=false
  • Sufficient samples per class

Cv

oneof
DefaultCv

Standard cross-validation configuration using stratified 3-fold splitting.

Configuration:

  • Folds: 3
  • Method: StratifiedKFold
  • Stratification: Preserves class proportions

Advantages:

  • Balanced evaluation
  • Reasonable computation time
  • Good for medium-sized datasets

Limitations:

  • May be insufficient for small datasets
  • Higher variance than larger fold counts
  • May miss some data patterns

Configurable stratified k-fold cross-validation for specific validation requirements.

Features:

  • Adjustable fold count with NFolds determining the number of splits.
  • Stratified sampling
  • Preserved class distributions

Use cases:

  • Small datasets (more folds)
  • Large datasets (fewer folds)
  • Detailed model evaluation
  • Robust performance estimation
3

Number of cross-validation folds. Guidelines:

  • 3-5: Large datasets, faster training
  • 5-10: Standard choice, good balance
  • 10+: Small datasets, thorough evaluation

Trade-offs:

  • More folds: Better evaluation, slower training
  • Fewer folds: Faster training, higher variance

Must be at least 2.

K-fold cross-validation without stratification. Divides data into k consecutive folds for iterative validation.

Process:

  • Splits data into k equal parts
  • Each fold serves as validation once
  • Remaining k-1 folds form training set

Use cases:

  • Regression problems
  • Large, balanced datasets
  • When stratification unnecessary
  • Continuous target variables

Limitations:

  • May not preserve class distributions
  • Less suitable for imbalanced data
  • Can create biased splits with ordered data

Number of folds for cross-validation. Selection guide: Recommended values:

  • 5: Standard choice (default)
  • 3: Large datasets/quick evaluation
  • 10: Thorough evaluation/smaller datasets

Trade-offs:

  • Higher values: More thorough, computationally expensive
  • Lower values: Faster, potentially higher variance

Must be at least 2 for valid cross-validation.

Random seed for fold generation when shuffling. Important for:

  • Reproducible results
  • Consistent fold assignments
  • Benchmark comparisons
  • Debugging and validation

Set specific value for reproducibility across runs.

true

Whether to shuffle data before splitting into folds. Effects:

  • true: Randomized fold composition (recommended)
  • false: Sequential splitting

Enable when:

  • Data may have ordering
  • Better fold independence needed

Disable for:

  • Time series data
  • Ordered observations

Stratified K-fold cross-validation maintaining class proportions across folds.

Key features:

  • Preserves class distribution in each fold
  • Handles imbalanced datasets
  • Ensures representative splits

Best for:

  • Classification problems
  • Imbalanced class distributions
  • When class proportions matter

Requirements:

  • Classification tasks only
  • Sufficient samples per class
  • Categorical target variable

Number of stratified folds. Guidelines: Typical values:

  • 5: Standard for most cases
  • 3: Quick evaluation/large datasets
  • 10: Detailed evaluation/smaller datasets

Considerations:

  • Must allow sufficient samples per class per fold
  • Balance between stability and computation time
  • Consider smallest class size when choosing

Seed for reproducible stratified splits. Ensures:

  • Consistent fold assignments
  • Reproducible results
  • Comparable experiments
  • Systematic validation

Fixed seed guarantees identical stratified splits.

false

Data shuffling before stratified splitting. Impact:

  • true: Randomizes while maintaining stratification
  • false: Maintains data order within strata

Use cases:

  • true: Independent observations
  • false: Grouped or sequential data

Class proportions maintained regardless of setting.

Random permutation cross-validator with independent sampling.

Characteristics:

  • Random sampling for each split
  • Independent train/test sets
  • More flexible than K-fold
  • Can have overlapping test sets

Advantages:

  • Control over test size
  • Fresh splits each iteration
  • Good for large datasets

Limitations:

  • Some samples might never be tested
  • Others might be tested multiple times
  • No guarantee of complete coverage

Number of random splits to perform. Consider: Common values:

  • 5: Standard evaluation
  • 10: More thorough assessment
  • 3: Quick estimates

Trade-offs:

  • More splits: Better estimation, longer runtime
  • Fewer splits: Faster, less stable estimates

Balance between computation and stability.

Random seed for reproducible shuffling. Controls:

  • Split randomization
  • Sample selection
  • Result reproducibility

Important for:

  • Debugging
  • Comparative studies
  • Result verification
0.2

Proportion of samples for test set. Guidelines: Common ratios:

  • 0.2: Standard (80/20 split)
  • 0.25: More validation emphasis
  • 0.1: More training data

Considerations:

  • Dataset size
  • Model complexity
  • Validation requirements

It must be between 0.0 and 1.0.

Stratified random permutation cross-validator combining shuffle-split with stratification.

Features:

  • Maintains class proportions
  • Random sampling within strata
  • Independent splits
  • Flexible test size

Ideal for:

  • Imbalanced datasets
  • Large-scale problems
  • When class distributions matter
  • Flexible validation schemes

Number of stratified random splits. Guidelines: Recommended values:

  • 5: Standard evaluation
  • 10: Detailed analysis
  • 3: Quick assessment

Consider:

  • Sample size per class
  • Computational resources
  • Stability requirements

Seed for reproducible stratified sampling. Ensures:

  • Consistent class proportions
  • Reproducible splits
  • Comparable experiments

Critical for:

  • Benchmarking
  • Research studies
  • Quality assurance
0.2

Fraction of samples for stratified test set. Best practices: Common splits:

  • 0.2: Balanced evaluation
  • 0.3: More thorough testing
  • 0.15: Preserve training size

Consider:

  • Minority class size
  • Overall dataset size
  • Validation objectives

It must be between 0.0 and 1.0.

Time Series cross-validator. Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. It is a variation of k-fold which returns first k folds as train set and the k + 1th fold as test set. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Also, it adds all surplus data to the first training partition, which is always used to train the model. Key features:

  • Maintains temporal dependence
  • Expanding window approach
  • Forward-chaining splits
  • No future data leakage

Use cases:

  • Sequential data
  • Financial forecasting
  • Temporal predictions
  • Time-dependent patterns

Note: Training sets are supersets of previous iterations.

Number of temporal splits. Considerations: Typical values:

  • 5: Standard forward chaining
  • 3: Limited historical data
  • 10: Long time series

Impact:

  • Affects training window growth
  • Determines validation points
  • Influences computational load

Maximum size of training set. Should be strictly less than the number of samples. Applications:

  • 0: Use all available past data
  • >0: Rolling window of fixed size

Use cases:

  • Limit historical relevance
  • Control computational cost
  • Handle concept drift
  • Memory constraints

Number of samples in each test set. When 0:

  • Auto-calculated as n_samples/(n_splits+1)
  • Ensures equal-sized test sets

Considerations:

  • Forecast horizon
  • Validation requirements
  • Available future data

Gap

u64
0

Number of samples to exclude from the end of each train set before the test set.Gap between train and test sets. Uses:

  • Avoid data leakage
  • Model forecast lag
  • Buffer periods

Common scenarios:

  • 0: Continuous prediction
  • >0: Forward gap for realistic evaluation
  • Match business forecasting needs