LogisticRegression / Classifier Layer

Logistic Regression Classifier - a fundamental classification algorithm that models probability of discrete outcomes. Similar to sklearn.linear_model.LogisticRegression.

Mathematical form: $P (y_{i} = 1∣ x_{i}) = \frac{1}{1 + e ^{- w^{T} x_{i}}}$ where $w$ are the model weights and $x_{i}$ is the feature vector.

Key characteristics:

Linear decision boundary
Probabilistic predictions
Binary and multiclass support
Multiple regularization options
Efficient for sparse data

Common applications:

Risk assessment (credit scoring, medical diagnosis)
Customer behavior prediction
Email spam detection
Binary text classification
Marketing response prediction
Quality control pass/fail

Outputs:

Predicted Table: Input data with added prediction columns
Validation Results: Cross-validation metrics on training data
Test Metric: Performance metrics on test set
ROC Curve Data: True/False positive rates for threshold tuning
Confusion Matrix: Detailed classification performance breakdown
Feature Importances: Coefficient magnitudes showing feature relevance

Note: Works best with preprocessed, standardized numerical features.

Table

Predicted Table

Validation Results

Test Metric

ROC Curve Data

Confusion Matrix

Feature Importances

SelectFeatures

[column, ...]

Feature columns for model training. Selection tips:

Include relevant predictive variables
Avoid highly correlated features
Consider feature scaling requirements
Balance between information and model complexity

Best practices:

Remove redundant features
Include domain-important variables
Consider feature interactions
Check for multicollinearity

If empty, uses all numeric columns except target.

SelectTarget

column

Target column for classification. Requirements:

Categorical or numeric labels
At least two unique classes
No missing values

Preprocessing tips:

Encode categorical labels
Check class balance
Consider label noise
Verify label consistency

Params

oneof

DefaultParams

Standard parameter configuration optimized for general-purpose classification tasks.

Default configuration:

L2 regularization (prevents overfitting)
LBFGS solver (memory-efficient quasi-Newton method)
Balanced class weights (handles imbalanced datasets)
C=0.1 (moderate regularization strength)
100 max iterations

Best suited for:

Medium-sized datasets
Relatively balanced classes
When feature scaling is applied
Initial model exploration

Configurable parameters for fine-tuning the logistic regression model. Allows detailed control over regularization, optimization, and convergence behavior. Essential for adapting the model to specific data characteristics and performance requirements.

SolverPenalty

enum

LbfgsL2

The combination of solver and penalty to use. Solver-penalty combinations for optimization. Each combination offers different trade-offs:

Performance characteristics:

Small datasets (n < 10k): Choose liblinear
Large datasets: Prefer sag or saga
Many features: newton-cg or lbfgs
Sparse data: saga with L1

Memory usage:

Low memory: liblinear, saga
Medium: lbfgs, newton-cg
High: newton-cholesky (quadratic in features)

Multiclass support:

Full support: newton-cg, sag, saga, lbfgs
Binary only: newton-cholesky
One-vs-rest only: liblinear

LbfgsNone ~

L-BFGS with no penalty

LbfgsL2 ~

L-BFGS with L2 penalty

LibLinearL1 ~

Liblinear with L1 penalty

LibLinearL2 ~

Liblinear with L2 penalty

NewtonCgNone ~

Newton-CG with no penalty

NewtonCgL2 ~

Newton-CG with L2 penalty

NewtonCholeskyNone ~

Newton-Cholesky with no penalty

NewtonCholeskyL2 ~

Newton-Cholesky with L2 penalty

SagNone ~

SAG with no penalty

SagL2 ~

SAG with L2 penalty

SagaNone ~

SAGA with no penalty

SagaL2 ~

SAGA with L2 penalty

SagaL1 ~

SAGA with L1 penalty

SagaElasticNet ~

SAGA with ElasticNet penalty

Dual

bool

false

Optimization formulation selection. Can be used only for liblinear solver with L2 penalty. Dual formulation is advantageous when n_samples < n_features. Primal (false) is preferred when n_samples > n_features. Automatically ignored if incompatible with solver.

Tolerance

f64

0.0001

Convergence criterion threshold. Controls optimization precision:

Smaller values: More precise but slower convergence
Larger values: Faster but potentially less optimal solution

Typical range: 1e-6 to 1e-3. Adjust based on precision needs.

CFactor

f64

0.1

Inverse regularization strength (C parameter). Controls model complexity:

Small values (<1): Stronger regularization, simpler model
Large values (>1): Weaker regularization, more complex model

Common ranges:

0.001 to 0.1: High regularization (noisy data)
0.1 to 1: Moderate regularization (typical)
1 to 10: Low regularization (clean data)

Must be positive. Scale often requires adjustment with dataset size.

FitIntercept

bool

true

Whether to include bias term (intercept). Effects:

true: Model can learn offset from origin (recommended)
false: Decision boundary through origin

Set false only when data is pre-centered or for theoretical analysis.

InterceptScaling

f64

Scaling factor for synthetic intercept feature. Only used with liblinear solver and fit_intercept=true. Effects:

Larger values: More emphasis on intercept fitting
Smaller values: Less emphasis on intercept

Useful for handling numerical stability in specific cases.

ClassWeights

enum

Balanced

Class weight adjustment strategy for handling imbalanced datasets:

Mathematical form:

Balanced: $w_{i} = \frac{n _{s am pl es}}{n _{c l a sses} \times n _{s am pl e s_{i}}}$
Uniform: $w_{i} = 1$ for all classes

Impact on model:

Affects class importance during training
Influences decision boundary placement
Controls misclassification penalties
Balances precision vs recall trade-off

Selection criteria:

Class distribution in data
Cost of different error types
Business/domain requirements
Performance metrics priorities

Uniform ~

Equal weights for all classes: $w_{i} = 1$

Characteristics:

No adjustment for class frequencies
Natural class proportions preserved
Faster training process
Original data distribution maintained

Best for:

Balanced datasets (similar class frequencies)
When natural proportions matter
Representative sampling
When all errors equally costly

Warning: May underperform on imbalanced data

Balanced ~

Weights inversely proportional to class frequencies:

Formula: $w_{i} = \frac{N}{K \times N _{i}}$

where:

$w_{i}$ is weight for class i
$N$ is total samples
$K$ is number of classes
$N_{i}$ is samples in class i

Properties:

Automatic weight adjustment
Compensates class imbalance
Balanced error contribution
Frequency-based weights

Best for:

Imbalanced datasets
Minority class importance
Skewed distributions
Fair classification needs

Note: May increase sensitivity to noise in rare classes

RandomState

u64

Random number generator seed. Used for:

Reproducible results with stochastic solvers (sag, saga, liblinear)
Consistent data shuffling
Benchmark comparisons

Same seed guarantees identical results across runs.

MaxIter

u64

100

Maximum number of solver iterations. Consider:

Increase if model warns about non-convergence
Typical ranges: • 100-500: Simple problems • 500-1000: Complex or large datasets • 1000+: Difficult convergence cases

Balance between convergence quality and computation time.

MultiClass

enum

Auto

Strategy for handling multiple classes. Affects both model architecture and training:

Auto: Automatically choose based on data and solver
Ovr (One-vs-Rest):
- Trains N binary classifiers
- Memory efficient
- Works with all solvers
- Good for imbalanced classes
Multinomial:
- Single model for all classes
- More accurate probability estimates
- Only with newton-cg, sag, saga, lbfgs
- Better when classes are balanced

Auto ~

Ovr ~

MultiNomial ~

L1Ratio

f64

-1

ElasticNet mixing parameter [0, 1]. Controls L1 vs L2 penalty mix:

0.0: Pure L2 regularization
1.0: Pure L1 regularization
0.0-1.0: Mix of both

Only used with elasticnet penalty. Any value other than [0-1] disables elasticnet.

GridSearch

Exhaustive search over specified parameter grid to find optimal model configuration. Similar to sklearn.model_selection.GridSearchCV.

Search process:

Tests all parameter combinations
Uses cross-validation for each combination
Selects best parameters based on scoring metric

Performance considerations:

Computation time grows exponentially with parameters
Memory usage depends on data size and cv folds
Consider RandomizedSearchCV for large parameter spaces

Best practices:

Start with broad parameter ranges
Refine ranges based on initial results
Monitor for overfitting across folds

Penalty

[enum, ...]

Regularization penalty type controlling model complexity and feature selection:

Selection impact:

Controls overfitting prevention
Influences feature selection
Affects model sparsity
Determines solution stability

Common use cases:

L2: General purpose, dense features
L1: Feature selection, sparse solutions
ElasticNet: Combined benefits of L1 and L2
None: When regularization not needed

Mathematical form: $L oss (w) + P e na lt y (w)$

None ~

No regularization penalty applied

Characteristics:

Uncontrolled model complexity
Maximum flexibility
Risk of overfitting
Full parameter range

Best for:

Very small datasets
Theoretical analysis
When bias undesirable
Testing/debugging purposes

Warning: Use with caution as it may lead to overfitting

L1 ~

L1 penalty (Lasso): $α \sum_{i = 1}^{n} ∣ w_{i} ∣$

Characteristics:

Absolute magnitude penalty
Produces sparse solutions
Feature selection capability
Path algorithms possible

Best for:

Feature selection
High-dimensional data
When sparse solutions desired
Eliminating irrelevant features

L2 ~

L2 penalty (Ridge): $α \sum_{i = 1}^{n} w_{i}^{2}$

Characteristics:

Squared magnitude penalty
Shrinks all weights toward zero
Handles correlated features well
Produces dense solutions

Best for:

Most classification tasks
When all features potentially relevant
Dealing with multicollinearity
Stable solutions needed

ElasticNet ~

ElasticNet penalty: $α ρ \sum_{i = 1}^{n} ∣ w_{i} ∣ + \frac{α ( 1 - ρ )}{2} \sum_{i = 1}^{n} w_{i}^{2}$

Characteristics:

Combines L1 and L2 penalties
Controls sparsity via ratio
Group selection capability
More stable than pure L1

Best for:

Correlated features
When both sparsity and stability needed
Group feature selection
Balanced regularization

Solver

[enum, ...]

Lbfgs

Optimization algorithm for minimizing the logistic regression cost function:

Mathematical objective: $w, c min \frac{1}{n _{s am pl es}} i = 1 \sum n_{s am pl es} lo g (1 + e^{- y_{i} (X_{i}^{T} w + c)}) + R (w)$

Selection criteria:

Dataset characteristics:
- Sample size (n_samples)
- Feature count (n_features)
- Sparsity level
Memory constraints:
- Low: liblinear, saga
- Medium: lbfgs, newton-cg
- High: newton-cholesky
Problem type:
- Binary classification
- Multiclass problems
- Different regularizations

Performance considerations:

Convergence speed
Memory usage
Numerical stability
Scalability

Lbfgs ~

Limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm:

Characteristics:

Quasi-Newton method
Memory efficient
Good convergence
Handles many features

Best for:

Medium to large datasets
High-dimensional data
When memory limited

Supports: L2 regularization, multinomial

LibLinear ~

Coordinate Descent algorithm from LIBLINEAR library:

Characteristics:

Highly optimized
Memory efficient
Fast for small datasets
Supports L1 regularization

Best for:

Small to medium datasets
Binary classification
Sparse features

Supports: L1/L2 regularization, one-vs-rest

NewtonCg ~

Newton-Conjugate Gradient algorithm:

Characteristics:

Second-order method
Accurate convergence
Handles large features
More iterations needed

Best for:

When accuracy critical
High-dimensional data
Multinomial problems

Supports: L2 regularization, multinomial

NewtonCholesky ~

Newton method using Cholesky decomposition:

Characteristics:

Direct second-order method
Very fast convergence
High memory usage
Numerically stable

Best for:

Small to medium features
Dense data
When memory available

Supports: L2 regularization, binary only

Sag ~

Stochastic Average Gradient descent:

Characteristics:

Fast convergence
Linear memory usage
Efficient for large samples
Smooth optimization

Best for:

Large datasets
Online learning
L2 regularization

Supports: L2 regularization, multinomial

Saga ~

SAGA (Stochastic Average Gradient descent variant):

Characteristics:

Unbiased SAG variant
Supports all penalties
Good convergence
Memory efficient

Best for:

Large datasets
Sparse data
L1 regularization

Supports: All regularizations, multinomial

CFactor

[f64, ...]

Regularization strength values to test. Recommended ranges:

Logarithmic scale: [0.001, 0.01, 0.1, 1, 10, 100]
Fine-tuning: [0.1, 0.2, 0.5, 1.0]

Smaller values = stronger regularization. Tips:

Start with broad range
Refine around best values
Consider dataset size in selection

Dual

[bool, ...]

false

Dual formulation options to test. Considerations:

[false]: Standard for n_samples > n_features
[true, false]: When optimal formulation unclear

Only relevant for liblinear solver with L2 penalty.

Tolerance

[f64, ...]

0.0001

Convergence tolerance values to test. Typical ranges:

Coarse: [1e-2, 1e-3, 1e-4]
Fine: [1e-4, 1e-5, 1e-6]

Trade-off:

Lower values: Better precision, slower convergence
Higher values: Faster training, less precise

InterceptScaling

f64

Synthetic feature scaling for intercept. Relevant when:

Using liblinear solver
fit_intercept is true

Typical values: 1.0 (default) to 100.0 Increase if model has trouble learning intercept.

ClassWeights

[enum, ...]

Balanced

Class weight adjustment strategy for handling imbalanced datasets:

Mathematical form:

Balanced: $w_{i} = \frac{n _{s am pl es}}{n _{c l a sses} \times n _{s am pl e s_{i}}}$
Uniform: $w_{i} = 1$ for all classes

Impact on model:

Affects class importance during training
Influences decision boundary placement
Controls misclassification penalties
Balances precision vs recall trade-off

Selection criteria:

Class distribution in data
Cost of different error types
Business/domain requirements
Performance metrics priorities

Uniform ~

Equal weights for all classes: $w_{i} = 1$

Characteristics:

No adjustment for class frequencies
Natural class proportions preserved
Faster training process
Original data distribution maintained

Best for:

Balanced datasets (similar class frequencies)
When natural proportions matter
Representative sampling
When all errors equally costly

Warning: May underperform on imbalanced data

Balanced ~

Weights inversely proportional to class frequencies:

Formula: $w_{i} = \frac{N}{K \times N _{i}}$

where:

$w_{i}$ is weight for class i
$N$ is total samples
$K$ is number of classes
$N_{i}$ is samples in class i

Properties:

Automatic weight adjustment
Compensates class imbalance
Balanced error contribution
Frequency-based weights

Best for:

Imbalanced datasets
Minority class importance
Skewed distributions
Fair classification needs

Note: May increase sensitivity to noise in rare classes

WarmStart

bool

false

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. Effects:

true: Faster for multiple similar fits
false: Fresh start each time

Useful for:

Path algorithms
Incremental learning

RandomState

u64

Random seed for reproducibility. Ensures:

Consistent cross-validation splits
Reproducible solver behavior
Comparable grid search results

Essential for research and debugging.Used when solver == sag, saga or liblinear to shuffle the data.

MaxIter

[u64, ...]

100

Maximum iterations for each fit. Common ranges:

Standard: [100, 200, 500]
Extended: [100, 500, 1000, 2000]

Tips:

Include larger values if seeing non-convergence
Consider computation budget
Monitor convergence warnings

MultiClass

[enum, ...]

Auto

Strategy for handling multiple classes. Affects both model architecture and training:

Auto: Automatically choose based on data and solver
Ovr (One-vs-Rest):
- Trains N binary classifiers
- Memory efficient
- Works with all solvers
- Good for imbalanced classes
Multinomial:
- Single model for all classes
- More accurate probability estimates
- Only with newton-cg, sag, saga, lbfgs
- Better when classes are balanced

Auto ~

Ovr ~

MultiNomial ~

L1Ratio

[f64, ...]

-1

ElasticNet mixing parameter values. Common patterns:

Broad search: [0.0, 0.25, 0.5, 0.75, 1.0]
Fine-tuning: [0.1, 0.2, 0.3, 0.4]

Only relevant when using elasticnet penalty. Any value other than [0-1] disables elasticnet.

RefitScore

enum

Accuracy

Metric for evaluating model performance during training and validation:

Default: Uses estimator's built-in scoring
Accuracy: Proportion of correct predictions
BalancedAccuracy: Arithmetic mean of recall for each class
LogLoss: Negative log-likelihood of true labels
RocAuc: Area under ROC curve (threshold-independent)

Choose based on:

Class balance (balanced_accuracy for imbalanced)
Need for probability calibration (log_loss)
Binary vs multiclass (roc_auc for binary)

Default ~

Accuracy ~

BalancedAccuracy ~

LogLoss ~

RocAuc ~

Split

oneof

DefaultSplit

Standard train-test split configuration optimized for general classification tasks.

Configuration:

Test size: 20% (0.2)
Random seed: 98
Shuffling: Enabled
Stratification: Based on target distribution

Advantages:

Preserves class distribution
Provides reliable validation
Suitable for most datasets

Best for:

Medium to large datasets
Independent observations
Initial model evaluation

Splitting uses the ShuffleSplit strategy or StratifiedShuffleSplit strategy depending on the field stratified. Note: If shuffle is false then stratified must be false.

CustomSplit

Configurable train-test split parameters for specialized requirements. Allows fine-tuning of data division strategy for specific use cases or constraints.

Use cases:

Time series data
Grouped observations
Specific train/test ratios
Custom validation schemes

RandomState

u64

Random seed for reproducible splits. Ensures:

Consistent train/test sets
Reproducible experiments
Comparable model evaluations

Same seed guarantees identical splits across runs.

Shuffle

bool

true

Data shuffling before splitting. Effects:

true: Randomizes order, better for i.i.d. data
false: Maintains order, important for time series

When to disable:

Time dependent data
Sequential patterns
Grouped observations

TrainSize

f64

0.8

Proportion of data for training. Considerations:

Larger (e.g., 0.8-0.9): Better model learning
Smaller (e.g., 0.5-0.7): Better validation

Common splits:

0.8: Standard (80/20 split)
0.7: More validation emphasis
0.9: More training emphasis

Stratified

bool

false

Maintain class distribution in splits. Important when:

Classes are imbalanced
Small classes present
Representative splits needed

Requirements:

Classification tasks only
Cannot use with shuffle=false
Sufficient samples per class

Cv

oneof

DefaultCv

Standard cross-validation configuration using stratified 3-fold splitting.

Configuration:

Folds: 3
Method: StratifiedKFold
Stratification: Preserves class proportions

Advantages:

Balanced evaluation
Reasonable computation time
Good for medium-sized datasets

Limitations:

May be insufficient for small datasets
Higher variance than larger fold counts
May miss some data patterns

CustomCv

Configurable stratified k-fold cross-validation for specific validation requirements.

Features:

Adjustable fold count with NFolds determining the number of splits.
Stratified sampling
Preserved class distributions

Use cases:

Small datasets (more folds)
Large datasets (fewer folds)
Detailed model evaluation
Robust performance estimation

NFolds

u32

Number of cross-validation folds. Guidelines:

3-5: Large datasets, faster training
5-10: Standard choice, good balance
10+: Small datasets, thorough evaluation

Trade-offs:

More folds: Better evaluation, slower training
Fewer folds: Faster training, higher variance

Must be at least 2.

KfoldCv

K-fold cross-validation without stratification. Divides data into k consecutive folds for iterative validation.

Process:

Splits data into k equal parts
Each fold serves as validation once
Remaining k-1 folds form training set

Use cases:

Regression problems
Large, balanced datasets
When stratification unnecessary
Continuous target variables

Limitations:

May not preserve class distributions
Less suitable for imbalanced data
Can create biased splits with ordered data

NSplits

u32

Number of folds for cross-validation. Selection guide: Recommended values:

5: Standard choice (default)
3: Large datasets/quick evaluation
10: Thorough evaluation/smaller datasets

Trade-offs:

Higher values: More thorough, computationally expensive
Lower values: Faster, potentially higher variance

Must be at least 2 for valid cross-validation.

RandomState

u64

Random seed for fold generation when shuffling. Important for:

Reproducible results
Consistent fold assignments
Benchmark comparisons
Debugging and validation

Set specific value for reproducibility across runs.

Shuffle

bool

true

Whether to shuffle data before splitting into folds. Effects:

true: Randomized fold composition (recommended)
false: Sequential splitting

Enable when:

Data may have ordering
Better fold independence needed

Disable for:

Time series data
Ordered observations

StratifiedKfoldCv

Stratified K-fold cross-validation maintaining class proportions across folds.

Key features:

Preserves class distribution in each fold
Handles imbalanced datasets
Ensures representative splits

Best for:

Classification problems
Imbalanced class distributions
When class proportions matter

Requirements:

Classification tasks only
Sufficient samples per class
Categorical target variable

NSplits

u32

Number of stratified folds. Guidelines: Typical values:

5: Standard for most cases
3: Quick evaluation/large datasets
10: Detailed evaluation/smaller datasets

Considerations:

Must allow sufficient samples per class per fold
Balance between stability and computation time
Consider smallest class size when choosing

RandomState

u64

Seed for reproducible stratified splits. Ensures:

Consistent fold assignments
Reproducible results
Comparable experiments
Systematic validation

Fixed seed guarantees identical stratified splits.

Shuffle

bool

false

Data shuffling before stratified splitting. Impact:

true: Randomizes while maintaining stratification
false: Maintains data order within strata

Use cases:

true: Independent observations
false: Grouped or sequential data

Class proportions maintained regardless of setting.

ShuffleSplitCv

Random permutation cross-validator with independent sampling.

Characteristics:

Random sampling for each split
Independent train/test sets
More flexible than K-fold
Can have overlapping test sets

Advantages:

Control over test size
Fresh splits each iteration
Good for large datasets

Limitations:

Some samples might never be tested
Others might be tested multiple times
No guarantee of complete coverage

NSplits

u32

Number of random splits to perform. Consider: Common values:

5: Standard evaluation
10: More thorough assessment
3: Quick estimates

Trade-offs:

More splits: Better estimation, longer runtime
Fewer splits: Faster, less stable estimates

Balance between computation and stability.

RandomState

u64

Random seed for reproducible shuffling. Controls:

Split randomization
Sample selection
Result reproducibility

Important for:

Debugging
Comparative studies
Result verification

TestSize

f64

0.2

Proportion of samples for test set. Guidelines: Common ratios:

0.2: Standard (80/20 split)
0.25: More validation emphasis
0.1: More training data

Considerations:

Dataset size
Model complexity
Validation requirements

It must be between 0.0 and 1.0.

StratifiedShuffleSplitCv

Stratified random permutation cross-validator combining shuffle-split with stratification.

Features:

Maintains class proportions
Random sampling within strata
Independent splits
Flexible test size

Ideal for:

Imbalanced datasets
Large-scale problems
When class distributions matter
Flexible validation schemes

NSplits

u32

Number of stratified random splits. Guidelines: Recommended values:

5: Standard evaluation
10: Detailed analysis
3: Quick assessment

Consider:

Sample size per class
Computational resources
Stability requirements

RandomState

u64

Seed for reproducible stratified sampling. Ensures:

Consistent class proportions
Reproducible splits
Comparable experiments

Critical for:

Benchmarking
Research studies
Quality assurance

TestSize

f64

0.2

Fraction of samples for stratified test set. Best practices: Common splits:

0.2: Balanced evaluation
0.3: More thorough testing
0.15: Preserve training size

Consider:

Minority class size
Overall dataset size
Validation objectives

It must be between 0.0 and 1.0.

TimeSeriesSplitCv

Time Series cross-validator. Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. It is a variation of k-fold which returns first k folds as train set and the k + 1th fold as test set. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Also, it adds all surplus data to the first training partition, which is always used to train the model. Key features:

Maintains temporal dependence
Expanding window approach
Forward-chaining splits
No future data leakage

Use cases:

Sequential data
Financial forecasting
Temporal predictions
Time-dependent patterns

Note: Training sets are supersets of previous iterations.

NSplits

u32

Number of temporal splits. Considerations: Typical values:

5: Standard forward chaining
3: Limited historical data
10: Long time series

Impact:

Affects training window growth
Determines validation points
Influences computational load

MaxTrainSize

u64

Maximum size of training set. Should be strictly less than the number of samples. Applications:

0: Use all available past data
>0: Rolling window of fixed size

Use cases:

Limit historical relevance
Control computational cost
Handle concept drift
Memory constraints

TestSize

u64

Number of samples in each test set. When 0:

Auto-calculated as n_samples/(n_splits+1)
Ensures equal-sized test sets

Considerations:

Forecast horizon
Validation requirements
Available future data

Gap

u64

Number of samples to exclude from the end of each train set before the test set.Gap between train and test sets. Uses:

Avoid data leakage
Model forecast lag
Buffer periods

Common scenarios:

0: Continuous prediction
>0: Forward gap for realistic evaluation
Match business forecasting needs