LogisticRegression / Classifier Layer
Logistic Regression Classifier - a fundamental classification algorithm that models probability of discrete outcomes. Similar to sklearn.linear_model.LogisticRegression.
Mathematical form: where are the model weights and is the feature vector.
Key characteristics:
- Linear decision boundary
- Probabilistic predictions
- Binary and multiclass support
- Multiple regularization options
- Efficient for sparse data
Common applications:
- Risk assessment (credit scoring, medical diagnosis)
- Customer behavior prediction
- Email spam detection
- Binary text classification
- Marketing response prediction
- Quality control pass/fail
Outputs:
- Predicted Table: Input data with added prediction columns
- Validation Results: Cross-validation metrics on training data
- Test Metric: Performance metrics on test set
- ROC Curve Data: True/False positive rates for threshold tuning
- Confusion Matrix: Detailed classification performance breakdown
- Feature Importances: Coefficient magnitudes showing feature relevance
Note: Works best with preprocessed, standardized numerical features.
SelectFeatures
[column, ...]Feature columns for model training. Selection tips:
- Include relevant predictive variables
- Avoid highly correlated features
- Consider feature scaling requirements
- Balance between information and model complexity
Best practices:
- Remove redundant features
- Include domain-important variables
- Consider feature interactions
- Check for multicollinearity
If empty, uses all numeric columns except target.
SelectTarget
columnTarget column for classification. Requirements:
- Categorical or numeric labels
- At least two unique classes
- No missing values
Preprocessing tips:
- Encode categorical labels
- Check class balance
- Consider label noise
- Verify label consistency
Params
oneofStandard parameter configuration optimized for general-purpose classification tasks.
Default configuration:
- L2 regularization (prevents overfitting)
- LBFGS solver (memory-efficient quasi-Newton method)
- Balanced class weights (handles imbalanced datasets)
- C=0.1 (moderate regularization strength)
- 100 max iterations
Best suited for:
- Medium-sized datasets
- Relatively balanced classes
- When feature scaling is applied
- Initial model exploration
Configurable parameters for fine-tuning the logistic regression model. Allows detailed control over regularization, optimization, and convergence behavior. Essential for adapting the model to specific data characteristics and performance requirements.
SolverPenalty
enumThe combination of solver and penalty to use. Solver-penalty combinations for optimization. Each combination offers different trade-offs:
Performance characteristics:
- Small datasets (n < 10k): Choose liblinear
- Large datasets: Prefer sag or saga
- Many features: newton-cg or lbfgs
- Sparse data: saga with L1
Memory usage:
- Low memory: liblinear, saga
- Medium: lbfgs, newton-cg
- High: newton-cholesky (quadratic in features)
Multiclass support:
- Full support: newton-cg, sag, saga, lbfgs
- Binary only: newton-cholesky
- One-vs-rest only: liblinear
L-BFGS with no penalty
L-BFGS with L2 penalty
Liblinear with L1 penalty
Liblinear with L2 penalty
Newton-CG with no penalty
Newton-CG with L2 penalty
Newton-Cholesky with no penalty
Newton-Cholesky with L2 penalty
SAG with no penalty
SAG with L2 penalty
SAGA with no penalty
SAGA with L2 penalty
SAGA with L1 penalty
SAGA with ElasticNet penalty
Dual
boolOptimization formulation selection. Can be used only for liblinear solver with L2 penalty. Dual formulation is advantageous when n_samples < n_features. Primal (false) is preferred when n_samples > n_features. Automatically ignored if incompatible with solver.
Tolerance
f64Convergence criterion threshold. Controls optimization precision:
- Smaller values: More precise but slower convergence
- Larger values: Faster but potentially less optimal solution
Typical range: 1e-6 to 1e-3. Adjust based on precision needs.
CFactor
f64Inverse regularization strength (C parameter). Controls model complexity:
- Small values (<1): Stronger regularization, simpler model
- Large values (>1): Weaker regularization, more complex model
Common ranges:
- 0.001 to 0.1: High regularization (noisy data)
- 0.1 to 1: Moderate regularization (typical)
- 1 to 10: Low regularization (clean data)
Must be positive. Scale often requires adjustment with dataset size.
FitIntercept
boolWhether to include bias term (intercept). Effects:
- true: Model can learn offset from origin (recommended)
- false: Decision boundary through origin
Set false only when data is pre-centered or for theoretical analysis.
Scaling factor for synthetic intercept feature. Only used with liblinear solver and fit_intercept=true. Effects:
- Larger values: More emphasis on intercept fitting
- Smaller values: Less emphasis on intercept
Useful for handling numerical stability in specific cases.
ClassWeights
enumClass weight adjustment strategy for handling imbalanced datasets:
Mathematical form:
- Balanced:
- Uniform: for all classes
Impact on model:
- Affects class importance during training
- Influences decision boundary placement
- Controls misclassification penalties
- Balances precision vs recall trade-off
Selection criteria:
- Class distribution in data
- Cost of different error types
- Business/domain requirements
- Performance metrics priorities
Equal weights for all classes:
Characteristics:
- No adjustment for class frequencies
- Natural class proportions preserved
- Faster training process
- Original data distribution maintained
Best for:
- Balanced datasets (similar class frequencies)
- When natural proportions matter
- Representative sampling
- When all errors equally costly
Warning: May underperform on imbalanced data
Weights inversely proportional to class frequencies:
Formula:
where:
- is weight for class i
- is total samples
- is number of classes
- is samples in class i
Properties:
- Automatic weight adjustment
- Compensates class imbalance
- Balanced error contribution
- Frequency-based weights
Best for:
- Imbalanced datasets
- Minority class importance
- Skewed distributions
- Fair classification needs
Note: May increase sensitivity to noise in rare classes
RandomState
u64Random number generator seed. Used for:
- Reproducible results with stochastic solvers (sag, saga, liblinear)
- Consistent data shuffling
- Benchmark comparisons
Same seed guarantees identical results across runs.
MaxIter
u64Maximum number of solver iterations. Consider:
- Increase if model warns about non-convergence
- Typical ranges: • 100-500: Simple problems • 500-1000: Complex or large datasets • 1000+: Difficult convergence cases
Balance between convergence quality and computation time.
MultiClass
enumStrategy for handling multiple classes. Affects both model architecture and training:
- Auto: Automatically choose based on data and solver
- Ovr (One-vs-Rest):
- Trains N binary classifiers
- Memory efficient
- Works with all solvers
- Good for imbalanced classes
- Multinomial:
- Single model for all classes
- More accurate probability estimates
- Only with newton-cg, sag, saga, lbfgs
- Better when classes are balanced
L1Ratio
f64ElasticNet mixing parameter [0, 1]. Controls L1 vs L2 penalty mix:
- 0.0: Pure L2 regularization
- 1.0: Pure L1 regularization
- 0.0-1.0: Mix of both
Only used with elasticnet penalty. Any value other than [0-1] disables elasticnet.
Exhaustive search over specified parameter grid to find optimal model configuration. Similar to sklearn.model_selection.GridSearchCV.
Search process:
- Tests all parameter combinations
- Uses cross-validation for each combination
- Selects best parameters based on scoring metric
Performance considerations:
- Computation time grows exponentially with parameters
- Memory usage depends on data size and cv folds
- Consider RandomizedSearchCV for large parameter spaces
Best practices:
- Start with broad parameter ranges
- Refine ranges based on initial results
- Monitor for overfitting across folds
Penalty
[enum, ...]Regularization penalty type controlling model complexity and feature selection:
Selection impact:
- Controls overfitting prevention
- Influences feature selection
- Affects model sparsity
- Determines solution stability
Common use cases:
- L2: General purpose, dense features
- L1: Feature selection, sparse solutions
- ElasticNet: Combined benefits of L1 and L2
- None: When regularization not needed
Mathematical form:
No regularization penalty applied
Characteristics:
- Uncontrolled model complexity
- Maximum flexibility
- Risk of overfitting
- Full parameter range
Best for:
- Very small datasets
- Theoretical analysis
- When bias undesirable
- Testing/debugging purposes
Warning: Use with caution as it may lead to overfitting
L1 penalty (Lasso):
Characteristics:
- Absolute magnitude penalty
- Produces sparse solutions
- Feature selection capability
- Path algorithms possible
Best for:
- Feature selection
- High-dimensional data
- When sparse solutions desired
- Eliminating irrelevant features
L2 penalty (Ridge):
Characteristics:
- Squared magnitude penalty
- Shrinks all weights toward zero
- Handles correlated features well
- Produces dense solutions
Best for:
- Most classification tasks
- When all features potentially relevant
- Dealing with multicollinearity
- Stable solutions needed
ElasticNet penalty:
Characteristics:
- Combines L1 and L2 penalties
- Controls sparsity via ratio
- Group selection capability
- More stable than pure L1
Best for:
- Correlated features
- When both sparsity and stability needed
- Group feature selection
- Balanced regularization
Solver
[enum, ...]Optimization algorithm for minimizing the logistic regression cost function:
Mathematical objective:
Selection criteria:
-
Dataset characteristics:
- Sample size (n_samples)
- Feature count (n_features)
- Sparsity level
-
Memory constraints:
- Low: liblinear, saga
- Medium: lbfgs, newton-cg
- High: newton-cholesky
-
Problem type:
- Binary classification
- Multiclass problems
- Different regularizations
Performance considerations:
- Convergence speed
- Memory usage
- Numerical stability
- Scalability
Limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm:
Characteristics:
- Quasi-Newton method
- Memory efficient
- Good convergence
- Handles many features
Best for:
- Medium to large datasets
- High-dimensional data
- When memory limited
Supports: L2 regularization, multinomial
Coordinate Descent algorithm from LIBLINEAR library:
Characteristics:
- Highly optimized
- Memory efficient
- Fast for small datasets
- Supports L1 regularization
Best for:
- Small to medium datasets
- Binary classification
- Sparse features
Supports: L1/L2 regularization, one-vs-rest
Newton-Conjugate Gradient algorithm:
Characteristics:
- Second-order method
- Accurate convergence
- Handles large features
- More iterations needed
Best for:
- When accuracy critical
- High-dimensional data
- Multinomial problems
Supports: L2 regularization, multinomial
Newton method using Cholesky decomposition:
Characteristics:
- Direct second-order method
- Very fast convergence
- High memory usage
- Numerically stable
Best for:
- Small to medium features
- Dense data
- When memory available
Supports: L2 regularization, binary only
Stochastic Average Gradient descent:
Characteristics:
- Fast convergence
- Linear memory usage
- Efficient for large samples
- Smooth optimization
Best for:
- Large datasets
- Online learning
- L2 regularization
Supports: L2 regularization, multinomial
SAGA (Stochastic Average Gradient descent variant):
Characteristics:
- Unbiased SAG variant
- Supports all penalties
- Good convergence
- Memory efficient
Best for:
- Large datasets
- Sparse data
- L1 regularization
Supports: All regularizations, multinomial
CFactor
[f64, ...]Regularization strength values to test. Recommended ranges:
- Logarithmic scale: [0.001, 0.01, 0.1, 1, 10, 100]
- Fine-tuning: [0.1, 0.2, 0.5, 1.0]
Smaller values = stronger regularization. Tips:
- Start with broad range
- Refine around best values
- Consider dataset size in selection
Dual
[bool, ...]Dual formulation options to test. Considerations:
- [false]: Standard for n_samples > n_features
- [true, false]: When optimal formulation unclear
Only relevant for liblinear solver with L2 penalty.
Tolerance
[f64, ...]Convergence tolerance values to test. Typical ranges:
- Coarse: [1e-2, 1e-3, 1e-4]
- Fine: [1e-4, 1e-5, 1e-6]
Trade-off:
- Lower values: Better precision, slower convergence
- Higher values: Faster training, less precise
Synthetic feature scaling for intercept. Relevant when:
- Using liblinear solver
- fit_intercept is true
Typical values: 1.0 (default) to 100.0 Increase if model has trouble learning intercept.
ClassWeights
[enum, ...]Class weight adjustment strategy for handling imbalanced datasets:
Mathematical form:
- Balanced:
- Uniform: for all classes
Impact on model:
- Affects class importance during training
- Influences decision boundary placement
- Controls misclassification penalties
- Balances precision vs recall trade-off
Selection criteria:
- Class distribution in data
- Cost of different error types
- Business/domain requirements
- Performance metrics priorities
Equal weights for all classes:
Characteristics:
- No adjustment for class frequencies
- Natural class proportions preserved
- Faster training process
- Original data distribution maintained
Best for:
- Balanced datasets (similar class frequencies)
- When natural proportions matter
- Representative sampling
- When all errors equally costly
Warning: May underperform on imbalanced data
Weights inversely proportional to class frequencies:
Formula:
where:
- is weight for class i
- is total samples
- is number of classes
- is samples in class i
Properties:
- Automatic weight adjustment
- Compensates class imbalance
- Balanced error contribution
- Frequency-based weights
Best for:
- Imbalanced datasets
- Minority class importance
- Skewed distributions
- Fair classification needs
Note: May increase sensitivity to noise in rare classes
WarmStart
boolWhen set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. Effects:
- true: Faster for multiple similar fits
- false: Fresh start each time
Useful for:
- Path algorithms
- Incremental learning
RandomState
u64Random seed for reproducibility. Ensures:
- Consistent cross-validation splits
- Reproducible solver behavior
- Comparable grid search results
Essential for research and debugging.Used when solver == sag
, saga
or liblinear
to shuffle the data.
MaxIter
[u64, ...]Maximum iterations for each fit. Common ranges:
- Standard: [100, 200, 500]
- Extended: [100, 500, 1000, 2000]
Tips:
- Include larger values if seeing non-convergence
- Consider computation budget
- Monitor convergence warnings
MultiClass
[enum, ...]Strategy for handling multiple classes. Affects both model architecture and training:
- Auto: Automatically choose based on data and solver
- Ovr (One-vs-Rest):
- Trains N binary classifiers
- Memory efficient
- Works with all solvers
- Good for imbalanced classes
- Multinomial:
- Single model for all classes
- More accurate probability estimates
- Only with newton-cg, sag, saga, lbfgs
- Better when classes are balanced
L1Ratio
[f64, ...]ElasticNet mixing parameter values. Common patterns:
- Broad search: [0.0, 0.25, 0.5, 0.75, 1.0]
- Fine-tuning: [0.1, 0.2, 0.3, 0.4]
Only relevant when using elasticnet penalty. Any value other than [0-1] disables elasticnet.
RefitScore
enumMetric for evaluating model performance during training and validation:
- Default: Uses estimator's built-in scoring
- Accuracy: Proportion of correct predictions
- BalancedAccuracy: Arithmetic mean of recall for each class
- LogLoss: Negative log-likelihood of true labels
- RocAuc: Area under ROC curve (threshold-independent)
Choose based on:
- Class balance (balanced_accuracy for imbalanced)
- Need for probability calibration (log_loss)
- Binary vs multiclass (roc_auc for binary)
Split
oneofStandard train-test split configuration optimized for general classification tasks.
Configuration:
- Test size: 20% (0.2)
- Random seed: 98
- Shuffling: Enabled
- Stratification: Based on target distribution
Advantages:
- Preserves class distribution
- Provides reliable validation
- Suitable for most datasets
Best for:
- Medium to large datasets
- Independent observations
- Initial model evaluation
Splitting uses the ShuffleSplit strategy or StratifiedShuffleSplit strategy depending on the field stratified
. Note: If shuffle is false then stratified must be false.
Configurable train-test split parameters for specialized requirements. Allows fine-tuning of data division strategy for specific use cases or constraints.
Use cases:
- Time series data
- Grouped observations
- Specific train/test ratios
- Custom validation schemes
RandomState
u64Random seed for reproducible splits. Ensures:
- Consistent train/test sets
- Reproducible experiments
- Comparable model evaluations
Same seed guarantees identical splits across runs.
Shuffle
boolData shuffling before splitting. Effects:
- true: Randomizes order, better for i.i.d. data
- false: Maintains order, important for time series
When to disable:
- Time dependent data
- Sequential patterns
- Grouped observations
TrainSize
f64Proportion of data for training. Considerations:
- Larger (e.g., 0.8-0.9): Better model learning
- Smaller (e.g., 0.5-0.7): Better validation
Common splits:
- 0.8: Standard (80/20 split)
- 0.7: More validation emphasis
- 0.9: More training emphasis
Stratified
boolMaintain class distribution in splits. Important when:
- Classes are imbalanced
- Small classes present
- Representative splits needed
Requirements:
- Classification tasks only
- Cannot use with shuffle=false
- Sufficient samples per class
Cv
oneofStandard cross-validation configuration using stratified 3-fold splitting.
Configuration:
- Folds: 3
- Method: StratifiedKFold
- Stratification: Preserves class proportions
Advantages:
- Balanced evaluation
- Reasonable computation time
- Good for medium-sized datasets
Limitations:
- May be insufficient for small datasets
- Higher variance than larger fold counts
- May miss some data patterns
Configurable stratified k-fold cross-validation for specific validation requirements.
Features:
- Adjustable fold count with
NFolds
determining the number of splits. - Stratified sampling
- Preserved class distributions
Use cases:
- Small datasets (more folds)
- Large datasets (fewer folds)
- Detailed model evaluation
- Robust performance estimation
NFolds
u32Number of cross-validation folds. Guidelines:
- 3-5: Large datasets, faster training
- 5-10: Standard choice, good balance
- 10+: Small datasets, thorough evaluation
Trade-offs:
- More folds: Better evaluation, slower training
- Fewer folds: Faster training, higher variance
Must be at least 2.
K-fold cross-validation without stratification. Divides data into k consecutive folds for iterative validation.
Process:
- Splits data into k equal parts
- Each fold serves as validation once
- Remaining k-1 folds form training set
Use cases:
- Regression problems
- Large, balanced datasets
- When stratification unnecessary
- Continuous target variables
Limitations:
- May not preserve class distributions
- Less suitable for imbalanced data
- Can create biased splits with ordered data
NSplits
u32Number of folds for cross-validation. Selection guide: Recommended values:
- 5: Standard choice (default)
- 3: Large datasets/quick evaluation
- 10: Thorough evaluation/smaller datasets
Trade-offs:
- Higher values: More thorough, computationally expensive
- Lower values: Faster, potentially higher variance
Must be at least 2 for valid cross-validation.
RandomState
u64Random seed for fold generation when shuffling. Important for:
- Reproducible results
- Consistent fold assignments
- Benchmark comparisons
- Debugging and validation
Set specific value for reproducibility across runs.
Shuffle
boolWhether to shuffle data before splitting into folds. Effects:
- true: Randomized fold composition (recommended)
- false: Sequential splitting
Enable when:
- Data may have ordering
- Better fold independence needed
Disable for:
- Time series data
- Ordered observations
Stratified K-fold cross-validation maintaining class proportions across folds.
Key features:
- Preserves class distribution in each fold
- Handles imbalanced datasets
- Ensures representative splits
Best for:
- Classification problems
- Imbalanced class distributions
- When class proportions matter
Requirements:
- Classification tasks only
- Sufficient samples per class
- Categorical target variable
NSplits
u32Number of stratified folds. Guidelines: Typical values:
- 5: Standard for most cases
- 3: Quick evaluation/large datasets
- 10: Detailed evaluation/smaller datasets
Considerations:
- Must allow sufficient samples per class per fold
- Balance between stability and computation time
- Consider smallest class size when choosing
RandomState
u64Seed for reproducible stratified splits. Ensures:
- Consistent fold assignments
- Reproducible results
- Comparable experiments
- Systematic validation
Fixed seed guarantees identical stratified splits.
Shuffle
boolData shuffling before stratified splitting. Impact:
- true: Randomizes while maintaining stratification
- false: Maintains data order within strata
Use cases:
- true: Independent observations
- false: Grouped or sequential data
Class proportions maintained regardless of setting.
Random permutation cross-validator with independent sampling.
Characteristics:
- Random sampling for each split
- Independent train/test sets
- More flexible than K-fold
- Can have overlapping test sets
Advantages:
- Control over test size
- Fresh splits each iteration
- Good for large datasets
Limitations:
- Some samples might never be tested
- Others might be tested multiple times
- No guarantee of complete coverage
NSplits
u32Number of random splits to perform. Consider: Common values:
- 5: Standard evaluation
- 10: More thorough assessment
- 3: Quick estimates
Trade-offs:
- More splits: Better estimation, longer runtime
- Fewer splits: Faster, less stable estimates
Balance between computation and stability.
RandomState
u64Random seed for reproducible shuffling. Controls:
- Split randomization
- Sample selection
- Result reproducibility
Important for:
- Debugging
- Comparative studies
- Result verification
TestSize
f64Proportion of samples for test set. Guidelines: Common ratios:
- 0.2: Standard (80/20 split)
- 0.25: More validation emphasis
- 0.1: More training data
Considerations:
- Dataset size
- Model complexity
- Validation requirements
It must be between 0.0 and 1.0.
Stratified random permutation cross-validator combining shuffle-split with stratification.
Features:
- Maintains class proportions
- Random sampling within strata
- Independent splits
- Flexible test size
Ideal for:
- Imbalanced datasets
- Large-scale problems
- When class distributions matter
- Flexible validation schemes
NSplits
u32Number of stratified random splits. Guidelines: Recommended values:
- 5: Standard evaluation
- 10: Detailed analysis
- 3: Quick assessment
Consider:
- Sample size per class
- Computational resources
- Stability requirements
RandomState
u64Seed for reproducible stratified sampling. Ensures:
- Consistent class proportions
- Reproducible splits
- Comparable experiments
Critical for:
- Benchmarking
- Research studies
- Quality assurance
TestSize
f64Fraction of samples for stratified test set. Best practices: Common splits:
- 0.2: Balanced evaluation
- 0.3: More thorough testing
- 0.15: Preserve training size
Consider:
- Minority class size
- Overall dataset size
- Validation objectives
It must be between 0.0 and 1.0.
Time Series cross-validator. Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. It is a variation of k-fold which returns first k
folds as train set and the k + 1
th fold as test set. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Also, it adds all surplus data to the first training partition, which is always used to train the model.
Key features:
- Maintains temporal dependence
- Expanding window approach
- Forward-chaining splits
- No future data leakage
Use cases:
- Sequential data
- Financial forecasting
- Temporal predictions
- Time-dependent patterns
Note: Training sets are supersets of previous iterations.
NSplits
u32Number of temporal splits. Considerations: Typical values:
- 5: Standard forward chaining
- 3: Limited historical data
- 10: Long time series
Impact:
- Affects training window growth
- Determines validation points
- Influences computational load
MaxTrainSize
u64Maximum size of training set. Should be strictly less than the number of samples. Applications:
- 0: Use all available past data
- >0: Rolling window of fixed size
Use cases:
- Limit historical relevance
- Control computational cost
- Handle concept drift
- Memory constraints
TestSize
u64Number of samples in each test set. When 0:
- Auto-calculated as n_samples/(n_splits+1)
- Ensures equal-sized test sets
Considerations:
- Forecast horizon
- Validation requirements
- Available future data
Gap
u64Number of samples to exclude from the end of each train set before the test set.Gap between train and test sets. Uses:
- Avoid data leakage
- Model forecast lag
- Buffer periods
Common scenarios:
- 0: Continuous prediction
- >0: Forward gap for realistic evaluation
- Match business forecasting needs