PassiveAggressive / Classifier Layer

Passive-Aggressive Classifier for online learning. Similar to sklearn.linear_model.PassiveAggressiveClassifier.

Mathematical form: where is the step size determined by the loss and regularization.

Key characteristics:

  • Online learning capability
  • No learning rate parameter
  • Adaptive margin violations
  • Linear decision boundary
  • Memory efficient

Common applications:

  • Large-scale learning
  • Stream data classification
  • Text classification
  • Online prediction systems
  • Real-time learning

Outputs:

  1. Predicted Table: Predictions added to input
  2. Validation Results: Cross-validation metrics
  3. Test Metric: Test set performance
  4. ROC Curve Data: ROC analysis data
  5. Confusion Matrix: Classification breakdown
  6. Feature Importances: Feature coefficients

Note: Particularly effective for online learning and large datasets.

Table
0
0
Predicted Table
1
Validation Results
2
Test Metric
3
ROC Curve Data
4
Confusion Matrix
5
Feature Importances

SelectFeatures

[column, ...]

Feature columns for classification. Selection guidance:

  • Choose relevant predictive features
  • Consider feature interactions
  • Avoid redundant variables
  • Handle missing values first

Best practices:

  • Scale numerical features
  • Encode categorical variables
  • Remove highly correlated features
  • Consider feature importance

If empty, uses all numeric columns except target.

Target column for classification. Requirements:

  • Must be categorical labels
  • No missing values
  • At least two classes

Preprocessing steps:

  • Encode categorical labels
  • Check class balance
  • Handle missing values
  • Verify label quality

Params

oneof
DefaultParams

Standard configuration optimized for general use cases:

Default settings:

  • C=1.0: Moderate regularization
  • Hinge loss: Standard linear penalty
  • Early stopping: Disabled
  • Max iterations: 1000
  • Random State: 98
  • Tolerance: 0.01
  • Balanced weights: Handle class imbalance

Best for:

  • Initial model exploration
  • Balanced datasets
  • Standard classification tasks
  • Quick prototyping

Configurable parameters for fine-tuning model behavior. Allows detailed control over:

  • Learning process
  • Convergence criteria
  • Class handling
  • Model complexity

Essential for optimizing performance for specific use cases.

Loss

enum
Hinge

Loss function determining how the model penalizes misclassifications:

Characteristics:

  • Hinge: Linear penalty, similar to SVM
  • SquaredHinge: Quadratic penalty, smoother

Selection guidelines:

  • Hinge: More robust to outliers
  • SquaredHinge: Stronger penalties for large violations

Impact on learning:

  • Affects update step size
  • Influences convergence behavior
  • Changes margin sensitivity
Hinge ~

Standard hinge loss (L1): max(0, 1 - y*f(x)). Characteristics:

  • Linear penalty for violations
  • Sparse solutions
  • Similar to SVM loss
  • Better for noisy data
SquaredHinge ~

Squared hinge loss (L2): max(0, 1 - y*f(x))². Features:

  • Quadratic penalty
  • Smoother gradients
  • More sensitive to outliers
  • Often faster convergence

Aggressiveness parameter (C). Controls update step size:

  • Smaller values: More conservative updates
  • Larger values: More aggressive updates Range guide:
  • 0.1: Very conservative
  • 1.0: Balanced (default)
  • 10.0: Very aggressive
true

Whether to calculate the intercept term:

  • true: Learn bias term (recommended)
  • false: Assume centered data

Set false only when data is pre-centered or for testing.

0.0001

Convergence criterion threshold:

  • Smaller: More precise, slower convergence
  • Larger: Faster, less precise

Typical range: 1e-5 to 1e-3. The iterations will stop when (loss > previous_loss - tol)

Balanced

If Uniform, all classes are supposed to have weight one. If “balanced”, class weights will be given by n_samples / (n_classes * np.bincount(y)). If a dictionary is given, keys are classes and values are corresponding class weights.

Impact:

  • Affects model's sensitivity to different classes
  • Controls class bias in learning
  • Helps with imbalanced datasets

Selection criteria:

  • Data balance
  • Class importance
  • Error costs
Uniform ~

Equal weights for all classes. Use when:

  • Classes are naturally balanced
  • Equal importance of classes
  • Default behavior needed
Balanced ~

Weights inversely proportional to class frequencies. Ideal for:

  • Imbalanced datasets
  • When minority classes matter
  • Equalizing class influence
true

Whether or not the training data should be shuffled after each epoch.

  • true: Better convergence, randomized updates
  • false: Deterministic, ordered updates

Enable for:

  • Better generalization
  • Avoiding local optima
  • Independent sample updates
false

Weight averaging for final model:

  • true: More stable predictions, averaged weights
  • false: Last iteration weights

Benefits of true:

  • Reduces variance
  • Better generalization
  • More robust predictions
true

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

  • true: Continue previous training
  • false: Fresh start each time

Use true for:

  • Incremental learning
  • Fine-tuning models
  • Transfer learning scenarios
100

The maximum number of passes over the training data (aka epochs). Guidelines:

  • 100: Quick problems
  • 500-1000: Complex problems
  • 1000+: Difficult convergence

Consider:

  • Dataset size
  • Problem complexity
  • Convergence behavior

Random number generator seed. Important for:

  • Reproducible training
  • Consistent shuffling
  • Benchmark comparisons

Set specific value for reproducibility.

false

Enable validation-based early stopping:

  • true: Stop when validation score plateaus
  • false: Run for max_iter epochs

Benefits:

  • Prevents overfitting
  • Reduces training time
  • Automatic stopping criterion

Proportion of training data for validation: Typical values:

  • 0.1: Standard (10% validation)
  • 0.2: More validation emphasis
  • 0.15: Balanced split

Only used when early_stopping=true

Early stopping patience parameter:

  • Higher: More chances to improve
  • Lower: Earlier stopping

Common values:

  • 5: Quick stopping
  • 10: More patience
  • 20: Very patient

Exhaustive hyperparameter optimization through grid search.

Process:

  • Tests all parameter combinations
  • Uses cross-validation
  • Selects best parameters

Considerations:

  • Computational cost grows exponentially
  • Memory usage scales with parameters
  • Balance between coverage and efficiency

Loss

[enum, ...]
Hinge

Loss function determining how the model penalizes misclassifications:

Characteristics:

  • Hinge: Linear penalty, similar to SVM
  • SquaredHinge: Quadratic penalty, smoother

Selection guidelines:

  • Hinge: More robust to outliers
  • SquaredHinge: Stronger penalties for large violations

Impact on learning:

  • Affects update step size
  • Influences convergence behavior
  • Changes margin sensitivity
Hinge ~

Standard hinge loss (L1): max(0, 1 - y*f(x)). Characteristics:

  • Linear penalty for violations
  • Sparse solutions
  • Similar to SVM loss
  • Better for noisy data
SquaredHinge ~

Squared hinge loss (L2): max(0, 1 - y*f(x))². Features:

  • Quadratic penalty
  • Smoother gradients
  • More sensitive to outliers
  • Often faster convergence

CFactor

[f64, ...]
1

Aggressiveness parameters to test. Recommended ranges:

  • Logarithmic scale: [0.1, 1.0, 10.0]
  • Fine-tuning: [0.8, 1.0, 1.2]
  • Wide search: [0.01, 0.1, 1.0, 10.0, 100.0]

FitIntercept

[bool, ...]
true

Whether to fit intercept. Options:

  • [true]: Standard approach
  • [true, false]: Compare both

Usually keep default [true] unless data is centered

Tolerance

[f64, ...]
0.0001

Convergence tolerances to test. Common ranges:

  • Fine tolerance: [1e-5, 1e-4, 1e-3]
  • Coarse search: [1e-4, 1e-3, 1e-2]

Balance precision vs. speed

ClassWeights

[enum, ...]
Balanced

If Uniform, all classes are supposed to have weight one. If “balanced”, class weights will be given by n_samples / (n_classes * np.bincount(y)). If a dictionary is given, keys are classes and values are corresponding class weights.

Impact:

  • Affects model's sensitivity to different classes
  • Controls class bias in learning
  • Helps with imbalanced datasets

Selection criteria:

  • Data balance
  • Class importance
  • Error costs
Uniform ~

Equal weights for all classes. Use when:

  • Classes are naturally balanced
  • Equal importance of classes
  • Default behavior needed
Balanced ~

Weights inversely proportional to class frequencies. Ideal for:

  • Imbalanced datasets
  • When minority classes matter
  • Equalizing class influence

Shuffle

[bool, ...]
true

Data shuffling options:

  • [true]: Recommended for most cases
  • [false]: For ordered data
  • [true, false]: Compare impact
false

Weight averaging setting:

  • true: Use averaged weights
  • false: Use final weights

Consider true for more stable models

true

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • true: Reuse previous solutions
  • false: Fresh start each time

Affects optimization efficiency

MaxIter

[u64, ...]
100

Maximum iterations to test. Common ranges:

  • Basic: [100, 500, 1000]
  • Extended: [100, 500, 1000, 2000]
  • Complex: [1000, 2000, 5000]

Random seed for reproducibility. Controls:

  • Training data shuffling
  • Cross-validation splits
  • Parameter search reproducibility

Important for:

  • Benchmarking
  • Research experiments
  • Debugging issues
false

Early stopping flag. Configuration:

  • true: Use validation-based stopping
  • false: Run full iterations

Enable when:

  • Training time is critical
  • Overfitting is a concern
  • Quick model evaluation needed

Validation set size for early stopping. Guidelines: Common splits:

  • 0.1: Standard validation size
  • 0.15: More validation emphasis
  • 0.2: Large validation set

Consider:

  • Dataset size
  • Validation stability needs
  • Training data requirements

Patience for early stopping. Impact:

  • Smaller values: Faster stopping, might be premature
  • Larger values: More chances to improve

Typical settings:

  • 5: Quick stopping
  • 10: Standard patience
  • 20: Extended training opportunity
Accuracy

Metric for evaluating model performance:

Selection guidelines:

  • Default: Use model's built-in scoring
  • Accuracy: When classes are balanced
  • BalancedAccuracy: For imbalanced datasets

Impact:

  • Affects model selection in CV
  • Guides optimization
  • Determines best parameters
Default ~
Accuracy ~
BalancedAccuracy ~

Split

oneof
DefaultSplit

Standard train-test split configuration optimized for general classification tasks.

Configuration:

  • Test size: 20% (0.2)
  • Random seed: 98
  • Shuffling: Enabled
  • Stratification: Based on target distribution

Advantages:

  • Preserves class distribution
  • Provides reliable validation
  • Suitable for most datasets

Best for:

  • Medium to large datasets
  • Independent observations
  • Initial model evaluation

Splitting uses the ShuffleSplit strategy or StratifiedShuffleSplit strategy depending on the field stratified. Note: If shuffle is false then stratified must be false.

Configurable train-test split parameters for specialized requirements. Allows fine-tuning of data division strategy for specific use cases or constraints.

Use cases:

  • Time series data
  • Grouped observations
  • Specific train/test ratios
  • Custom validation schemes

Random seed for reproducible splits. Ensures:

  • Consistent train/test sets
  • Reproducible experiments
  • Comparable model evaluations

Same seed guarantees identical splits across runs.

true

Data shuffling before splitting. Effects:

  • true: Randomizes order, better for i.i.d. data
  • false: Maintains order, important for time series

When to disable:

  • Time dependent data
  • Sequential patterns
  • Grouped observations
0.8

Proportion of data for training. Considerations:

  • Larger (e.g., 0.8-0.9): Better model learning
  • Smaller (e.g., 0.5-0.7): Better validation

Common splits:

  • 0.8: Standard (80/20 split)
  • 0.7: More validation emphasis
  • 0.9: More training emphasis
false

Maintain class distribution in splits. Important when:

  • Classes are imbalanced
  • Small classes present
  • Representative splits needed

Requirements:

  • Classification tasks only
  • Cannot use with shuffle=false
  • Sufficient samples per class

Cv

oneof
DefaultCv

Standard cross-validation configuration using stratified 3-fold splitting.

Configuration:

  • Folds: 3
  • Method: StratifiedKFold
  • Stratification: Preserves class proportions

Advantages:

  • Balanced evaluation
  • Reasonable computation time
  • Good for medium-sized datasets

Limitations:

  • May be insufficient for small datasets
  • Higher variance than larger fold counts
  • May miss some data patterns

Configurable stratified k-fold cross-validation for specific validation requirements.

Features:

  • Adjustable fold count with NFolds determining the number of splits.
  • Stratified sampling
  • Preserved class distributions

Use cases:

  • Small datasets (more folds)
  • Large datasets (fewer folds)
  • Detailed model evaluation
  • Robust performance estimation
3

Number of cross-validation folds. Guidelines:

  • 3-5: Large datasets, faster training
  • 5-10: Standard choice, good balance
  • 10+: Small datasets, thorough evaluation

Trade-offs:

  • More folds: Better evaluation, slower training
  • Fewer folds: Faster training, higher variance

Must be at least 2.

K-fold cross-validation without stratification. Divides data into k consecutive folds for iterative validation.

Process:

  • Splits data into k equal parts
  • Each fold serves as validation once
  • Remaining k-1 folds form training set

Use cases:

  • Regression problems
  • Large, balanced datasets
  • When stratification unnecessary
  • Continuous target variables

Limitations:

  • May not preserve class distributions
  • Less suitable for imbalanced data
  • Can create biased splits with ordered data

Number of folds for cross-validation. Selection guide: Recommended values:

  • 5: Standard choice (default)
  • 3: Large datasets/quick evaluation
  • 10: Thorough evaluation/smaller datasets

Trade-offs:

  • Higher values: More thorough, computationally expensive
  • Lower values: Faster, potentially higher variance

Must be at least 2 for valid cross-validation.

Random seed for fold generation when shuffling. Important for:

  • Reproducible results
  • Consistent fold assignments
  • Benchmark comparisons
  • Debugging and validation

Set specific value for reproducibility across runs.

true

Whether to shuffle data before splitting into folds. Effects:

  • true: Randomized fold composition (recommended)
  • false: Sequential splitting

Enable when:

  • Data may have ordering
  • Better fold independence needed

Disable for:

  • Time series data
  • Ordered observations

Stratified K-fold cross-validation maintaining class proportions across folds.

Key features:

  • Preserves class distribution in each fold
  • Handles imbalanced datasets
  • Ensures representative splits

Best for:

  • Classification problems
  • Imbalanced class distributions
  • When class proportions matter

Requirements:

  • Classification tasks only
  • Sufficient samples per class
  • Categorical target variable

Number of stratified folds. Guidelines: Typical values:

  • 5: Standard for most cases
  • 3: Quick evaluation/large datasets
  • 10: Detailed evaluation/smaller datasets

Considerations:

  • Must allow sufficient samples per class per fold
  • Balance between stability and computation time
  • Consider smallest class size when choosing

Seed for reproducible stratified splits. Ensures:

  • Consistent fold assignments
  • Reproducible results
  • Comparable experiments
  • Systematic validation

Fixed seed guarantees identical stratified splits.

false

Data shuffling before stratified splitting. Impact:

  • true: Randomizes while maintaining stratification
  • false: Maintains data order within strata

Use cases:

  • true: Independent observations
  • false: Grouped or sequential data

Class proportions maintained regardless of setting.

Random permutation cross-validator with independent sampling.

Characteristics:

  • Random sampling for each split
  • Independent train/test sets
  • More flexible than K-fold
  • Can have overlapping test sets

Advantages:

  • Control over test size
  • Fresh splits each iteration
  • Good for large datasets

Limitations:

  • Some samples might never be tested
  • Others might be tested multiple times
  • No guarantee of complete coverage

Number of random splits to perform. Consider: Common values:

  • 5: Standard evaluation
  • 10: More thorough assessment
  • 3: Quick estimates

Trade-offs:

  • More splits: Better estimation, longer runtime
  • Fewer splits: Faster, less stable estimates

Balance between computation and stability.

Random seed for reproducible shuffling. Controls:

  • Split randomization
  • Sample selection
  • Result reproducibility

Important for:

  • Debugging
  • Comparative studies
  • Result verification
0.2

Proportion of samples for test set. Guidelines: Common ratios:

  • 0.2: Standard (80/20 split)
  • 0.25: More validation emphasis
  • 0.1: More training data

Considerations:

  • Dataset size
  • Model complexity
  • Validation requirements

It must be between 0.0 and 1.0.

Stratified random permutation cross-validator combining shuffle-split with stratification.

Features:

  • Maintains class proportions
  • Random sampling within strata
  • Independent splits
  • Flexible test size

Ideal for:

  • Imbalanced datasets
  • Large-scale problems
  • When class distributions matter
  • Flexible validation schemes

Number of stratified random splits. Guidelines: Recommended values:

  • 5: Standard evaluation
  • 10: Detailed analysis
  • 3: Quick assessment

Consider:

  • Sample size per class
  • Computational resources
  • Stability requirements

Seed for reproducible stratified sampling. Ensures:

  • Consistent class proportions
  • Reproducible splits
  • Comparable experiments

Critical for:

  • Benchmarking
  • Research studies
  • Quality assurance
0.2

Fraction of samples for stratified test set. Best practices: Common splits:

  • 0.2: Balanced evaluation
  • 0.3: More thorough testing
  • 0.15: Preserve training size

Consider:

  • Minority class size
  • Overall dataset size
  • Validation objectives

It must be between 0.0 and 1.0.

Time Series cross-validator. Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. It is a variation of k-fold which returns first k folds as train set and the k + 1th fold as test set. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Also, it adds all surplus data to the first training partition, which is always used to train the model. Key features:

  • Maintains temporal dependence
  • Expanding window approach
  • Forward-chaining splits
  • No future data leakage

Use cases:

  • Sequential data
  • Financial forecasting
  • Temporal predictions
  • Time-dependent patterns

Note: Training sets are supersets of previous iterations.

Number of temporal splits. Considerations: Typical values:

  • 5: Standard forward chaining
  • 3: Limited historical data
  • 10: Long time series

Impact:

  • Affects training window growth
  • Determines validation points
  • Influences computational load

Maximum size of training set. Should be strictly less than the number of samples. Applications:

  • 0: Use all available past data
  • >0: Rolling window of fixed size

Use cases:

  • Limit historical relevance
  • Control computational cost
  • Handle concept drift
  • Memory constraints

Number of samples in each test set. When 0:

  • Auto-calculated as n_samples/(n_splits+1)
  • Ensures equal-sized test sets

Considerations:

  • Forecast horizon
  • Validation requirements
  • Available future data

Gap

u64
0

Number of samples to exclude from the end of each train set before the test set.Gap between train and test sets. Uses:

  • Avoid data leakage
  • Model forecast lag
  • Buffer periods

Common scenarios:

  • 0: Continuous prediction
  • >0: Forward gap for realistic evaluation
  • Match business forecasting needs