Dummy / Classifier Layer

A simple baseline classifier that makes predictions using basic rules, ignoring input features. Similar to sklearn.dummy.DummyClassifier.

Key characteristics:

Serves as performance baseline
Ignores feature values
Uses simple statistical rules
Helps detect data leakage

Common applications:

Baseline performance measurement
Model sanity checking
Data leakage detection
Null hypothesis testing
Minimal performance benchmarking

Outputs:

Predicted Table: Input data with predictions
Validation Results: Cross-validation metrics
Test Metric: Test set performance
ROC Curve Data: ROC analysis data
Confusion Matrix: Classification breakdown
Feature Importances: Always zero/null

Note: Not suitable for real-world predictions. Use only as baseline reference.

Table

Predicted Table

Validation Results

Test Metric

ROC Curve Data

Confusion Matrix

Feature Importances

TargetCol

column

Target column for prediction. Requirements:

Must contain class labels
No missing values
At least one class

Used to determine class distribution for strategies.

Strategy

enum

Prior

Prediction strategy determining how the classifier makes decisions. Each strategy provides different baseline behavior:

Use cases:

Stratified: For imbalanced datasets
MostFrequent: When majority class is important
Prior: For probability calibration baseline
Uniform: For random chance baseline
Constant: For specific class focus

Stratified ~

the predict_proba method randomly samples one-hot vectors from a multinomial distribution parameterized by the empirical class prior probabilities. The predict method returns the class label which got probability one in the one-hot vector of predict_proba. Each sampled row of both methods is therefore independent and identically distributed.

MostFrequent ~

the predict method always returns the most frequent class label in the observed y argument passed to fit. The predict_proba method returns the matching one-hot encoded vector.

Prior ~

the predict method always returns the most frequent class label in the observed y argument passed to fit (like “most_frequent”). predict_proba always returns the empirical class distribution of y also known as the empirical class prior distribution.

Uniform ~

generates predictions uniformly at random from the list of unique classes observed in y, i.e. each class has equal probability.

Constant ~

always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.

RandomState

u32

Random seed for reproducible predictions. Important for:

Stratified strategy
Uniform strategy
Reproducible baselines

Other strategies are deterministic.

ConstantValue

string

Prediction value for Constant strategy. Usage:

Must be a valid class label
Only used with Constant strategy
Useful for specific class focus
Common in minority class analysis