Dummy / Classifier Layer

A simple baseline classifier that makes predictions using basic rules, ignoring input features. Similar to sklearn.dummy.DummyClassifier.

Key characteristics:

  • Serves as performance baseline
  • Ignores feature values
  • Uses simple statistical rules
  • Helps detect data leakage

Common applications:

  • Baseline performance measurement
  • Model sanity checking
  • Data leakage detection
  • Null hypothesis testing
  • Minimal performance benchmarking

Outputs:

  1. Predicted Table: Input data with predictions
  2. Validation Results: Cross-validation metrics
  3. Test Metric: Test set performance
  4. ROC Curve Data: ROC analysis data
  5. Confusion Matrix: Classification breakdown
  6. Feature Importances: Always zero/null

Note: Not suitable for real-world predictions. Use only as baseline reference.

Table
0
0
Predicted Table
1
Validation Results
2
Test Metric
3
ROC Curve Data
4
Confusion Matrix
5
Feature Importances

TargetCol

column

Target column for prediction. Requirements:

  • Must contain class labels
  • No missing values
  • At least one class

Used to determine class distribution for strategies.

Prior

Prediction strategy determining how the classifier makes decisions. Each strategy provides different baseline behavior:

Use cases:

  • Stratified: For imbalanced datasets
  • MostFrequent: When majority class is important
  • Prior: For probability calibration baseline
  • Uniform: For random chance baseline
  • Constant: For specific class focus
Stratified ~

the predict_proba method randomly samples one-hot vectors from a multinomial distribution parameterized by the empirical class prior probabilities. The predict method returns the class label which got probability one in the one-hot vector of predict_proba. Each sampled row of both methods is therefore independent and identically distributed.

MostFrequent ~

the predict method always returns the most frequent class label in the observed y argument passed to fit. The predict_proba method returns the matching one-hot encoded vector.

Prior ~

the predict method always returns the most frequent class label in the observed y argument passed to fit (like “most_frequent”). predict_proba always returns the empirical class distribution of y also known as the empirical class prior distribution.

Uniform ~

generates predictions uniformly at random from the list of unique classes observed in y, i.e. each class has equal probability.

Constant ~

always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.

Random seed for reproducible predictions. Important for:

  • Stratified strategy
  • Uniform strategy
  • Reproducible baselines

Other strategies are deterministic.

Prediction value for Constant strategy. Usage:

  • Must be a valid class label
  • Only used with Constant strategy
  • Useful for specific class focus
  • Common in minority class analysis