Dummy / Classifier Layer
A simple baseline classifier that makes predictions using basic rules, ignoring input features. Similar to sklearn.dummy.DummyClassifier.
Key characteristics:
- Serves as performance baseline
- Ignores feature values
- Uses simple statistical rules
- Helps detect data leakage
Common applications:
- Baseline performance measurement
- Model sanity checking
- Data leakage detection
- Null hypothesis testing
- Minimal performance benchmarking
Outputs:
- Predicted Table: Input data with predictions
- Validation Results: Cross-validation metrics
- Test Metric: Test set performance
- ROC Curve Data: ROC analysis data
- Confusion Matrix: Classification breakdown
- Feature Importances: Always zero/null
Note: Not suitable for real-world predictions. Use only as baseline reference.
TargetCol
columnTarget column for prediction. Requirements:
- Must contain class labels
- No missing values
- At least one class
Used to determine class distribution for strategies.
Strategy
enumPrediction strategy determining how the classifier makes decisions. Each strategy provides different baseline behavior:
Use cases:
- Stratified: For imbalanced datasets
- MostFrequent: When majority class is important
- Prior: For probability calibration baseline
- Uniform: For random chance baseline
- Constant: For specific class focus
the predict_proba method randomly samples one-hot vectors from a multinomial distribution parameterized by the empirical class prior probabilities. The predict method returns the class label which got probability one in the one-hot vector of predict_proba. Each sampled row of both methods is therefore independent and identically distributed.
the predict method always returns the most frequent class label in the observed y argument passed to fit. The predict_proba method returns the matching one-hot encoded vector.
the predict method always returns the most frequent class label in the observed y argument passed to fit (like “most_frequent”). predict_proba always returns the empirical class distribution of y also known as the empirical class prior distribution.
generates predictions uniformly at random from the list of unique classes observed in y, i.e. each class has equal probability.
always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.
RandomState
u32Random seed for reproducible predictions. Important for:
- Stratified strategy
- Uniform strategy
- Reproducible baselines
Other strategies are deterministic.
ConstantValue
stringPrediction value for Constant strategy. Usage:
- Must be a valid class label
- Only used with Constant strategy
- Useful for specific class focus
- Common in minority class analysis