Dummy / Regressor Layer
Dummy Regressor: A baseline regression model using simple rules.
Purpose:
- Provides regression baseline
- Validates more complex models
- Sanity checks performance
- Tests pipeline infrastructure
Strategies:
- Mean prediction
- Median prediction
- Quantile-based prediction
- Constant value prediction
Use cases:
- Model comparison baseline
- Pipeline debugging
- Null hypothesis testing
- Data quality assessment
Outputs:
- Predicted Table: Results with predictions
- Validation Results: Cross-validation metrics
- Test Metric: Hold-out performance
- Feature Importances: Always zero (dummy model)
Note: Ignores feature values in predictions
SelectTarget
columnTarget column specification for regression:
Requirements:
-
Data type:
- Numeric values only
- Continuous or discrete
- No missing values
- Real-valued targets
-
Quality checks:
- Value range verification
- Distribution assessment
- Outlier detection
- Scale consideration
-
Statistical properties:
- Central tendency
- Spread measures
- Distribution shape
- Outlier impact
Note: Must be a single numeric column
Strategy
enumPrediction strategy for baseline regression:
Selection criteria:
- Data distribution
- Target characteristics
- Baseline requirements
- Performance comparison needs
Usage:
- Model validation
- Performance benchmarking
- Statistical testing
- Pipeline verification
Mean-based prediction strategy:
Formula:
Characteristics:
- Minimizes MSE
- Central tendency measure
- Sensitive to outliers
- Simple computation
Best for:
- Normal distributions
- MSE optimization
- General baselines
- Initial benchmarks
Median-based prediction strategy:
Formula: Middle value of sorted {y₁, ..., yₙ}
Properties:
- Robust to outliers
- Minimizes MAE
- Central position measure
- Order statistic
Best for:
- Skewed distributions
- Outlier presence
- MAE optimization
- Robust baselines
Quantile-based prediction strategy:
Formula: q-th quantile of {y₁, ..., yₙ}
Features:
- User-specified quantile
- Distribution percentiles
- Flexible prediction point
- Risk-level modeling
Best for:
- Quantile regression
- Risk assessment
- Asymmetric losses
- Specific percentiles
Constant value prediction strategy:
Method: (user-defined constant)
Applications:
- Fixed value prediction
- Known target level
- Specific baseline
- Domain knowledge
Best for:
- Known reference points
- Simple comparisons
- Specific hypotheses
- Custom baselines
Quantile
f64The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.
Range: [0.0, 1.0] where:
- 0.0: Minimum value
- 0.5: Median (default)
- 1.0: Maximum value
Common values:
- 0.25: First quartile
- 0.75: Third quartile
- 0.95: 95th percentile
Use cases:
- Risk assessment
- Confidence bounds
- Distribution analysis
- Extreme value study
Fixed prediction value using Constant
strategy:
Usage:
- Domain-specific constant
- Known reference point
- Theoretical value
- Baseline comparison
Applications:
- Null hypothesis testing
- Known standards
- Specific benchmarks
- Reference comparisons