Dummy / Regressor Layer

Dummy Regressor: A baseline regression model using simple rules.

Purpose:

  • Provides regression baseline
  • Validates more complex models
  • Sanity checks performance
  • Tests pipeline infrastructure

Strategies:

  • Mean prediction
  • Median prediction
  • Quantile-based prediction
  • Constant value prediction

Use cases:

  • Model comparison baseline
  • Pipeline debugging
  • Null hypothesis testing
  • Data quality assessment

Outputs:

  1. Predicted Table: Results with predictions
  2. Validation Results: Cross-validation metrics
  3. Test Metric: Hold-out performance
  4. Feature Importances: Always zero (dummy model)

Note: Ignores feature values in predictions

Table
0
0
Predicted Table
1
Validation Results
2
Test Metric
3
Feature Importances

Target column specification for regression:

Requirements:

  1. Data type:

    • Numeric values only
    • Continuous or discrete
    • No missing values
    • Real-valued targets
  2. Quality checks:

    • Value range verification
    • Distribution assessment
    • Outlier detection
    • Scale consideration
  3. Statistical properties:

    • Central tendency
    • Spread measures
    • Distribution shape
    • Outlier impact

Note: Must be a single numeric column

Quantile

Prediction strategy for baseline regression:

Selection criteria:

  • Data distribution
  • Target characteristics
  • Baseline requirements
  • Performance comparison needs

Usage:

  • Model validation
  • Performance benchmarking
  • Statistical testing
  • Pipeline verification
Mean ~

Mean-based prediction strategy:

Formula:

Characteristics:

  • Minimizes MSE
  • Central tendency measure
  • Sensitive to outliers
  • Simple computation

Best for:

  • Normal distributions
  • MSE optimization
  • General baselines
  • Initial benchmarks
Median ~

Median-based prediction strategy:

Formula: Middle value of sorted {y₁, ..., yₙ}

Properties:

  • Robust to outliers
  • Minimizes MAE
  • Central position measure
  • Order statistic

Best for:

  • Skewed distributions
  • Outlier presence
  • MAE optimization
  • Robust baselines
Quantile ~

Quantile-based prediction strategy:

Formula: q-th quantile of {y₁, ..., yₙ}

Features:

  • User-specified quantile
  • Distribution percentiles
  • Flexible prediction point
  • Risk-level modeling

Best for:

  • Quantile regression
  • Risk assessment
  • Asymmetric losses
  • Specific percentiles
Constant ~

Constant value prediction strategy:

Method: (user-defined constant)

Applications:

  • Fixed value prediction
  • Known target level
  • Specific baseline
  • Domain knowledge

Best for:

  • Known reference points
  • Simple comparisons
  • Specific hypotheses
  • Custom baselines
0.5

The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.

Range: [0.0, 1.0] where:

  • 0.0: Minimum value
  • 0.5: Median (default)
  • 1.0: Maximum value

Common values:

  • 0.25: First quartile
  • 0.75: Third quartile
  • 0.95: 95th percentile

Use cases:

  • Risk assessment
  • Confidence bounds
  • Distribution analysis
  • Extreme value study

Fixed prediction value using Constant strategy:

Usage:

  • Domain-specific constant
  • Known reference point
  • Theoretical value
  • Baseline comparison

Applications:

  • Null hypothesis testing
  • Known standards
  • Specific benchmarks
  • Reference comparisons