Qcut / Manipulation Layer

Convert continuous data into quantile-based categories. Similar to pandas' qcut() or R's quantile-based categorization.

Mathematical concept: For n quantiles, each bin contains approximately N/n records, where N is total count.

Common applications:

  • Customer segmentation (top 25%, middle 50%, bottom 25%)
  • Performance ranking (quartile distribution)
  • Equal-sized grouping (terciles, quartiles, quintiles)
  • Distribution analysis (percentile ranges)
  • Outlier identification (extreme quantiles)
  • Relative positioning (below median, above median)
  • Balanced sampling (stratified selection)
  • Normalized grouping (population-based divisions)

Unlike fixed-width binning (cut), qcut ensures approximately equal-sized groups.

Table
0
0
Table

Select

column

Column to divide into quantile-based groups. Suitable for:

  • Continuous measurements
  • Ranked values
  • Performance metrics
  • Distribution data
  • Sequential measurements

Quantiles

[f64, ...]
0.25, 0.5, 0.75

Quantile break points (0.0 to 1.0). Common patterns:

  • Quartiles: [0.25, 0.50, 0.75]
  • Quintiles: [0.2, 0.4, 0.6, 0.8]
  • Deciles: [0.1, 0.2, ..., 0.9]
  • Custom: [0.05, 0.95] for extreme grouping

Labels

[string, ...]
0-25%, 25-50%, 50-75%, 75-100%

Names for quantile groups. Examples:

  • Quartiles: ['Bottom 25%', 'Lower Mid', 'Upper Mid', 'Top 25%']
  • Rankings: ['Bronze', 'Silver', 'Gold', 'Platinum']
  • Relative: ['Below Average', 'Average', 'Above Average']
  • Descriptive: ['Low', 'Moderate', 'High', 'Very High']
false

Defines interval closure:

  • false (default): Right-closed [a,b)
  • true: Left-closed (a,b] Important for edge values at quantile boundaries
false

Handle duplicate values at quantile boundaries:

  • false (default): Raise error if duplicates cause unequal bins
  • true: Allow duplicates, potentially creating uneven groups Critical for discrete data with repeated values

Name for the new column. If not provided, the system generates a unique name. If AsColumn matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.