Qcut / Manipulation Layer
Convert continuous data into quantile-based categories. Similar to pandas' qcut() or R's quantile-based categorization.
Mathematical concept: For n quantiles, each bin contains approximately N/n records, where N is total count.
Common applications:
- Customer segmentation (top 25%, middle 50%, bottom 25%)
- Performance ranking (quartile distribution)
- Equal-sized grouping (terciles, quartiles, quintiles)
- Distribution analysis (percentile ranges)
- Outlier identification (extreme quantiles)
- Relative positioning (below median, above median)
- Balanced sampling (stratified selection)
- Normalized grouping (population-based divisions)
Unlike fixed-width binning (cut), qcut ensures approximately equal-sized groups.
Select
columnColumn to divide into quantile-based groups. Suitable for:
- Continuous measurements
- Ranked values
- Performance metrics
- Distribution data
- Sequential measurements
Quantiles
[f64, ...]Quantile break points (0.0 to 1.0). Common patterns:
- Quartiles: [0.25, 0.50, 0.75]
- Quintiles: [0.2, 0.4, 0.6, 0.8]
- Deciles: [0.1, 0.2, ..., 0.9]
- Custom: [0.05, 0.95] for extreme grouping
Labels
[string, ...]Names for quantile groups. Examples:
- Quartiles: ['Bottom 25%', 'Lower Mid', 'Upper Mid', 'Top 25%']
- Rankings: ['Bronze', 'Silver', 'Gold', 'Platinum']
- Relative: ['Below Average', 'Average', 'Above Average']
- Descriptive: ['Low', 'Moderate', 'High', 'Very High']
LeftClosed
boolDefines interval closure:
false
(default): Right-closed [a,b)true
: Left-closed (a,b] Important for edge values at quantile boundaries
Duplicates
boolHandle duplicate values at quantile boundaries:
false
(default): Raise error if duplicates cause unequal binstrue
: Allow duplicates, potentially creating uneven groups Critical for discrete data with repeated values
AsColumn
nameName for the new column. If not provided, the system generates a unique name. If AsColumn
matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.