QcutUniform / Manipulation Layer
Create equal-sized groups using uniform quantile divisions. Similar to pandas' qcut(q=n) or R's quantile-based binning with uniform probabilities.
Mathematical concept: For n bins, probability intervals are:
Common applications:
- Balanced tier assignment (equal-sized groups)
- Normalized scoring (percentile-based grades)
- Distribution partitioning (equal probability bins)
- Balanced sampling frameworks
- Fair resource allocation
- Standardized benchmarking
- Population segmentation
- Ranking systems
Example: With 4 bins, creates quartiles with approximately 25% of data in each group.
Select
columnColumn to divide into uniform quantile groups. Ideal for:
- Continuous measurements requiring equal grouping
- Raw scores needing standardization
- Sequential data requiring balanced division
- Ranked values needing uniform distribution
- Metrics requiring equal-sized categorization
Bins
u32Number of equal-sized groups to create. Common choices:
- 2: Median split (above/below)
- 3: Terciles (low/middle/high)
- 4: Quartiles (four equal parts)
- 5: Quintiles (five equal parts)
- 10: Deciles (ten equal parts)
- 100: Percentiles (hundred equal parts)
Labels
[string, ...]Names for the uniform quantile groups. Examples:
- Descriptive: ['Bottom', 'Lower Middle', 'Upper Middle', 'Top']
- Tiered: ['Bronze', 'Silver', 'Gold', 'Platinum']
- Graded: ['Q1', 'Q2', 'Q3', 'Q4']
- Relative: ['Low', 'Medium-Low', 'Medium-High', 'High']
LeftClosed
boolControls interval boundary inclusion:
false
(default): Right-closed [a,b)true
: Left-closed (a,b] Affects edge case assignment at quantile boundaries
Duplicates
boolHandle duplicate values at quantile boundaries:
false
(default): Error on unequal bins due to duplicatestrue
: Allow unequal bins when duplicates cross boundaries Important for discrete data with value repetition
AsColumn
nameName for the new column. If not provided, the system generates a unique name. If AsColumn
matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.