Quantile / GroupBy Layer
Calculate quantiles within groups, similar to pandas groupby().quantile() or dplyr's group_by() with summarize(quantile()). Useful for understanding value distributions within different categories or segments.
Example scenarios:
- Income Distribution Analysis
Configuration | Result |
---|---|
by=[region, education] | Group by location and education |
select=[annual_income] | Analyze income distribution |
quantile=0.75 | Find 75th percentile (upper quartile) |
- Performance Metrics
Configuration | Result |
---|---|
by=[department, quarter] | Group by dept and time period |
select=[response_time] | Analyze service levels |
quantile=0.95 | Find 95th percentile (SLA analysis) |
Common applications:
- Salary distribution analysis
- Performance benchmarking
- Response time analysis
- Quality control thresholds
- Risk assessment metrics
- Customer segmentation
By
[column, ...]Columns to group by. Creates separate quantile calculations for each unique combination. Common grouping columns:
- Geographic: country, region, city
- Temporal: year, quarter, month
- Categorical: product_line, customer_segment
Select
[column, ...]Numeric columns to calculate quantiles for. Selected columns must contain numeric data suitable for quantile calculation. Common metrics:
- Financial: price, revenue, cost
- Performance: duration, score, rating
- Measurements: temperature, weight, distance
Quantile
f64Quantile value between 0 and 1. Common values:
- 0.25: First quartile (Q1)
- 0.50: Median
- 0.75: Third quartile (Q3)
- 0.95: 95th percentile (common for SLAs)
- 0.99: 99th percentile (outlier analysis)
Interpolation
enumMethods for estimating quantile values between discrete data points in each group. Choice affects results when exact quantile falls between observations.
Use lower value (floor). Conservative estimate, ensures value exists in dataset. Example: If 75th percentile falls between 100 and 101, uses 100.
Use higher value (ceiling). Liberal estimate, ensures value exists in dataset. Example: If 75th percentile falls between 100 and 101, uses 101.
Average of lower and higher values. Balanced approach between extremes. Example: If 75th percentile falls between 100 and 101, uses 100.5.
Use nearest value (round). Minimizes absolute distance to true quantile. Example: If 75th percentile falls between 100 and 101, uses closest value.
Linear interpolation between points. Provides smooth transitions. Example: If 75th percentile falls 60% between 100 and 101, uses 100.6.