Quantile / GroupBy Layer

Calculate quantiles within groups, similar to pandas groupby().quantile() or dplyr's group_by() with summarize(quantile()). Useful for understanding value distributions within different categories or segments.

Example scenarios:

Income Distribution Analysis

Configuration	Result
by=[region, education]	Group by location and education
select=[annual_income]	Analyze income distribution
quantile=0.75	Find 75th percentile (upper quartile)

Performance Metrics

Configuration	Result
by=[department, quarter]	Group by dept and time period
select=[response_time]	Analyze service levels
quantile=0.95	Find 95th percentile (SLA analysis)

Common applications:

Salary distribution analysis
Performance benchmarking
Response time analysis
Quality control thresholds
Risk assessment metrics
Customer segmentation

Table

By

[column, ...]

Columns to group by. Creates separate quantile calculations for each unique combination. Common grouping columns:

Geographic: country, region, city
Temporal: year, quarter, month
Categorical: product_line, customer_segment

Select

[column, ...]

Numeric columns to calculate quantiles for. Selected columns must contain numeric data suitable for quantile calculation. Common metrics:

Financial: price, revenue, cost
Performance: duration, score, rating
Measurements: temperature, weight, distance

Quantile

f64

Quantile value between 0 and 1. Common values:

0.25: First quartile (Q1)
0.50: Median
0.75: Third quartile (Q3)
0.95: 95th percentile (common for SLAs)
0.99: 99th percentile (outlier analysis)

Interpolation

enum

Lower

Methods for estimating quantile values between discrete data points in each group. Choice affects results when exact quantile falls between observations.

Lower ~

Use lower value (floor). Conservative estimate, ensures value exists in dataset. Example: If 75th percentile falls between 100 and 101, uses 100.

Higher ~

Use higher value (ceiling). Liberal estimate, ensures value exists in dataset. Example: If 75th percentile falls between 100 and 101, uses 101.

Midpoint ~

Average of lower and higher values. Balanced approach between extremes. Example: If 75th percentile falls between 100 and 101, uses 100.5.

Nearest ~

Use nearest value (round). Minimizes absolute distance to true quantile. Example: If 75th percentile falls between 100 and 101, uses closest value.

Linear ~

Linear interpolation between points. Provides smooth transitions. Example: If 75th percentile falls 60% between 100 and 101, uses 100.6.