Quantile / GroupBy Layer

Calculate quantiles within groups, similar to pandas groupby().quantile() or dplyr's group_by() with summarize(quantile()). Useful for understanding value distributions within different categories or segments.

Example scenarios:

  1. Income Distribution Analysis
ConfigurationResult
by=[region, education]Group by location and education
select=[annual_income]Analyze income distribution
quantile=0.75Find 75th percentile (upper quartile)
  1. Performance Metrics
ConfigurationResult
by=[department, quarter]Group by dept and time period
select=[response_time]Analyze service levels
quantile=0.95Find 95th percentile (SLA analysis)

Common applications:

  • Salary distribution analysis
  • Performance benchmarking
  • Response time analysis
  • Quality control thresholds
  • Risk assessment metrics
  • Customer segmentation
Table
0
0
Table

By

[column, ...]

Columns to group by. Creates separate quantile calculations for each unique combination. Common grouping columns:

  • Geographic: country, region, city
  • Temporal: year, quarter, month
  • Categorical: product_line, customer_segment

Select

[column, ...]

Numeric columns to calculate quantiles for. Selected columns must contain numeric data suitable for quantile calculation. Common metrics:

  • Financial: price, revenue, cost
  • Performance: duration, score, rating
  • Measurements: temperature, weight, distance

Quantile value between 0 and 1. Common values:

  • 0.25: First quartile (Q1)
  • 0.50: Median
  • 0.75: Third quartile (Q3)
  • 0.95: 95th percentile (common for SLAs)
  • 0.99: 99th percentile (outlier analysis)
Lower

Methods for estimating quantile values between discrete data points in each group. Choice affects results when exact quantile falls between observations.

Lower ~

Use lower value (floor). Conservative estimate, ensures value exists in dataset. Example: If 75th percentile falls between 100 and 101, uses 100.

Higher ~

Use higher value (ceiling). Liberal estimate, ensures value exists in dataset. Example: If 75th percentile falls between 100 and 101, uses 101.

Midpoint ~

Average of lower and higher values. Balanced approach between extremes. Example: If 75th percentile falls between 100 and 101, uses 100.5.

Nearest ~

Use nearest value (round). Minimizes absolute distance to true quantile. Example: If 75th percentile falls between 100 and 101, uses closest value.

Linear ~

Linear interpolation between points. Provides smooth transitions. Example: If 75th percentile falls 60% between 100 and 101, uses 100.6.