Advanced / GroupBy Layer
Perform advanced grouping operations with multiple aggregations, similar to pandas groupby() with agg() or dplyr's group_by() with summarize().
Example scenarios:
- Sales Analysis
Operation | Purpose |
---|---|
by=[region, product_type] | Group by region and product |
agg=Sum(sales_amount) | Total sales in each group |
agg=Count(transaction_id) | Number of transactions |
agg=Mean(unit_price) | Average price point |
- Customer Behavior
Operation | Purpose |
---|---|
by=[customer_segment, month] | Group by segment and time |
agg=NUnique(customer_id) | Unique customers |
agg=Mean(purchase_value) | Average purchase value |
agg=Max(transaction_amount) | Largest transactions |
- Quality Control
Operation | Purpose |
---|---|
by=[batch_id, production_line] | Group by batch and line |
agg=Mean(measurement) | Average measurements |
agg=Std(measurement) | Measurement variation |
agg=Count(defect_id) | Number of defects |
Common applications:
- Sales analysis by region/product
- Customer behavior segmentation
- Performance metrics by category
- Time-based aggregations
- Quality metrics by batch
- Resource utilization analysis
By
[column, ...]Columns to group by. Creates distinct groups based on unique combinations of values in these columns. Common grouping columns:
- Categorical: region, product_type, customer_segment
- Temporal: year, month, day
- Hierarchical: department, team, employee
Aggregations
[, ...]Defines aggregation operations to apply to grouped data. Multiple aggregations can be combined to create comprehensive group summaries.
Select
[column, ...]Columns to aggregate. Requirements vary by aggregation type:
- Numeric required: Mean, Sum, Std, Var, Min, Max, Median
- Any type allowed: Count, NUnique, First, Last, None Multiple columns can be selected for the same aggregation.
Agg
enumAvailable aggregation functions for grouped data analysis. Different functions suit different analytical needs and data types.
Minimum value in each group. Used for:
- Lowest price points
- Best performance times
- Temperature minimums
Maximum value in each group. Used for:
- Peak sales amounts
- Highest temperatures
- Maximum load points
Arithmetic mean of each group. Used for:
- Average transaction value
- Typical response times
- Mean daily temperatures
Middle value in each group. Used for:
- Typical household income
- Central tendency analysis
- Robust average measures
Count of unique values per group. Used for:
- Distinct customer count
- Product variety analysis
- Unique error codes
Sum of values in each group. Used for:
- Total revenue by region
- Cumulative quantities
- Total expenses
Standard deviation of group values. Used for:
- Price variation analysis
- Quality control measures
- Performance consistency
Variance of group values. Used for:
- Risk assessment
- Spread analysis
- Variability measures
First row in each group. Used for:
- Initial readings
- Starting values
- First occurrences
Last row in each group. Used for:
- Final status
- Most recent values
- Ending conditions
Row count per group. Used for:
- Transaction frequency
- Event counts
- Occurrence analysis
No aggregation, maintains original values. Used for:
- Keeping original data
- Preparation for further processing
- Custom aggregations