Advanced / GroupBy Layer

Perform advanced grouping operations with multiple aggregations, similar to pandas groupby() with agg() or dplyr's group_by() with summarize().

Example scenarios:

  1. Sales Analysis
OperationPurpose
by=[region, product_type]Group by region and product
agg=Sum(sales_amount)Total sales in each group
agg=Count(transaction_id)Number of transactions
agg=Mean(unit_price)Average price point
  1. Customer Behavior
OperationPurpose
by=[customer_segment, month]Group by segment and time
agg=NUnique(customer_id)Unique customers
agg=Mean(purchase_value)Average purchase value
agg=Max(transaction_amount)Largest transactions
  1. Quality Control
OperationPurpose
by=[batch_id, production_line]Group by batch and line
agg=Mean(measurement)Average measurements
agg=Std(measurement)Measurement variation
agg=Count(defect_id)Number of defects

Common applications:

  • Sales analysis by region/product
  • Customer behavior segmentation
  • Performance metrics by category
  • Time-based aggregations
  • Quality metrics by batch
  • Resource utilization analysis
Table
0
0
Table

By

[column, ...]

Columns to group by. Creates distinct groups based on unique combinations of values in these columns. Common grouping columns:

  • Categorical: region, product_type, customer_segment
  • Temporal: year, month, day
  • Hierarchical: department, team, employee

Defines aggregation operations to apply to grouped data. Multiple aggregations can be combined to create comprehensive group summaries.

Select

[column, ...]

Columns to aggregate. Requirements vary by aggregation type:

  • Numeric required: Mean, Sum, Std, Var, Min, Max, Median
  • Any type allowed: Count, NUnique, First, Last, None Multiple columns can be selected for the same aggregation.

Agg

enum
Min

Available aggregation functions for grouped data analysis. Different functions suit different analytical needs and data types.

Min ~

Minimum value in each group. Used for:

  • Lowest price points
  • Best performance times
  • Temperature minimums
Max ~

Maximum value in each group. Used for:

  • Peak sales amounts
  • Highest temperatures
  • Maximum load points
Mean ~

Arithmetic mean of each group. Used for:

  • Average transaction value
  • Typical response times
  • Mean daily temperatures
Median ~

Middle value in each group. Used for:

  • Typical household income
  • Central tendency analysis
  • Robust average measures
NUnique ~

Count of unique values per group. Used for:

  • Distinct customer count
  • Product variety analysis
  • Unique error codes
Sum ~

Sum of values in each group. Used for:

  • Total revenue by region
  • Cumulative quantities
  • Total expenses
Std ~

Standard deviation of group values. Used for:

  • Price variation analysis
  • Quality control measures
  • Performance consistency
Var ~

Variance of group values. Used for:

  • Risk assessment
  • Spread analysis
  • Variability measures
First ~

First row in each group. Used for:

  • Initial readings
  • Starting values
  • First occurrences
Last ~

Last row in each group. Used for:

  • Final status
  • Most recent values
  • Ending conditions
Count ~

Row count per group. Used for:

  • Transaction frequency
  • Event counts
  • Occurrence analysis
None ~

No aggregation, maintains original values. Used for:

  • Keeping original data
  • Preparation for further processing
  • Custom aggregations