DataframeDescribe / Aggregation Layer

Calculate multiple statistical measures across dataframe columns, similar to pandas describe() or R's summary(). Returns a new dataframe where rows represent different statistics and columns maintain the original column names.

Key features:

  • Handles multiple data types (numeric, boolean, categorical)
  • Supports selective column analysis
  • Provides comprehensive statistical measures
  • Maintains null-value awareness

Common applications:

  • Exploratory Data Analysis (EDA)
  • Data quality assessment
  • Distribution analysis
  • Outlier detection
  • Dataset summarization
  • Statistical reporting
Table
0
0
Table

Select

[column, ...]

Columns to analyze. If empty, all compatible columns are included. Select specific columns to focus analysis or improve performance. Column selection should match the requirements of chosen statistics (e.g., numeric columns for Mean).

Statistics

[enum, ...]
Mean, Median

Available statistical measures for column analysis. Different statistics are applicable to different data types (numeric, boolean, categorical).

ArgMax ~

Row index of maximum value. Useful for:

  • Finding peak locations
  • Identifying extreme events
  • Maximum value referencing
ArgMin ~

Row index of minimum value. Important for:

  • Locating lowest points
  • Finding first occurrences
  • Minimum value referencing
Length ~

Total number of rows. Used for:

  • Dataset size verification
  • Completeness checks
  • Sample size confirmation
HasNulls ~

Boolean indicator for presence of null values. Critical for:

  • Data quality assessment
  • Missing data detection
  • Preprocessing decisions
All ~

True if all boolean values are True. For boolean columns only. Used in:

  • Condition verification
  • Quality checks
  • Validation rules
Any ~

True if any boolean value is True. For boolean columns only. Applied in:

  • Event detection
  • Flag checking
  • Condition testing
Max ~

Maximum value (numeric columns). Essential for:

  • Range analysis
  • Outlier detection
  • Upper bound identification
Mean ~

Arithmetic mean (numeric columns). Key for:

  • Central tendency analysis
  • Expected value estimation
  • General data characterization
Median ~

Middle value (numeric columns). Important for:

  • Robust central tendency
  • Skewness assessment
  • Outlier-resistant analysis
Mode ~

Most frequent value (numeric columns). Useful for:

  • Peak detection
  • Common value identification
  • Discrete data analysis
Min ~

Minimum value (numeric columns). Critical for:

  • Range analysis
  • Lower bound detection
  • Baseline identification
NUnique ~

Count of unique values. Applied in:

  • Cardinality analysis
  • Categorical assessment
  • Diversity measurement
NullCount ~

Count of null values. Essential for:

  • Missing data analysis
  • Data quality metrics
  • Completeness assessment
Variance ~

Statistical variance with ddof=1 (numeric columns). Used for:

  • Spread measurement
  • Variability analysis
  • Uncertainty quantification
Std ~

Standard deviation with ddof=1 (numeric columns). Important for:

  • Dispersion analysis
  • Error estimation
  • Quality control
Sum ~

Sum of all values (numeric columns). Applied in:

  • Total calculations
  • Aggregate analysis
  • Cumulative measurements
Product ~

Product of all values (numeric columns). Used for:

  • Geometric calculations
  • Compound growth
  • Multiplicative aggregation
Skew ~

Skewness measurement (numeric columns). Essential for:

  • Distribution shape analysis
  • Asymmetry detection
  • Normality assessment
Kurtosis ~

Kurtosis measurement (numeric columns). Important for:

  • Tail heaviness analysis
  • Peak sharpness assessment
  • Distribution shape characterization