DataframeDescribe / Aggregation Layer
Calculate multiple statistical measures across dataframe columns, similar to pandas describe() or R's summary(). Returns a new dataframe where rows represent different statistics and columns maintain the original column names.
Key features:
- Handles multiple data types (numeric, boolean, categorical)
- Supports selective column analysis
- Provides comprehensive statistical measures
- Maintains null-value awareness
Common applications:
- Exploratory Data Analysis (EDA)
- Data quality assessment
- Distribution analysis
- Outlier detection
- Dataset summarization
- Statistical reporting
Select
[column, ...]Columns to analyze. If empty, all compatible columns are included. Select specific columns to focus analysis or improve performance. Column selection should match the requirements of chosen statistics (e.g., numeric columns for Mean).
Statistics
[enum, ...]Available statistical measures for column analysis. Different statistics are applicable to different data types (numeric, boolean, categorical).
Row index of maximum value. Useful for:
- Finding peak locations
- Identifying extreme events
- Maximum value referencing
Row index of minimum value. Important for:
- Locating lowest points
- Finding first occurrences
- Minimum value referencing
Total number of rows. Used for:
- Dataset size verification
- Completeness checks
- Sample size confirmation
Boolean indicator for presence of null values. Critical for:
- Data quality assessment
- Missing data detection
- Preprocessing decisions
True if all boolean values are True. For boolean columns only. Used in:
- Condition verification
- Quality checks
- Validation rules
True if any boolean value is True. For boolean columns only. Applied in:
- Event detection
- Flag checking
- Condition testing
Maximum value (numeric columns). Essential for:
- Range analysis
- Outlier detection
- Upper bound identification
Arithmetic mean (numeric columns). Key for:
- Central tendency analysis
- Expected value estimation
- General data characterization
Middle value (numeric columns). Important for:
- Robust central tendency
- Skewness assessment
- Outlier-resistant analysis
Most frequent value (numeric columns). Useful for:
- Peak detection
- Common value identification
- Discrete data analysis
Minimum value (numeric columns). Critical for:
- Range analysis
- Lower bound detection
- Baseline identification
Count of unique values. Applied in:
- Cardinality analysis
- Categorical assessment
- Diversity measurement
Count of null values. Essential for:
- Missing data analysis
- Data quality metrics
- Completeness assessment
Statistical variance with ddof=1 (numeric columns). Used for:
- Spread measurement
- Variability analysis
- Uncertainty quantification
Standard deviation with ddof=1 (numeric columns). Important for:
- Dispersion analysis
- Error estimation
- Quality control
Sum of all values (numeric columns). Applied in:
- Total calculations
- Aggregate analysis
- Cumulative measurements
Product of all values (numeric columns). Used for:
- Geometric calculations
- Compound growth
- Multiplicative aggregation
Skewness measurement (numeric columns). Essential for:
- Distribution shape analysis
- Asymmetry detection
- Normality assessment
Kurtosis measurement (numeric columns). Important for:
- Tail heaviness analysis
- Peak sharpness assessment
- Distribution shape characterization