FillNullByStrategy / Manipulation Layer

Fill null values in columns using predefined strategies. Similar to pandas fillna() or R's na.fill(). This operation handles missing data through various methods, each suited for different analytical needs.

Common applications:

  • Time series gap filling
  • Sensor data cleanup
  • Survey response completion
  • Financial data preprocessing
  • Machine learning dataset preparation

Multiple columns can be processed simultaneously, each with its own filling strategy, making it efficient for bulk data cleaning operations.

Table
0
0
Table

Transforms

[, ...]

Defines a single column fill operation with its strategy. Multiple transforms can be applied in parallel, each potentially using different strategies based on the data characteristics and analysis requirements.

The column containing null values to be filled. The column's data type must be compatible with the chosen strategy (e.g., numeric for Mean strategy).

Forward

Available strategies for null value replacement, each designed for specific data patterns and analysis requirements.

Forward ~

Propagate last valid value forward (LOCF - Last Observation Carried Forward). Ideal for:

  • Time series with stable periods
  • State-based data
  • Sensor readings with gaps
Backward ~

Fill with next valid value (NOCB - Next Observation Carried Backward). Useful for:

  • Retroactive data updates
  • Backward-looking analysis
  • End-of-period assignments
Mean ~

Fill with column's mean value. Requires numeric data. Appropriate for:

  • Statistical analysis
  • Normal distributions
  • Centered data estimation
Min ~

Fill with column's minimum value. Suitable for:

  • Conservative estimates
  • Lower bound analysis
  • Worst-case scenarios
Max ~

Fill with column's maximum value. Useful for:

  • Upper bound analysis
  • Best-case scenarios
  • Ceiling estimates
Zero ~

Replace nulls with zero. Common in:

  • Count data
  • Financial calculations
  • Binary indicators
One ~

Replace nulls with one. Useful for:

  • Multiplicative operations
  • Unit-based calculations
  • Neutral scaling factors
MinBound ~

Fill with data type's minimum value (e.g., INT_MIN). Used for:

  • Range-based analysis
  • System boundaries
  • Edge case handling
MaxBound ~

Fill with data type's maximum value (e.g., INT_MAX). Applied in:

  • System limits
  • Boundary testing
  • Maximum range scenarios

Name for the new column. If not provided, the system generates a unique name. If AsColumn matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.