CaseWhen / Manipulation Layer
Create a new column by applying multiple conditional statements across one or more columns. Similar to SQL's CASE WHEN or pandas' np.select(). Evaluates conditions in sequence and assigns values based on the first matching condition.
Common applications:
- Order processing status
- Data quality validation
- Customer categorization
- Product classification
- Transaction flagging
Example: Order Processing Priority Assignment
Order Type | Shipping | Value ($) | Status | Priority |
---|---|---|---|---|
Express | Next Day | 150 | Processing | Urgent |
Standard | Ground | 350 | Pending | High |
Express | Ground | 75 | Processing | Medium |
Standard | Ground | 50 | Complete | Low |
Standard | Ground | 25 | Processing | Normal |
Implementation logic:
CASE
WHEN shipping = 'Next Day' THEN 'Urgent'
WHEN value > 300 THEN 'High'
WHEN order_type = 'Express' THEN 'Medium'
WHEN status = 'Complete' THEN 'Low'
ELSE 'Normal'
END
Key points:
- Each condition evaluates a single column
- Conditions are checked in sequence
- First matching condition determines result
- Non-matching rows use the Otherwise value
- More specific conditions should be listed first
Note: The order of conditions is crucial as each row's value comes from the first matching condition. Consider business logic priority when ordering conditions.
CaseList
[, ...]Defines a single conditional case with its comparison and result. Cases are evaluated in order until a match is found. Similar to an IF-THEN statement in programming.
SelectWhen
columnThe column to evaluate in this condition. Forms the left side of the comparison. Example: age, salary, status_code
Operator
enumThe operator to use for comparison
Equal.
Not equal.
Less than.
Less than or equal.
Greater than.
Greater than or equal.
Contains.
Does not contain.
Begins with.
Does not begin with.
Ends with.
Does not end with.
CompareWith
stringThe value to compare against. Forms the right side of the comparison. Must be a literal value. Example: 18 for age comparison, 'Active' for status
ReplaceWith
stringThe value to use when this condition is true. This value will be converted to the specified AsDatatype
. Example: 'Minor' for age category, 1 for binary flags
Otherwise
oneofSpecifies that when no conditions match, the resulting value should be null. This is a conservative approach to handling unmatched cases, making it explicit that no conditions were satisfied.
ReplaceWith
stringDefault value to use when no conditions match. Similar to ELSE in SQL CASE statement. Provides a fallback value for unmatched cases. Example: 'Other', 'Unknown', -1
ReplaceFrom
columnColumn to draw default values from when no conditions match. Allows dynamic fallback values based on existing data. Useful for preserving original values in specific cases.
AsColumn
nameName for the new column. If not provided, the system generates a unique name. If AsColumn
matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.
AsDatatype
enumSpecifies the output data type for the new column. Enables type control for the conditional results.
8-bit signed integer (-128 to 127). For small-range discrete values.
16-bit signed integer (-32,768 to 32,767). For moderate-range whole numbers.
32-bit signed integer (-2^31 to 2^31-1). Common for general integer data.
64-bit signed integer (-2^63 to 2^63-1). For large whole numbers.
8-bit unsigned integer (0 to 255). For small positive numbers.
16-bit unsigned integer (0 to 65,535). For moderate positive ranges.
32-bit unsigned integer (0 to 4,294,967,295). For larger positive numbers.
64-bit unsigned integer (0 to 2^64-1). For very large positive numbers.
32-bit floating point. For decimal numbers with moderate precision.
64-bit floating point. For high-precision decimal numbers.
Text data type. Default type for flexible value representation.
Categorical string type. For efficient storage of repeated string values.
Boolean type. For true/false results.