CaseWhen / Manipulation Layer

Create a new column by applying multiple conditional statements across one or more columns. Similar to SQL's CASE WHEN or pandas' np.select(). Evaluates conditions in sequence and assigns values based on the first matching condition.

Common applications:

  • Order processing status
  • Data quality validation
  • Customer categorization
  • Product classification
  • Transaction flagging

Example: Order Processing Priority Assignment

Order TypeShippingValue ($)StatusPriority
ExpressNext Day150ProcessingUrgent
StandardGround350PendingHigh
ExpressGround75ProcessingMedium
StandardGround50CompleteLow
StandardGround25ProcessingNormal

Implementation logic:

CASE 
  WHEN shipping = 'Next Day' THEN 'Urgent'
  WHEN value > 300 THEN 'High'
  WHEN order_type = 'Express' THEN 'Medium'
  WHEN status = 'Complete' THEN 'Low'
  ELSE 'Normal'
END

Key points:

  • Each condition evaluates a single column
  • Conditions are checked in sequence
  • First matching condition determines result
  • Non-matching rows use the Otherwise value
  • More specific conditions should be listed first

Note: The order of conditions is crucial as each row's value comes from the first matching condition. Consider business logic priority when ordering conditions.

Table
0
0
Table

CaseList

[, ...]

Defines a single conditional case with its comparison and result. Cases are evaluated in order until a match is found. Similar to an IF-THEN statement in programming.

The column to evaluate in this condition. Forms the left side of the comparison. Example: age, salary, status_code

EqualTo

The operator to use for comparison

EqualTo ~

Equal.

NotEqualTo ~

Not equal.

LessThan ~

Less than.

LessThanOrEqualTo ~

Less than or equal.

GreaterThan ~

Greater than.

GreaterThanOrEqualTo ~

Greater than or equal.

StrContains ~

Contains.

StrDoesNotContain ~

Does not contain.

StrStartsWith ~

Begins with.

StrDoesNotStartWith ~

Does not begin with.

StrEndsWith ~

Ends with.

StrDoesNotEndWith ~

Does not end with.

The value to compare against. Forms the right side of the comparison. Must be a literal value. Example: 18 for age comparison, 'Active' for status

The value to use when this condition is true. This value will be converted to the specified AsDatatype. Example: 'Minor' for age category, 1 for binary flags

FillNull

Specifies that when no conditions match, the resulting value should be null. This is a conservative approach to handling unmatched cases, making it explicit that no conditions were satisfied.

Default value to use when no conditions match. Similar to ELSE in SQL CASE statement. Provides a fallback value for unmatched cases. Example: 'Other', 'Unknown', -1

Column to draw default values from when no conditions match. Allows dynamic fallback values based on existing data. Useful for preserving original values in specific cases.

Name for the new column. If not provided, the system generates a unique name. If AsColumn matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.

String

Specifies the output data type for the new column. Enables type control for the conditional results.

Int8 ~

8-bit signed integer (-128 to 127). For small-range discrete values.

Int16 ~

16-bit signed integer (-32,768 to 32,767). For moderate-range whole numbers.

Int32 ~

32-bit signed integer (-2^31 to 2^31-1). Common for general integer data.

Int64 ~

64-bit signed integer (-2^63 to 2^63-1). For large whole numbers.

Uint8 ~

8-bit unsigned integer (0 to 255). For small positive numbers.

Uint16 ~

16-bit unsigned integer (0 to 65,535). For moderate positive ranges.

Uint32 ~

32-bit unsigned integer (0 to 4,294,967,295). For larger positive numbers.

Uint64 ~

64-bit unsigned integer (0 to 2^64-1). For very large positive numbers.

Float32 ~

32-bit floating point. For decimal numbers with moderate precision.

Float64 ~

64-bit floating point. For high-precision decimal numbers.

String ~

Text data type. Default type for flexible value representation.

Categorical ~

Categorical string type. For efficient storage of repeated string values.

Bool ~

Boolean type. For true/false results.