CastDatatype / Manipulation Layer

Convert columns to different data types. Similar to pandas' astype() or R's as.type() functions.

Key features:

  • Multiple data type options
  • Batch conversion support
  • Safe type conversion handling

Common applications:

  • Memory optimization
  • Data validation
  • Format standardization
  • Database compatibility
  • Algorithm requirements

Note: Conversions that could lose data precision or cause overflow will raise errors unless explicitly handled.

Table
0
0
Table

Transforms

[, ...]

List of column type conversions to perform. Multiple transforms allow batch processing of type conversions across different columns.

Select

column

Source column for type conversion. The current type and data content should be compatible with the target data type to avoid conversion errors.

Int8

Available target data types for conversion. Choose based on requirements for precision, range, and memory usage.

Int8 ~

8-bit signed integer (-128 to 127). Use for:

  • Small categorical codes
  • Byte-sized data
  • Memory-efficient counters
Int16 ~

16-bit signed integer (-32,768 to 32,767). Ideal for:

  • Year values
  • Medium-range counts
  • Audio sample data
Int32 ~

32-bit signed integer (-2^31 to 2^31-1). Common for:

  • General purpose integers
  • Population counts
  • Time differences
Int64 ~

64-bit signed integer (-2^63 to 2^63-1). Suitable for:

  • Large counts
  • Timestamps
  • Big integer calculations
Uint8 ~

8-bit unsigned integer (0 to 255). Perfect for:

  • Color values
  • Small positive counts
  • Binary flags
Uint16 ~

16-bit unsigned integer (0 to 65,535). Use for:

  • Port numbers
  • Image pixel values
  • Medium positive counts
Uint32 ~

32-bit unsigned integer (0 to 2^32-1). Good for:

  • Large positive counts
  • File sizes
  • Network addresses
Uint64 ~

64-bit unsigned integer (0 to 2^64-1). Ideal for:

  • Very large counts
  • Unique identifiers
  • Microsecond timestamps
Float32 ~

32-bit floating point (single precision). Used for:

  • Basic scientific calculations
  • Memory-efficient decimals
  • Graphics coordinates
Float64 ~

64-bit floating point (double precision). Standard for:

  • Financial calculations
  • Scientific computing
  • High-precision analytics
String ~

Text data type. Essential for:

  • Human-readable data
  • Textual features
  • Identifier preservation
Categorical ~

Optimized storage for repeated strings. Perfect for:

  • Factor variables
  • Enumerated types
  • Grouped text data
Bool ~

Boolean true/false values. Used for:

  • Binary flags
  • Condition indicators
  • Yes/No data

Name for the new column. If not provided, the system generates a unique name. If AsColumn matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.