SplitWords / String Layer

Split strings into lists of words using whitespace as delimiter. Similar to Python's str.split(), R's strsplit(x, '\s+'), or Rust's split_whitespace(). Creates a list column containing the separated words.

Handles multiple types of whitespace:

  • Spaces
  • Tabs
  • Line breaks
  • Other Unicode whitespace characters

Common applications:

  • Text tokenization for NLP
  • Processing space-separated data
  • Extracting words from sentences
  • Parsing log file entries
  • Breaking down full names
  • Analyzing word frequencies
  • Processing structured text data

Note: Consecutive whitespace characters are treated as a single delimiter, and leading/trailing whitespace is ignored.

Table
0
0
Table

Select

column

The string column to split. Examples of input and results:

  • 'hello world' → ['hello', 'world']
  • 'John Doe Smith' → ['John', 'Doe', 'Smith']
  • ' spaces trim ' → ['spaces', 'trim']
  • 'line1\nline2\tword' → ['line1', 'line2', 'word']

Name for the new column. If not provided, the system generates a unique name. If AsColumn matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.