RandomStringSelect / New Layer
Generate a new column by randomly sampling from a predefined set of strings. Similar to numpy.random.choice() or sample() in R, but for text values. Useful for creating categorical variables, simulating discrete text outcomes, or generating test data with specific text patterns.
Values
[string, ...]List of strings (Values
) to randomly sample from. Must contain at least one string. Common uses include:
- Category labels (e.g., [Low, Medium, High])
- Status values (e.g., [Pending, Active, Completed])
- Test data (e.g., [Alice, Bob, Charlie])
Each value has equal probability of being selected.
Seed
oneofUse system-provided randomization. Each execution produces different selections. Suitable for security-sensitive applications, simulation scenarios, or cases where true randomness is required.
Use seeded random generation for reproducible results. Like random.seed() in Python or set.seed() in R. Essential for testing and reproducible experiments. Note: Not suitable for security-sensitive applications where predictability could be a vulnerability.
Value
u64Seed value for the random number generator. Same Value
guarantees identical sequence of selections from Values
. Should not be used for security-critical operations.
AsColumn
nameName for the new column. If not provided, the system generates a unique name. If AsColumn
matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.