IsDuplicatedMask / Boolean Layer
Create boolean mask columns identifying duplicate values in specified columns. Similar to pandas duplicated(keep=False) or dplyr's duplicated(). Returns True for ALL occurrences of values that appear more than once.
Common applications:
- Detecting duplicate records
- Finding repeated transactions
- Identifying redundant entries
- Data cleaning
- Quality control checks
Example:
Index | Value | Is Duplicated |
---|---|---|
0 | apple | true |
1 | banana | true |
2 | apple | true |
3 | orange | false |
4 | banana | true |
Mask
[, ...]List of duplicate checking operations to perform. Each mask creates a new boolean column. Common scenarios:
- Finding all duplicate customer records
- Identifying all repeated transactions
- Detecting all redundant measurements
- Marking all duplicate entries
At least one mask must be specified.
Select
columnThe column to check for duplicates. Works with any data type. The mask will be a boolean column where:
- True indicates a value that appears multiple times (ALL occurrences)
- False indicates values that appear exactly once
Useful for finding ALL rows involved in data redundancy.
AsColumn
nameName for the new column. If not provided, the system generates a unique name. If AsColumn
matches an existing column, the existing column is replaced. The name should follow valid column naming conventions.