DropDuplicateRows / Manipulation Layer

Remove duplicate rows based on specified columns. Similar to pandas' drop_duplicates() or R's distinct().

Common applications:

Example: From records [A,1], [B,2], [A,1], [C,3], keeping first occurrence results in [A,1], [B,2], [C,3]

Table

Select

[column, ...]

Columns to check for duplicates. Examples:

enum

First

Strategy for handling multiple occurrences of identical rows. Determines which duplicate record to retain.

First ~

Retain earliest occurrence of duplicate rows. Useful for:

Last ~

Retain most recent occurrence of duplicate rows. Suitable for:

None ~

Remove all duplicate rows including originals. Used for:

Any ~

Keep arbitrary occurrence of duplicate rows. Appropriate for:

bool

false

Preserve original row order after deduplication:

false (default): Faster processing, may reorder rows
true: Maintains original sequence, slower performance Note: Not supported for DataFrame containing List type columns