Join / Manipulation Layer
Combine two DataFrames based on matching values in specified columns. Similar to SQL JOIN operations, pandas merge(), or R's join functions.
Example joining customer and order data:
Left Table (Customers):
ID | Name |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Right Table (Orders):
ID | Amount |
---|---|
1 | 100 |
2 | 200 |
4 | 300 |
Inner Join Result:
ID | Name | Amount |
---|---|---|
1 | Alice | 100 |
2 | Bob | 200 |
Common applications:
- Combining related datasets
- Enriching data with attributes
- Relationship analysis
- Data consolidation
- Cross-reference validation
- Master data integration
- Historical data linkage
- Reference data lookup
SelectLeft
[column, ...]Join key columns from left table. Examples:
- Primary keys
- Foreign keys
- Natural keys
- Composite keys (multiple columns)
Strategy
enumJoin strategy. Default is Left
.
Keep all left table rows with matching data from right table where available.
Set notation:
Example:
Left Table:
id | name |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Right Table:
id | amount |
---|---|
1 | 100 |
3 | 300 |
Result (LEFT JOIN):
id | name | amount |
---|---|---|
1 | Alice | 100 |
2 | Bob | null |
3 | Carol | 300 |
Key characteristics:
- Preserves all left table rows
- Includes matching right table data
- Fills non-matches with null
- Maintains left table order
Common applications:
- Enriching base dataset
- Optional relationship handling
- Master data lookup
- Preserving core records
Keep only rows that have matching keys in both tables.
Set notation:
Example:
Left Table:
id | name |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Right Table:
id | amount |
---|---|
1 | 100 |
3 | 300 |
4 | 400 |
Result (INNER JOIN):
id | name | amount |
---|---|---|
1 | Alice | 100 |
3 | Carol | 300 |
Key characteristics:
- Only includes matching rows
- No null values in join columns
- Maintains left table order
- Potentially fewer rows than either input
Common applications:
- Strict relationship enforcement
- Guaranteed complete records
- Data validation
- Matched pair analysis
Keep all rows from both tables, matching where possible.
Set notation:
Example:
Left Table:
id | name |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Right Table:
id | amount |
---|---|
1 | 100 |
3 | 300 |
4 | 400 |
Result (FULL OUTER JOIN):
id | name | amount |
---|---|---|
1 | Alice | 100 |
2 | Bob | null |
3 | Carol | 300 |
4 | null | 400 |
Key characteristics:
- Preserves all rows from both tables
- Matches where possible
- Fills non-matches with null
- Shows complete relationship view
Common applications:
- Data completeness analysis
- Relationship gap detection
- Full data integration
- Migration validation
Cartesian product - combine every row with every row.
Set notation:
Example:
Left Table:
id | name |
---|---|
1 | Alice |
2 | Bob |
Right Table:
type | value |
---|---|
A | 100 |
B | 200 |
Result (CROSS JOIN):
id | name | type | value |
---|---|---|---|
1 | Alice | A | 100 |
1 | Alice | B | 200 |
2 | Bob | A | 100 |
2 | Bob | B | 200 |
Key characteristics:
- No key matching required
- Result size = left rows × right rows
- No null values
- All possible combinations
Common applications:
- Generating all possibilities
- Feature combination creation
- Product variations
- Multi-factor analysis
Keep left table rows that have matches in right table.
Set notation:
Example:
Left Table:
id | name |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Right Table:
id | amount |
---|---|
1 | 100 |
3 | 300 |
4 | 400 |
Result (SEMI JOIN):
id | name |
---|---|
1 | Alice |
3 | Carol |
Key characteristics:
- Only keeps matching left rows
- Maintains left table structure
- No duplicates
- No right table columns
Common applications:
- Filtering by existence
- Finding matched records
- Data subset selection
- Relationship verification
Keep left table rows that have no matches in right table.
Set notation:
Example:
Left Table:
id | name |
---|---|
1 | Alice |
2 | Bob |
3 | Carol |
Right Table:
id | amount |
---|---|
1 | 100 |
3 | 300 |
4 | 400 |
Result (ANTI JOIN):
id | name |
---|---|
2 | Bob |
Key characteristics:
- Only keeps non-matching left rows
- Maintains left table structure
- No right table columns
- Inverse of semi join
Common applications:
- Finding missing relationships
- Data gap analysis
- Exception identification
- Quality control checks
Validation
enumCardinality validation ensuring relationship integrity between tables
No uniqueness requirements:
- Multiple matches allowed
- No validation performed
- Most flexible option
Unique keys in both tables:
- One-to-one relationships
- No duplicates allowed
- Strict 1:1 mapping
Unique keys in right table only:
- Many-to-one relationships
- Right side as lookup table
- Common for reference data
Unique keys in left table only:
- One-to-many relationships
- Left side as master record
- Common for parent-child data
SelectRight
[column, ...]Join key columns from right table. Must correspond to left keys:
- Matching data types
- Same number of columns
- Corresponding order
- Related content