Join / Manipulation Layer

Combine two DataFrames based on matching values in specified columns. Similar to SQL JOIN operations, pandas merge(), or R's join functions.

Example joining customer and order data:

Left Table (Customers):

IDName
1Alice
2Bob
3Carol

Right Table (Orders):

IDAmount
1100
2200
4300

Inner Join Result:

IDNameAmount
1Alice100
2Bob200

Common applications:

  • Combining related datasets
  • Enriching data with attributes
  • Relationship analysis
  • Data consolidation
  • Cross-reference validation
  • Master data integration
  • Historical data linkage
  • Reference data lookup
LeftTable
0
RightTable
1
0
Table

SelectLeft

[column, ...]

Join key columns from left table. Examples:

  • Primary keys
  • Foreign keys
  • Natural keys
  • Composite keys (multiple columns)
Inner

Join strategy. Default is Left.

Left ~

Keep all left table rows with matching data from right table where available.

Set notation:

Example:

Left Table:

idname
1Alice
2Bob
3Carol

Right Table:

idamount
1100
3300

Result (LEFT JOIN):

idnameamount
1Alice100
2Bobnull
3Carol300

Key characteristics:

  • Preserves all left table rows
  • Includes matching right table data
  • Fills non-matches with null
  • Maintains left table order

Common applications:

  • Enriching base dataset
  • Optional relationship handling
  • Master data lookup
  • Preserving core records
Inner ~

Keep only rows that have matching keys in both tables.

Set notation:

Example:

Left Table:

idname
1Alice
2Bob
3Carol

Right Table:

idamount
1100
3300
4400

Result (INNER JOIN):

idnameamount
1Alice100
3Carol300

Key characteristics:

  • Only includes matching rows
  • No null values in join columns
  • Maintains left table order
  • Potentially fewer rows than either input

Common applications:

  • Strict relationship enforcement
  • Guaranteed complete records
  • Data validation
  • Matched pair analysis
Outer ~

Keep all rows from both tables, matching where possible.

Set notation:

Example:

Left Table:

idname
1Alice
2Bob
3Carol

Right Table:

idamount
1100
3300
4400

Result (FULL OUTER JOIN):

idnameamount
1Alice100
2Bobnull
3Carol300
4null400

Key characteristics:

  • Preserves all rows from both tables
  • Matches where possible
  • Fills non-matches with null
  • Shows complete relationship view

Common applications:

  • Data completeness analysis
  • Relationship gap detection
  • Full data integration
  • Migration validation
Cross ~

Cartesian product - combine every row with every row.

Set notation:

Example:

Left Table:

idname
1Alice
2Bob

Right Table:

typevalue
A100
B200

Result (CROSS JOIN):

idnametypevalue
1AliceA100
1AliceB200
2BobA100
2BobB200

Key characteristics:

  • No key matching required
  • Result size = left rows × right rows
  • No null values
  • All possible combinations

Common applications:

  • Generating all possibilities
  • Feature combination creation
  • Product variations
  • Multi-factor analysis
Semi ~

Keep left table rows that have matches in right table.

Set notation:

Example:

Left Table:

idname
1Alice
2Bob
3Carol

Right Table:

idamount
1100
3300
4400

Result (SEMI JOIN):

idname
1Alice
3Carol

Key characteristics:

  • Only keeps matching left rows
  • Maintains left table structure
  • No duplicates
  • No right table columns

Common applications:

  • Filtering by existence
  • Finding matched records
  • Data subset selection
  • Relationship verification
Anti ~

Keep left table rows that have no matches in right table.

Set notation:

Example:

Left Table:

idname
1Alice
2Bob
3Carol

Right Table:

idamount
1100
3300
4400

Result (ANTI JOIN):

idname
2Bob

Key characteristics:

  • Only keeps non-matching left rows
  • Maintains left table structure
  • No right table columns
  • Inverse of semi join

Common applications:

  • Finding missing relationships
  • Data gap analysis
  • Exception identification
  • Quality control checks
ManyToMany

Cardinality validation ensuring relationship integrity between tables

ManyToMany ~

No uniqueness requirements:

  • Multiple matches allowed
  • No validation performed
  • Most flexible option
OneToOne ~

Unique keys in both tables:

  • One-to-one relationships
  • No duplicates allowed
  • Strict 1:1 mapping
ManyToOne ~

Unique keys in right table only:

  • Many-to-one relationships
  • Right side as lookup table
  • Common for reference data
OneToMany ~

Unique keys in left table only:

  • One-to-many relationships
  • Left side as master record
  • Common for parent-child data

SelectRight

[column, ...]

Join key columns from right table. Must correspond to left keys:

  • Matching data types
  • Same number of columns
  • Corresponding order
  • Related content