Sample / Manipulation Layer

Select a random subset of rows from the DataFrame. Similar to pandas' sample() or R's sample_n()/sample_frac().

Sampling methods:

  • Fixed size (n rows)
  • Proportional (fraction of rows)
  • With/without replacement
  • Ordered/shuffled results

Common applications:

  • Dataset reduction for testing
  • Training data selection
  • Validation set creation
  • Statistical sampling
  • Performance testing
  • Population studies
  • Exploratory analysis
  • Monte Carlo simulations

Example: From 1000 rows, sample 100 (n=100) or 10% (fraction=0.1) for analysis.

Table
0
0
Table
Fraction
0.1

Proportion of rows to sample (0.0 to 1.0). Common uses:

  • 0.1 (10%) for quick analysis
  • 0.5 (50%) for balanced splitting
  • 0.75 (75%) for training sets
  • 0.01 (1%) for large dataset exploration

N

u32
0

Fixed number of rows to sample. Typical scenarios:

  • Small samples (10-100) for testing
  • Medium samples (100-1000) for validation
  • Large samples (1000+) for modeling
  • Balanced class representation

Control sampling method:

  • false (default): Each row selected once at most
  • true: Same row may be selected multiple times Critical for: Bootstrap sampling, simulation studies, probability analysis
false

Control output order:

  • false (default): Maintain relative order of selected rows
  • true: Randomize order of sampled rows Important for: Random batching, unbiased selection, order-sensitive analysis

Seed

u32
0

Random seed for reproducible sampling. Essential for:

  • Reproducible research
  • Consistent test cases
  • Debuggable workflows
  • Comparative studies Same seed produces identical samples