Sample / Manipulation Layer
Select a random subset of rows from the DataFrame. Similar to pandas' sample() or R's sample_n()/sample_frac().
Sampling methods:
- Fixed size (n rows)
- Proportional (fraction of rows)
- With/without replacement
- Ordered/shuffled results
Common applications:
- Dataset reduction for testing
- Training data selection
- Validation set creation
- Statistical sampling
- Performance testing
- Population studies
- Exploratory analysis
- Monte Carlo simulations
Example: From 1000 rows, sample 100 (n=100) or 10% (fraction=0.1) for analysis.
Table
0
0
Table
SampleType
oneofFraction
f64Proportion of rows to sample (0.0 to 1.0). Common uses:
- 0.1 (10%) for quick analysis
- 0.5 (50%) for balanced splitting
- 0.75 (75%) for training sets
- 0.01 (1%) for large dataset exploration
N
u32Fixed number of rows to sample. Typical scenarios:
- Small samples (10-100) for testing
- Medium samples (100-1000) for validation
- Large samples (1000+) for modeling
- Balanced class representation
WithReplacement
boolControl sampling method:
false
(default): Each row selected once at mosttrue
: Same row may be selected multiple times Critical for: Bootstrap sampling, simulation studies, probability analysis
Shuffle
boolControl output order:
false
(default): Maintain relative order of selected rowstrue
: Randomize order of sampled rows Important for: Random batching, unbiased selection, order-sensitive analysis
Seed
u32Random seed for reproducible sampling. Essential for:
- Reproducible research
- Consistent test cases
- Debuggable workflows
- Comparative studies Same seed produces identical samples