Skip to content
/ samp Public

A simple CLI that randomly samples lines from standard input

License

Notifications You must be signed in to change notification settings

jerluc/samp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samp (sampler)

samp is a simple command-line program that randomly samples lines from standard input. This can be used to trim down on large newline-delimited files for testing or other purposes.

Installation

To get started, install samp from source:

# Directly from Github
cargo install --git https:/jerluc/samp.git

# Or from local source
git clone https:/jerluc/samp.git && cd samp/ && cargo install --path .

Usage

To use samp:

Usage: samp [-r <ratio>] [-s <seed>]

Sample stdin

Options:
  -r, --ratio       sample ratio
  -s, --seed        seed string
  --help            display usage information

For example, here's how you can randomly sample ~10% of your computer's dictionary file:

cat /usr/share/dict/words | samp -r 0.1

And here's how you can randomly sample ~5% of "War and Peace" using a reproducible text seed:

# Save sample to file
curl -s https://www.gutenberg.org/cache/epub/2600/pg2600.txt | samp -r 0.05 -s tolstoy > wp.txt

# Save second sample to another file
curl -s https://www.gutenberg.org/cache/epub/2600/pg2600.txt | samp -r 0.05 -s tolstoy > wp2.txt

diff wp.txt wp2.txt
# No differences!

Motivations

I basically had two motivations in creating this software:

  1. I often find myself working with very large, newline-delimited data; I use samp to randomly down-sample this data for running various tests
  2. I wanted an excuse to practice some more Rust :)

Contributing

When contributing to this repository, please follow the steps below:

  1. Fork the repository
  2. Submit your patch in one commit, or a series of well-defined commits
  3. Submit your pull request and make sure you reference the issue you are addressing

License

See LICENSE

About

A simple CLI that randomly samples lines from standard input

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages