rlite - Text sorting, analysis, cross de-duplication
-
Title
: rlite
Author
: CynosurePrime
URL
: https://github.com/Cynosureprime/rlite
Description
: rlite focuses on providing a well rounded array of tools to sort, de-duplicate, cross de-duplicate, linecount, frequency count and index lists while requiring minimal dependencies. rlite was developed to push the envelope of multi-threaded computing and demonstrates this with it's ability push system resources including disk read/write operations, processing cores and also memory usage.Sort and de-duplicate the input.txt file, write the output to output.txt. Where possible use -o to allow internal buffer handling to write faster, especially to flash memory.
rlite input.txt -o output.txt
Sort and de-duplicate the stdin, write the output to stdout
rlite stdin
Remove all the common lines in bigfile1.txt & somefile.txt from input.txt and write it to stdout
rlite input.txt bigfile.txt somefile.txt
Remove all the common lines in verbigfile1.txt & somefile.txt from input.txt, assume the input.txt is sorted with the -p switch and write it to stdout
rlite input.txt verbigfile.txt somefile.txt -p
Keep the common lines from addresses.txt & streets.txt, redirect the stdout to a file
rlite input.txt streets.txt addresses.txt -c >> output.txt
Analysis tools
rlite input.txt -L
Reports a linecount and also the longest line
rlite input.txt -q
Counts the frequency of the items inside input.txt and outputs the occurance for each line in TSV format. Items will be ordered in descending popularity order