close_range - Linux
Overview
close_range is a command-line tool that measures and compares the similarity between two files or pieces of text. It uses advanced algorithms to calculate the distance between the content, making it valuable for detecting plagiarism, comparing code versions, or verifying data integrity.
Syntax
close_range [options] file1 file2
Options/Flags
- -m, –method METHOD: Algorithm used for comparison. Options:
- jaccard: Calculates Jaccard similarity coefficient
- levenshtein: Measures edit distance between strings
- cosine: Computes cosine similarity
- -p, –precision PRECISION: Number of decimal places for similarity result. Default: 4
- -t, –threshold THRESHOLD: Specifies a similarity threshold to consider two files similar. Range: 0-1. Default: 0.85
- -h, –help: Displays help information
Examples
Simple Usage:
close_range file1.txt file2.txt
Specifying Comparison Method:
close_range -m jaccard file3.txt file4.txt
Setting Similarity Threshold:
close_range -t 0.9 file5.html file6.html
Customizing Precision:
close_range -p 6 file7.py file8.py
Common Issues
- No output: Ensure both input files exist and contain valid content.
- Unexpected similarity values: Adjust the comparison method or threshold to suit the data being compared.
- Slow performance: Large or complex files can take longer to process. Consider reducing file size or optimizing the comparison algorithm.
Integration
Compare File Diff:
diff file1.txt file2.txt | close_range -m levenshtein
Detect Plagiarism:
close_range -m cosine student1.txt original_text.txt
Related Commands
- diff: Compares files line by line.
- cmp: Compares binary files.
- comm: Compares sorted files.