close_range - Linux


Overview

close_range is a command-line tool that measures and compares the similarity between two files or pieces of text. It uses advanced algorithms to calculate the distance between the content, making it valuable for detecting plagiarism, comparing code versions, or verifying data integrity.

Syntax

close_range [options] file1 file2

Options/Flags

  • -m, –method METHOD: Algorithm used for comparison. Options:
    • jaccard: Calculates Jaccard similarity coefficient
    • levenshtein: Measures edit distance between strings
    • cosine: Computes cosine similarity
  • -p, –precision PRECISION: Number of decimal places for similarity result. Default: 4
  • -t, –threshold THRESHOLD: Specifies a similarity threshold to consider two files similar. Range: 0-1. Default: 0.85
  • -h, –help: Displays help information

Examples

Simple Usage:

close_range file1.txt file2.txt

Specifying Comparison Method:

close_range -m jaccard file3.txt file4.txt

Setting Similarity Threshold:

close_range -t 0.9 file5.html file6.html

Customizing Precision:

close_range -p 6 file7.py file8.py

Common Issues

  • No output: Ensure both input files exist and contain valid content.
  • Unexpected similarity values: Adjust the comparison method or threshold to suit the data being compared.
  • Slow performance: Large or complex files can take longer to process. Consider reducing file size or optimizing the comparison algorithm.

Integration

Compare File Diff:

diff file1.txt file2.txt | close_range -m levenshtein

Detect Plagiarism:

close_range -m cosine student1.txt original_text.txt

Related Commands

  • diff: Compares files line by line.
  • cmp: Compares binary files.
  • comm: Compares sorted files.

close_range GitHub Page