comm - Linux


Overview

The comm command in Linux is used to compare two sorted files line by line. Its primary purpose is to identify common lines between the two files as well as lines that are unique to each file. This command is particularly useful in scripts and data analysis where a quick comparison of sorted lists, like lists of user names or inventory items, is needed.

Syntax

The basic syntax of comm is as follows:

comm [OPTIONS] FILE1 FILE2

Both FILE1 and FILE2 are required arguments and should be sorted before using the comm command. If either file is not sorted, the results may be unpredictable.

Options/Flags

Here are the commonly used options in comm:

  • -1: Suppress the output of lines unique to FILE1.
  • -2: Suppress the output of lines unique to FILE2.
  • -3: Suppress the output of lines that appear in both files.
  • --check-order: Check that the input is correctly sorted, even if all input lines are pairwise different.
  • --nocheck-order: Do not check that the input is correctly sorted.
  • --output-delimiter=STRING: Use STRING as the output delimiter instead of TAB characters. This can help in viewing aligned output more clearly.

For example, running comm -12 file1 file2 would output only the common lines between FILE1 and FILE2, suppressing columns 1 and 2.

Examples

  1. Basic Comparison:

    comm file1.txt file2.txt
    

    This will output three columns: lines unique to file1.txt, lines unique to file2.txt, and common lines.

  2. Suppressing Columns:

    comm -23 file1.txt file2.txt
    

    This command outputs lines that are unique to file1.txt by suppressing the output from the 2nd and 3rd columns (lines unique to file2.txt and common lines).

  3. Using Custom Delimiters:

    comm --output-delimiter=" : " file1.txt file2.txt
    

    Outputs the comparison result with ” : ” as the column delimiter instead of the default TAB character.

Common Issues

  • Unsorted Input: The most common issue with comm is improperly sorted input. Make sure both input files are sorted. If unsure, you can sort them using the sort command like so: sort file1 -o file1.

  • Locale and Sorting: Sorting can be influenced by the locale settings (LC_COLLATE). To get consistent results, consider setting the locale to C when sorting and using comm.

Integration

comm can be integrated with other commands for more complex text processing. For example:

sort file1 > sorted1
sort file2 > sorted2
comm -12 sorted1 sorted2 | grep "specific_pattern"

This command sequence sorts two files, compares them to find common lines, and then uses grep to filter lines that match a specific pattern.

  • diff: Compare files line by line, but without the need for sorted input.
  • sort: Sort lines of text files.
  • uniq: Report or omit repeated lines.

Further reading and more detailed information on comm can be found in the GNU coreutils: GNU coreutils manual.