comm - Linux
Overview
The comm
command in Linux is used to compare two sorted files line by line. Its primary purpose is to identify common lines between the two files as well as lines that are unique to each file. This command is particularly useful in scripts and data analysis where a quick comparison of sorted lists, like lists of user names or inventory items, is needed.
Syntax
The basic syntax of comm
is as follows:
comm [OPTIONS] FILE1 FILE2
Both FILE1
and FILE2
are required arguments and should be sorted before using the comm
command. If either file is not sorted, the results may be unpredictable.
Options/Flags
Here are the commonly used options in comm
:
-1
: Suppress the output of lines unique toFILE1
.-2
: Suppress the output of lines unique toFILE2
.-3
: Suppress the output of lines that appear in both files.--check-order
: Check that the input is correctly sorted, even if all input lines are pairwise different.--nocheck-order
: Do not check that the input is correctly sorted.--output-delimiter=STRING
: Use STRING as the output delimiter instead of TAB characters. This can help in viewing aligned output more clearly.
For example, running comm -12 file1 file2
would output only the common lines between FILE1
and FILE2
, suppressing columns 1 and 2.
Examples
-
Basic Comparison:
comm file1.txt file2.txt
This will output three columns: lines unique to
file1.txt
, lines unique tofile2.txt
, and common lines. -
Suppressing Columns:
comm -23 file1.txt file2.txt
This command outputs lines that are unique to
file1.txt
by suppressing the output from the 2nd and 3rd columns (lines unique tofile2.txt
and common lines). -
Using Custom Delimiters:
comm --output-delimiter=" : " file1.txt file2.txt
Outputs the comparison result with ” : ” as the column delimiter instead of the default TAB character.
Common Issues
-
Unsorted Input: The most common issue with
comm
is improperly sorted input. Make sure both input files are sorted. If unsure, you can sort them using thesort
command like so:sort file1 -o file1
. -
Locale and Sorting: Sorting can be influenced by the locale settings (
LC_COLLATE
). To get consistent results, consider setting the locale toC
when sorting and usingcomm
.
Integration
comm
can be integrated with other commands for more complex text processing. For example:
sort file1 > sorted1
sort file2 > sorted2
comm -12 sorted1 sorted2 | grep "specific_pattern"
This command sequence sorts two files, compares them to find common lines, and then uses grep
to filter lines that match a specific pattern.
Related Commands
diff
: Compare files line by line, but without the need for sorted input.sort
: Sort lines of text files.uniq
: Report or omit repeated lines.
Further reading and more detailed information on comm
can be found in the GNU coreutils: GNU coreutils manual.