uniq - Linux


Overview

The uniq command in Linux is a utility for filtering adjacent matching lines from input, typically used with sorted data. It helps in identifying and/or removing duplicates, counting occurrences, and other similar tasks. This command is particularly effective in processing log files, data cleanup, and analysis tasks where unique entries need to be identified.

Syntax

The basic syntax of the uniq command is:

uniq [OPTIONS] [INPUT [OUTPUT]]
  • INPUT is the name of the input file. If no input file is specified, uniq reads from the standard input.
  • OUTPUT is the name of the output file. If no output file is specified, uniq writes to the standard output.

Options/Flags

  • -c, --count: Prefix lines by the number of occurrences.
  • -d, --repeated: Only print duplicate lines, one for each group of identical lines.
  • -i, --ignore-case: Ignore differences in case when comparing lines.
  • -u, --unique: Only print unique lines.
  • -z, --zero-terminated: End lines with a zero byte (ASCII NUL), instead of the usual newline.
  • -f N, --skip-fields=N: Skip the first N fields in each line before checking for uniqueness (a field is a string of non-blank characters separated by blanks).
  • -s N, --skip-chars=N: Skip the first N characters in each line before checking for uniqueness.
  • -w N, --check-chars=N: Compare only the first N characters in lines.

Examples

  1. Basic Usage: Remove duplicate lines from a file.

    uniq myfile.txt
    
  2. Count Occurrences: Count how many times each line appears in a file.

    uniq -c sorted_file.txt
    
  3. Find Duplicates: Print only the lines that repeat in a file.

    uniq -d sorted_file.txt
    
  4. Case Insensitive Comparison:

    uniq -i unsorted_file.txt
    
  5. Print Only Unique Lines:

    uniq -u sorted_file.txt
    

Common Issues

  • Non-adjacent Duplicates: uniq only removes duplicates that are adjacent. To handle non-adjacent duplicates, the input should be sorted with sort before using uniq.
    Example:

    sort myfile.txt | uniq
    
  • Case Sensitivity: By default, uniq is case-sensitive. Use the -i option to ignore case.

Integration

  • Sort, Unique, and Count: Chain sort and uniq to count unique lines in a file:

    sort file.txt | uniq -c
    
  • Piping with grep:
    Combine uniq with grep to find unique error logs:

    grep "Error" log.txt | sort | uniq
    
  • sort: Often used in conjunction with uniq to sort data before uniqueness operations.
  • awk and sed: Useful for more advanced text manipulation tasks.

Visit the official GNU documentation for uniq here for more detailed information and advanced usage scenarios.