uniq - macOS


Overview

uniq is a versatile command for removing or counting duplicate lines in a file, enhancing data quality and simplifying text processing tasks. It’s commonly used in data analysis, text editing, and system administration.

Syntax

uniq [options] [file]

Options/Flags

  • -c, –count: Count occurrences of each unique line.
  • -d, –repeated: Only print duplicate lines.
  • -i, –ignore-case: Ignore case when comparing lines.
  • -u, –unique: Only print unique lines.
  • -z, –zero-terminated: Treat input lines as zero-terminated instead of newline-terminated.

Examples

Count duplicate lines:

uniq -c file.txt

Print only duplicate lines:

uniq -d file.txt

Print unique lines (with counts):

uniq -c file.txt | grep 1

Ignore case when comparing lines:

uniq -i file.txt

Common Issues

  • Improper file handling: Ensure the specified file exists and has correct permissions.
  • Incorrect options: Refer to the available options above and use them appropriately.
  • Large files: Processing very large files may take a long time. Consider using a streaming approach to avoid memory issues.

Integration

Combine with grep: Filter input before removing duplicates:

grep pattern file.txt | uniq

Pipe to xargs: Execute commands on each unique line:

uniq file.txt | xargs command
  • sort: Sort text input before removing duplicates.
  • sed: Perform more complex text manipulations.
  • comm: Compare two sorted files and find unique lines.
  • GNU uniq manual for additional documentation.