sort - Linux


Overview

The sort command in Linux is a utility for sorting lines of text in a file. It supports sorting alphabetically, numerically, and even does month-wise sorting. This command is useful for organizing data, generating readable outputs, debugging datasets to ensure they are in a specified order, or preparing data for further processing.

Syntax

The basic syntax of the sort command is as follows:

sort [OPTION]... [FILE]...

If no file is specified, or if the file is “-“, sort reads from the standard input.

Options/Flags

  • -b, --ignore-leading-blanks: Ignore leading blanks.
  • -d, --dictionary-order: Only consider blanks and alphanumeric characters.
  • -f, --ignore-case: Fold lower case to upper case characters for sorting.
  • -n, --numeric-sort: Compare according to string numerical value.
  • -r, --reverse: Reverse the result of comparisons.
  • -k, --key=KEYDEF: Sort via a key; KEYDEF gives location and type.
  • -m, --merge: Merge already sorted files; do not sort.
  • -o, --output=FILE: Write result to FILE instead of standard output.
  • -t, --field-separator=SEP: Use SEP instead of non-blank to blank transition.
  • -u, --unique: Suppress all but one of successive identical lines.
  • -c, --check, --check=diagnose-first: Check for sorted input; do not sort.
  • --help: Display a help message and exit.
  • --version: Output version information and exit.

Examples

  1. Simple Sort:
    sort file.txt
    
  2. Numeric Sort:
    sort -n file.txt
    
  3. Reverse Order Sort:
    sort -r file.txt
    
  4. Sort and Save Output:
    sort file.txt -o sorted_file.txt
    
  5. Sort on a Specific Key (field):
    sort -k2,2 file.txt
    
  6. Dictionary Order and Unique Lines:
    sort -d -u file.txt
    

Common Issues

  • Locale-specific sorting issues: Sorting might vary with locales. Use LC_ALL=C sort file.txt for consistent results.
  • Memory limits on large files: Consider using --batch-size or splitting the file to sort and then merging.
  • Performance issues with large datasets: Use -S or --buffer-size to optimize memory usage.

Integration

The sort command can be integrated into pipelines for complex data processing:

cat file.txt | sort | uniq -c

Here, sort is piped with uniq -c to count unique lines post sorting. It’s often used before awk or sed for further processing.

  • uniq: Often paired with sort for removing duplicates.
  • awk: For data extraction and reporting, after sorting.
  • sed: For stream editing after sorting data.

For more details, refer to the official documentation or type man sort in your terminal.