awk - macOS
Overview
awk
is a versatile programming language and command-line tool primarily used for pattern scanning and processing. awk
is highly effective for extracting and transforming text data embedded in files or streams. Generally, it is used to manipulate structured data and generate formatted reports.
Syntax
The basic syntax of the awk
command is:
awk [options] 'program' input-file1 input-file2 ...
- ‘program’: A set of commands for
awk
to execute. - input-file1, input-file2, …: The files that contain the data to process. If no input files are specified,
awk
reads from the standard input.
Options/Flags
- -F fs: Specifies an input field separator
fs
. The default is white space. - -v var=value: Assigns a user-defined variable before script execution.
- -f file: Reads the
awk
program source from the specified file. - -m[n]: This option defines a maximum number of multiple matches.;
n
limits the number of CPU cores used. - —: Denotes the end of options, after which only filenames are accepted.
Examples
-
Print Columns: To print the first and fourth column of a file:
awk '{print $1, $4}' filename.txt
-
Sum a Column: Sum all the values in the second column:
awk '{sum += $2} END {print sum}' filename.txt
-
Filtering with Conditions: Print lines where the third column is greater than 10:
awk '$3 > 10' filename.txt
-
Using Multiple Commands: Print the sum and average of the values in column 2:
awk '{sum += $2; count++} END {print "Sum:", sum, "Average:", sum/count}' filename.txt
-
Format Output: Format the output to display the row number and data:
awk '{printf "Row %d: %s\n", NR, $0}' filename.txt
Common Issues
- Field Separator Confusion: Incorrect use of the
-F
option might lead to unexpected results. Always double-check the field delimiter in your data. - Syntax Errors in Script: Errors in
awk
syntax, often due to missing braces or quotes, can lead to failures. Ensure each block of code is correctly enclosed in{}
and strings in single quotes.
Integration
awk
can be combined with other Unix commands like sort
, cut
, and grep
for more complex data processing:
cat data.txt | grep "pattern" | awk '{print $2, $3}' | sort
This pipeline filters lines containing “pattern”, extracts the 2nd and 3rd fields, and sorts the output.
Related Commands
- sed: Stream editor for filtering and transforming text.
- grep: Command-line utility for searching plain-text data for lines matching a regular expression.
- cut: Removes sections from each line of files.
Additional resources and in-depth learning can be found in the official GNU Awk documentation.