How to use Linux awk programming and regular expression to read a big log file?

Use the Linux tail command to analysis the log file content, in order to understand log entries pattern.

Using the db2diag.log as an example, each event / incident is initiated with a line that contains date and time:

2008-01-02-10.52.47.720435+480 I1840G300          LEVEL: Event
Then, I use the awk and its regular expression to filter out all log entries that match the particular day and hour of interest:

First, find out the record number of first log entry that match the date and time pattern using its regular expression (RegEx) function:

awk '{if ($1 ~ /2008-01-16-17/){print NR}}' < db2diag.log | head -1
Next, find out the record number of last log entry that match the date and time pattern:

awk '{if ($1 ~ /2008-01-16-17/){print NR}}' < db2diag.log | tail -1
Finally, use awk again to extract or filter all log entries within the range of first and last record numbers that we’ve known from last two steps:

awk '{if (NR >= 7529 && NR <= 8382){print $0}}' < db2diag.log
Because the nature of db2diag.log, the last record number I get from awk doesn’t include the detail of DB2 event / incident happened on that particular time. Thus, I purposely top up the “last record number” (suppose the last record number reported by awk command is 8382, I rest it to be 8390):

awk '{if (NR >= 7529 && NR <= 8390){print $0}}' < db2diag.log >tempfile
If you would like to output the extracted log entries to another temporarily file, just redirect the standard output of awk command to a temp file as you wish (e.g. append >tempfile to the end of last awk command sample).

Brief note about the awk programming syntax used in the sample codes at above:

$1 ~ /2008-01-16-17/ means to check if 1st field/column text pattern matches with the regular expression (i.e. 2008-01-16-17).


Unless the field separator (FS) is specified, awk regards space as field separator by default.

The first field (a.k.a column) of a line (awk treats each line as a record) is denoted as $1, 2nd field as $2, and so forth. The $0 is simply means all the fields/columns of the line/record.

Thus, the combination of awk programming and organized text files can form a simple database system!


The awk regular expression pattern is enclosed by a pair of slash character (/).

The awk RegEx operator for match comparison is a tilde/swung dash character (~). (refer to GNU awk notes on Regular Expression).

print NR is meant to print the record number (NR), i.e. the line number in the log file. To print the number of field/column in a line/record, use NF

'Hacking' 카테고리의 다른 글

fantasy baseball  (0) 2009.04.24
How To Bypass Linux Magazine Membership Check  (0) 2009.04.16
download musics mp3 at shared libary iTunes  (0) 2009.04.08
US iPod repairman guilty of fraud  (0) 2009.04.07
Visa, MasterCard In Security Hot Seat  (0) 2009.04.01
Posted by CEOinIRVINE
l