Moon River

A tranquil niche for contemplation

Split Logging File By dd

Task Scenario Description

The latest information needs to be extracted from a logging file while the application is running.

Limitations are listed as below:

  • The target platform is a subset of QNX
  • The size of logging file is over 100M.
  • The shell(ksh) commands are not complete, eg. split, tac

Solution

#!/bin/ksh

log_size=0
log_size=`du -k logfile_name | awk '{print $1}'`
echo "The size of logfile_name is $log_size k"

skip_size=$(($log_size - 64))
dd if=logfile_name bs=1k count=64 skip=$skip_size of=log_tmp 
grep 'keyword' log_tmp | awk 'NF{a=$0}END{print a}'
rm log_tmp

In order to reduce the response time, I do not want to search the whole file. So the most efficient way seems to look up from the end of file. However the environment does not have a reverse output command like tac. What appears in my mind is a temp file, the environment frustrate me again: no split.

Fortunately, I find dd, an uncommon and old command, compared to df and du. Here just summarize its usage.

mnemonics:
   ``dd`` :  disk divid
   ``du`` :  disk used
   ``df`` :  disk free

PS: The name dd may be an allusion to the DD statement found in IBM's Job Control Language (JCL), where the acronym stands for "Data Description".


Syntax
     dd [Options]

Key
   if=FILE
      Input file : Read from FILE instead of standard input.

   of=FILE
      Output file : Write to FILE instead of standard output.  Unless `conv=notrunc'
      is given, `dd' truncates FILE to zero bytes (or the size specified
      with `seek=').

   ibs=BYTES
      Read BYTES bytes at a time.

   obs=BYTES
      Write BYTES bytes at a time.

   bs=BYTES
      Block size, both read and write BYTES bytes at a time.  This overrides `ibs'
      and `obs'.

   skip=BLOCKS
      Skip BLOCKS `ibs'-byte blocks in the input file before copying.

   count=BLOCKS
      Copy BLOCKS `ibs'-byte blocks from the input file, instead of
      everything until the end of the file.

The numeric-valued options (BYTES and BLOCKS) can be followed by a multiplier: b=512, c=1, w=2, xM=M, or any of the standard block size suffixes like `k'=1024.

xM=M seems be confusing, M does not refers to megabyte. Actually, it is a multiplier. If you want to use 1 Megabyte, you could let bs=1024x1024.

For me, the option skip is the key, which allows me to only truncate the last part of the logging file. If using split, I have to wait until the file is divided to sevearl pieces.

Comments