Friday, August 29, 2008

Enhanced bhist source code

One of popular batch scheduler in Supercomputing area is LSF. As many other forks out there, I used LSF for a while until we switched to other one. bhist is one of LSF command to displays historical information about jobs.

I was assigned to develop a shell script to display job list
consumed more than certain wall time clock of certain user. So, I've got 2 input parameters; wall clock time and user id. And output should be the list of jobs used more than given wall time clock and it should contain detail information about job. Finally, the output should be suitable for printing and reporting.

This is what I came up with after spending one or two days. To align column or row for printing layout, it includes lots of intentional tabs and white spaces.

Here is sample output;

Extended bhist

=====================================================================================
DATE JOBID USER JOB_NAME PEND PSUSP RUN SSUSP TOTAL
-------------------------------------------------------------------------------------
Aug 29 13:33:51 16464 xxx test1 10 0 82 0 92
Aug 29 13:46:51 16465 xxx test1 10 0 4402 0 4412
...
...
...
Aug 29 15:20:37 16471 xxx test1 8 0 13639 0 13647
=====================================================================================
Total CPU time: 569:53:52


Source code

#!/bin/sh
#
# NAME : ebhist.new
# Display output of bhis with time & date information of each job of LSF.
#
# AUTHOR : Brian Kim
# Supercomputer Center
#
# DATE : SEPTEMBER 12, 2005
#

function print_title {
echo $1 $2 $3
echo "Extended bhist : " $1 $2 $3
echo
}

function print_usage {
echo "Usage: ebhist WALLTIME ACCOUNT"
echo
echo " Display job information of [ACCOUNT]"
echo " which consumed more than [WALLTIME] second"
echo
echo "Example:"
echo " ebhist 1000 guest"
echo
}

function print_heading {
# echo "\tDATE\tJOBID\tUSER\tJOB_NAME\tPEND\tPSUSP\tRUN\tUSUSP\tSSUSP\tUNKWN\tTOTAL"
# echo "==============================================================================="
echo "====================================================================================="
echo "DATE\t\tJOBID\tUSER\tJOB_NAME\tPEND\tPSUSP\tRUN\tSSUSP\tTOTAL"
echo "-------------------------------------------------------------------------------------"
# echo "-------------------------------------------------------------------------------"
}

function print_footer {
a=$1
hh=`expr $a \/ 3600`
tmp=`expr $a \% 3600`
mm=`expr $tmp \/ 60`
ss=`expr $tmp \% 60`
if [ $ss -lt 10 ]
then
ss="0"$ss
fi
if [ $mm -lt 10 ]
then
mm="0"$mm
fi
if [ $hh -lt 10 ]
then
hh="0"$hh
fi
echo "====================================================================================="
# echo "==============================================================================="
echo "\t\t\t\t\t\tTotal CPU time: "$hh:$mm:$ss
# echo $hh:$mm:$ss

}
function set_parameter {
min_cpu_time=$1
id=$2
}

# Start of program


print_title $0 $1 $2

if [ $# -ne 2 ]
then
print_usage
exit
fi

set_parameter $1 $2
print_heading

MYSUM=0
my_date=""
bhist -a -u $id | \
while read jobid user job_name pend psusp run ususp ssusp unkwn total
do
case $jobid in
"") continue ;;
JOBID) continue ;;
Summary) continue ;;
esac

if [ $run -lt $min_cpu_time ]
then
continue
else
if [ ${#job_name} -gt 7 ]
then
NUM_TAB='\t'
else
NUM_TAB='\t\t'
fi
MYSUM=`expr $MYSUM + $run`
# echo $MYSUM $run
# echo `bhist -l $jobid | grep Submitted | cut -d: -f1-3` "\t$jobid\t$user\t$job_name$NUM_TAB$pend\t$psusp\t$run\t$ususp\t$ssusp\t$unkwn\t$total"
my_date=`bhist -l $jobid | grep Submitted | cut -c5-19`
my_date_length=`echo $my_date | wc -c`
if [ my_date_length -lt 16 ]
then
PADDING=" "
else
PADDING=""
fi

echo $my_date$PADDING" $jobid\t$user\t$job_name$NUM_TAB$pend\t$psusp\t$run\t$ssusp\t$total"
fi
done

print_footer $MYSUM
# End of program