|
This document describes a procedure to attempt to recover from an IPL hang
at LED 553. It applies to AIX Versions 5 and 6.
About LED 553 Recovery procedure Sample /etc/inittab file for AIX Versions 5 and 6 Sample /etc/environment file for AIX Versions 5 and 6
An LED value of 553 is a checkpoint code displayed to indicate the system
transition to phase 3 of IPL. A halt or hang at LED 553 is often the result
of a corrupted or missing /etc/inittab file. It can also be caused by
full / (root) or /tmp file systems, inconsistencies in either
startup configuration files, Object Data Manager (ODM) object class databases,
or system library files. Additionally, a number of other issues involving file
permissions, invalid hard links in the root file system, etc. have been
observed to cause a hang at LED 553.
Summary of the recovery procedure
To attempt to isolate the cause for an LED 553 hang, start by checking the
root file systems with the fsck command. Then check /dev/hd3 and
/dev/hd4 for space problems, and erase files if necessary. Check the
/etc/inittab file for corruption, and fix it if necessary. If the
inittab file was not corrupted, you will need to check the shell
profile and environment files, the /bin/bsh file, as well as other
system configuration
files. A check of the consistency of all installed files within the installed
fileset base and an update of the boot image should be done. To conclude, run
the configuration manager to find out if there is a hang during device
configuration.
- Boot your system into a limited function maintenance shell (Service or
Maintenance mode) from AIX bootable media.
Please refer to your system user's or installation and service guide for
specific IPL procedures related to your type and model of hardware. You can
also refer to
the document titled "Booting in Service Mode", available at
http://techsupport.services.ibm.com/server/aix.srchBroker for more
information.
- With bootable media of the same version and level as the system,
boot the system.
The bootable media can be any ONE of the following:
- Bootable CD-ROM
- mksysb
- Bootable Install Tape
Follow the screen prompts to the Welcome to Base OS menu.
- Choose Start Maintenance Mode for System Recovery (Option 3).
The next screen contains prompts for the Maintenance menu.
- Choose Access a Root Volume Group (Option 1).
The next screen displays a warning that indicates you will not be able to
return to Base OS menu without rebooting.
- Choose 0 continue.
The next screen displays information about all volume groups on the system.
- Select the root volume group by number. The logical volumes in
rootvg
will be displayed with two options below.
- Choose Access this volume group and start a shell before
mounting file systems (Option 2).
If you get errors from the preceding option, do not continue with the rest of
this procedure. Correct the problem causing the error. If you need assistance
correcting the problem causing the error, contact one of the following:
- local branch office
- your point of sale
- your AIX support center
If no errors occur, proceed with the following steps.
- Run the following series of commands to check and repair file systems.
fsck -p /dev/hd4
fsck -p /dev/hd2
fsck -p /dev/hd3
fsck -p /dev/hd9var
fsck -p /dev/hd1
NOTE: The -y option gives the fsck command permission
to repair file system corruption when necessary. This flag can be used to avoid
having to manually answer multiple confirmation prompts, however, use of this
flag can cause permanent data loss in some situations.
- To format the default jfslog for the rootvg Journaled
File Systems (JFS), run the following command:
/usr/sbin/logform /dev/hd8
Answer yes when asked if you want to destroy the log.
-
Type exit to exit from the shell. The file systems should
automatically mount after you type exit. If you receive error
messages at this point, reboot into a limited function maintenance shell
again to attempt to address the failure causes.
- Use the df command to check for free space in /dev/hd3 and
/dev/hd4.
df /dev/hd3
df /dev/hd4
- If the output from the df command shows that either file system is
out of space, erase some files from
that file system. Three files you may want to erase are /smit.log,
/smit.script and /.sh_history.
- Next, check the /etc/inittab file for corruption. It may be empty
or
missing, or it may have an incorrect entry. For comparison, see the section
"Sample /etc/inittab file" at the end of this document.
- If the inittab file is corrupt, set your terminal type in
preparation for
editing the file. (xxx stands for a terminal type, such as lft,
ibm3151, or vt100.)
TERM=xxx
export TERM
Now use an editor to create the /etc/inittab file. For an example,
see
the section "Sample /etc/inittab file" in this document.
If your /etc/inittab
file was corrupt and you recreated it, the
following steps may not be necessary.
There are only three entries which must be in the /etc/inittab file to
successfully boot the system. If your /etc/inittab file is missing or
corrupted AND you are unable to use an editor while in Service mode, do the
following to create a minimal inittab file to boot the machine into run
level 2 (Normal mode).
mv /etc/inittab /etc/inittab.MMYYDD
touch /etc/inittab
chmod 544 /etc/inittab
chown root:system /etc/inittab
echo 'init:2:initdefault:' >> /etc/inittab
echo 'brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1' >> /etc/inittab
echo 'cons:0123456789:respawn:/usr/sbin/getty /dev/console' >> /etc/inittab
MMDDYY represents the current two-digit representation of the Month,
Day and Year respectively.
- Use the following command to check for any modifications or problems with
permissions on shell startup files.
NOTE: The /.kshrc and /.profile files are not necessary
for
the system to boot into run level 2 (Normal mode) and, in fact, may not
exist on your system.
ls -al /.kshrc /.profile /etc/environment /etc/profile
Sample output:
-rw-r--r-- 1 root system 71 Dec 14 1993 /.kshrc
-rw-r--r-- 1 root system 158 Dec 14 1993 /.profile
-rw-rw-r-- 1 root system 1389 Oct 26 1993 /etc/environment
-rw-r-xr-x 1 bin bin 1214 Jan 22 1993 /etc/profile
etc/profile or .profile may contain a command that is valid only
in the
Korn shell. Change the command to something that is also valid in the Bourne
shell. For example, change the following:
export PATH=/bin:/usr/bin/:/etc:/usr/ucb:.
to the following:
PATH=/bin:/usr/bin/:/etc:/usr/ucb:.
export PATH
/etc/environment is a special case. The only commands it may contain
are
simple variable assignments, such as statements of the form
(varname)=(value). Check this file with an editor to verify the
format. See the section "Sample /etc/environment file" at the
end of
this document.
- Check for missing or moved files, or changed ownership/permissions with
the following command:
ls -al /bin /bin/bsh /bin/sh /lib /unix /u
Sample output:
lrwxrwxrwx 1 bin bin 8 Aug 5 1994 /bin -> /usr/bin
-r-xr-xr-x 3 bin bin 25622 4 Jun 4 1993 /bin/bsh
-r-xr-xr-x 3 bin bin 25622 4 Jun 4 1993 /bin/sh
lrwxrwxrwx 1 bin bin 8 Aug 5 1994 /lib -> /usr/lib
lrwxrwxrwx 1 bin bin 5 Aug 5 1994 /u -> /home
lrwxrwxrwx 1 root system 18 Aug 5 1994 /unix -> /usr/lib/boot/unix
If any of these files are missing, the problem may be a missing symbolic
link. Use the commands from the following list that correspond to the missing
links.
ln -s /usr/bin /bin
ln -s /usr/lib/boot/unix /unix
ln -s /usr/lib /lib
ln -s /home /u
- Use the following command to make sure that rc.boot is not
missing or corrupt.
ls -l /sbin/rc.boot
Sample output:
-rwxrwxr-- 1 root system 33760 Aug 30 1993 /sbin/rc.boot
- Make sure the /etc/inittab file is for AIX Version 5 or 6. For these versions, the line that begins with brc is
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1
See the section "Sample /etc/inittab file" in
this document for an example.
- If you have not found any obvious problems, try substituting ksh for
bsh
with the following series of commands. (The first command saves your
bsh before you copy
over it.)
cp /bin/bsh /bin/bsh.orig
cp /bin/ksh /bin/bsh
If you can then reboot successfully, this indicates that one of the profiles
was
causing problems for bsh. Check the profiles again by running the
following:
/bin/bsh.orig /.profile
/bin/bsh.orig /etc/profile
/bin/bsh.orig /etc/environment
If you receive errors with any of the preceding commands, this indicates
that
there is a command in that profile that bsh cannot handle.
- To run a checksum validation of all files in the installed fileset base
and a consistency check of the fileset installation, run the following commands:
lppchk -c
lppchk -v
lppchk -l
NOTE: These commands should not produce output. If they do, then the
messages should be examined to assess whether it is a potential cause of the
hang.
- Detemine the boot drive and update the boot image with the following
command:
lslv -m hd5
Sample output:
hd5:N/A
LP PP1 PV1 PP2 PV2 PP3 PV3
0001 0001 hdisk0
The disk number under the PV1 column is the disk name you should
use to run the following two commands:
bosboot -ad /dev/hdisk0
bootlist -m normal hdisk0
- To check the device configuration routines, the following command should
identify any problems associated with configuration routines:
cfgmgr -vp 2
If the cfgmgr command hangs, this is likely the cause of the system
hang.
You may be able to stop the command by pressing Ctrl-C, however, a reboot is
often required to get back into Service mode and continue troubleshooting the
problem.
- If your model has a mode select key, turn it to the Normal position.
- Attempt to reboot the system into Normal mode by running the following
command:
sync;sync;sync;reboot
If you followed all of the preceding steps and the system still stops at an LED
553 during a reboot in Normal mode, you may want to consider reinstalling your
system from a recent backup. Isolating the cause of the hang could be
excessively
time-consuming and may not be cost-effective in your operating environment. To
isolate the possible cause of the hang, would require a debug boot of the
system.
Instructions for doing this are included in the document, "Capturing Boot
Debug",
available at http://techsupport.services.ibm.com/server/aix.srchBroker.
It is still possible, in the end, that isolation of the problem may indicate a
restore
or reinstall of AIX is necessary to correct it.
If you wish, you may pursue further system recovery assistance from one of
the
following:
- local branch office
- your point of sale
- your AIX support center
: US Government Users Restricted Rights - Use, duplication or
: disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
:
: Note - initdefault and sysinit should be the first and second entry.
:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power
Failure Detection
load64bit:2:once:/etc/methods/cfg64 >/dev/console 2>&1 # Enable 64-bit execs
rc:2:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
fbcheck:2:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/fi
rstboot
srcmstr:2:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:2:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcnfs:2:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:2:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pio/etc/pioinit >/dev/null 2>&1 # pb cleanup
uprintfd:2:respawn:/usr/sbin/uprintfd
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
pmd:2:wait:/usr/bin/pmd > /dev/console 2>&1 # Start PM daemon
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
dt:2:wait:/etc/rc.dt
cons:0123456789:respawn:/usr/sbin/getty /dev/console
# @(#)18 1.21 src/bos/etc/environment/environment, cmdsh, bos430, ...
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/local/netscape:/usr/lo
cal/bin
TZ=CST6CDT
LANG=en_US
LOCPATH=/usr/lib/nls/loc
MOZILLA_HOME=/local/netscape
export MOZILLA_HOME
NLSPATH=/usr/lib/nls/msg/%L/%N:/usr/lib/nls/msg/%L/%N.cat
LC__FASTMSG=true
PS1='MYSYSTEM $PWD=>'
set -o vi
# ODM routines use ODMDIR to determine which objects to operate on
# the default is /etc/objrepos - this is where the device objects
# reside, which are required for hardware configuration
ODMDIR=/etc/objrepos
|