| | | Question | | Why did AIX shut down and reboot, or shut down and halt? | | | | | | Answer | Why did AIX shut down? Introduction How
does a system go down? How does a system boot
up after going
down? What system logs and
commands
can I use to
investigate an unexpected shut down? System shut down events Conclusion Introduction
Sometimes an AIX Operating System might shut down for no apparent
reason, and with no
evidence of any user initiating the shut down by running a command.
When
this happens, the system should be investigated for clues as to what
caused the shut down. While it is possible for AIX to perform a delayed
shut down in response to a shutdown command with the appropriate timing
options, AIX will never shut itself down, unless the system
crashes, or there is a hardware failure. Even though AIX does not shut
itself down automatically, cluster management software such as IBM
HACMP, Oracle RAC, or Veritas Cluster Server might force a system down
under certain conditions. If a
system goes down unexpectedly, various AIX commands and system logs can
be
used to find information about the cause.
If a node in a cluster goes down unexpectedly, the cluster manager
logs should also be reviewed to see if the shut down might have been
initiated by the cluster management software. If there is a system dump
that was generated during the shut down, it should be analyzed by AIX
Software Support to
see what the system was doing at the time the dump was captured. How does a system go down?
An AIX system can be in one of three states; running, halted, or hung.
A running system that is fully operational should respond to commands.
A
halted system is not running AIX, and power may or may not be turned
off on the machine. A hung system is running AIX but for some reason is
no longer responding to commands. The most common way for a system to
go down is for a user, script, or program with root authority to run
one of the AIX shut down commands. Or if the system is an LPAR, a user
with hsroot authority can run a restart or shutdown command on the
HMC. A hung system might be powered off or reset by an operator.
Below
is a list of events that can cause a running or hung
system to shut down and reboot, or shut down and halt: - A user with root authority runs one of the shut down commands on
the command line to bring the system down immediately, or after a
specified period of time.
- A script or program running on the system with root authority
executes one of the shut down commands.
This could be a script or program started from a cron job or in
inittab, or started from
the command line. Cluster management
software is one example of this type of software.
- A user with hsroot authority selects a menu option in WebSM or
on the newer HMC web application to restart or shut down an LPAR, or
uses the command line on the HMC to run a shutdown command. If
appropriate options are selected, a system dump will be generated if
the dump facility is properly configured.
- A user manually resets the system by pressing the reset button on
the front console, or by selecting specific functions on a system with
a multi-button front panel display. A system dump should be
written if the dump facility is properly configured.
- The system crashes due to an operating system defect.
- The system crashes due to a hardware malfunction.
- The system is powered off.
- The system loses power and a UPS does not provide power backup.
How does a
system boot up after going down?
There are four basic boot types. - Warm boot:
Booting a running system by performing a shut down and reboot in a
single operation. This type of boot is also known as a soft IPL
(Initial
Program Load). - Cold boot:
Booting a halted system. - Timed boot:
Booting a halted system automatically after a specified period of time. - Crash boot:
Automatic reboot of a system that has crashed.
When a system goes down for any reason other than catastrophic
hardware failure, it will either automatically reboot, or it will
remain in the halted state. If a system is brought down by one of the
shut down commands, it will
only reboot if an appropriate reboot option is included with the
command,
or if the command itself specifies that the machine will be rebooted.
For example, the -r flag
when used with the shutdown
command will cause
the system to automatically reboot after the shut down. Or the reboot
command will shut down a system and then automatically reboot. If a
system goes down and then remains in the halted state, the power button
must be pressed on a stand-alone system to boot, or if the system is an
LPAR, the LPAR must be activated using the HMC.
If a system crashes or if a user resets the system or runs the sysdumpstart command to force a
system dump, it will
automatically reboot if the autorestart flag is enabled. Use the
following command to view the current value of the autorestart flag: # lsattr -D -l sys0 | grep auto autorestart
true Automatically REBOOT system after a crash
The default value for this flag is true. To change this value, use the
following command: # chdev -l sys0 -a
autorestart=value where value = true or false What system
logs and commands can I use to investigate an
unexpected shut down?
A number of AIX system logs and commands can be used to investigate the
cause of an unexpected shut down. Also on HMC managed LPARs, logs are
maintained on the HMC that can provide information about shut down and
dump related commands that have been executed on the HMC. These logs
and commands should be used to help determine the cause of an
unexpected system shut down. Because the logs cannot be read until a
system has been booted, logs will usually show boot entries just
after the shut down entries from the most recent shut down.
Below is a list of the most useful system logs and commands that can be
used to investigate an unexpected shut down: - AIX error log (read with the errpt command)
- /var/adm/wtmp account file (read with the last command)
- /var/adm/pacct account files (read with the lastcomm command)
- AIX console log (read with the alog -t console -o command)
- su log file (read with cat
/var/adm/sulog)
- Shell history file (read with the fc command)
- /etc/shutdown.log file (read with cat /etc/shutdown.log)
- HMC log files (consult HMC documentation)
- AIX audit log
The AIX error report
The AIX error log can be read with the errpt command. This log
contains
many different types of error and informational
entries. Some of the most useful entries for providing
information
about a system shut down and reboot are REBOOT_ID, ERRLOG_ON,
ERRLOG_OFF, SYS_RESET, DUMP_STATS, and MINIDUMP. Some of the entries
that will be
logged if a system crashes due to a software problem are DSI_PROC,
ISI_PROC, and PROGRAM_INT. Some of the entries that might be logged if
a system crashes due to a hardware malfunction are SCAN_ERROR_CHRP, and
SCANOUT. - REBOOT_ID
This entry is written into the error log
whenever a system boots, so it is used for both warm boots and cold
boots. The Detail Data section in the entry specifies whether the boot
was warm, cold, or timed. --------------------------------------------------------------------------- LABEL:
REBOOT_ID IDENTIFIER:
2BFA76F6 Date/Time:
Sun
Nov 23 13:45:12 CST 2008 Sequence Number: 199 Machine
Id: 0002FBB2D900 Node
Id: vegas Class:
S Type:
TEMP Resource Name:
SYSPROC Description SYSTEM SHUTDOWN BY USER Probable Causes SYSTEM SHUTDOWN Detail Data USER ID
0 0=SOFT IPL
1=HALT 2=TIME REBOOT
0 TIME TO REBOOT (FOR TIMED
REBOOT ONLY)
0 ---------------------------------------------------------------------------
The Date/Time is the time that the entry was logged into the error
report during the beginning of the boot process, and so is a good
approximation of the time when the system was booted. If the system was
warm booted, meaning that a running system was rebooted with one of the
reboot commands, this time stamp will be later than the time that the
reboot command was actually executed, because of the additional time
required for the system to shut down before the reboot. There is no
dedicated entry in the error report for a normal system shut down that
gives the precise time that the shut down command was executed. However
under normal circumstances, the AIX error log is turned off when a
system is shutting down, so the ERRLOG_OFF entry, described below, can
be
used to approximate the time that the shut down was initiated. Also,
the last command
described above, can be used to read the shutdown
record to find the exact time that a system was shut down.
The boot type is displayed in the Detail Data section. - 0=SOFT
IPL:
Warm boot, meaning that a running system was shut down and rebooted. - 1=HALT:
Cold boot, meaning that a halted system was booted. - 2=TIME REBOOT:
Timed boot, meaning that a halted system was booted
automatically after a specified period of time. - ERRLOG_OFF, ERRLOG_ON
The AIX error logging system is turned off whenever a system is shut
down normally using any of the shut down commands, except for
sysdumpstart. It is always turned on when the system is booted. So if a
system is shut down and rebooted in the normal way, the error log
should contain an ERRLOG_OFF that is written when the system is
shutting down, and then an ERRLOG_ON that is written after the system
boots back up. The time stamp on the ERRLOG_OFF entry will approximate
the actual time that the shut down was initiated, and the time stamp on
the ERRLOG_ON will approximate the actual time the system was rebooted.
If a system shuts down in the normal way, these two entries will
normally exist in the error report, one right after the other. But they
can also be written one after the other if error logging is manually
turned off and then immediately turned back on. - DUMP_STATS
This entry is written into the error log to show that a system dump was
attempted. If the dump facility is properly configured, a system dump
will be captured when a system crashes, or when a dump is forced by a
user. Whenever a dump is written, the system is shut down immediately
with no warnings, and running processes are killed and not terminated
in an orderly way. The time stamp on this entry is the time that the
entry was
written into the error log after the system was rebooted, and so is not
the time that the system actually went down. A second time stamp in the
detail section of the entry reports the time that the system dump was
started, and this would be the time when the system crashed, or was
reset. Contact AIX Software Support for assistance with analyzing a
system dump to determine the reason why the dump was created. - SYS_RESET
This entry is written into the error report when a system is manually
reset by pressing the reset button or function buttons on the front
panel, or by selecting the Restart menu on an HMC. When a system is
reset, a system dump will be created if the dump facility is properly
configured. The SYS_RESET entry is not written into the error report
when the sysdumpstart command is executed. - DSI_PROC, ISI_PROC, PROGRAM_INT
These entries are written into the error report when a system crashes
due to some type of defect in the kernel, kernel extensions, or device
drivers. If the system dump facility is properly configured, a
DUMP_STATS entry should also be written into the error report about
the same time as one of these entries. Contact AIX Software Support for
assistance with these types of errors. - SCAN_ERROR_CHRP, SCANOUT
These are hardware related entries and might be logged into the error
report if the system crashes due to a hardware malfunction., or if
there is an unexpected loss of power. Contact IBM
Hardware Support for assistance with these types of errors. The /var/adm/wtmp account file
This binary file is used to store various types of login information.
One type of information stored in this file is user login records.
These records document the user name and time of login. Pseudo user
names are used for shutdown and reboot. So when a system is shut down
using one of the shut down commands, a record with the user name shutdown will be logged into the
wtmp file. Similarly when a system is booted, a record with the user
name reboot will be written
into the wtmp file. Some shut down commands have flags that can be used
to suppress login records in the wtmp file. Note: Technically a
reboot is a warm boot but the pseudo user name reboot is written into the wtmp
file for both warm boots and cold boots.
The wtmp file can be read by using the last command. Below is example
output from the last
command. Descriptions of the records are contained within parenthesis. # last root
pts/0
sig-9-65-19-99.mts.ibm.com
Nov 23
13:50 still logged in. (a root user logged in at 13:50) reboot
~
Nov
23 13:45 (the system was booted at
13:45) shutdown
pts/0
Nov
23 13:44 (the system was shut down at 13:44) root
pts/0
sig-9-65-19-99.mts.ibm.com
Nov 23
13:43 - System is halted by system administrator. (00:00) (a root user logged back in at 13:43
and after 00:00 minutes, or almost immediately, shut down the system) root
pts/0
sig-9-65-19-99.mts.ibm.com
Nov 23
13:29 - 13:43 (00:13) (a root user logged in at 13:29 and
logged out at 13:43, for a total login time of about 00:13 minutes) The /var/adm/pacct account files
Files in the /var/adm/pacct directory store information about the last
commands that have been executed on the system. These files are read
with the lastcomm
command. This command displays information, in reverse chronological
order, about all of the previously executed commands that are still
recorded in the files in the /var/adm/pacct directory. The /usr/sbin/acct/startup command
must be executed before the lastcomm
command can be used. The startup
command does not persist across a reboot, so if you want to keep
command recording active at all times, you would need to add the startup command to /etc/inittab
or some other startup script. For some reason the shut down commands
such as shutdown, reboot, and halt are not recorded. However
other
commands that are executed during the shut down process are recorded.
When a script is executed, all commands called within the script are
logged, so the lastcomm
command output can be difficult to read. See the man page for more
information about this command. The AIX console log
The AIX console log is a binary log file that can be read with the
following command:
# alog -t console -o
A number of AIX system processes log information into the console log
when starting up during the boot process, and when shutting down during
the shut down process. Time stamps are written with each entry, so this
log can contain valuable information that can be used to investigate an
unexpected shut down. su log file
The su log file is used to log attempts to become a superuser. This log
can be useful when trying to track down who might have gained root
access to shut down a system. The su log file is located in
/var/adm/sulog and has messages that look like this: # cat /var/adm/sulog SU 07/08 10:57 + pts/0 root-root SU 07/11 12:44 + pts/0 root-nobody SU 07/25 16:37 + pts/5 dcoca-root SU 09/11 10:21 + pts/1 mrj1-root Shell history file
If the root account uses a shell that supports a history file, this
file can be used to view a history of commands that were executed by a
root user. The korn shell will write the history file to the file named
in the HISTFILE environment variable ($HOME/.sh_history by default). Of
course it is possible for a user who has gained root access to disable
the file temporarily before running commands. The korn shell history
file is read with the fc
command. /etc/shutdown.log
This log file is created or appended to if the -l option is used with the shutdown command. The file
contains a time stamp to show the time of the shut down. It also logs
the shut down of specific subsystems such as syslogd, the unmounting of
file systems, and bringing down network interfaces. Here is example
output from this log file: # cat /etc/shutdown.log Sun Nov 30 11:45:31 CST 2008 shutdown: THE SYSTEM IS
BEING SHUT DOWN NOW User(s) currently logged in: root Stopping some active subsystems... 0513-044 The syslogd Subsystem
was requested to stop. 0513-044 The hostmibd Subsystem
was requested to stop. 0513-044 The snmpmibd Subsystem
was requested to stop. ... Unmounting the file systems... /lgfs unmounted successfully. /download unmounted successfully. ... umount: 0506-349 Cannot unmount
/dev/hd3: The requested resource is busy. Bringing down network interfaces: detached en0 from the network
interface list detached lo0 from the network
interface list HMC system logs
The HMC maintains a system log file that records information about
commands that have been executed on the HMC, or through a web interface
such as WebSM or the newer HMC web application. If an LPAR is shut
down, restarted, halted, or dumped, the HMC log should contain a record
of the command and the time the command was executed, if the system was
shut down using the HMC. Consult your HMC
documentation for details about how to access this log. AUX audit log AIX includes an auditing subsystem that can be used to log
information about commands that have been executed on the system. If a
system is going down repeatedly due to one of the shut down commands,
auditing can be enabled to help provide information about who or what
is executing the command. For an overview on the AIX auditing
subsystem, see technote T1000212. System shut down events
The table below contains a list of the most common events that cause a
system to shut down and reboot, or shut down and halt. Note that
only the most commonly used command options are listed - consult the
AIX man
pages and HMC manuals for more comprehensive documentation. Event | Description | shutdown
shutdown -h
shutdown -v | Shuts down a
running system with multiple users in an orderly way, and then halts. Notifies
users
with
the
wall command of the impending shut down. If this
command is used
on a system with software control of the power supply, power will be
turned off. Note: All three
of these
commands shut down the system essentially the same way, and generate
identical entries in AIX logs. | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
1 ---------- LABEL:
ERRLOG_ON
-----------
LABEL: ERRLOG_OFF wtmp reboot
~ Nov 23 15:20 shutdown vty0 Nov 23
15:16
root pts/0 hostname Nov 23
15:13 - 15:16 (00:03) | Event | Description | shutdown -r
shutdown -Fr | Shuts down a running system with
multiple users in an orderly way and then
calls the reboot command to reboot the system. Notifies users with the
wall command of the impending shut down, unless the -F flag is used, in
which case the system is shut down as quickly as possible with no user
notification. | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
0 ---------- LABEL:
ERRLOG_ON
-----------
LABEL: ERRLOG_OFF wtmp reboot
~ Nov 23 16:33 shutdown vty0 Nov 23
16:32
root pts/0 hostname Nov 23
16:28 - 16:32 (00:04) | Event | Description | shutdown -l | The -l
option can be used alone or added to other options to create or append
to the AIX system log file /etc/shutdown.log. This option can be used
to debug problems with the shut down process. | Logs | /etc/shutdown.log Sun Nov 30 11:45:31 CST 2008 shutdown: THE SYSTEM
IS
BEING SHUT DOWN NOW User(s) currently logged in: root Stopping some active
subsystems... 0513-044 The syslogd
Subsystem
was requested to stop. ... Note: The Error
Report and wtmp entries are the same as above,
depending on options used in addition to the -l option. | Event | Description | reboot
fastboot | Shuts down a
running system in an orderly way and then reboots. This command should not be used if other
users are logged into the system. Use
shutdown -r instead. Note:
fastboot is
identical to reboot and is provided for BSD compatibility. | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
0 ---------- LABEL:
ERRLOG_ON
-----------
LABEL: ERRLOG_OFF wtmp reboot
~ Nov 23 13:45 shutdown vty0 Nov 23
13:44
root pts/0 hostname Nov 23 13:43 -
System is halted by system administrator. (00:00) | Event | Description | reboot
-l Note:
the -n and -q options imply -l | Shuts down a running system in
an orderly way and
then reboots, but does not log a shutdown record in the /var/adm/wtmp
accounting
file. This
command should not be used if other users are logged into the system.
Use
shutdown -r instead. Note: The
-l option should normally not be used by a system administrator. It is
intended for other commands such as shutdown -r that call the reboot
command but log an entry in wtmp themselves. | | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
0 ---------- LABEL:
ERRLOG_ON
-----------
LABEL: ERRLOG_OFF wtmp reboot ~
Dec 06 14:29
root pts/0 hostname Dec 06 14:22 - System
halted abnormally. (00:06) | Event | Description | reboot
-nq Note:
the -l option is implied | Shuts down a running system as
quickly as possible and then reboots, but does not log a shutdown
record in the /var/adm/wtmp accounting file. Does not call sync to
flush file
buffers and does not send processes a SIGTERM. Normally this command
should not be used by a system administrator. Note: This command
is sometimes used by cluster management software such as Oracle RAC to
evict a node as quickly as possible to preserve the integrity of the
database. | | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
0 ---------- LABEL:
ERRLOG_ON wtmp reboot ~
Dec 05 11:32
root pts/0 hostname Dec 05 11:03 - System
halted abnormally. (00:29) Note: There
is no ERRLOG_OFF entry in the error report because the -q option causes
the reboot command to shut down immediately without sending processes a
SIGTERM to shut them down in an orderly way. | Event | Description | halt
fasthalt | Shuts down a running system in
an orderly way and
then halts. This
command should not be used if other users are logged into the system. Use
shutdown -h instead. Note:
fasthalt is identical to halt and is provided for BSD compatibility. | | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
1 ---------- LABEL:
ERRLOG_ON
-----------
LABEL: ERRLOG_OFF wtmp reboot
~ Nov 23 14:35 shutdown vty0 Nov 23 14:32
root pts/0 hostname Nov 23 14:30 -
System is halted by system administrator. (00:01) | Event | Description | halt -l Note: the -n and -q options
imply
-l | The same as a halt with no
options, except that a shutdown record will be logged in the
/var/adm/wtmp
accounting file. Note:
The -l option should normally not be used by a system administrator. It
is intended for other commands such as shutdown -h that call the halt
command but log an entry in wtmp themselves. | | Logs | Error
Report
LABEL: REBOOT_ID 0=SOFT IPL 1=HALT
2=TIME REBOOT
1 ---------- LABEL:
ERRLOG_ON
-----------
LABEL: ERRLOG_OFF wtmp reboot
~ Dec 06 16:08 root
pts/0 hostname Dec 06 15:51 - 16:04 (00:12) | Event | Description | sysdumpstart -p | Immediately
stops AIX and initiates a
system dump to the primary dump
device if the dump facility is properly configured. Afterwards the
system will automatically reboot if the auto restart flag is true.
Otherwise the system will halt. Note: This
command is
sometimes used by cluster management software such as Oracle RAC to
evict a node and create a system dump. If a node in an Oracle RAC is
evicted with the sysdumpstart command, contact Oracle Support and IBM
AIX support for assistance with analyzing the system dump. | | Logs | Error Report
LABEL: DUMP_STATS
Description
SYSTEM DUMP
User Causes
SYSTEM DUMP REQUESTED BY USER
Detail Data
DUMP DEVICE
/dev/lg_dumplv
DUMP SIZE
39740416
TIME
Sun Dec 7 09:45:21 2008
...
----------
LABEL:
MINIDUMP_LOG
----------
LABEL: ERRLOG_ON wtmp reboot ~
Dec 07 09:52
root pts/0 hostname Dec 07 09:44 - System
halted abnormally. (00:07) # sysdumpdev -L
0453-039
Device name:
/dev/lg_dumplv
Size:
39740416 bytes
Uncompressed Size: 398315509 bytes
Date/Time:
Sun Dec 7 09:45:21 CST 2008
Dump status: 0
dump completed successfully Note: No
REBOOT_ID command is logged in the error report. Also there is no
ERRLOG_OFF entry in the error report because this command shuts down
the system immediately without sending processes a SIGTERM to shut them
down in an orderly way. No shutdown record is logged in the wtmp file. | Event | Description | Forced reset on system front
panel
OR
Restart command with the dump option is executed on an HMC | The system is reset by pressing
the reset button on the front panel of the machine. Or if the machine
has function buttons, the system is reset by executing one or more
functions on the front panel. Initiates a
system dump to the primary dump
device if the dump facility is properly configured. The system will
automatically reboot if the autorestart flag is set to true.
If the system is an LPAR managed by an HMC, the system is reset by
running the Restart command with the dump option selected on the HMC. | | Logs | Error Report
LABEL: DUMP_STATS
Description
SYSTEM DUMP
User Causes
SYSTEM DUMP REQUESTED BY USER
Detail Data
DUMP DEVICE
/dev/lg_dumplv
DUMP SIZE
56760832
TIME
Wed Dec 3 08:14:41 2008
...
----------
LABEL:
MINIDUMP_LOG
----------
LABEL: SYS_RESET
Description
SYSTEM RESET INTERRUPT RECEIVED
----------
LABEL: ERRLOG_ON wtmp reboot ~
Dec 03 08:21
root pts/0 hostname Dec 03 08:04 -
08:12 (00:07) # sysdumpdev -L
0453-039
Device name:
/dev/lg_dumplv
Size:
56760832 bytes
Uncompressed Size: 399727524 bytes
Date/Time: Wed Dec 3 08:14:41
CST 2008
Dump status: 0
dump completed successfully Note: No
REBOOT_ID command is logged in the error report. Also there is no
ERRLOG_OFF entry in the error report because a system reset shuts down
the system immediately without sending processes a SIGTERM to shut them
down in an orderly way. No shutdown record is logged in the wtmp file. | Event | Description | Software system crash | A software system crash or
kernel panic is most often caused by some type of problem in the
kernel,
kernel extensions, or device drivers. If the system dump facility is
properly configured, a system dump will be created. The system will
automatically reboot of the autorestart flag is set to true.
If a system crashes, contact IBM AIX Support for assistance. | | Logs | Error Report LABEL:
DUMP_STATS
Description
SYSTEM DUMP
User Causes
SYSTEM DUMP REQUESTED BY USER
Detail Data
DUMP DEVICE
/dev/hd7
DUMP SIZE
189803520
TIME
Fri Oct 17 02:02:02 2008
... Note: The
DUMP_STATS entry might report that the system dump was requested by
user, even though the system crashed.
----------
LABEL:
MINIDUMP_LOG
----------
LABEL:
PROGRAM_INT OR LABEL:
DSI_PROC OR LABEL:
ISI_PROC
----------
LABEL: ERRLOG_ON wtmp reboot ~
Oct 17 02:05 # sysdumpdev -L
0453-039
Device name: /dev/hd7
Size:
189803520 bytes
Uncompressed Size: 6548975244 bytes
Date/Time: Fri Oct 17 02:02:02 CST
2008
Dump status: 0
dump completed successfully Note: No
REBOOT_ID command is logged in the error report. Also there is no
ERRLOG_OFF entry in the error report because a system crash shuts down
the system immediately without sending processes a SIGTERM to shut them
down in an orderly way. No shutdown record is logged in the wtmp file. | Event | Description | Hardware system crash | A hardware system crash is
caused by some type of hardware failure. Even if the system dump
facility is properly configured, depending on the type of hardware
failure, a system dump might not be created. The system will
automatically reboot of the autorestart flag is set to true, and if the
hardware is operational enough to allow the system to boot.
If a system crashes due to hardware failure, contact IBM Hardware
Support for assistance. | | Logs | Error Report LABEL:
DUMP_STATS
Description
SYSTEM DUMP
User Causes
SYSTEM DUMP REQUESTED BY USER
Detail Data
DUMP DEVICE
/dev/lg_dumplv
DUMP SIZE
284503126
TIME
Sat Oct 18 04:01:05 2008
... Note:
The DUMP_STATS
entry might report that the system dump was requested by user, even
though
the system crashed. ----------
LABEL:
MINIDUMP_LOG
----------
LABEL: SCAN_ERROR_CHRP OR LABEL:
SCANOUT OR
possibly other hardware related entries ----------
LABEL: ERRLOG_ON wtmp reboot ~
Oct 18 04:05 # sysdumpdev -L
0453-039
Device name:
/dev/lg_dumplv
Size:
284503126 bytes
Date/Time: Sat Oct 18 04:01:05 CST
2008
Dump status: 0
dump completed successfully Note:
No
REBOOT_ID command is logged in the error report. Also there is no
ERRLOG_OFF entry in the error report because a system crash shuts down
the system immediately without sending processes a SIGTERM to shut them
down in an orderly way. No shutdown record is logged in the wtmp file. | Event | Description | Loss of power | A power failure occurs and a UPS
does not provide backup power. Or, the power button is pressed without
first shutting the system down with one of the shut down commands. | Logs | Error Report LABEL:
SCAN_ERROR_CHRP Note: The reference code in this
entry will indicate an unexpected loss of power. | Event | Description | HMC menu
commands such as: Operations:Activate Operations:Shutdown Operations:Restart
OR
HMC command line commands | An LPAR can also be shut down
and rebooted using commands on the HMC. For example, commands can be
executed using WebSM or the newer HMC web application, that will shut
down and halt an LPAR, shut down and reboot an LPAR, or initiate a
system dump on an LPAR. Discussion of these methods are beyond the
scope of this document. Consult your HMC documentation for details. | | Logs | HMC Console Log hscroot@pkdahmc5:~>
lssvcevents -t console time=03/12/2008
08:04:34,text=HSCE2174 User hscroot Login from remote host
pcp684467pcs.central.sprint with IP address 10.86.10.151 was successful. time=03/06/2008
13:43:22,text=HSCE2016 User name hscroot Logical Partition dkda0177
with ID 1 of managed system 9133-55A*10C104G has been activated with
profile pkda0177. time=03/06/2008
13:43:21,text=HSCE2245 User name 1: Activating the partition
9133-55A*10C104G succeeded on managed system {2}. time=03/06/2008
13:42:14,text=HSCE2121 User name hscroot: Immediate shut down executed
successfully on partition dkda0177 with ID 1 on the managed system
Server-9133-55A-SN10C104G. time=03/06/2008
13:42:14,text=HSCE2254 User name 1*9133-55A*10C104G: Dump to load
source for partition 9133-55A*10C104G succeeded on managed system {2}. Note: The log
output above is presented only as an example. Consult your HMC
documentation for details about how to view HMC logs. Note: Some of the shut
down and restart commands on the HMC have options that will cause the
HMC to send AIX shut down commands to the LPAR. These options are
documented in the HMC interface with the text "Operating System". If an
Operating System command is executed on the HMC, the AIX system logs
will be the same as if the command had been executed directly with the
Unix command line on the LPAR. | Conclusion If
AIX shuts down unexpectedly, there are a number of log files and
commands that can be used to investigate the cause. AIX does not shut
itself down on its own, but software running on AIX might initiate a
shut down. This document shows there are a number of ways for a system
to go
down, and provides information about system log files and commands that
can help to determine the cause. If a node in a cluster shuts down
unexpectedly, review the
cluster manager logs to see if there are any entries related to the
shut down that might provide additional information about the cause. If
a
system dump was created after the shut down, contact AIX Software
Support for
assistance with analyzing the system dump. | | | | |