Archive for

May, 2012

...

List WWPN of all LPARs or a specific LPAR using HMC command

Comments Off

Question:

How to WWPN of all LPARs or a specific LPAR using HMC command?

 

Solution:

 

To List WWPN of all LPARs on a managed system:

 

MS=Dev-BOX-9117-MMB-SN129SBCP
lshwres -r virtualio --rsubtype fc -m $MS --level lpar -F lpar_name,slot_num,wwpns --header |grep -v null
lpar_name,slot_num,wwpns
devaix101,5,"c05076036cfc002e,c05076036cfc002f"
devaix101,4,"c05078044abc002c,c05078044abc002d"
devaix101,3,"c05078044abc002a,c05078044abc002b"
devaix101,2,"c05078044abc0028,c05078044abc0029"
devaix102,5,"c05078044abc0538,c05078044abc0539"
devaix102,4,"c05078044abc0536,c05078044abc0537"
devaix102,3,"c05078044abc0534,c05078044abc0535"
devaix102,2,"c05078044abc0532,c05078044abc0533"
devaix103,5,"c05078044abc00c6,c05078044abc00c7"
devaix103,4,"c05078044abc00c4,c05078044abc00c5"
devaix103,3,"c05078044abc00c2,c05078044abc00c3"
devaix103,2,"c05078044abc00c0,c05078044abc00c1"

 

Please be noted slot_number is useful to identify the FC adapter on AIX.

For example:

On server devaix103

lsdev -Ccadapter|grep fcs
fcs0 Available C2-T1 Virtual Fibre Channel Client Adapter
fcs1 Available C3-T1 Virtual Fibre Channel Client Adapter
fcs2 Available C4-T1 Virtual Fibre Channel Client Adapter
fcs3 Available C5-T1 Virtual Fibre Channel Client Adapter

 

The location of fcs0 is C2, which means its slot_number is 2.

 

To List WWPN of a specific LPAR:

MS=Dev-BOX-9117-MMB-SN129SBCP
LPAR=devaix103
lshwres -r virtualio --rsubtype fc -m $MS --filter lpar_names=$LPAR --level lpar -F lpar_name,slot_num,wwpns --header

lpar_name,slot_num,wwpns

devaix103,5,”c05078044abc00c6,c05078044abc00c7″

devaix103,4,”c05078044abc00c4,c05078044abc00c5″
devaix103,3,”c05078044abc00c2,c05078044abc00c3″
devaix103,2,”c05078044abc00c0,c05078044abc00c1″

Comments Off

Yup, another PowerHA bug! cl_chfs: VG sharedvg is concurrent

Comments Off

Problem:

 

When trying to extend a shared filesystem in  Enhanced-Capable Concurrent mode VG in PowerHA, I always return the following message:

 

cl_chfs: VG sharedvg is concurrent

No matter I use smit cspoc -> filesystem

or command:

/usr/es/sbin/cluster/sbin/cl_chfs -cspoc -nnode1,node2 -a size=+100G -A no /usr/opt/db2/dat1

Or

/usr/es/sbin/cluster/cspoc/cli_chfs -a size=+100G /usr/opt/db2/dat1

 

Solution:

 

The problem is due to a typo in line 139 of /usr/es/sbin/cluster/utilities/clresactive

if print -- "$lsvg_out" | grep -i -q "passive_only"

It should be

  if print -- "$lsvg_out" | grep -i -q "passive-only"

 

Please note it should be _ instead of – between passive and only.

 

Modify the /usr/es/sbin/cluster/utilities/clresactive on all the Power HA nodes, now extending sharded filesystem works!

Comments Off

Determining if TSAMP/RSCT rebooted a node/server

Comments Off

Problem(Abstract)

This document will discuss the various ways to determine if RSCT has rebooted your node.

Symptom

Node reboots without operator issuing the reboot command

Resolving the problem

TSAMP cannot reboot your node, there is no functionality built into the core TSAMP application that allows for this functionality. However, RSCT (Reliable, Scalable Cluster Technology), the cluster provider that TSAMP ‘rides’ upon can and will reboot your node given a few different situations. This technote will not discuss why the node was rebooted, only some of the ways to determine if RSCT was the culprit who initiated the reboot.

Syslogs:

The easiest way to see that RSCT has rebooted a node is to check your syslogs for the following message:
Jan 11 10:15:38 node03 ConfigRM[1418]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,PeerDomain.C,1.99.25.16,18148            :::CONFIGRM_NOQUORUM_ER#012The operational quorum state of the active peer domain has changed to NO_QUORUM. #012This indicates that recovery of cluster resources can no longer occur and that #012the node may be rebooted or halted in order to ensure that critical resources #012are released so that they can be recovered by another sub-domain that may have #012operational quorum.

Jan 11 10:15:38 node03 ConfigRM[1418]: (Recorded using libct_ffdc.a cv 2):::Error ID: :::Reference ID:  :::Template ID: 0:::Details File:  :::Location: RSCT,PeerDomain.C,1.99.25.16,21028            :::CONFIGRM_REBOOTOS_ER#012The operating system is being rebooted to ensure that critical resources are #012stopped so that another sub-domain that has operational quorum may recover #012these resources without causing corruption or conflict.

The easiest way to do this is to grep “REBOOTOS” from your syslogs output file.

ConfigRM Trace File:
Your IBM.ConfigRM trace file can be found in the following directory and needs to be formatted to be read:
To format the file:
rpttr -odtic /var/ct/IW/log/mc/IBM.ConfigRM trace > /tmp/IBM.ConfigRM_Trace.out
** Note: Your specific file names for the trace file may differ from what is shown above **

01/11/2011 10:15:38 AM.886775 T(11844464) _CFD !!!!!!!!!!!!!!!!! PeerDomainRcp::haltOS Entered. !!!!!!!!!!!!!!!!!!!!!
/var/ct/IW/log/mc/IBM.ConfigRM/
01/11/2011 10:15:38 AM.886856 T(11844464) _CFD logerr: In file=/project/spreljan/build/rjans002a/src/rsct/rm/ConfigRM/PeerDomain.C (Version=1.99.25.16 Line=21028) :
CONFIGRM_REBOOTOS_ER
The operating system is being rebooted to ensure that critical resources are
stopped so that another sub-domain that has operational quorum may recover
these resources without causing corruption or conflict.

After running the above rpttr command to format the traces, grep the word “rebootos” from the output file.

Error Report (AIX only):
Create or view the error report with the following command:
errpt -a > /tmp/error_report.out

Search for the following message:
-----------------------------------------------------------------------
LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63

Date/Time:       Fri Sep  9 17:35:14 2011
Sequence Number: 163592
Machine Id:      00C54C6E4C00
Node Id:         ccdev32
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   PANIC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
ASSERT STRING

PANIC STRING
RSCT reboot caused by critical resource protection
-----------------------------------------------------------------------

On both AIX 5.3 and AIX 6.1 the “IDENTIFIER” is the same value so searching the error report for “225E3B63″ will locate and identify the RSCT reboot.

Event Viewer (Windows only):
Search the event viewer for “0xDEADDEAD” to find the RSCT initiated reboots.

Comments Off