Information Technology Grimoire

Version .0.0.1

IT Notes from various projects because I forget, and hopefully they help you too.

clusterxl

ClusterXL Troubleshooting

SmartView Tracker > Type > Control Origin > Cluster Members Review logs cluster state and reason

cphaprob state

Cluster Mode: HA (Active Up) with IGMP Membership

Number		Unique Address	Assigned Load	State
1 (Local) 	x.x.x.2			0%				Down
2			x.x.x.3			100%			Active Attention

cphaprob -l list

list of devices on pnotes list and their status

cphaprob list

Show summary of devices in problem state only

Device Name:  Interface Active Check
Current state: problem

ifconfig vs ethtool

  • ifconfig is unreliable, ethtool shows speed/duplex too

cphaprob -a if

Show status of interface as ClusterXL sees it

eth0	Disconnected	non sync, multicast
eth1	UP				non sync, multicast
eth2	UP				non sync, multicast
eth3	DOWN (2344 sec)	sync(secured), multicast

Disconnected vs Down

  • Disconnected is non sync, not monitored, no problem
  • Down is a monitored sync, and a problem

ethtool down stats

  • if both down, it’s probably your switch

Active / Active Troubleshooting

cphaprob stat
Cluster Mode: High Availability (Active Up) with IGMP Membership

Number		Unique Address	Assigned Load	State
1 (Local) 	x.x.x.2			100%			Active
2			x.x.x.3			0%				ClusterXl Inactive or Machine is Down

Number		Unique Address	Assigned Load	State
1			x.x.x.2			0%				ClusterXl Inactive or Machine is Down
2 (Local)	x.x.x.3			100%			Active
  1. Can clusters see each other? View topology and ifconfig eth3 ping eth3 members - they reply?

  2. Any device status prob note? cphaprob list “there are no pnotes in problem state”

  3. Sync Traffic exist? fw ctl pstat view the sync packet count watch -n 1 “fw ctl pstat | tail -12” only sent or receive counter increasing? Only both sent? cpha process not seeing each other

  4. Check CCP packets UDP 8116 tcpdump -nnei eth3 CCP? review mac_magic, id of member cluster_global_D, id of member

5th has to be same, 254 or FE is the default if cluster member reinstalled need to update all to same chpaconf cluster_id get 155 = x9b 133 = x85 cphaconf cluster_id set 155 (survives reboot)

Load Sharing Unicast ClusterXL

relevant info in debugs at same time, specific connection and interfaces https://youtu.be/7g6PdcLOIzU?t=169