weird network problem with B100s/B1600 chassis

From: Adam Levin (levins@westnet.com)
Date: Mon Apr 25 2005 - 11:34:50 EDT


Hey all, we've got a weird problem, and I'm not sure how to proceed
because these systems are out of warranty at this point.

We've got a Sun Blade B1600 chassis full of servers. We're using both
internal switches. SWT0 is on one VLAN, SWT1 is on another, both going to
our core Cisco switches.

The chassis is full of B100s servers running as web servers (running
Apache on Solaris 8 04/01 patched as of a few months ago).

Recently, two of the blades failed, and we bought two new ones. The
failure mode was that the second interface, ce1, stopped seeing anything
on the rest of the network.

We replaced them, jumpstarted them, and we're still having a problem
seeing anything on the network:

[11:31:35]root@http-b01.prod:/root$ ping -s 10.20.50.255
PING 10.20.50.255: 56 data bytes
64 bytes from http-b01.san (10.20.50.135): icmp_seq=0. time=1. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=0. time=5. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=1. time=0. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=1. time=0. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=2. time=0. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=2. time=0. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=3. time=0. ms
64 bytes from http-b01.san (10.20.50.135): icmp_seq=3. time=0. ms
^C
----10.20.50.255 PING Statistics----
4 packets transmitted, 8 packets received, 2.00 times amplification
round-trip (ms) min/avg/max = 0/0/5

We should be seeing a huge number of servers responding, which we do on
all other blades in the chassis.

All other blades are functioning normally. The ce0 interfaces on the two
bad blades are functioning normally.

I've tried logging in to the switch module. I tried to ping 10.20.50.135,
but it failed. I also tried to ping 10.20.50.134, a known good server,
but that also failed, even though the good server is, well, good.

Has anybody seen this before? I'm not sure what else to do, since it
*appears* that the switch module and chassis are OK, and this exact same
problem is happening on two blades, which were both replaced with new last
week.

The switch configuration has not changed -- I've confirmed that by
comparing the running config with the boot config.

-Adam
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers



This archive was generated by hypermail 2.1.7 : Wed Apr 09 2008 - 23:30:35 EDT