Author Topic: Intermittent Network issues  (Read 3742 times)

PeteK

  • Guru
  • ****
  • Posts: 408
    • View Profile
Intermittent Network issues
« on: January 05, 2009, 10:41:33 pm »
Gang--

I've got an issue that is serving to constantly frustrate me and make my LMCE setup unusable.  I'm hoping someone may be able to help me.  The issue that I'm seeing is that when I have MDs running, at some point in time the internal network fails.  By fails, I mean I can't ping any machine from the core, and vice versa.  The MDs freeze, and the windows PCs indicate limited (non-DHCP access) My squeezebox goes blank, indicating loss of communications to the server.  However, there are no indications on the core that anything is amiss.  ifconfig indicates that the network connection is active, though RX and TX packets count up very slowly and erratically).  I've never seen this issue happen when I don't have MDs running.  The outside NIC is fine, i.e. I can access the internet from the core.  Shutting off any running MDs immediately corrects the problem and the network returns to normal (ie. windows PCs are routed to the internet, squeezebox works).

My setup is:
Hybrid: Intel S5000PSL
MDs:
1 Abit AN-M2HD
1 ASROCK    ALiveNF7G-FullHD R3.0
Switch
Netgear Gigabit

I've tried the following:

1) Updated the kernel to 2.6.27 to try the new e1000e driver for the onboard intel gigabit lan
2) Swapped out network cables/ethernet switch
3) Tried each MD singly and together
4) Added D-link network card to core to test new hardware/driver
5) Tried 64 and 32 bit versions of LMCE.

Still, in anywhere from 2 to 72 hours, the problem recurs.  Has anyone seen this before?  Anybody have any ideas of where to look to troubleshoot? Any help would be greatly appreciated.

Thanks,
-PeteK



colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Intermittent Network issues
« Reply #1 on: January 05, 2009, 11:03:54 pm »
Pete - is it possible for you to install a network sniffer on the core to see what traffic is passing during the lock up?

The only things that spring to mind are the switch borking, perhaps flooded with frames it can't handle or perhaps STP is causing switch ports to go from Forwarding to Blocking mode - but if you have already changed the switch (for a different type, not the same model) then I presume that would eliminate that possibility.

Or the internal NIC or driver locking up, but it sounds like you have tried replacing both of those, too. Slim possibility, the routing table gets screwed up? But can't think why, or why turning off the MDs would fix this. I guess if the interface has an IP address and is enabled and online, and you can ping that interface from the core (ie ping itself), and the routing table looks normal, then you will have to sniff the traffic to see what is going on - traffic in only? Out only? Where from and to? What protocols and payloads?

EDIT: one thing I always recommend in network situations is to eliminate the switch completely by plugging the MD directly into the core (obviously if your media is also on the internal network then you will have to copy some over to the core first to test with). Almost all NICs handle auto-cross over these days, so you don't actually need a switch or cross over cable. This will eliminate the external networking at least, except one cable. Then test again and see if it recurs...
« Last Edit: January 05, 2009, 11:06:29 pm by colinjones »

Dale_K

  • Veteran
  • ***
  • Posts: 149
    • View Profile
Re: Intermittent Network issues
« Reply #2 on: January 05, 2009, 11:19:20 pm »
I also had similar network/MD lock up problems.  The issue turned out to be a cable problem.  I had an MD that had an inferior patch cable.  The cable worked fine at 100Mb but failed intermittently at 1Gb.

I changed all my cable from CAT5e to CAT6 and have had no problems since.  It was probably unecessary to change all my cable but I was looking to do that at some point anyway.

An easy way to see if you have this type of failure is to plug in a 100Mb switch and see if the lock ups still occur.  If not, I'd say you have a cable problem.  I realize that this does not rule out a failing switch but in my experience, when switches fail they fail completely.  The fact that an MD reboot corrects your problem indicates that it isn't a switch problem.  Most likely poor/defective cable is causing excessive re-transmissions causing a switch "lock-up".

Hope this helps

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Intermittent Network issues
« Reply #3 on: January 05, 2009, 11:20:21 pm »
I had the same problem with my old motherboard. The problem for me was that it used the forcedeth driver which was very flaky at gigabit speeds.

Maybe try swapping network interfaces (in the web admin) and swap network cables in the back - it worked for me on one of my machines. Either way, best experience for me has been to always use the onboard NIC on the external network, and a PCI or PCIe network card with good linux support on the internal network.

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Intermittent Network issues
« Reply #4 on: January 06, 2009, 06:48:06 am »
ooh another thought is autonegotiate. Setting auto for speed and particularly duplex on your internal NIC can cause problems, especially with NICs/Switches that don't autoneg well. Some cases this will cause the NIC or switch port to go to 10/half duplex, and under load will cause massive amounts of frame collisions which will effectively lock your port up. Nothing will be able to communicate with the core which means your MDs will die, but all other machines on that switch will be able to ping each other just fine.

The suggestion I made to connect direct to the core could eliminate this possibility, too. If your switch is manageable at all, you can force the switch port to 100/Full or 1G, and then do the same to the core NIC to prevent this. Also, if it is manageable, for a Cisco you can turn on PORTFAST which will eliminate certain STP blocking issues. Other brands of manageable switches will have an equivalent to this option as well. Can explain what this is if it turns out to be the cause.

But suggestions about the patch cables are the first and most likely causes, be sure of this first as it will be annoying for you to go through lots of advanced troubleshooting only to find you had a crappy cable :)

PeteK

  • Guru
  • ****
  • Posts: 408
    • View Profile
Re: Intermittent Network issues
« Reply #5 on: January 06, 2009, 05:26:17 pm »
Colin--

I have a web-managed switch in there normally, and it showed all at 1GB, unfortunately.  I wired to a 100MBit switch, and I still had a failure last night (of only 1 MD, which is a bit different).  I think I'll cut out the homemade cables to the MDs (I had already done so with the core) and replace with pre-made cat5e cables.

ooh another thought is autonegotiate. Setting auto for speed and particularly duplex on your internal NIC can cause problems, especially with NICs/Switches that don't autoneg well. Some cases this will cause the NIC or switch port to go to 10/half duplex, and under load will cause massive amounts of frame collisions which will effectively lock your port up. Nothing will be able to communicate with the core which means your MDs will die, but all other machines on that switch will be able to ping each other just fine.

The suggestion I made to connect direct to the core could eliminate this possibility, too. If your switch is manageable at all, you can force the switch port to 100/Full or 1G, and then do the same to the core NIC to prevent this. Also, if it is manageable, for a Cisco you can turn on PORTFAST which will eliminate certain STP blocking issues. Other brands of manageable switches will have an equivalent to this option as well. Can explain what this is if it turns out to be the cause.

But suggestions about the patch cables are the first and most likely causes, be sure of this first as it will be annoying for you to go through lots of advanced troubleshooting only to find you had a crappy cable :)

PeteK

  • Guru
  • ****
  • Posts: 408
    • View Profile
Re: Intermittent Network issues
« Reply #6 on: January 21, 2009, 05:18:34 am »
Just an update--

I'm running 0810 alpha1 now and I'm note seeing the network freeze issue.  I don't know if it's something that's not currently operational yet in 0810 or something that was fixed in either a Kubuntu or LMCE upgrade.  In any case, I'll have a good delta if things break in a future release of 0810.  This is a huge relief from the 0710 frustration.  Good job dev team, and thanks.

-PeteK