Author Topic: Horrible Network instability problems  (Read 6531 times)

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Horrible Network instability problems
« on: August 06, 2008, 12:06:43 pm »
We have been discussing this problem along with my RAID problem here: http://forum.linuxmce.org/index.php?topic=5892.0. Both problems have become so severe that I decided to separate my network/stability issues into this thread.

For a few weeks now, it appeared that my core was locking up on a nightly basis, without fail. (couldn't SSH in, network would be down, etc..) After more poking around, the core was not actually locking up, but killing the network on Eth:0 (onboard NIC of my Asus M2nPV-VM, a well tested and supported MB) every single night.

Please note that I have been using this same setup for over 6 months with absolutely no problems!

The only changes have been the addition of a new switch (Netgear GS524) and Access Point (Netgear WG302). I've recently spoke with Netgear Tech Support to ensure it is all set up properly.

Here are some different entries from syslog - not sure if any of them are related to my problem:

Code: [Select]
Aug  6 05:39:20 dcerouter kernel: [  690.972000] rtc: lost 27 interrupts
Aug  6 05:39:22 dcerouter kernel: [  693.024000] rtc: lost 28 interrupts
Aug  6 05:39:24 dcerouter kernel: [  695.076000] rtc: lost 28 interrupts
Aug  6 05:39:30 dcerouter kernel: [  701.232000] rtc: lost 28 interrupts
Aug  6 05:39:32 dcerouter kernel: [  703.288000] rtc: lost 27 interrupts
I get the above all the time in my syslog. Not sure if it has always been like this or not, since these problems started before the 6 days worth of archived logs kept on the core.

Code: [Select]
Aug  6 05:40:42 dcerouter kernel: [  772.760000] eth0: too many iterations (6) in nv_nic_irq.
Aug  6 05:40:43 dcerouter kernel: [  773.752000] eth0: too many iterations (6) in nv_nic_irq.
Aug  6 05:40:44 dcerouter kernel: [  774.752000] eth0: too many iterations (6) in nv_nic_irq.
Aug  6 05:40:46 dcerouter kernel: [  776.756000] eth0: too many iterations (6) in nv_nic_irq.
I see the above pretty often in syslog as well. Not sure exactly what it means though...

Code: [Select]
Aug  6 04:00:15 dcerouter kernel: [81443.716000] printk: 1 messages suppressed.
Aug  6 04:00:15 dcerouter kernel: [81443.716000] rtc: lost 27 interrupts
Aug  6 04:00:19 dcerouter kernel: [81447.820000] printk: 1 messages suppressed.
Aug  6 04:00:19 dcerouter kernel: [81447.820000] rtc: lost 28 interrupts
Aug  6 04:00:23 dcerouter kernel: [81451.924000] printk: 1 messages suppressed.
Aug  6 04:00:23 dcerouter kernel: [81451.924000] rtc: lost 28 interrupts
Aug  6 04:00:23 dcerouter kernel: [81452.000000] NETDEV WATCHDOG: eth0: transmit timed out
Aug  6 04:00:23 dcerouter kernel: [81452.000000] eth0: Got tx_timeout. irq: 00000036
Aug  6 04:00:23 dcerouter kernel: [81452.000000] eth0: Ring at 1fe0e000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] eth0: Dumping tx registers
Aug  6 04:00:23 dcerouter kernel: [81452.000000]   0: 00000036 000000ff 00000003 00df03ca 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  20: 00000000 f0000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  40: 0420e20e 0000a455 00002e20 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  60: 00000000 00000000 00000000 0000ffff 0000ffff 0000ffff 0000ffff 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  80: 003b0f3c 00000001 00000000 007f0028 0000061c 00000001 00200000 00007fc9
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  a0: 0014050f 00000016 45f31800 00002e3b 00000001 00000000 8000cccd 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  c0: 10000002 00000001 00000001 00000001 00000001 00000001 00000001 00000001
Aug  6 04:00:23 dcerouter kernel: [81452.000000]  e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 100: 1fe0e800 1fe0e000 007f00ff 00008000 00010032 00000000 00000017 1fe0ecd0
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 120: 1fe0e050 1c104840 a000ffeb 00000000 00000000 1fe0ecdc 1fe0e05c 0fe08000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 140: 00304120 80c02200 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 180: 00000016 00000008 0194796d 00008103 0000002a 00007800 0194796d 0000f903
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 1a0: 00000016 00000008 0194796d 00008103 0000002a 00007800 0194796d 0000f903
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 1c0: 00000016 00000008 0194796d 00008103 0000002a 00007800 0194796d 0000f903
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 1e0: 00000016 00000008 0194796d 00008103 0000002a 00007800 0194796d 0000f903
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 200: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 260: 00000000 00000000 fe020001 00000100 00000000 00000000 fe020001 00000100
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 2c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 2e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 300: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 320: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 340: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 360: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 380: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 3a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 3c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 3e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 400: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 420: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 440: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 460: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 480: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 4a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 4c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 4e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 500: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 520: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 540: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 560: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 580: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 5a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 5c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 5e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 600: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug  6 04:00:23 dcerouter kernel: [81452.000000] eth0: Dumping tx ring
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 000: 00000000 10394002 20000040 // 00000000 10394202 20000040 // 00000000 10394402 20000040 // 00000000 10394602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 004: 00000000 10394802 20000040 // 00000000 10394a02 20000040 // 00000000 10394c02 20000040 // 00000000 10394e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 008: 00000000 18e68002 20000040 // 00000000 18e68202 20000040 // 00000000 18e68402 20000040 // 00000000 18e68602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 00c: 00000000 18e68802 20000040 // 00000000 18e68a02 20000040 // 00000000 18e68c02 20000040 // 00000000 18e68e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 010: 00000000 19473002 20000040 // 00000000 19473202 20000040 // 00000000 19473402 20000040 // 00000000 19473602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 014: 00000000 19473802 20000040 // 00000000 19473a02 20000040 // 00000000 19473c02 20000040 // 00000000 19473e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 018: 00000000 19472002 20000040 // 00000000 19472202 20000040 // 00000000 19472402 20000040 // 00000000 19472602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 01c: 00000000 19472802 20000040 // 00000000 19472a02 20000040 // 00000000 19472c02 20000040 // 00000000 19472e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 020: 00000000 19471002 20000040 // 00000000 19471202 20000040 // 00000000 19471402 20000040 // 00000000 19471602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 024: 00000000 19471802 20000040 // 00000000 19471a02 20000040 // 00000000 19471c02 20000040 // 00000000 19471e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 028: 00000000 19470002 20000040 // 00000000 19470202 20000040 // 00000000 19470402 20000040 // 00000000 19470602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 02c: 00000000 19470802 20000040 // 00000000 19470a02 20000040 // 00000000 19470c02 20000040 // 00000000 19470e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 030: 00000000 1c107002 20000040 // 00000000 1c107202 20000040 // 00000000 1c107402 20000040 // 00000000 1c107602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 034: 00000000 1c107802 20000040 // 00000000 1c107a02 20000040 // 00000000 1c107c02 20000040 // 00000000 1c107e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 038: 00000000 1c106002 20000040 // 00000000 1c106202 20000040 // 00000000 1c106402 20000040 // 00000000 1c106602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 03c: 00000000 1c106802 20000040 // 00000000 1c106a02 20000040 // 00000000 1c106c02 20000040 // 00000000 1c106e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 040: 00000000 1c105002 20000040 // 00000000 1c105202 20000040 // 00000000 1c105402 20000040 // 00000000 1c105602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 044: 00000000 1c105802 20000040 // 00000000 1c105a02 20000040 // 00000000 1c105c02 20000040 // 00000000 1c105e02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 048: 00000000 1c104002 20000040 // 00000000 1c104202 20000040 // 00000000 1c104402 20000040 // 00000000 1c104602 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 04c: 00000000 1c104802 20000040 // 00000000 20cfb002 20000040 // 00000000 20cfb202 20000040 // 00000000 20cfb402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 050: 00000000 20cfb602 20000040 // 00000000 20cfb802 20000040 // 00000000 20cfba02 20000040 // 00000000 20cfbc02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 054: 00000000 20cfbe02 20000040 // 00000000 0f50d002 20000040 // 00000000 0f50d202 20000040 // 00000000 0f50d402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 058: 00000000 0f50d602 20000040 // 00000000 0f50d802 20000040 // 00000000 0f50da02 20000040 // 00000000 0f50dc02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 05c: 00000000 0f50de02 20000040 // 00000000 0e454002 20000040 // 00000000 0e454202 20000040 // 00000000 0e454402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 060: 00000000 0e454602 20000040 // 00000000 0e454802 20000040 // 00000000 0e454a02 20000040 // 00000000 0e454c02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 064: 00000000 0e454e02 20000040 // 00000000 1cd67002 20000040 // 00000000 1cd67202 20000040 // 00000000 1cd67402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 068: 00000000 1cd67602 20000040 // 00000000 1cd67802 20000040 // 00000000 1cd67a02 20000040 // 00000000 1cd67c02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 06c: 00000000 1cd67e02 20000040 // 00000000 0d83f002 20000040 // 00000000 0d83f202 20000040 // 00000000 0d83f402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 070: 00000000 0d83f602 20000040 // 00000000 0d83f802 20000040 // 00000000 0d83fa02 20000040 // 00000000 0d83fc02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 074: 00000000 0d83fe02 20000040 // 00000000 105f0002 20000040 // 00000000 105f0202 20000040 // 00000000 105f0402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 078: 00000000 105f0602 20000040 // 00000000 105f0802 20000040 // 00000000 105f0a02 20000040 // 00000000 105f0c02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 07c: 00000000 105f0e02 20000040 // 00000000 2740b002 20000040 // 00000000 2740b202 20000040 // 00000000 2740b402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 080: 00000000 2740b602 20000040 // 00000000 2740b802 20000040 // 00000000 2740ba02 20000040 // 00000000 2740bc02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 084: 00000000 2740be02 20000040 // 00000000 1bc58002 20000040 // 00000000 1bc58202 20000040 // 00000000 1bc58402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 088: 00000000 1bc58602 20000040 // 00000000 1bc58802 20000040 // 00000000 1bc58a02 20000040 // 00000000 1bc58c02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 08c: 00000000 1bc58e02 20000040 // 00000000 208e7002 20000040 // 00000000 208e7202 20000040 // 00000000 208e7402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 090: 00000000 208e7602 20000040 // 00000000 208e7802 20000040 // 00000000 208e7a02 20000040 // 00000000 208e7c02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 094: 00000000 208e7e02 20000040 // 00000000 36193002 20000040 // 00000000 36193202 20000040 // 00000000 36193402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 098: 00000000 36193602 20000040 // 00000000 36193802 20000040 // 00000000 36193a02 20000040 // 00000000 36193c02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 09c: 00000000 36193e02 20000040 // 00000000 113dc002 20000040 // 00000000 113dc202 20000040 // 00000000 113dc402 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 0a0: 00000000 113dc602 20000040 // 00000000 113dc802 20000040 // 00000000 113dca02 20000040 // 00000000 113dcc02 20000040
Aug  6 04:00:23 dcerouter kernel: [81452.000000] 0a4: 00000000 113dce02 20000040 // 00000000 113dd002 20000040 // 00000000 113dd202 20000040 // 00000000 113dd402 20000040
...
...
truncated to stay under the 20000 character post limit

I have a slew (meaning thousands) of these prior to my reboot this morning. The first one appeared in the log at 0358 in the morning, and they continued at least a few times a minute for the remainder of the night until my reboot.

Ok, so what now? I can't fathom why out of nowhere I am having these problems after 6 months of near perfect stability?

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #1 on: August 06, 2008, 01:07:48 pm »
Thanks, I will definitely check out the ehternet tools when I get home tonight. I'm going to go ahead and place an order tonight for a network card - i do have a pci-x slot open. Hardware failure is starting to sound like the only feasible explaination. I might even start looking at new motherboard options...

Regarding cooling - I have been checking that. I have 5 case fans running 24 hours a day, and dust the internals once a week. I checked the other day thinking maybe heat was the problem, but my heatsink was cool to the touch, maybe just a bit above ambient temperature. The drives get pretty warm though, but I don't think its warm enough to cause such issues. I didn't think to check the southbridge temp - I'll do that as well when I get home.

Regarding network and cabling - The cabling is all new Cat5e. Its the same as when things ran stable. Also, I have tested this issue by disconnecting everything on the internal network, and only allowing the core to run with the same lockup happening.

If I can't resolve this soon, and I continue to have RAID problems as well, it looks like it will be time for a new motherboard. It seems awefully odd that I've ended up with both network issues and Raid issues as the exact same time.

Thanks for sharing your experiences on this
« Last Edit: August 06, 2008, 01:11:18 pm by jondecker76 »

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #2 on: August 06, 2008, 01:20:11 pm »
Looking through the wiki I have found others with the network problem, exibiting the same errors/behavior:
http://wiki.linuxmce.org/index.php/Gigabyte_GA-M61P-S3

So, rather than chase this in circles, I'm going to opt for a new motherboard. Currently taking suggestions!
I'd like to re-use my processor (AMD BE-2400) and RAM (DDR2 800MHZ). As many PCI slots as possilbe, and as many SATA ports as possible. I'm going back to the wiki to read more now, but please, if you have some suggestions please post them!

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #3 on: August 06, 2008, 11:12:41 pm »
Had 4 network dropouts today. There is a small heatsink on the north bridge that was very hot (could only touch it for about 2 seconds) in an area where there was poor circulation. Just to see if this was the problem, I took the entire top off of the server case and put a box fan on top of it. The small heatsink is now cool to the touch - just  slightly above ambient temperature. Now I'm going to wait about 24 hours (we were to the point where we were getting the network drops every 4-5 hours.) If by this time tomorrow everything is still looking good, I may invest in a better cooling setup rather than changing out the motherboard and dealing with everything that would go with that (full install again?)
Time to wait and see

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #4 on: August 06, 2008, 11:46:35 pm »
Network already crashed back out. I'm junking this board. Just ordered a new M2n-SLI Deluxe with an nVidia 7300 gfx card from new egg. Hopefully I can get it changed out this weekend

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #5 on: August 07, 2008, 02:52:18 am »
I'll have the PSU looked at. Its only a few months old so I haven't suspected it - but then again I've never had one die on me so I wouldn't know what to look for.

Thanks for the tip

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #6 on: August 07, 2008, 12:03:23 pm »
Ok, my network stayed up all night... I've figured out why I had a stable system for so long in the process..

Running ifconfig, it jumped out at me that I used to run eth0 on the external network, and eth1 on the internal network. When I re-installed, I took everything apart and video taped the process. Then, a little over a month ago, I did a full re-install, but this time eth0 is on the Internal network (as Hari had suggested to me, NVidia NICs are crap). I swapped cabling and interfaces in the web admin last night and finally enjoyed a reboot-free morning. I can't believe I didn't think of this earlier. I'm going to see what happens for the next couple of days.

So, assuming that the M2NPV-VM motherboard is still good, maybe I'll make another media director out of it once my new M2N-SLI Deluxe comes in.

marrandy

  • Guru
  • ****
  • Posts: 162
    • View Profile
Re: Horrible Network instability problems
« Reply #7 on: August 11, 2008, 02:22:59 am »
Yes, but are you still getting the RTC errors in the logs.

What is causing that ?

I reversed NIC positions (M2NPV-M) as auggested, but i still see:-

Aug 10 17:20:49 dcerouter kernel: [ 2084.920000] rtc: lost 28 interrupts
Aug 10 17:21:04 dcerouter kernel: [ 2099.780000] rtc: lost 27 interrupts
Aug 10 17:21:06 dcerouter kernel: [ 2101.904000] rtc: lost 27 interrupts
Aug 10 17:21:08 dcerouter kernel: [ 2104.024000] rtc: lost 27 interrupts
Aug 10 17:21:10 dcerouter kernel: [ 2106.148000] rtc: lost 27 interrupts
Aug 10 17:21:12 dcerouter kernel: [ 2108.272000] rtc: lost 27 interrupts
Aug 10 17:21:19 dcerouter kernel: [ 2114.640000] rtc: lost 27 interrupts
Aug 10 17:21:21 dcerouter kernel: [ 2116.768000] rtc: lost 27 interrupts
Aug 10 17:21:23 dcerouter kernel: [ 2118.892000] rtc: lost 27 interrupts
Aug 10 17:21:25 dcerouter kernel: [ 2121.012000] rtc: lost 27 interrupts
Aug 10 17:21:29 dcerouter kernel: [ 2125.256000] rtc: lost 28 interrupts
Aug 10 17:21:31 dcerouter kernel: [ 2127.380000] rtc: lost 28 interrupts

jondecker76

  • Alumni
  • wants to work for LinuxMCE
  • *
  • Posts: 763
    • View Profile
Re: Horrible Network instability problems
« Reply #8 on: August 11, 2008, 02:51:59 am »
yes, I still get them but it hasn't given me any stability issues. In addition to curing my random lockups of the network, it also i'mproved random lockups in general, RAID problems I was having and mythtv doesn't freeze on me anymore like it used to. I still get the rtc messages, but a ton less (by a factor of about 100) and no stability problems, so I'm happy

marrandy

  • Guru
  • ****
  • Posts: 162
    • View Profile
Re: Horrible Network instability problems
« Reply #9 on: August 11, 2008, 04:39:27 pm »
Hi Jon.

That's what I thought.

Oh, I have a M2NPV-VM motherboard - one of the pluto/linuxmce reference motherboards that the developers use according to the wiki.

I have run my pclinuxOS livecd on the system overnight with both cards enabled and I don't see any issues (I wish they had based on pclinuxOS instead of kubuntu - I think its a much better system, but I digress).

The only times (that I remember) in the past I have seen RTC issues in the logs, is with Asterisk.  I have run a separate asterisk server for 3+ years now, but with hardware cards so it may not be the same as the linuxmce I have which is using ztdummy.

I'm tempted to find a spare H/D and do a clean install of asterisk on my hardware to see if the errors come back.

The error messages don't say what caused the issue.  Isn't there something (systrace ?) that will provide more detail on the process that is causing the error message ?

marrandy

  • Guru
  • ****
  • Posts: 162
    • View Profile
Re: Horrible Network instability problems
« Reply #10 on: August 11, 2008, 06:21:41 pm »
Ok - there are a lot of issues around about this (googled).  It all depends on whether your motherboard has the old RTC (Real Time Clock) timer or the newer HPET (High Precision Event Timer). On top of that, if it has HPET, is it enabled.  Does the OS kernel support it ?  Even IF the kernel supports it, was the kernel built with it enabled and correct options ?  But wait, there's more.  Are things like asterisk, the  ztdummy compiled/recompiled for it ?

On top of that, there are a bunch of 'fixes' that disable the HPET and use the old RTC, change the max-user-freq, turn ACPI off

As the HPET is supposed to be a move forward and supported in kernel 2.6.22, I'm not sure we should be going backwards.

http://en.wikipedia.org/wiki/High_Precision_Event_Timer

As I'm not a programmer, a lot of this is way above me.

But i found this interesting, recent, asterisk thread.

http://forums.digium.com/viewtopic.php?t=22387&highlight=rtc+lost+interrupts

Where, at the bottom, is said:-

"Okay... I fixed it!
Still I'm getting a delay in a meetme room when I'm sitting next to someone else trying, but I'm blaming that on the default meetme operations.

Right, what I did was:
1. Recompile the default Debian kernel with a timer of 1000Hz instead of 256Hz. (Needed for Zaptels timing).
2. Commented out the defines for the RTC from ztdummy.c
3. Recompiled zaptel & installed it (not the init scripts though).
4. Recompiled asterisk and installed it.
5. Ran modprobe ztdummy
6. Activated a meetme conference.

Will something like this also work for you?"