Author Topic: MD wake on Lan not working  (Read 13514 times)

jamo

  • Guru
  • ****
  • Posts: 463
    • View Profile
    • LMCE user Page
Re: MD wake on Lan not working
« Reply #15 on: June 28, 2012, 07:20:58 pm »
Mike

Thanks for the tips. I have been there, though and, I think, eliminated that as the problem as per my comments higher up in the thread regarding WOL being enabled both in BIOS and on the NIC via ethtool.

Thom

I know it chews up the devs' time when we seem to rehash things on the forum but it still (generally) helps in my opinion because the forum then becomes an awesome knowledgebase for people searching on keywords and having similar issues. That's why I post these things that I'm working through and I do try to do so without putting too many red herrings in and after I've done a fair bit of digging myself first.

Misguided, often, but good intentions at least... ;-)

jamo

  • Guru
  • ****
  • Posts: 463
    • View Profile
    • LMCE user Page
Re: MD wake on Lan not working
« Reply #16 on: August 22, 2012, 08:48:38 am »
Bump

Still struggling with this WOL issue. It really is a pain, see, because the MD is a laptop that I want to keep closed to keep out dust and muck and ultimately I actually want to hide it in a cabinet or something so getting to the power switch is a no-no.

Progress-

Since WOL works fine with my other MDs I'm certain it has something to do with the Broadcom BCM5787M NIC, the tg3 driver which controls it or a combination of these. Most posts on the web state that the NIC has to be properly shut down in the poweroff sequence and because this isn't happening, it can't react to WOL packets. That makes sense because when I remove mains power (after shutting down) and then plug it back in, WOl works! So it seems that on shutdown the NIC is left in some state that is not right and when power is removed, the NIC is forceably closed or dropped to the appropriate state.

So I tried a couple of things-
1. I tried upgrading the tg3 driver to the latest in case this was fixed. Doesn't seem to have made a difference although I must confirm that I *am* using the latest (just be sure I created the correct initramfs etc)
2. I tried purging network-manager which worked for some people. Didn't help me.
3. I tried adding a patch to the /etc/init.d/halt script that forcibly does an
Code: [Select]
ifconfig eth0 down but this just causes the system to hang on shutdown. I suspected this was because the root file system was still mounted on NFS and therefore the system wouldn't take the NIC down? The reason I suspected NFS still mounted was because there are some messages on shutdown like
Code: [Select]
/ still busy.
4. Based on 3 above I thought perhaps the shutdown scripts were incorrectly ordered (I came across a post somewhere that suggested this was possibly the case) so I modded that to make NBD shutdown come earlier (as per advice in the post) but that doesn't seem to have improved things.

Next steps:
1. Confirm I am actually using the new tg3 driver (perhaps replace tg3.ko whcih I think I'm using with a nonsense file, rebuild initramfs and confirm that the system doesn't work!!).
2. Use live ubuntu CDs for 8.10, 10.04 and 12.xx to confirm that the WOL problem exists for 10.04 only. Then, possibly force the NIC driver from 8.10 or 12.xx in? Sounds like a major hack but who knows? Is it possible to use an alternative, *generic* driver to TG3 that might work?
3. Stick an X-10 controller on the mains source so I can forcibly pull the plug and then re-power the machine each time I shut it down!!!

sambuca

  • Guru
  • ****
  • Posts: 462
    • View Profile
Re: MD wake on Lan not working
« Reply #17 on: August 22, 2012, 11:49:24 am »
3. The nic is shut down when you do "ifconfig eth0 down". The reason your system hung is that NFS is trying to communicate with the core, and it can't as the network is gone.

Do you suspend the MD or do you shut it down?

br,
sambuca

jamo

  • Guru
  • ****
  • Posts: 463
    • View Profile
    • LMCE user Page
Re: MD wake on Lan not working
« Reply #18 on: August 22, 2012, 02:07:26 pm »
3. The nic is shut down when you do "ifconfig eth0 down". The reason your system hung is that NFS is trying to communicate with the core, and it can't as the network is gone.
Makes sense... but how do I fix that? Should NFS not be shut down by this time if the halt script is the last to run in the shutdown sequence?
I'm thinking I need to get onto one of the working MDs and put a whole lot of verbose logging to screen and some log file into the scripts so I can see what's happening, then do the same on the problem child and compare the two. Of course the tricky part is that the logging is happening to an NFS share so as soon as NFS or ETH goes down, that's the end of it... but I guess the screen logging could still be visible.

Quote
Do you suspend the MD or do you shut it down?
At the moment I'm shutting it down. I've tinkered briefly with suspend, not had any more luck with suspend than shutdown but ultimately I'd like to get shutdown sorted. I think when I tried suspend it didn't seem to show much difference either in terms of startup time or in terms of WOL response. Which probably means it's not really suspending. But that's a battle for another day. If I get thoroughly whipped with this one, I'll rebuild the image and start from scratch with suspend as an option.

jamo

  • Guru
  • ****
  • Posts: 463
    • View Profile
    • LMCE user Page
Re: MD wake on Lan not working
« Reply #19 on: November 26, 2012, 02:08:50 pm »
Had another stab at this now that my system is arising, phoenix-like from the ashes of the lightning strike. I still have a veritable dance of laptops available to me thanks to my work colleagues damaging their equipment and I'm very loathe to abandon these otherwise excellent media directors - HP6410b notebooks.

Next steps:
1. Confirm I am actually using the new tg3 driver (perhaps replace tg3.ko whcih I think I'm using with a nonsense file, rebuild initramfs and confirm that the system doesn't work!!).
2. Use live ubuntu CDs for 8.10, 10.04 and 12.xx to confirm that the WOL problem exists for 10.04 only. Then, possibly force the NIC driver from 8.10 or 12.xx in? Sounds like a major hack but who knows? Is it possible to use an alternative, *generic* driver to TG3 that might work?
3. Stick an X-10 controller on the mains source so I can forcibly pull the plug and then re-power the machine each time I shut it down!!!

Regarding step 1-
Code: [Select]
modinfo tg3
reveals that the latest tg3 driver is being used but still no dice. So my assumption is that it is not just the driver but some combination of kernel/driver interaction that causes this.

Considering step 2-> apparently this is fixed in ubuntu 12.xx so would it be possible to upgrade my media director to 12 while the rest run 10.04?

jamo

  • Guru
  • ****
  • Posts: 463
    • View Profile
    • LMCE user Page
Re: MD wake on Lan not working
« Reply #20 on: November 27, 2012, 10:11:10 am »
Pretty much given up on this one.

Definitely got the latest tg3 driver working but it doesn't change anything.

Tore apart the rc.0 scripts last night and came to the following conclusions-

The problem with the Broadcom BC5787m NIC is, as discussed on many LTS forums, that it won't respond to WOL packets unless it is shut down properly (a-la ifdown eth0) or if the power is completely removed from the NIC for a few seconds.

Most who've had issues with this have managed to ensure the NIC is shut down cleanly during the power off sequence by either creating a shutdown script with the above command, or hacking one of the other scripts. However, because we have completely diskless clients, we can't shut down the NIC until the very last moment because NFS is still mounted (the root file system).

When I've tried this at any point short of the last command that is called (/sbin/halt) the system hangs, apparently due to NFS being busy.

The final command, /sbin/halt is part of the upstart package and accepts the parameter "-i" which is supposed to shutdown all network interfaces but, at least in this version, simply doesn't. It just accepts the parameter for compatability's sake. I've seen the code.

So.... blah. Run out of ideas. Going to mark these devices (the laptops) and NICs (BCM5787s) as working with caveats under 10.04.

If anyone has any other suggestions, please shoot.