LinuxMCE Forums

General => Users => Topic started by: powerbits on January 07, 2009, 06:02:55 am

Title: Suspend to RAM
Post by: powerbits on January 07, 2009, 06:02:55 am
Can a Linuxmce go in suspend to ram mode?
And wake up on a USB trigger?

Seems to me this would be a great way to save power and eliminate boot time
Title: Re: Suspend to RAM
Post by: tkmedia on January 07, 2009, 06:18:05 am
some have tried with 0710 chk wiki.
looks promising with 0810.





Tim
Title: Re: Suspend to RAM
Post by: powerbits on January 07, 2009, 10:15:33 am
how do i do it?
Title: Re: Suspend to RAM
Post by: chrisbirkinshaw on January 07, 2009, 05:26:56 pm
some have tried with 0710 chk wiki.
looks promising with 0810.

Tim

I posted previously to this forum that when an MD is asleep and the core is restarted or a certain time period passes the MDs are orphaned from the core, and will no longer transmit or receive DCE messages. i.e. it is not possible to use the MD to control itself, or control it using another orbiter etc.

Has this been addressed?
Title: Re: Suspend to RAM
Post by: tkmedia on January 07, 2009, 05:48:06 pm
Has this been addressed?

Not that I am aware of.... just commenting on 0810's default options to suspend to ram.  I would imagine the DCE message issue would need to be resolved.
Title: Re: Suspend to RAM
Post by: colinjones on January 07, 2009, 08:46:31 pm
some have tried with 0710 chk wiki.
looks promising with 0810.

Tim

I posted previously to this forum that when an MD is asleep and the core is restarted or a certain time period passes the MDs are orphaned from the core, and will no longer transmit or receive DCE messages. i.e. it is not possible to use the MD to control itself, or control it using another orbiter etc.

Has this been addressed?


Wouldn't this just be that the TCP socket is lost? The DCE devices open some TCP connections to communicate with the Router through and these remain open - if you suspend an MD then the TCP connection would in theory remain open just the core would get no response on that session. If you reboot the core, then the socket info and state of that connection on the core end is lost (ie the source/destination IP/port numbers, and the fact that it exists and is owned by the DCERouter process) - when the MD resumes it will attempt to continue communicating on the same socket which will now not respond as it doesn't exist on the other end. But the MD isn't going to know this so it will effectively be permanently locked until something resets the process responsible for that session or reboots the box.

I imagine that not rebooting the core but leaving it long enough would trigger either an application (DECRouter) or OS clean up of the socket as it isn't responding to ACKs/Keep Alives....

Perhaps the simplest way to bullet proof this is hook a little piece of code into the MD Suspend-to-RAM function that 1) sets the MD state to OFF and 2) cleanly closes the TCP connections. Then on resume, some code that tells the MD to re-establish the DCERouter connections.

EDIT: Actually this looks like it might be a case of having to stop all child devices of the MD on suspend (which will close all the DCERouter connections), then start them all again on resume to reinitiate the connections....
Title: Re: Suspend to RAM
Post by: sambuca on January 07, 2009, 11:11:21 pm
I've used the method of restarting the devices on the MD at resume myself(this can be done with a script), but I see this as a hack, and no proper solution.

I agree that we should tell the core about our status, as it might need to know at some point. Perhaps it even should be some new status (MD_S2RAM/MD_S2DISK or whatever)?

Then there is of course the issue with maintaining or re-initiating the connections. Don't know the details of TCP programming, but it sure seems to just lock up the processes/devices in question after resume. Maybe a solution is to send a signal to every process to have it restart every connection to the core? At the other end, the core could restart the connections to the MD when it receives a MD_ON or MD_RESUMED status.

Just thinking out loud here...
Anyway, I suppose this kind of talk belongs in the dev forum :-)

br,
sambuca
Title: Re: Suspend to RAM
Post by: colinjones on January 07, 2009, 11:32:35 pm
Closing the connections rather than trying to maintain them is definitely tidier and much simpler.
Title: Re: Suspend to RAM
Post by: chrisbirkinshaw on January 07, 2009, 11:41:41 pm
Can you provide this script? I tried manually restarting the router on the core, but this still did not result in a responsive MD (as the MD obviously didn't get the message to reload!).

Would be useful to add this to the wiki: http://wiki.linuxmce.org/index.php/Suspend

Thanks,

Chris
Title: Re: Suspend to RAM
Post by: sambuca on January 08, 2009, 10:08:00 am

I really found this "solution" to hackish to add the script, but then again, as it is now, the whole suspend issue is quite hackish.

I just added the script to the wiki. It will basically kill the devices, causing the spawner to restart them. But note that it has a restart count of 50, so it will only work 50 times. You also need to adjust it to match the devices running on your MD.

When I think about it, it should probably be possible to find the list of device ids running on this MD and do a restart on devices based on that.

Ok, just had to google this  :D
Found a interesting thread about TCP connections and suspend/resume : https://lists.linux-foundation.org/pipermail/linux-pm/2008-June/017742.html (https://lists.linux-foundation.org/pipermail/linux-pm/2008-June/017742.html)
According to this, connections can survive suspend, as long as there is no NAT and both sides of the connections silent. So, if we could tell the DCERouter to "be silent" for all connections to the suspended MD, we should have solved one part of the problem. This is actually consistent with my findings, it's only the connections to the MD devices that are affected, and not the ones from the MD to the core (as the MD is very silent when suspended).
 The other part of the problem is if the router recreates connections or otherwise do anything to them while the MD is suspended. Does it create new connections when reloaded?

br,
sambuca
Title: Re: Suspend to RAM
Post by: colinjones on January 08, 2009, 10:32:55 am
Of course if the connections are silent and there are no keep-alives then the connection will never know that a suspend occurred. However:

1) we are talking about engineering a complex (and not necessarily guaranteed, esp at the OS level of the TCP connection) way of sustaining TCP connections when it isn't necessary. The whole point of (LMCE) devices is that they can stop and start, disconnect and reconnect as required. Modularity. Why build this complex extra functionality into DCERouter code, when it is a more elegant and clean/tidy just to close the device connections at suspend, and reconnect on resume. Both require code changes, but the latter is much more bullet proof and tidy.

2) the retain-connections option cannot survive a core reboot, or even a DCERouter reload, so this case would have to be handled anyway... and the simplest way to handle this would be to implement the former option above... which would solve the issue in the first place....
Title: Re: Suspend to RAM
Post by: powerbits on January 08, 2009, 11:35:58 am
so to conclude for a non lmce specialist:

-version 8.10 will be able to do it?

Will it have some issues? or do we need extra scripts?
Title: Re: Suspend to RAM
Post by: colinjones on January 08, 2009, 11:54:36 am
no it will not, unless the devs decide to include that functionality. This is an application thing, not an OS thing, so 0810 really makes not a lot of difference.
Title: Re: Suspend to RAM
Post by: powerbits on January 08, 2009, 01:02:09 pm
but is it possible?  because that is certainly a key feature!

and if it is , is it simple to do, or do you have to be a script guru :)
Title: Re: Suspend to RAM
Post by: chrisbirkinshaw on January 08, 2009, 01:11:53 pm
I think in the meantime we would be best directing our attention to cleanly reopening connections from the MD to the core. Is there a better way than killing all the child processes? I remember trying killing all SCREEN processes and then running the launch manager again from a script but for some reason this was not 100% bulletproof. I will revisit this soon when I have time. Any better ideas?
Title: Re: Suspend to RAM
Post by: chriss on January 08, 2009, 03:41:05 pm
I remember trying killing all SCREEN processes and then running the launch manager again from a script but for some reason this was not 100% bulletproof. I will revisit this soon when I have time. Any better ideas?

Just an idea: why not extend the launch manager to kill all devices before suspending and restart all devices after resuming? all the logic should be there (start/stop MD) just need to somehow connect it to suspend/resume.
Seems a cleaner solution - however I haven't researched it before.

/chriss
Title: Re: Suspend to RAM
Post by: colinjones on January 08, 2009, 10:35:10 pm
hmmm... was thinking, why can't we just send a TERM or HUP signal (not KILL) to LaunchManager - surely all the devices are started as children processes to that, so this would cause them all to shutdown cleanly. However, on researching it a little I'm now confused as to how LM works.

Each device is initiated using the Spawn_Device script, and so each device's PPID is the particular instance of Spawn_Device's PID that started it. But each of those Spawn_Device's PPID is not LM, as I expected. It points to Init, suggesting that LM plays no part in starting devices - that doesn't seem right?! Unless the LM is simply monitoring the devices starting up, not actually doing the launching itself. Either way, the fact that you can use LM to stop and start the devices means that it must have a hook into the functionality somewhere...
Title: Re: Suspend to RAM
Post by: sambuca on January 08, 2009, 11:44:24 pm
1) we are talking about engineering a complex (and not necessarily guaranteed, esp at the OS level of the TCP connection) way of sustaining TCP connections when it isn't necessary. The whole point of (LMCE) devices is that they can stop and start, disconnect and reconnect as required. Modularity. Why build this complex extra functionality into DCERouter code, when it is a more elegant and clean/tidy just to close the device connections at suspend, and reconnect on resume. Both require code changes, but the latter is much more bullet proof and tidy.

2) the retain-connections option cannot survive a core reboot, or even a DCERouter reload, so this case would have to be handled anyway... and the simplest way to handle this would be to implement the former option above... which would solve the issue in the first place....
I guess that reconnecting the device would also restart the TCP connection from the core to the MD (because there is two, isn't there? One core->device and one device->Core ?)
If that is the case, I totally agree with this proposal, it keeps it simple  ;)

As colin mentions, sending the device process a signal could probably do it. Don't know if a process can provide a suspend/resume hook to the kernel, or if there is a special resume signal, but the SIGHUP signal seems like a good candidate (wikipedia says something like: SIGHUP tells programs to reload configuration and reinitialize).

If the signal approach is used, it could be done from both the resume scripts and from the launch manager. As the LM already knows which devices are present (and possibly which ones need restarting, if some don't), it makes sense to me to let it handle this.
This depends on how the LM works, but at least it should be possible to do some ps | grep and bash scripting to get the process IDs to send signal to, using the device id as a start.

sambuca
Title: Re: Suspend to RAM
Post by: colinjones on January 09, 2009, 12:56:42 am
Pretty sure there is more than one connection, at least one in each direction, from something I read in the wiki - but I caution, I remember Thom correcting my understanding on this, and I don't know if it was just me misinterpreting the wiki or if the wiki is actually incorrect. Either way, I can't quiet remember the detail of what Thom explained, but it was less connections than I thought.

I think the first approach should be to see if there is a programmatic way of triggering LM to shutdown the devices normally (as if you pressed the button in the LM window). If this isn't easy, then move onto the SIGHUP approach. It should be easy to "ps" and "grep" the spawn_device processes, and send them all a SIGHUP. But not sure how to restart them - again research into interfaces to LM would help.

And yes, I'm pretty certain that the kernel has hooks you can attach scripts for suspend/resume to... just can't recall exactly where :)