LinuxMCE Forums

General => Developers => Topic started by: colinjones on March 03, 2009, 06:10:35 am

Title: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 03, 2009, 06:10:35 am
I've been working on this code to try and make suspending and resuming MDs reliable. The basic issue, I believe, is that the MD and DCERouter expect a continuous TCP connection for DCE communication whenever an MD is running, and when an MD is suspended it is effectively still in a running state. Once the MD has been suspended long enough for the TCP connection that the DCERouter is still holding open, to be terminated due to lack of response, the MD can never resume operation because it expects that TCP session still to be open, and the DCERouter can no longer reach it to initiate a Reload. This is true of all DCE devices on the MD (each has at least 1 DCE connection).

The objective is to create Suspend and Resume scripts to place in /etc/acpi/suspend.d and /etc/acpi/resume.d that will cleanly tell the DCE devices of the local MD to shutdown, thus terminating the TCP connections correctly. Then on resume, to tell the MD to reinitiate all these connections by getting it to spawn the DCE devices again, thus reconnecting it with the DCERouter.

So far I have got the script successfully to shutdown all the DCE devices and the MD DCE device itself, and output the device list to a file ready for the resume script to use. I have determined that it is unnecessary to disable the DCE devices before shutting them down because using the SYSCOMMAND message with a value of 0 means that the spawning system does not attempt to restart the devices even if they are enabled. So I have commented out that bit. I'm finished for the day now.

I haven't written the resume script yet, but have manually started the MD's devices using /usr/pluto/bin/Start_LocalDevices.sh. The only problem with this is that it only starts the child devices, not the MD DCE Device itself. Although the MD seems to work fine, this will obviously prevent relaying of DCE messages to children. The MD DCE device itself, of course, also creates a DCE connection so it is necessary to stop that device as well.

Code: [Select]
. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

FindMDDeviceQ="SELECT PK_Device FROM Device
                WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

DeviceID=56

FindChildrenQ="SELECT Device.PK_Device
                FROM Device
                JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
                LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
                LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
                ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
                WHERE (Device.FK_Device_ControlledVia=$DeviceID
                OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
                AND DeviceTemplate.FK_DeviceCategory <> 1
                AND DeviceTemplate.ImplementsDCE=1
                AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

echo "$DeviceList" > /usr/pluto/var/Suspend_DeviceList_$DeviceID.log

#for Device in $DeviceList; do

#       DisableDeviceQ="Update Device
#               Set Disabled=1
#               Where PK_Device=$Device;
#       "

#       RunSQL "$DisableDeviceQ"

#done

#DisableDeviceQ="Update Device
#               Set Disabled=1
#               Where PK_Device=$DeviceID;
#"

#RunSQL "$DisableDeviceQ"

for Device in $DeviceList; do

        /usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"
Title: Re: Suspend/Resume MDs....
Post by: colinjones on March 04, 2009, 02:59:10 am
I have gone a bit further. The script now successfully shuts down all devices tidily, and I can start them again using the lmce LM script. This all seems to work. Currently I have a hard coded sleep in here to allow VDR to close down as unfortunately the simple SQL query to identify all running devices does not recurse down the tree, so misses that VDR is still shutting down.

The main problem I am having now is working out how to convince my MD to suspend to RAM so that I can test that this script is triggered when in the /etc/acpi/suspend.d directory. The only options in my BIOS are POS and STR, but it doesn't specifically say that it is attaching that option to the power button... I chose STR, but when I hit the power button it halts the system. I tried sending "standby" or "mem" to /sys/power/state, but this seems to do funky things! Has anybody got any suggestions on how I can trigger a suspend to RAM at the command line?

Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

FindMDDeviceQ="SELECT PK_Device FROM Device
WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

DeviceID=56

FindChildrenQ="SELECT Device.PK_Device
FROM Device
JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
WHERE (Device.FK_Device_ControlledVia=$DeviceID
OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
AND DeviceTemplate.FK_DeviceCategory <> 1
AND DeviceTemplate.ImplementsDCE=1
AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

echo "$DeviceList" > /usr/pluto/var/Suspend_DeviceList_$DeviceID.log

#for Device in $DeviceList; do

# DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$Device;
# "

# RunSQL "$DisableDeviceQ"

#done

#DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$DeviceID;
#"

#RunSQL "$DisableDeviceQ"

for Device in $DeviceList; do

/usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"



#until [[ $(RunSQL "SELECT count(*)
# FROM Device
# JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
# LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
# LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
# ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
# WHERE (Device.FK_Device_ControlledVia=$DeviceID
# OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
# AND DeviceTemplate.FK_DeviceCategory <> 1
# AND DeviceTemplate.ImplementsDCE=1
# AND Device.Registered=1;
#") == 0 ]]; do sleep 2; echo "pass"; done

sleep 20

LMProcesses=$(ps aux|grep lmce_launch_manager|grep -v grep|cut -c10-16)

echo "$LMProcesses"

for Process in $LMProcesses; do

kill -9 $(echo "$Process")

done
Title: Re: Suspend/Resume MDs....
Post by: tschak909 on March 04, 2009, 04:02:09 am
Look into the hibernate scripts.

-Thom
Title: Re: Suspend/Resume MDs....
Post by: colinjones on March 04, 2009, 04:07:52 am
Look into the hibernate scripts.

-Thom


Thom - whereabouts are these scripts? I looked in /usr/pluto/bin but couldn't see anything...
Title: Re: Suspend/Resume MDs....
Post by: tschak909 on March 04, 2009, 04:41:53 am
apt-cache search hibernate .... they are scripts that are in the ubuntu repository, not part of the pluto syste.

-Thom
Title: Re: Suspend/Resume MDs....
Post by: colinjones on March 04, 2009, 07:44:38 am
damn - back to square one, I have a package dependency issue so can't install anything... guess I'll have to give up at this point.
Title: Re: Suspend/Resume MDs....
Post by: tschak909 on March 04, 2009, 07:57:52 am
:(
Title: Re: Suspend/Resume MDs....
Post by: sambuca on March 04, 2009, 12:31:49 pm
Have you looked at the http://wiki.linuxmce.org/index.php/Suspend (http://wiki.linuxmce.org/index.php/Suspend) page ? It describes how to set up hibernation, and also how to add script execution on suspend/resume.
Doesn't help with the dependency issue though  :-\

Great work by the way, I think you are very close to a working solution. If you see in the wiki article mentioned, I did a very crude script to kill the processes I knew were running. But I didn't think of using the SQL approach you use.

sambuca
Title: Re: Suspend/Resume MDs....
Post by: colinjones on March 05, 2009, 01:09:51 am
sambuca

Thanks for the link - I didn't realise that you had extended this article to include the killalls. My update is, I have (stupidly) realised that the dependency issue is only on my core, not MD! So I have got hibernate installed on my MD now. Was trying to work out the "scriptlets", your example has given me a pointer - I'm assuming I can put any bash script in one of those functions then declare it to be called within a suspend or resume hook? If so I will move my entire script into one of those.... still not sure what the point of the /etc/acpi/suspend.d etc is for... I guess using the hibernate command circumvents that entire system.

The other issue I want to fix is the hard coded "sleep" delay to allow the DCE devices to shut down. When I exectute the SQL query, it only returns the children devices, not all decendants. Sending the DCE device shutdown command passes all the way down so that isn't an issue. But checking that all have completed shutdown is an issue. I don't see how to write a SQL query that will enumerate all decendant devices ... any suggestion would be greatly appreciated! Failing that, the only thing I can do is build a recursive function to progressively walk down the tree doing the same SQL query until I have a complete list - BUT its been a long time since I've bent my brain around recursion like this (I get myself in knots!) so was hoping I could avoid it :)
Title: Re: Suspend/Resume MDs....
Post by: colinjones on March 05, 2009, 03:28:38 am
Bit confused now - seems s2ram was left out of uswsusp for Gutsy so the ususpend method in hibernate doesn't work, and the sysfs-ram method doesn't seem to have a 'force' option to ignore unrecognised hardware, and produces these error messages for my script:

/usr/pluto/bin/Config_Ops.sh: 5: [[: not found
/usr/pluto/bin/LockUtils.sh: 5: [[: not found
/usr/pluto/bin/Config_Ops.sh: 34: Syntax error: Bad for loop variable

I'm wondering whether these errors are because I have put my whole script in the scriptlet function, and scriptlets don't like bash (I can easily separate it out and call it as a bash script, but I don't see much point until I can resolve the unrecognised hardware bit)

Has anyone any experience on this stuff?
Title: Re: Suspend/Resume MDs....
Post by: colinjones on March 05, 2009, 04:35:41 am
Update:

Downloaded the source for suspend-0.8 and compiled. This didn't compile s2ram for some reason, so a manually 'make s2ram' and then got the executable. Copied this into /sbin where the s2disk was and went back to the ususpend-ram method. Now this gets passed the missing s2ram bit but has the same problems as the sysfs-ram method with the errors in my script.

I assumed this was because the scriptlet system doesn't support full bash scripts, and separated the script from the scriptlet:

Scriptlet:
Code: [Select]
AddSuspendHook 15 LMCESuspend

LMCESuspend() {

/root/suspend.sh &> /root/sus.log

}

And script:
Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

echo "LMCE Suspend Process Started" > /var/log/pluto/sus.log

ethtool -s eth0 wol g

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

FindMDDeviceQ="SELECT PK_Device FROM Device
WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

DeviceID=56

FindChildrenQ="SELECT Device.PK_Device
FROM Device
JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
WHERE (Device.FK_Device_ControlledVia=$DeviceID
OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
AND DeviceTemplate.FK_DeviceCategory <> 1
AND DeviceTemplate.ImplementsDCE=1
AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

echo "$DeviceList" > /usr/pluto/var/Suspend_DeviceList_$DeviceID.log

#for Device in $DeviceList; do

# DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$Device;
# "

# RunSQL "$DisableDeviceQ"

#done

#DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$DeviceID;
#"

#RunSQL "$DisableDeviceQ"

for Device in $DeviceList; do

/usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"



#until [[ $(RunSQL "SELECT count(*)
# FROM Device
# JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
# LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
# LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
# ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
# WHERE (Device.FK_Device_ControlledVia=$DeviceID
# OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
# AND DeviceTemplate.FK_DeviceCategory <> 1
# AND DeviceTemplate.ImplementsDCE=1
# AND Device.Registered=1;
#") == 0 ]]; do sleep 2; echo "pass"; done

sleep 20

killall -s SIGKILL -r lmce_launch_manager*

The script seems to be running but can't work out why it is not doing what it is intended to do... the PC goes into hibernate almost immediately without performing the tasks....
Title: [Testers please?] Suspend/Resume MDs....
Post by: colinjones on March 06, 2009, 03:23:22 am
OK, I have the suspend script working in conjunction with hibernate/s2ram.

My hardware doesn't seem to know how to resume at all.... is there anybody out there that would be interested in testing this script on their MDs? And possibly telling me whether it resumes?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: krys on March 06, 2009, 04:00:52 pm
Collin,
I would be willing to be a guinea pig for you, but I just dont know when would be a good time since we are a few timezones apart. I can just give you access and let you set it up then test it for you.
Let me know
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 06, 2009, 08:02:38 pm
Krys

Start by:

sudo apt-get install hibernate uswsusp

Then confirm that uswsusp didn't install the s2ram tool (known problem in Gutsy), if so you will need to download the sources for uswsusp, then from its directory ./configure --minimal-install (I think), then make s2ram. You can now copy the s2ram executable to your /sbin directory.

Let me know when you have got that done successully and I will give you the rest of the config instructions!

Col.
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 06, 2009, 09:19:17 pm
EDIT: Updated instructions due to several mistakes.

jondecker is going to try as well, so I will start putting in the detail. the following is all on the MD you want to suspend, not the core.

- make sure Suspend to RAM (STR) is enabled in your BIOS.
- in /etc/acpi/events/powerbtn - make sure it is pointing at /etc/acpi/powerbtn.sh
- make a backup copy of powerbtn.sh, then edit the original to have these contents...

Code: [Select]
#!/bin/sh
# /etc/acpi/powerbtn.sh
# Initiates a shutdown when the power putton has been
# pressed.

# Skip if we just in the middle of resuming.
test -f /var/lock/acpisleep && exit 0

hibernate-ram

- now create a file called /usr/pluto/bin/suspend.sh (with appropriate permissions for exec) and put this in it...

Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

echo "LMCE Suspend Process Started" > /var/log/pluto/sus.log

## Turn on WOL

ethtool -s eth0 wol g

## Find local IP address

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

## Use local IP address to find MD device number

FindMDDeviceQ="SELECT PK_Device FROM Device
WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

## Identify immediate children DCE devices that are currently Registered

FindChildrenQ="SELECT Device.PK_Device
FROM Device
JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
WHERE (Device.FK_Device_ControlledVia=$DeviceID
OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
AND DeviceTemplate.FK_DeviceCategory <> 1
AND DeviceTemplate.ImplementsDCE=1
AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

## Send SYSCOMMAND_0 (device shutdown) message to all immediate children. These devices will relay the message to all decendents

for Device in $DeviceList; do

/usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

## Send SYSCOMMAND_0 message to MD device itself
/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"


## Wait for DCE devices to complete their shutdown

MaxLoopCount=50
for ((i = 0; i < MaxLoopCount; i++)); do
Devices=$(cat /usr/pluto/locks/pluto_spawned_local_devices.txt | grep -v '^$' | tr '\n' ',')
Devices="${Devices%,}"
if [[ -z "$Devices" ]]; then
break
fi
RegCount=0
Q="SELECT COUNT(*) FROM Device WHERE PK_Device IN ($Devices) AND Registered=1"
RegCount=$(RunSQL "$Q")
if [[ "$RegCount" -eq 0 ]]; then
break
fi
echo "Waiting for $RegCount devices to shutdown"
sleep 1
done
                                                                                                                       
echo "Done waiting"

## Kill the LM processes
                                                                                                                       
killall -q -s SIGKILL -r lmce_launch_manager\*

- now edit /etc/hibernate/hibernate.conf - make sure that only TryMethod ram.conf is not commented out
- now edit /etc/hibernate/ram.conf - make sure that TryMethod ususpend-ram.conf is not commented out, but the other two methods are commented out.
- now edit /etc/hibernate/ususpend-ram.conf - if your hardware is not recognised when test the hibernate, you may need to uncomment USuspendRamForce yes. The other options are for various compatibility levels that effect whether the resume works or not... haven't got my hardware to resume successfully yet.... apparently that is a common problem with Linux suspend
- now create a file in /etc/hibernate/scriptlets.d - call it suspend and insert this code ....

Code: [Select]
AddSuspendHook 15 LMCESuspend

LMCESuspend() {

/usr/pluto/bin/suspend.sh

}

You are ready to test! From a shell, type hibernate-ram. You should see the MD suddenly drop out to the launch manager and then the launch manager die, followed by the MD going into suspend mode. In the shell you will see the hibernate script printing how many DCE devices are left to shutdown, every second. Note the VDR device takes about 15 seconds to shutdown on my system, but all the others are pretty much immediate. So if you are using VDR, you may see a delay between the MD dropping to launchmanger, and the launchmanager actually dying.

If that all works ok, try resuming your system and let me know if it was successful. As I say, try some of the other options in ususpend-ram.conf if the resume is unsuccessful. Also, the config in acpi above, should allow you just to tap your power button and it go into suspend from there.
Title: Re: [Testers please?] Suspend/Resume MDs....
Post by: totallymaxed on March 08, 2009, 09:55:01 am
OK, I have the suspend script working in conjunction with hibernate/s2ram.

My hardware doesn't seem to know how to resume at all.... is there anybody out there that would be interested in testing this script on their MDs? And possibly telling me whether it resumes?

Nice work Colin;

Colin we'll give this a test on Mon/Tues if I can grab some time on one of our test Cores... will post back here our experiences.

All the best

Andrew
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: jondecker76 on March 08, 2009, 04:58:13 pm
I'm just now getting some time to start on this. I had a question on one of the steps...
Quote
- make a copy of powerbtn.sh, then edit the original to point at /usr/pluto/bin/suspend.sh

I'm not sure exactly what you want to do at this step, can you explain a bit more?

thanks

Jon
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 08, 2009, 09:00:52 pm
I'm just now getting some time to start on this. I had a question on one of the steps...
Quote
- make a copy of powerbtn.sh, then edit the original to point at /usr/pluto/bin/suspend.sh

I'm not sure exactly what you want to do at this step, can you explain a bit more?

thanks

Jon

Sorry Jon, I completely screwed up the instructions because I was trying to convert them from how I hacked it together into how it should be done in retrospect! So I got mixed up as to what was in each file :) Give me 5 mins and I will update the instructions...

EDIT: OK, the instructions are now updated!
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 09, 2009, 10:57:38 am
I'll give it a go. I was out of town this weekend, if not I would probably have tested it already  ;)

I already have a working suspend/resume on one of my MDs, so I think this should be easy to test. But I also have had my share of problems with suspending, or more correctly resuming. In 704 I had suspend to ram working, but when 710 came out, they had removed the s2ram utility.. bummer. To make a long story short, I instead set it up to suspend to disk using a compact flash card.

btw, nice work Colin  :)


Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 09, 2009, 09:44:21 pm
ok, I tried your script, and it works fine for me. I used suspend to disk, but that should be the same from lmce's perspective.
It stops all devices and lastly kills the launch manager.

At resume, it does nothing (yet). You said you started the devices using Start_LocalDevices.sh, but that it does not start the MD DCE device? Not sure what you mean by that? The App server? Didn't have time to try this though, my wife wanted to use the TV  :(

But I did try to run the lmce_launch_manager.sh at resume via a resume hook, and that sort of worked. It started the launch manager and every device successfully, but it seemed like the resume script didn't end. I even tried to wrap the lmce_launch_manager.sh in a script with the fork (&) char at the end, but the script still did not return to the console. Didn't have time to try repeated suspend/resumes with this configuration though.

Btw. I noticed there is a enable_wol.sh script in /usr/pluto/bin folder. I guess it might be good to use that instead of the ethtool directly?

And one final suggestion: I've read (and experienced first hand) that suspend to disk is easier to get to work than suspend to ram. This is because when resuming from disk normal BIOS POST happens and starts the graphics card normally, in contrast to resume from ram where the OS needs to do much stuff. Maybe you will have more success trying s-to-disk?

I hope to do some more testing later this week.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 09, 2009, 10:37:53 pm
Great news, Sambucca, thanks for testing...

Yes, I hadn't done the resume bit yet, but my intention was to use the lmce_launch_manager.sh - you correctly noted that Start_LocalDevices.sh does not start the MD device itself. In my testing so far, just running this command at the shell after using the shutdown scripts (but not actually suspending) seemed to start everything back up successfully, so I would be interested in any further testing you do on this. That script doesn't exit (not checked why, just assumed that was normal, certainly it continues running after a reboot of my core...) when I executed it I used the & character as well... I guess the only untidiness is that it will be a job in that bash session, but that bash session in a script will soon terminate and leave the orphaned process running, just like a normally booted system, so do you think this is a real issue?

Agreed on the WOL point, will try that. Also, will look into the suspend to disk option... I think I have a disk somewhere, just not sure what I have to do to prepare it for this... doesn't 0710 MDs detect HDDs and use them as swap? Does it need to be partitioned and formated first? As what?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 09, 2009, 10:56:06 pm
In my testing so far, just running this command at the shell after using the shutdown scripts (but not actually suspending) seemed to start everything back up successfully, so I would be interested in any further testing you do on this. That script doesn't exit (not checked why, just assumed that was normal, certainly it continues running after a reboot of my core...) when I executed it I used the & character as well... I guess the only untidiness is that it will be a job in that bash session, but that bash session in a script will soon terminate and leave the orphaned process running, just like a normally booted system, so do you think this is a real issue?
The only problem might be that the script is started within the resume hook. Don't know how the resume system handles a script not exiting.

Agreed on the WOL point, will try that.
Not a big issue, I also used the ethtool directly in my scripts. I just noticed the other script today  :P

Also, will look into the suspend to disk option... I think I have a disk somewhere, just not sure what I have to do to prepare it for this... doesn't 0710 MDs detect HDDs and use them as swap? Does it need to be partitioned and formated first? As what?
Yes, it does detect swap partitions on HDDs and uses them. (Which is something I'd like to avoid, as I use a CF card for suspend to disk). You need to create a swap partition and possibly also do a mkswap on it. Also see the suspend wiki article for more information, there is some special considerations for LMCE.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 10, 2009, 07:54:26 pm
Hi,

I successfully tested suspending/resuming using this script today, with repeated suspend/resume cycles.

I added a new script to handle resume:
/usr/pluto/bin/resume.sh
Code: [Select]
#!/bin/bash

/usr/bin/screen -d -m -S "LMCE Launch Manager" /usr/pluto/bin/lmce_launch_manager.sh
It wraps the launch manager in a screen session, which detachs it from the console, letting the resume scripts return normally.
And added a resume hook to /etc/hibernate/scriptlets.d/lmce which now looks like
Code: [Select]
AddSuspendHook 15 LMCESuspend
AddResumeHook 15 LMCEResume

LMCESuspend() {
    /usr/pluto/bin/suspend.sh
}
LMCEResume() {
    /usr/pluto/bin/resume.sh
}

As far as I can see, this works perfectly.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 10, 2009, 09:39:22 pm
Great, OK, so focusing on suspend to disk, what do you think we need to do to add this into 0810? Are we confident enough of all the config and script files working on most platforms that this can be just static files added to the installation process? Also, has s2ram been added back into hibernate/uswsusp in 0810 or do we need to compile a version (like we did manually) and add it to the build?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: Marie.O on March 10, 2009, 10:35:41 pm
Great, OK, so focusing on suspend to disk, what do you think we need to do to add this into 0810? Are we confident enough of all the config and script files working on most platforms that this can be just static files added to the installation process? Also, has s2ram been added back into hibernate/uswsusp in 0810 or do we need to compile a version (like we did manually) and add it to the build?

Especially for MDs, I do not think focusing on suspend to disk is such a grand idea. All my MDs are (going to be) diskless.
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 10, 2009, 10:44:36 pm
Great, OK, so focusing on suspend to disk, what do you think we need to do to add this into 0810? Are we confident enough of all the config and script files working on most platforms that this can be just static files added to the installation process? Also, has s2ram been added back into hibernate/uswsusp in 0810 or do we need to compile a version (like we did manually) and add it to the build?

Especially for MDs, I do not think focusing on suspend to disk is such a grand idea. All my MDs are (going to be) diskless.

I agree, but as you can see from the thread, there are specific hardware issues particular to chipsets that cause s2ram to fail or fail if the right options are not specified... dunno how we can code for this as it seems to be a black art getting the right options for your specific hardware. I thought if we can get s2disk working reliably, s2ram can be bolted on afterwards by presenting the options to choose from based on the fact that the underlying suspend functionality works at least for s2disk.... what do you think?

Sambucca - forgot to ask, have you checked leaving your MD suspended for at least a day and then resuming? Just want to be sure that this mechanism genuinely deals with the TCP connections issues, and there isn't something else we need to cover as well.
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: tschak909 on March 11, 2009, 12:17:30 am
I really don't think you'll be able to handle the logistics of handling both suspend to ram, and suspend to disk, it may be too much of a headache.

-Thom
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 11, 2009, 01:18:18 am
I really don't think you'll be able to handle the logistics of handling both suspend to ram, and suspend to disk, it may be too much of a headache.

-Thom


Thom - if it is true, that suspend to disk is basically stable with static config files (ie no need for customised options based on the hardware you are using), then I think suspend to disk could be added to the build as a standard feature, couldn't it?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: tschak909 on March 11, 2009, 01:21:15 am
but MD's are supposed to be _DISKLESS_ .

*hmm*

dammit, I do not wanna have this argument.

-Thom
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 11, 2009, 01:24:00 am
well lets just give up then.
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: jondecker76 on March 11, 2009, 02:06:13 am
I still haven't had time to test this out yet (been extremely busy) - but it is a needed feature.

I agree that suspend to disk isn't a great option as a good percentage of users will have diskless MD's. I understand that it would be easier to implement, but it wouldn't be as good for our purposes as suspend to RAM. I hope you keep plugging away at it - I think we all know what its like to feel like we are chasing our tails around in circles sometimes
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: tschak909 on March 11, 2009, 02:29:16 am
No. Suspend to Ram will work, with well supported hardware.

But again, You don't have to prove me wrong. Just implement what you feel is right.

:)

-Thom
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: totallymaxed on March 11, 2009, 02:55:59 am
Great, OK, so focusing on suspend to disk, what do you think we need to do to add this into 0810? Are we confident enough of all the config and script files working on most platforms that this can be just static files added to the installation process? Also, has s2ram been added back into hibernate/uswsusp in 0810 or do we need to compile a version (like we did manually) and add it to the build?

Especially for MDs, I do not think focusing on suspend to disk is such a grand idea. All my MDs are (going to be) diskless.

I agree, but as you can see from the thread, there are specific hardware issues particular to chipsets that cause s2ram to fail or fail if the right options are not specified... dunno how we can code for this as it seems to be a black art getting the right options for your specific hardware. I thought if we can get s2disk working reliably, s2ram can be bolted on afterwards by presenting the options to choose from based on the fact that the underlying suspend functionality works at least for s2disk.... what do you think?

Sambucca - forgot to ask, have you checked leaving your MD suspended for at least a day and then resuming? Just want to be sure that this mechanism genuinely deals with the TCP connections issues, and there isn't something else we need to cover as well.

I tend to agree Colin that suspend to RAM is really what we need...we are moving strongly to diskless MD's wherever possible.

Andrew
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 11, 2009, 03:19:09 am
Great, OK, so focusing on suspend to disk, what do you think we need to do to add this into 0810? Are we confident enough of all the config and script files working on most platforms that this can be just static files added to the installation process? Also, has s2ram been added back into hibernate/uswsusp in 0810 or do we need to compile a version (like we did manually) and add it to the build?

Especially for MDs, I do not think focusing on suspend to disk is such a grand idea. All my MDs are (going to be) diskless.

I agree, but as you can see from the thread, there are specific hardware issues particular to chipsets that cause s2ram to fail or fail if the right options are not specified... dunno how we can code for this as it seems to be a black art getting the right options for your specific hardware. I thought if we can get s2disk working reliably, s2ram can be bolted on afterwards by presenting the options to choose from based on the fact that the underlying suspend functionality works at least for s2disk.... what do you think?

Sambucca - forgot to ask, have you checked leaving your MD suspended for at least a day and then resuming? Just want to be sure that this mechanism genuinely deals with the TCP connections issues, and there isn't something else we need to cover as well.

I tend to agree Colin that suspend to RAM is really what we need...we are moving strongly to diskless MD's wherever possible.

Andrew

I don't think anybody actually read my post, let me repeat it!!

I agree, but as you can see from the thread, there are specific hardware issues particular to chipsets that cause s2ram to fail or fail if the right options are not specified... dunno how we can code for this as it seems to be a black art getting the right options for your specific hardware. I thought if we can get s2disk working reliably, s2ram can be bolted on afterwards by presenting the options to choose from based on the fact that the underlying suspend functionality works at least for s2disk.... what do you think?

Sambucca - forgot to ask, have you checked leaving your MD suspended for at least a day and then resuming? Just want to be sure that this mechanism genuinely deals with the TCP connections issues, and there isn't something else we need to cover as well.

I never suggested that Suspend to Disk should be the solution... just that getting the overall mechanism to work with it would be easier. Once we had proved that that works, then expand the options to Suspend to RAM by exposing the s2ram configuration options so that the user can choose which ones they think are appropriate for their hardware. I don't know of any other way of progressing unless 0810 kernel has recently worked magic in making s2ram work reliably across different hardware. Certainly it is useless to me, I don't have a disk in my MD, and want to avoid putting one in. Potential false economy anyway - having to run a disk when one isn't otherwise needed to save power?! :) No, I've no idea where to go now... I cannot get suspend to RAM working on my MSI Wind PC, the only options for different hardware I found were in that s2ram config file and I have tried all those - one caused some kind of oops/panic and backtrace on resume, all the other combinations just hung on resume... are there any more reliable s2ram alternatives to uswsusp/hibernate/s2ram?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: tschak909 on March 11, 2009, 03:21:08 am
my bad. I owe you a beer. :)

-Thom
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 11, 2009, 03:23:24 am
my bad. I owe you a beer. :)

-Thom


I'm reeeealy thirsty, can you fedex it? :)
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 11, 2009, 10:32:13 am
Colin, have not yet tried leaving the MD off for some time. But it is suspended at this moment, so when get back home today, I can resume it to check it. Not that I think it will be a problem, as all devices are restarted the normal way.

I think that these scripts are good for *both* suspend to ram and to disk. LMCE does not need to take any special precautions with regards to suspending and resuming other than what your script already does. It is the suspend scripts' responsibility to make sure the specific hardware works after resume, not LMCE!

However, it is a fact that not all hardware works reliable with suspend/resume. But this is constantly improving.
And I think we all agree that suspend to ram is the ideal solution.

There are also other options to suspend/resume. I think ACPI has some support and also pm-utils do suspend/resume, although pm-utils seems to use s2disk/s2ram in the background. I think pm-utils are the standard in 810. So we need to adapt the scripts to using that, see http://en.opensuse.org/Pm-utils (http://en.opensuse.org/Pm-utils). The hook scripts are a bit different, but a no-brainer to change.

As for how to integrate it in LMCE, I am not sure. But couldn't we just add support for both disk and ram suspension? Lets say I install ubuntu on a random computer, then try to supend it. It will either work or not. The same with LinuxMCE. What do you think? At least we should provide the scripts necessary to do LinuxMCE specific stuff (colin's script), and then users with enough know-how can enable them.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: gollywog on March 11, 2009, 06:38:58 pm
Could the suspend do disk happen to a CF or flash drive of some description? Still boot of the net but when suspending write appropriate stuff to a (maybe usb attached??) drive?

I don't know much about this so just a sugestion. If it's dumb just ignore :)

Gollywog
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 11, 2009, 07:35:35 pm
gollywog,

I am using a CF disk to suspend at this moment. It works fine, only problem is that LMCE tries to use it as swap, which would be a bad, as CF cards have limited write cycles.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 11, 2009, 08:45:27 pm
Colin, I just resumed my MD, suspended from yesterday (~24h), and everything works perfectly. I use the addition to your script that I posted previously.

sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: jondecker76 on March 12, 2009, 12:19:33 pm
sambuca - can you post a speed comparison of booting from the lan and resuming from suspend?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 12, 2009, 01:38:42 pm
I can give you a rough estimate right away.
I know from previous tests that normal lan boot takes around 2 minutes. This is on a 100MBit lan, via a gigabit switch to the gigabit card in the core.
I actually did a rough count of seconds for resume from disk yesterday, and I think it ended up around half a minute. So its quite an improvement, and do note that this is suspend to disk. Suspend to ram should be even quicker. I remember that it resumed in 3-4 seconds with 704, but you'll have to also add in the time that the launch manager and devices take to start up.

I'll see it I can do a more thorough test one of these days.

sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: chriss on March 12, 2009, 02:49:31 pm
In my experience the performance of the disk and the size of the RAM are important factors when doing s2disk. On my WinXP laptop with 2GB RAM and a very slow 1.8" HDD the resume takes about 2min where ~95% of the time are required to read back the memory image from disk.

br,
/chriss
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 12, 2009, 08:05:51 pm
Hi again,

I timed the resume process today, and I must admit, I don't know what I timed the last time. It certainly was not the resume... Now I measured it to about 1 minute in resume from disk. 25 seconds of those are spent reading from disk (512MB memory), and 15 is the normal BIOS boot, which leaves 20 seconds that lmce uses to start up again.
The time for lmce restart is ofcourse depending on how many devices you have (and what devices).

On the other hand, I tried pm-utils today, and got my MD to suspend to RAM.  8) I also read that pm-utils uses the HAL database to find out what stuff to do when suspending/resuming, which would mean that it supports more configurations easier.
So I timed the pm-utils suspend to ram and got these times:
(It used 30 seconds to turn off, not that it matters very much)

00 power on
09 lmce launch manager started
18 Orbiter started
30 all devices started, complete
= 30 seconds to resume

So, I would definitely recommend pm-utils. Remember, this is the standard in Ubuntu 810.

br,
samuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 12, 2009, 08:19:00 pm
sambucca - care to outline what config you changed to use pm-utils instead of hibernate-ram/s2disk?

Also, I would be interested in what device is causing your suspend to take so long... are you using VDR? That device takes ages (~15s) to shutdown on my MD whereas all the other devices take ~2 secs. If you hibernate-ram from the command line, I believe there are still echo's in there that print the number of outstanding devices... also you can keep hitting refresh on the children devices of the MD and see which one still says Registered: Yes, that will be the one that slows it down. I realise that the suspend isn't the one we are interested in for speed, but its still worth knowing...
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 12, 2009, 08:41:22 pm
colinjones,

I actually did almost nothing. I only added a file /etc/pm/sleep.d/01lmce which is the  pm-utils equivalent of /etc/hibernate/scriptlets.d/lmce. A little different syntax.
Code: [Select]
#!/bin/bash

. /usr/lib/pm-utils/functions

RETVAL=0
case "$1" in
        hibernate|suspend)
                /usr/pluto/bin/suspend.sh
                RETVAL=$?
                ;;
        thaw|resume)
                /usr/pluto/bin/resume.sh
                ;;
        *)
                ;;
esac

exit $RETVAL

Other than that, I just installed the pm-utils package. Running pm-suspend suspends it to ram, and it resumed successfully afterwards. Keep in mind that I already had uswsusp and hibernate installed. I think pm-utils recommends hibernate.

Pm-suspend actually logs everything to /var/log/pm-suspend.log, so the "Waiting for device..." message is found there.

I changed one line in the suspend script:
Code: [Select]
echo "Waiting for $RegCount devices to shutdown ($Devices)"adding the ($Devices) part, and suspended.
It turns out that it is the MythTV_Player that takes the most time to shut down.

I was thinking of adding a description to the wiki suspend article.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 12, 2009, 08:58:17 pm
can you post your /usr/pluto/bin/resume.sh?
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 12, 2009, 09:37:16 pm
Here it is:
Code: [Select]
#!/bin/bash

/usr/bin/screen -d -m -S "LMCE Launch Manager" /usr/pluto/bin/lmce_launch_manager.sh

Basically just a variant of how the other devices are started.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 12, 2009, 10:23:00 pm
thx... hmmm... when I first ran pm-suspend, I hadn't set exec permissions on the script, but it seemed to work (albeit obviously didn't run my shutdown script), as for the first time when i hit power it resumed and I could use the MD ... although it seemed to be not responding as quickly as usual. Then I added the resume script and set both with exec permissions. Now it doesn't seem to work! Either way the screen doesn't come back on any more, even if I use --quirk-dpms-on or xset dpms force on from the command line. Sometimes the command line comes back, but at the moment it doesn't seem to be... oh well... will have a play with it again later!
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 13, 2009, 08:34:24 am

Will it suspend and resume reliably when not running the lmce suspend/resume scripts?
Also check /var/log/pm-suspend.log for any clues.

br,
sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: sambuca on March 13, 2009, 11:20:52 pm
Hi

I looked into the issue with Myth shutdown times. In the log files there is an almost 20 second pause between the previous device shutdown and the MythTV_Player shutdown. But I cannot find any clue as to why.
I was trying to find out when each device receives the command, but I couldn't find anything in the logs. Do I need to enable some other log levels, perhaps?

sambuca
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: colinjones on March 14, 2009, 01:38:22 am
i would imagine that you would see the message in DCERouter.log...
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: donpaul on April 14, 2009, 03:33:40 pm
Works great on my MD with onboard sound, but the MD with Audigy (Coax Digital) card loses sound when it resumes.
Title: Re: [Testers?] Suspend/Resume MDs....
Post by: donpaul on April 14, 2009, 05:20:31 pm
Works great on my MD with onboard sound, but the MD with Audigy (Coax Digital) card loses sound when it resumes.

Solution: Unload sound card module on suspend, and load module on resume. Adding the module to blacklisted-modules didn't seem to work so I created a script:

/etc/pm/sleep.d/99sound
Code: [Select]
#!/bin/bash

case "$1" in
    resume|thaw)
        modprobe snd_ca0106
     ;;
     suspend|hibernate)
        modprobe -r snd_ca0106
    ;;
esac
exit $?