Author Topic: [Testers?] Suspend/Resume MDs....  (Read 23568 times)

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
[Testers?] Suspend/Resume MDs....
« on: March 03, 2009, 06:10:35 am »
I've been working on this code to try and make suspending and resuming MDs reliable. The basic issue, I believe, is that the MD and DCERouter expect a continuous TCP connection for DCE communication whenever an MD is running, and when an MD is suspended it is effectively still in a running state. Once the MD has been suspended long enough for the TCP connection that the DCERouter is still holding open, to be terminated due to lack of response, the MD can never resume operation because it expects that TCP session still to be open, and the DCERouter can no longer reach it to initiate a Reload. This is true of all DCE devices on the MD (each has at least 1 DCE connection).

The objective is to create Suspend and Resume scripts to place in /etc/acpi/suspend.d and /etc/acpi/resume.d that will cleanly tell the DCE devices of the local MD to shutdown, thus terminating the TCP connections correctly. Then on resume, to tell the MD to reinitiate all these connections by getting it to spawn the DCE devices again, thus reconnecting it with the DCERouter.

So far I have got the script successfully to shutdown all the DCE devices and the MD DCE device itself, and output the device list to a file ready for the resume script to use. I have determined that it is unnecessary to disable the DCE devices before shutting them down because using the SYSCOMMAND message with a value of 0 means that the spawning system does not attempt to restart the devices even if they are enabled. So I have commented out that bit. I'm finished for the day now.

I haven't written the resume script yet, but have manually started the MD's devices using /usr/pluto/bin/Start_LocalDevices.sh. The only problem with this is that it only starts the child devices, not the MD DCE Device itself. Although the MD seems to work fine, this will obviously prevent relaying of DCE messages to children. The MD DCE device itself, of course, also creates a DCE connection so it is necessary to stop that device as well.

Code: [Select]
. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

FindMDDeviceQ="SELECT PK_Device FROM Device
                WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

DeviceID=56

FindChildrenQ="SELECT Device.PK_Device
                FROM Device
                JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
                LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
                LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
                ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
                WHERE (Device.FK_Device_ControlledVia=$DeviceID
                OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
                AND DeviceTemplate.FK_DeviceCategory <> 1
                AND DeviceTemplate.ImplementsDCE=1
                AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

echo "$DeviceList" > /usr/pluto/var/Suspend_DeviceList_$DeviceID.log

#for Device in $DeviceList; do

#       DisableDeviceQ="Update Device
#               Set Disabled=1
#               Where PK_Device=$Device;
#       "

#       RunSQL "$DisableDeviceQ"

#done

#DisableDeviceQ="Update Device
#               Set Disabled=1
#               Where PK_Device=$DeviceID;
#"

#RunSQL "$DisableDeviceQ"

for Device in $DeviceList; do

        /usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"
« Last Edit: March 06, 2009, 03:26:12 am by colinjones »

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Suspend/Resume MDs....
« Reply #1 on: March 04, 2009, 02:59:10 am »
I have gone a bit further. The script now successfully shuts down all devices tidily, and I can start them again using the lmce LM script. This all seems to work. Currently I have a hard coded sleep in here to allow VDR to close down as unfortunately the simple SQL query to identify all running devices does not recurse down the tree, so misses that VDR is still shutting down.

The main problem I am having now is working out how to convince my MD to suspend to RAM so that I can test that this script is triggered when in the /etc/acpi/suspend.d directory. The only options in my BIOS are POS and STR, but it doesn't specifically say that it is attaching that option to the power button... I chose STR, but when I hit the power button it halts the system. I tried sending "standby" or "mem" to /sys/power/state, but this seems to do funky things! Has anybody got any suggestions on how I can trigger a suspend to RAM at the command line?

Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

FindMDDeviceQ="SELECT PK_Device FROM Device
WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

DeviceID=56

FindChildrenQ="SELECT Device.PK_Device
FROM Device
JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
WHERE (Device.FK_Device_ControlledVia=$DeviceID
OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
AND DeviceTemplate.FK_DeviceCategory <> 1
AND DeviceTemplate.ImplementsDCE=1
AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

echo "$DeviceList" > /usr/pluto/var/Suspend_DeviceList_$DeviceID.log

#for Device in $DeviceList; do

# DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$Device;
# "

# RunSQL "$DisableDeviceQ"

#done

#DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$DeviceID;
#"

#RunSQL "$DisableDeviceQ"

for Device in $DeviceList; do

/usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"



#until [[ $(RunSQL "SELECT count(*)
# FROM Device
# JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
# LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
# LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
# ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
# WHERE (Device.FK_Device_ControlledVia=$DeviceID
# OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
# AND DeviceTemplate.FK_DeviceCategory <> 1
# AND DeviceTemplate.ImplementsDCE=1
# AND Device.Registered=1;
#") == 0 ]]; do sleep 2; echo "pass"; done

sleep 20

LMProcesses=$(ps aux|grep lmce_launch_manager|grep -v grep|cut -c10-16)

echo "$LMProcesses"

for Process in $LMProcesses; do

kill -9 $(echo "$Process")

done

tschak909

  • LinuxMCE God
  • ****
  • Posts: 5549
  • DOES work for LinuxMCE.
    • View Profile
Re: Suspend/Resume MDs....
« Reply #2 on: March 04, 2009, 04:02:09 am »
Look into the hibernate scripts.

-Thom

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Suspend/Resume MDs....
« Reply #3 on: March 04, 2009, 04:07:52 am »
Look into the hibernate scripts.

-Thom


Thom - whereabouts are these scripts? I looked in /usr/pluto/bin but couldn't see anything...

tschak909

  • LinuxMCE God
  • ****
  • Posts: 5549
  • DOES work for LinuxMCE.
    • View Profile
Re: Suspend/Resume MDs....
« Reply #4 on: March 04, 2009, 04:41:53 am »
apt-cache search hibernate .... they are scripts that are in the ubuntu repository, not part of the pluto syste.

-Thom

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Suspend/Resume MDs....
« Reply #5 on: March 04, 2009, 07:44:38 am »
damn - back to square one, I have a package dependency issue so can't install anything... guess I'll have to give up at this point.

tschak909

  • LinuxMCE God
  • ****
  • Posts: 5549
  • DOES work for LinuxMCE.
    • View Profile
Re: Suspend/Resume MDs....
« Reply #6 on: March 04, 2009, 07:57:52 am »
:(

sambuca

  • Guru
  • ****
  • Posts: 462
    • View Profile
Re: Suspend/Resume MDs....
« Reply #7 on: March 04, 2009, 12:31:49 pm »
Have you looked at the http://wiki.linuxmce.org/index.php/Suspend page ? It describes how to set up hibernation, and also how to add script execution on suspend/resume.
Doesn't help with the dependency issue though  :-\

Great work by the way, I think you are very close to a working solution. If you see in the wiki article mentioned, I did a very crude script to kill the processes I knew were running. But I didn't think of using the SQL approach you use.

sambuca

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Suspend/Resume MDs....
« Reply #8 on: March 05, 2009, 01:09:51 am »
sambuca

Thanks for the link - I didn't realise that you had extended this article to include the killalls. My update is, I have (stupidly) realised that the dependency issue is only on my core, not MD! So I have got hibernate installed on my MD now. Was trying to work out the "scriptlets", your example has given me a pointer - I'm assuming I can put any bash script in one of those functions then declare it to be called within a suspend or resume hook? If so I will move my entire script into one of those.... still not sure what the point of the /etc/acpi/suspend.d etc is for... I guess using the hibernate command circumvents that entire system.

The other issue I want to fix is the hard coded "sleep" delay to allow the DCE devices to shut down. When I exectute the SQL query, it only returns the children devices, not all decendants. Sending the DCE device shutdown command passes all the way down so that isn't an issue. But checking that all have completed shutdown is an issue. I don't see how to write a SQL query that will enumerate all decendant devices ... any suggestion would be greatly appreciated! Failing that, the only thing I can do is build a recursive function to progressively walk down the tree doing the same SQL query until I have a complete list - BUT its been a long time since I've bent my brain around recursion like this (I get myself in knots!) so was hoping I could avoid it :)

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Suspend/Resume MDs....
« Reply #9 on: March 05, 2009, 03:28:38 am »
Bit confused now - seems s2ram was left out of uswsusp for Gutsy so the ususpend method in hibernate doesn't work, and the sysfs-ram method doesn't seem to have a 'force' option to ignore unrecognised hardware, and produces these error messages for my script:

/usr/pluto/bin/Config_Ops.sh: 5: [[: not found
/usr/pluto/bin/LockUtils.sh: 5: [[: not found
/usr/pluto/bin/Config_Ops.sh: 34: Syntax error: Bad for loop variable

I'm wondering whether these errors are because I have put my whole script in the scriptlet function, and scriptlets don't like bash (I can easily separate it out and call it as a bash script, but I don't see much point until I can resolve the unrecognised hardware bit)

Has anyone any experience on this stuff?

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Suspend/Resume MDs....
« Reply #10 on: March 05, 2009, 04:35:41 am »
Update:

Downloaded the source for suspend-0.8 and compiled. This didn't compile s2ram for some reason, so a manually 'make s2ram' and then got the executable. Copied this into /sbin where the s2disk was and went back to the ususpend-ram method. Now this gets passed the missing s2ram bit but has the same problems as the sysfs-ram method with the errors in my script.

I assumed this was because the scriptlet system doesn't support full bash scripts, and separated the script from the scriptlet:

Scriptlet:
Code: [Select]
AddSuspendHook 15 LMCESuspend

LMCESuspend() {

/root/suspend.sh &> /root/sus.log

}

And script:
Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

echo "LMCE Suspend Process Started" > /var/log/pluto/sus.log

ethtool -s eth0 wol g

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

FindMDDeviceQ="SELECT PK_Device FROM Device
WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

DeviceID=56

FindChildrenQ="SELECT Device.PK_Device
FROM Device
JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
WHERE (Device.FK_Device_ControlledVia=$DeviceID
OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
AND DeviceTemplate.FK_DeviceCategory <> 1
AND DeviceTemplate.ImplementsDCE=1
AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

echo "$DeviceList" > /usr/pluto/var/Suspend_DeviceList_$DeviceID.log

#for Device in $DeviceList; do

# DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$Device;
# "

# RunSQL "$DisableDeviceQ"

#done

#DisableDeviceQ="Update Device
# Set Disabled=1
# Where PK_Device=$DeviceID;
#"

#RunSQL "$DisableDeviceQ"

for Device in $DeviceList; do

/usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"



#until [[ $(RunSQL "SELECT count(*)
# FROM Device
# JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
# LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
# LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
# ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
# WHERE (Device.FK_Device_ControlledVia=$DeviceID
# OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
# AND DeviceTemplate.FK_DeviceCategory <> 1
# AND DeviceTemplate.ImplementsDCE=1
# AND Device.Registered=1;
#") == 0 ]]; do sleep 2; echo "pass"; done

sleep 20

killall -s SIGKILL -r lmce_launch_manager*

The script seems to be running but can't work out why it is not doing what it is intended to do... the PC goes into hibernate almost immediately without performing the tasks....

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
[Testers please?] Suspend/Resume MDs....
« Reply #11 on: March 06, 2009, 03:23:22 am »
OK, I have the suspend script working in conjunction with hibernate/s2ram.

My hardware doesn't seem to know how to resume at all.... is there anybody out there that would be interested in testing this script on their MDs? And possibly telling me whether it resumes?

krys

  • Addicted
  • *
  • Posts: 583
    • View Profile
Re: [Testers?] Suspend/Resume MDs....
« Reply #12 on: March 06, 2009, 04:00:52 pm »
Collin,
I would be willing to be a guinea pig for you, but I just dont know when would be a good time since we are a few timezones apart. I can just give you access and let you set it up then test it for you.
Let me know

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: [Testers?] Suspend/Resume MDs....
« Reply #13 on: March 06, 2009, 08:02:38 pm »
Krys

Start by:

sudo apt-get install hibernate uswsusp

Then confirm that uswsusp didn't install the s2ram tool (known problem in Gutsy), if so you will need to download the sources for uswsusp, then from its directory ./configure --minimal-install (I think), then make s2ram. You can now copy the s2ram executable to your /sbin directory.

Let me know when you have got that done successully and I will give you the rest of the config instructions!

Col.

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: [Testers?] Suspend/Resume MDs....
« Reply #14 on: March 06, 2009, 09:19:17 pm »
EDIT: Updated instructions due to several mistakes.

jondecker is going to try as well, so I will start putting in the detail. the following is all on the MD you want to suspend, not the core.

- make sure Suspend to RAM (STR) is enabled in your BIOS.
- in /etc/acpi/events/powerbtn - make sure it is pointing at /etc/acpi/powerbtn.sh
- make a backup copy of powerbtn.sh, then edit the original to have these contents...

Code: [Select]
#!/bin/sh
# /etc/acpi/powerbtn.sh
# Initiates a shutdown when the power putton has been
# pressed.

# Skip if we just in the middle of resuming.
test -f /var/lock/acpisleep && exit 0

hibernate-ram

- now create a file called /usr/pluto/bin/suspend.sh (with appropriate permissions for exec) and put this in it...

Code: [Select]
#!/bin/bash

. /usr/pluto/bin/Config_Ops.sh
. /usr/pluto/bin/SQL_Ops.sh
. /usr/pluto/bin/pluto.func

echo "LMCE Suspend Process Started" > /var/log/pluto/sus.log

## Turn on WOL

ethtool -s eth0 wol g

## Find local IP address

LocalIP=$(ip addr show dev eth0|grep "inet "|cut -f6 -d" "|cut -f1 -d/)

## Use local IP address to find MD device number

FindMDDeviceQ="SELECT PK_Device FROM Device
WHERE IPAddress='$LocalIP';
"

DeviceID=$(RunSQL "$FindMDDeviceQ")

## Identify immediate children DCE devices that are currently Registered

FindChildrenQ="SELECT Device.PK_Device
FROM Device
JOIN DeviceTemplate ON Device.FK_DeviceTemplate=DeviceTemplate.PK_DeviceTemplate
LEFT JOIN Device AS Device_Parent on Device.FK_Device_ControlledVia=Device_Parent.PK_Device
LEFT JOIN DeviceTemplate AS DeviceTemplate_Parent
ON Device_Parent.FK_DeviceTemplate=DeviceTemplate_Parent.PK_DeviceTemplate
WHERE (Device.FK_Device_ControlledVia=$DeviceID
OR (Device_Parent.FK_Device_ControlledVia=$DeviceID AND DeviceTemplate_Parent.FK_DeviceCategory IN (6,7,8) ) )
AND DeviceTemplate.FK_DeviceCategory <> 1
AND DeviceTemplate.ImplementsDCE=1
AND Device.Registered=1;
"

DeviceList=$(RunSQL "$FindChildrenQ")

## Send SYSCOMMAND_0 (device shutdown) message to all immediate children. These devices will relay the message to all decendents

for Device in $DeviceList; do

/usr/pluto/bin/MessageSend "$DCERouter" 0 "$Device" 7 0 163 "start_local_devices"

done

## Send SYSCOMMAND_0 message to MD device itself
/usr/pluto/bin/MessageSend dcerouter 0 $DeviceID  7 0 163 "start_local_devices"


## Wait for DCE devices to complete their shutdown

MaxLoopCount=50
for ((i = 0; i < MaxLoopCount; i++)); do
Devices=$(cat /usr/pluto/locks/pluto_spawned_local_devices.txt | grep -v '^$' | tr '\n' ',')
Devices="${Devices%,}"
if [[ -z "$Devices" ]]; then
break
fi
RegCount=0
Q="SELECT COUNT(*) FROM Device WHERE PK_Device IN ($Devices) AND Registered=1"
RegCount=$(RunSQL "$Q")
if [[ "$RegCount" -eq 0 ]]; then
break
fi
echo "Waiting for $RegCount devices to shutdown"
sleep 1
done
                                                                                                                       
echo "Done waiting"

## Kill the LM processes
                                                                                                                       
killall -q -s SIGKILL -r lmce_launch_manager\*

- now edit /etc/hibernate/hibernate.conf - make sure that only TryMethod ram.conf is not commented out
- now edit /etc/hibernate/ram.conf - make sure that TryMethod ususpend-ram.conf is not commented out, but the other two methods are commented out.
- now edit /etc/hibernate/ususpend-ram.conf - if your hardware is not recognised when test the hibernate, you may need to uncomment USuspendRamForce yes. The other options are for various compatibility levels that effect whether the resume works or not... haven't got my hardware to resume successfully yet.... apparently that is a common problem with Linux suspend
- now create a file in /etc/hibernate/scriptlets.d - call it suspend and insert this code ....

Code: [Select]
AddSuspendHook 15 LMCESuspend

LMCESuspend() {

/usr/pluto/bin/suspend.sh

}

You are ready to test! From a shell, type hibernate-ram. You should see the MD suddenly drop out to the launch manager and then the launch manager die, followed by the MD going into suspend mode. In the shell you will see the hibernate script printing how many DCE devices are left to shutdown, every second. Note the VDR device takes about 15 seconds to shutdown on my system, but all the others are pretty much immediate. So if you are using VDR, you may see a delay between the MD dropping to launchmanger, and the launchmanager actually dying.

If that all works ok, try resuming your system and let me know if it was successful. As I say, try some of the other options in ususpend-ram.conf if the resume is unsuccessful. Also, the config in acpi above, should allow you just to tap your power button and it go into suspend from there.
« Last Edit: March 08, 2009, 09:07:13 pm by colinjones »