LinuxMCE Forums

General => Users => Topic started by: tortho on December 05, 2010, 11:29:01 pm

Title: Zwave device stops working and fails to restart
Post by: tortho on December 05, 2010, 11:29:01 pm
Hi,

Aeon Labs Zstick.
It just stops working from time to time.
When I try to do a quick reload router the device fails to start (Device 112 on my core)
From the log of the device I find nothing usefull.

Anyone with hints of where to start locking for what is going wrong?
Below the last lines in the log.
Can also see in old log files that there is a lot of lines with "No callback received: awaiting_callback: xx timer: xxxxxxx"

Device ZWave, # 112   
Device Template ZWave, device category Lighting Interface, manufacturer Pluto.   
4 lines, displaying from 0 to 4   
1   
05   12/05/10 7:42:10.294      Going to rotate logs... <0xb608bb90>
05   12/05/10 8:17:59.157      Going to rotate logs... <0xb608bb90>
05   12/05/10 21:47:53.953      Got a reload command from 0  <0xb608bb90>
05   12/05/10 21:47:54.203      void ClientSocket::Disconnect() on this socket: 0x9230350 (m_Socket: 7) <0xb708e6c0>
Title: Re: Zwave device stops working and fails to restart
Post by: hari on December 06, 2010, 12:20:30 am
Enable more logging (levels 36, 40, 41)

Br hari
Title: Re: Zwave device stops working and fails to restart
Post by: mythtified on December 06, 2010, 01:50:02 am
Does it come back when you reboot the computer?  I have an ASRock mobo that usb ports would go to sleep.  So it would appear that my lighting interfaces had quit working.  Just one experience I had.
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on December 06, 2010, 08:08:30 pm
Thanks both!
Have set full logging, so just have to be patient and wait for it to fail now :-)
Title: Re: Zwave device stops working and fails to restart
Post by: purps on December 06, 2010, 10:30:34 pm
Does it come back when you reboot the computer?  I have an ASRock mobo that usb ports would go to sleep.  So it would appear that my lighting interfaces had quit working.  Just one experience I had.

What model of ASRock was that? *please don't be the one I just bought*
Title: Re: Zwave device stops working and fails to restart
Post by: mythtified on December 06, 2010, 10:54:22 pm
K10N78

I did not pursue this too much.  I decided to turn it in to a mythtv box since I was not using the usb ports.  There may be a solution out there for this but as I said I never pursued it.
Title: Re: Zwave device stops working and fails to restart
Post by: purps on December 06, 2010, 11:40:54 pm
Oh FFS, guess which one I've just bought for 50 quid.

It was literally the only full ATX AM2 mobo I could find. Oh that's really fecked me off.
Title: Re: Zwave device stops working and fails to restart
Post by: b4rney on December 06, 2010, 11:52:06 pm
Might not be related but I found this page which suggests a possible fix/hack for a usb sleep issue:
http://www.cyfinity.com/2008/09/waking-up-usb-ports-on-ubuntu-hardy/

Barney
Title: Re: Zwave device stops working and fails to restart
Post by: mythtified on December 07, 2010, 01:35:07 am
Oh FFS, guess which one I've just bought for 50 quid.

It was literally the only full ATX AM2 mobo I could find. Oh that's really fecked me off.

I know what you mean.  It was the only one I could find that had 3 pci slots in it that was reasonably priced.  I realized I had a problem when some lights did not come on when they were scheduled to. 

The system never went into suspend mode.  It was always awake.  Just the usb ports would start working and the only thing that would recover them was a reboot although I don't believe I tried the lsusb command at the time.

Purps, go ahead and try it.  You have the board now and maybe your experience will be different.
Title: Re: Zwave device stops working and fails to restart
Post by: purps on December 07, 2010, 12:15:53 pm
Thanks for info guys, but to be honest I am getting tired of frigging around. Need my system up and working again before Christmas as I'm going away.

I've just bought a new AM3 mobo, RAM and CPU (£120), will just send the ASRock back, screw it.
Title: Re: Zwave device stops working and fails to restart
Post by: mythtified on December 08, 2010, 12:45:06 am
what board did you buy?
Title: Re: Zwave device stops working and fails to restart
Post by: purps on December 08, 2010, 12:59:39 am
I've been trying to fight it, but it's no use, I have gone AM3 http://www.ebuyer.com/product/241003

@tortho - feel free to jump back in at any point with your Z-wave issue! :) Sorry for the hijack.
Title: Re: Zwave device stops working and fails to restart
Post by: totallymaxed on December 08, 2010, 01:50:35 pm
Thanks for info guys, but to be honest I am getting tired of frigging around. Need my system up and working again before Christmas as I'm going away.

I've just bought a new AM3 mobo, RAM and CPU (£120), will just send the ASRock back, screw it.

Hi Purps,

Thats very strange indeed...

We sell/install a lot of ASrock based Cascade Core's and I cant say we've ever seen any USB issues like you describe...we've had the odd unit fail (as you do with any product line) but none of those issues were down to USB at all.

All the best


Andrew
Title: Re: Zwave device stops working and fails to restart
Post by: purps on December 08, 2010, 02:01:33 pm
Thanks for info guys, but to be honest I am getting tired of frigging around. Need my system up and working again before Christmas as I'm going away.

I've just bought a new AM3 mobo, RAM and CPU (£120), will just send the ASRock back, screw it.

Hi Purps,

Thats very strange indeed...

We sell/install a lot of ASrock based Cascade Core's and I cant say we've ever seen any USB issues like you describe...we've had the odd unit fail (as you do with any product line) but none of those issues were down to USB at all.

All the best


Andrew

Hi Andrew,

I haven't actually received this board yet (it's the K10N78D, have you used this one specifically?), so haven't tried it myself.

The problem mythtified describes is (I assume) for when a MCV dongle is being used with it (which is what I have) - I imagine you don't supply those to your customers due to the SIS thing? Could it be the combination of this particular board and this particular dongle that causes the sleepy USB issue?

Cheers,
Matt.
Title: Re: Zwave device stops working and fails to restart
Post by: totallymaxed on December 08, 2010, 04:44:39 pm
Thanks for info guys, but to be honest I am getting tired of frigging around. Need my system up and working again before Christmas as I'm going away.

I've just bought a new AM3 mobo, RAM and CPU (£120), will just send the ASRock back, screw it.

Hi Purps,

Thats very strange indeed...

We sell/install a lot of ASrock based Cascade Core's and I cant say we've ever seen any USB issues like you describe...we've had the odd unit fail (as you do with any product line) but none of those issues were down to USB at all.

All the best


Andrew

Hi Andrew,

I haven't actually received this board yet (it's the K10N78D, have you used this one specifically?), so haven't tried it myself.

The problem mythtified describes is (I assume) for when a MCV dongle is being used with it (which is what I have) - I imagine you don't supply those to your customers due to the SIS thing? Could it be the combination of this particular board and this particular dongle that causes the sleepy USB issue?

Cheers,
Matt.

Hi Matt,

We don't have any experience with that motherboard I'm afraid...so can't help I'm afraid.

Your right we dont use the MCV dongles in our customer systems at all. We use either Seluxit's or Tricklestar's - and we have no usb issues with either of those on the ASrock's. However I can imagine that if you attempted to use the MCV dongle as the inclusion controller and detached/re-attached it several times that you might encounter some PnP usb issues (these would not be hardware related however...but Pluto PnP code related).

We tend to disable the PnP signature for our dongles to avoid the PnP system re-detecting the dongle if it accidentally gets removed or needs to be re-attached again for some reason as re-attaching will cause the dongle to be re-detected and this often leads to the device failing to startup. If the PnP signature is removed, then as long as you plug it back into the same port again your good.

All the best


Andrew
Title: Re: Zwave device stops working and fails to restart
Post by: mythtified on December 08, 2010, 07:59:54 pm
Having the port disappear due to removing the dongle was not my case.  The machine would see the dongle early in the day.  I could watch the log as devices were polled/responding then later in the evening a light would fail to turn on on schedule or by event.  The dongle had not been removed during that time.  I would look at the logs and see no activity.  I could not control lights.  i would have to reboot the machine for the dongle to start working again.  Whether the usb ports were going to sleep or something else was going on I could not determine.  After playing with it for a while I decided to move to a different motherboard, Asus A8N32SLI.  Something I had laying around and have not had that problem again.  I never pursued the problem further.  I presently am using the board as a mythtv server and have little to no trouble from it.  Of course I am not using the usb ports either.
Title: Re: Zwave device stops working and fails to restart
Post by: totallymaxed on December 08, 2010, 09:00:14 pm
Having the port disappear due to removing the dongle was not my case.  The machine would see the dongle early in the day.  I could watch the log as devices were polled/responding then later in the evening a light would fail to turn on on schedule or by event.  The dongle had not been removed during that time.  I would look at the logs and see no activity.  I could not control lights.  i would have to reboot the machine for the dongle to start working again.  Whether the usb ports were going to sleep or something else was going on I could not determine.  After playing with it for a while I decided to move to a different motherboard, Asus A8N32SLI.  Something I had laying around and have not had that problem again.  I never pursued the problem further.  I presently am using the board as a mythtv server and have little to no trouble from it.  Of course I am not using the usb ports either.

The port does not disappear when removing a dongle - but you do need to re-insert it into the same port when PnP is disabled. What I was describing is not specific to the ASrock but a general problem.

It sounds more likely that your ZWave binary was dying after a few hours - it may have been that the usb chip used in the MCV was not happy with the usb ports on the ASrock. I know that MCV have had problems with their dongle when used in conjunction with their Vera product.

We have no such problems with ZWave or the ZWave interfaces we use and never have had either on any hardware let alone the ASrocks.

Anyway glad you found a solutions for your problem.

all the best

Andrew
Title: Re: Zwave device stops working and fails to restart
Post by: mythtified on December 08, 2010, 10:10:58 pm
Whoops.  I did not mean to imply that the port actually disappeared. Thx for the reply
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on December 12, 2010, 11:55:19 pm
Well then back to the original thread :-)

Believe this is where it "dies"...
Have been away for a few days and now it is no0t working again..
If it were a sleep issue I believe it would have appeared the first night...

Quote from log:
Code: [Select]
36 12/08/10 13:04:31.171 Got basic report from node 2, value: 99 <0xb71a2b90>
40 12/08/10 13:04:31.275 Sending job 0xa3df048 (cb 243) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0xf3 0xdb (######1####) <0xb71a2b90>
41 12/08/10 13:04:31.280 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb71a2b90>
36 12/08/10 13:04:31.280 ZW_SEND delivered to Z-Wave stack <0xb71a2b90>
41 12/08/10 13:04:31.300 0x1 0x5 0x0 0x13 0xf3 0x0 0x1a (#######) <0xb71a2b90>
36 12/08/10 13:04:31.301 ZW_SEND Response with callback 243 received <0xb71a2b90>
36 12/08/10 13:04:31.301 ZW_SEND was successful, removing job <0xb71a2b90>
36 12/08/10 13:05:01.451 We have been idle for 30 seconds, polling device states <0xb71a2b90>
40 12/08/10 13:05:01.547 Sending job 0xa3df268 (cb 244) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0xf4 0xcb (###### ####) <0xb71a2b90>
05 12/08/10 13:05:04.655 No callback received: await_callback: 244 timer: 31 <0xb71a2b90>
40 12/08/10 13:05:04.755 Sending job 0xa3df268 (cb 244) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0xf4 0xcb (###### ####) <0xb71a2b90>
05 12/08/10 13:05:07.855 No callback received: await_callback: 244 timer: 31 <0xb71a2b90>
40 12/08/10 13:05:07.955 Sending job 0xa3df268 (cb 244) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0xf4 0xcb (###### ####) <0xb71a2b90>
05 12/08/10 13:05:11.055 No callback received: await_callback: 244 timer: 31 <0xb71a2b90>
01 12/08/10 13:05:11.063 ERROR: Dropping command, no callback received after three resends <0xb71a2b90>
40 12/08/10 13:05:11.159 Sending job 0xa3df488 (cb 245) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0xf5 0xac (######`## ####) <0xb71a2b90>
05 12/08/10 13:05:14.267 No callback received: await_callback: 245 timer: 31 <0xb71a2b90>
40 12/08/10 13:05:14.715 Sending job 0xa3df488 (cb 245) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0xf5 0xac (######`## ####) <0xb71a2b90>
05 12/08/10 13:05:17.819 No callback received: await_callback: 245 timer: 31 <0xb71a2b90>
40 12/08/10 13:05:17.919 Sending job 0xa3df488 (cb 245) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0xf5 0xac (######`## ####) <0xb71a2b90>
05 12/08/10 13:05:21.039 No callback received: await_callback: 245 timer: 31 <0xb71a2b90>
01 12/08/10 13:05:21.039 ERROR: Dropping command, no callback received after three resends <0xb71a2b90>
40 12/08/10 13:05:21.139 Sending job 0xa3df6a8 (cb 246) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0xf6 0xde (######1####) <0xb71a2b90>
05 12/08/10 13:05:24.243 No callback received: await_callback: 246 timer: 31 <0xb71a2b90>
40 12/08/10 13:05:24.343 Sending job 0xa3df6a8 (cb 246) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0xf6 0xde (######1####) <0xb71a2b90>
05 12/08/10 13:05:27.463 No callback received: await_callback: 246 timer: 31 <0xb71a2b90>
40 12/08/10 13:05:27.567 Sending job 0xa3df6a8 (cb 246) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0xf6 0xde (######1####) <0xb71a2b90>
05 12/08/10 13:05:30.671 No callback received: await_callback: 246 timer: 31 <0xb71a2b90>
01 12/08/10 13:05:30.671 ERROR: Dropping command, no callback received after three resends <0xb71a2b90>
01 12/08/10 13:05:30.671 ERROR: Three dropped commands in a row, soft-resetting controller <0xb71a2b90>
36 12/08/10 13:05:30.671 Soft-resetting the Z-Wave chip <0xb71a2b90>
40 12/08/10 13:05:30.771 Sending job 0xa3df8c8 (cb 0) - 0x1 0x3 0x0 0x8 0xf4 (#####) <0xb71a2b90>
36 12/08/10 15:34:55.061 Received command for child <0xb61a0b90>
36 12/08/10 15:34:55.061 ON RECEIVED FOR CHILD 19/0 <0xb61a0b90>
36 12/09/10 6:00:00.001 Received command for child <0xb61a0b90>
36 12/09/10 6:00:00.051 ON RECEIVED FOR CHILD 19/0 <0xb61a0b90>
36 12/09/10 6:00:00.051 Received command for child <0xb61a0b90>
36 12/09/10 6:00:00.051 ON RECEIVED FOR CHILD 19/0 <0xb61a0b90>
36 12/09/10 6:00:00.052 Received command for child <0xb61a0b90>
36 12/09/10 6:00:00.052 ON RECEIVED FOR CHILD 19/0 <0xb61a0b90>
36 12/09/10 6:00:00.052 Received command for child <0xb61a0b90>
36 12/09/10 6:00:00.052 ON RECEIVED FOR CHILD 19/0 <0xb61a0b90>
Title: Re: Zwave device stops working and fails to restart
Post by: esev on December 13, 2010, 04:45:38 pm
Well then back to the original thread :-)

Believe this is where it "dies"...
Have been away for a few days and now it is no0t working again..
If it were a sleep issue I believe it would have appeared the first night...

Quote from log:
Code: [Select]
01 12/08/10 13:05:30.671 ERROR: Three dropped commands in a row, soft-resetting controller <0xb71a2b90>
36 12/08/10 13:05:30.671 Soft-resetting the Z-Wave chip <0xb71a2b90>
40 12/08/10 13:05:30.771 Sending job 0xa3df8c8 (cb 0) - 0x1 0x3 0x0 0x8 0xf4 (#####) <0xb71a2b90>
36 12/08/10 15:34:55.061 Received command for child <0xb61a0b90>
36 12/08/10 15:34:55.061 ON RECEIVED FOR CHILD 19/0 <0xb61a0b90>

Is it possible that the serial port goes away temporarily when the Z-Wave chip is reset?  I never encountered this when testing the soft-reset on Tricklestar hardware.  Do you see any Plug-and-play events happening around the same time as the soft-reset took place?

Just curious, do you have any wireless devices in your house operating around 900 MHz?  Maybe a cordless phone, wireless speakers, or a video monitor?
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on December 13, 2010, 09:51:15 pm
Been through all log files in /var/log and in /var/log/pluto and none of them has anything happening 8th December at 13:05 which was the time of failure.

Yes I do have a wireless phone operating at 900Mhz, but Zwave european model uses 868.42MHz.

so I'm still a little lost..
Title: Re: Zwave device stops working and fails to restart
Post by: hari on December 13, 2010, 11:32:47 pm
I wonder why the last send attempt does not run into a timeout (after the soft reset)..
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on December 14, 2010, 12:05:33 am
FYI,

- There is no timed events running at that time of the day (which I'm aware of or can see in the setup)
- No one at home used the zwave from the hybrid or md.
- The only thing that might have happened is that someone physically turned on or off one of the lights at home locally (Not through any remotes or computers)
Title: Re: Zwave device stops working and fails to restart
Post by: esev on December 14, 2010, 03:12:07 am
I wonder why the last send attempt does not run into a timeout (after the soft reset)..

Another question is why are there no more "Sending job" messages after the soft reset and no more 30 second polls.  It's like the main send/receive thread stopped working.
Title: Re: Zwave device stops working and fails to restart
Post by: hari on December 14, 2010, 03:43:39 pm
i'd suspect that we should not try to acquire the mutex a 2nd time in dropSendQueueJob()...

br Hari
Title: Re: Zwave device stops working and fails to restart
Post by: esev on December 14, 2010, 05:59:23 pm
i'd suspect that we should not try to acquire the mutex a 2nd time in dropSendQueueJob()...

When I added dropSendQueueJob(), I changed the mutex so that it is recursive.  dropSendQueueJob() should be able to acquire the mutex a second time from the same thread, as should zwSoftReset().
Title: Re: Zwave device stops working and fails to restart
Post by: hari on December 14, 2010, 10:09:51 pm
i'd suspect that we should not try to acquire the mutex a 2nd time in dropSendQueueJob()...

When I added dropSendQueueJob(), I changed the mutex so that it is recursive.  dropSendQueueJob() should be able to acquire the mutex a second time from the same thread, as should zwSoftReset().
seen that but it looks like it hangs there..

br Hari
Title: Re: Zwave device stops working and fails to restart
Post by: esev on December 14, 2010, 10:35:25 pm
i'd suspect that we should not try to acquire the mutex a 2nd time in dropSendQueueJob()...

When I added dropSendQueueJob(), I changed the mutex so that it is recursive.  dropSendQueueJob() should be able to acquire the mutex a second time from the same thread, as should zwSoftReset().
seen that but it looks like it hangs there..

I'm not understanding.  Are you saying the mutex is getting corrupted?

I see in the log it prints out "ERROR: Three dropped commands in a row, soft-resetting controller".  In dropSendQueueJob() this happens after the mutex is unlocked.  Then there is the "Soft-resetting the Z-Wave chip" from zwSoftReset().  Then I see the "Sending job..." which occurs from WriteSerialStringEx() back in the main receiveFunction() loop.

I was thinking the mutex was okay, otherwise the receiveFunction() wouldn't be able to acquire it again before calling WriteSerialStringEx().  But maybe, since this is all happening in the same thread, the mutex doesn't get unlocked as many times as it is locked - and then outside threads cannot add new things to the queue.

I was pretty sure there was a matching unlock for every mutex lock, but I'll take another look at the code again tonight.  Hari, if you could take a look too I'd appreciate having a second set of eyes looking at the code too.  It was from changeset 23385 (http://svn.linuxmce.org/trac.cgi/changeset/23385)
Title: Re: Zwave device stops working and fails to restart
Post by: valent on December 15, 2010, 05:00:55 pm
How do you disable PnP signature for your dongles?
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on December 20, 2010, 04:47:30 pm
No soft reset on this "freeze"....

Code: [Select]
01 12/19/10 13:42:07.680 ERROR: Dropping command, no callback received after three resends <0xb7236b90>
36 12/19/10 13:42:37.828 We have been idle for 30 seconds, polling device states <0xb7236b90>
40 12/19/10 13:42:37.928 Sending job 0x859e190 (cb 128) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x80 0xbf (###### ####) <0xb7236b90>
05 12/19/10 13:42:41.033 No callback received: await_callback: 128 timer: 31 <0xb7236b90>
40 12/19/10 13:42:41.133 Sending job 0x859e190 (cb 128) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x80 0xbf (###### ####) <0xb7236b90>
05 12/19/10 13:42:44.237 No callback received: await_callback: 128 timer: 31 <0xb7236b90>
40 12/19/10 13:42:44.333 Sending job 0x859e190 (cb 128) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x80 0xbf (###### ####) <0xb7236b90>
05 12/19/10 13:42:47.437 No callback received: await_callback: 128 timer: 31 <0xb7236b90>
01 12/19/10 13:42:47.437 ERROR: Dropping command, no callback received after three resends <0xb7236b90>
01 12/19/10 13:42:47.437 ERROR: Three dropped commands in a row, soft-resetting controller <0xb7236b90>
36 12/19/10 13:42:47.437 Soft-resetting the Z-Wave chip <0xb7236b90>
40 12/19/10 13:42:47.541 Sending job 0x859e3b0 (cb 129) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x81 0xd8 (######`## ####) <0xb7236b90>
05 12/19/10 13:42:50.641 No callback received: await_callback: 129 timer: 31 <0xb7236b90>
40 12/19/10 13:42:50.745 Sending job 0x859e3b0 (cb 129) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x81 0xd8 (######`## ####) <0xb7236b90>
05 12/19/10 13:42:53.849 No callback received: await_callback: 129 timer: 31 <0xb7236b90>
40 12/19/10 13:42:53.948 Sending job 0x859e3b0 (cb 129) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x81 0xd8 (######`## ####) <0xb7236b90>
05 12/19/10 13:42:57.053 No callback received: await_callback: 129 timer: 31 <0xb7236b90>
01 12/19/10 13:42:57.053 ERROR: Dropping command, no callback received after three resends <0xb7236b90>
40 12/19/10 13:42:57.153 Sending job 0x859e5d0 (cb 130) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x82 0xaa (######1####) <0xb7236b90>
05 12/19/10 13:43:00.265 No callback received: await_callback: 130 timer: 31 <0xb7236b90>
40 12/19/10 13:43:00.365 Sending job 0x859e5d0 (cb 130) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x82 0xaa (######1####) <0xb7236b90>
05 12/19/10 13:43:03.473 No callback received: await_callback: 130 timer: 31 <0xb7236b90>
40 12/19/10 13:43:03.573 Sending job 0x859e5d0 (cb 130) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x82 0xaa (######1####) <0xb7236b90>
05 12/19/10 13:43:06.681 No callback received: await_callback: 130 timer: 31 <0xb7236b90>
01 12/19/10 13:43:06.681 ERROR: Dropping command, no callback received after three resends <0xb7236b90>
40 12/19/10 13:43:06.776 Sending job 0x859e7f0 (cb 0) - 0x1 0x3 0x0 0x8 0xf4 (#####) <0xb7236b90>
05 12/19/10 14:22:28.360 Going to rotate logs... <0xb60ffb90>
36 12/19/10 15:34:10.034 Received command for child <0xb60ffb90>
36 12/19/10 15:34:10.035 ON RECEIVED FOR CHILD 19/0 <0xb60ffb90>
36 12/20/10 5:59:59.627 Received command for child <0xb60ffb90>
36 12/20/10 5:59:59.891 ON RECEIVED FOR CHILD 19/0 <0xb60ffb90>
36 12/20/10 6:00:00.001 Received command for child <0xb60ffb90>
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on December 27, 2010, 01:27:09 pm
Hi,

Will anyone look into the possible issues as mentioned above regarding the threads and possible hangs?
Quite annoying. The Z-stick was working properly earlier...

Valent:
What do you mean about pnp signatures?
Title: Re: Zwave device stops working and fails to restart
Post by: donpaul on January 04, 2011, 03:33:53 pm
I am seeing the same issues with my stick. It has worked flawless for the past year, then I upgraded and Zwave hangs every day. I literally have to remove it, and plug it back in before it will work. After a day or so, it will stop responding again. zwave driver problem maybe?

Here is a truncated log. This repeated several times then stopped at 00:22.

05   01/02/11 21:55:04.913      ERROR: ZW_SEND could not be delivered to Z-Wave stack <0xb7133b90>
05   01/02/11 21:55:05.168      ERROR: callback id is invalid! <0xb7133b90>
05   01/02/11 21:58:11.936      No callback received: await_callback: 233 timer: 31 <0xb7133b90>
05   01/02/11 22:05:44.356      No callback received: await_callback: 22 timer: 31 <0xb7133b90>
01   01/04/11 0:20:42.978      ERROR! Out of frame flow!! <0xb71a9b90>
01   01/04/11 0:20:42.979      Checksum incorrect ff - sending NAK <0xb71a9b90>
01   01/04/11 0:20:42.980      ERROR! Out of frame flow!! <0xb71a9b90>
05   01/04/11 0:22:34.889      No callback received: await_callback: 147 timer: 31 <0xb71a9b90>
05   01/04/11 0:22:38.089      No callback received: await_callback: 147 timer: 31 <0xb71a9b90>
05   01/04/11 0:22:41.289      No callback received: await_callback: 147 timer: 31 <0xb71a9b90>
01   01/04/11 0:22:41.289      ERROR: Dropping command, no callback received after three resends <0xb71a9b90>
01   01/04/11 0:22:41.289      ERROR: Three dropped commands in a row, soft-resetting controller <0xb71a9b90>
Title: Re: Zwave device stops working and fails to restart
Post by: hari on January 04, 2011, 04:23:49 pm
Please provide the complete log
Title: Re: Zwave device stops working and fails to restart
Post by: hari on January 04, 2011, 05:00:03 pm
Hi,

Will anyone look into the possible issues as mentioned above regarding the threads and possible hangs?
Quite annoying. The Z-stick was working properly earlier...
I've commited a patch, should be available within a few hours with apt-get update

please test and report. Be aware that I never encountered those soft lockups so i cannot test it myself.

br Hari
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 04, 2011, 06:50:11 pm
Thanks,

Will get my better half to install and restart today.
Will report back and also send the complete log on Friday when I get home if it is still running.
Title: Re: Zwave device stops working and fails to restart
Post by: donpaul on January 06, 2011, 01:27:32 am
So far so good for me.
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 08, 2011, 01:26:19 pm
So far so good...

Looks like there is some errors in my log.. "wrong callback type" and some other errors..

Also I can see it mention node 19, but I can not find any node 19 in my system...

Smoke detectors still not logging anything..

Complete log here:
http://www.thorland.org/112_ZWave.log
Title: Re: Zwave device stops working and fails to restart
Post by: hari on January 08, 2011, 11:13:50 pm
You need to associate he smoke detector with the dongle
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 09, 2011, 02:39:13 pm
Smoke detector, Okay, will try to associate.. so far only paired and showed up in webadmin.

Z-Stick:
Still fails to restart.
See the complete log here:
http://www.thorland.org/112_ZWave.log
and here:
http://www.thorland.org/112_ZWave.log.1
Title: Re: Zwave device stops working and fails to restart
Post by: esev on January 09, 2011, 02:52:53 pm
Z-Stick:
Still fails to restart.
See the complete log here:
http://www.thorland.org/112_ZWave.log
and here:
http://www.thorland.org/112_ZWave.log.1

Is the Z-Stick still stuck?  If so, can you get us some more information about it?  Login to your linuxmce computer as root and do the following:

First, figure out which process ID belongs to the ZWave driver.
Code: [Select]
ps -ef | grep 112_ZWaveYou should see a line like this:
Code: [Select]
root     23937 23870  0 Jan08 pts/22   00:00:05 /usr/pluto/bin/ZWave -d 183 -r localhost -l /var/log/pluto/183_ZWave.logIn the example above, 23937 is the process ID.

Next, run the following:
Code: [Select]
cd /usr/pluto/bin
gdb ZWave -p <processID>
thread apply all bt full

Keep pressing enter at each of the prompts.  When you get back to the (gdb) prompt, press q to exit and detach from the process.

Paste in the results of running thread apply all bt full, that'll help us track down the problem.  It should look something like this
Code: [Select]
root     23937 23870  0 Jan08 pts/22   00:00:05 /usr/pluto/bin/ZWave -d 183 -r localhost -l /var/log/pluto/183_ZWave.log
root     27399 26904  0 08:42 pts/0    00:00:00 grep ZW
dcerouter_:~# cd /usr/pluto/bin/
dcerouter_:/usr/pluto/bin# gdb ZWave -p 23937
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Attaching to program: /usr/pluto/bin/ZWave, process 23937
Reading symbols from /usr/pluto/lib/libSerializeClass.so...done.
Loaded symbols for /usr/pluto/lib/libSerializeClass.so
Reading symbols from /usr/pluto/lib/libDCECommon.so...done.
Loaded symbols for /usr/pluto/lib/libDCECommon.so
Reading symbols from /usr/pluto/lib/libPlutoUtils.so...done.
Loaded symbols for /usr/pluto/lib/libPlutoUtils.so
Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0xb71b66c0 (LWP 23937)]
[New Thread 0xb5ffeb90 (LWP 23949)]
[New Thread 0xb67ffb90 (LWP 23947)]
[New Thread 0xb71b5b90 (LWP 23946)]
Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/i686/cmov/libm.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /usr/lib/libmysqlclient_r.so.15...done.
Loaded symbols for /usr/lib/libmysqlclient_r.so.15
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1...done.
Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1
Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done.
Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
0xb782a430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb782a430 in __kernel_vsyscall ()
#1  0xb7687bf7 in pthread_join () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x080589cf in main ()
(gdb) thread apply all bt full

Thread 4 (Thread 0xb71b5b90 (LWP 23946)):
#0  0xb782a430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb74d9df1 in select () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0x0806e609 in ReadSerialStringEx ()
No locals.
#3  0x08079ce4 in ZWApi::ZWApi::receiveFunction ()
No locals.
#4  0x0807a532 in start ()
No locals.
#5  0xb768750f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#6  0xb74e1a0e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 3 (Thread 0xb67ffb90 (LWP 23947)):
#0  0xb782a430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb768b075 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#2  0xb76c802b in PlutoLock::CondWait (this=0xb67ff33c) at MultiThreadIncludes.cpp:532
iResult = 154139124
#3  0xb778ba76 in DCE::Command_Impl::ProcessMessageQueue (this=0x92ff8f8) at Command_Impl.cpp:827
copyMessageQueue = {
  <std::_List_base<DCE::Message*, std::allocator<DCE::Message*> >> = {
    _M_impl = {
      <std::allocator<std::_List_node<DCE::Message*> >> = {
        <__gnu_cxx::new_allocator<std::_List_node<DCE::Message*> >> = {<No data fields>}, <No data fields>},
      members of std::_List_base<DCE::Message*, std::allocator<DCE::Message*> >::_List_impl:
      _M_node = {
        _M_next = 0xb770922c,
        _M_prev = 0xb77c9ff4
      }
    }
  }, <No data fields>}
itMessageQueue = {
  _M_node = 0xb67ff3b8
}
---Type <return> to continue, or q <return> to quit---
mq = {
  m_sFileName = {
    static npos = 4294967295,
    _M_dataplus = {
      <std::allocator<char>> = {
        <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
      _M_p = 0x930a73c "Command_Impl.cpp"
    }
  },
  m_Line = 822,
  m_LockNum = 104,
  m_bReleased = true,
  m_bLog = false,
  m_bLogErrorsOnly = false,
  m_bGotLock = true,
  m_bIgnoreDeadlock = false,
  m_sMessage = {
    static npos = 4294967295,
    _M_dataplus = {
      <std::allocator<char>> = {
        <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
      _M_p = 0xb767c694 ""
    }
  },
  m_pMyLock = 0x92ffa34,
  m_tTime = 0,
  m_thread = 3061840784
}
#4  0xb77889b0 in MessageQueueThread_DCECI (param=0x92ff8f8) at Command_Impl.cpp:48
p = (warning: RTTI symbol not found for class 'DCE::ZWave'
warning: RTTI symbol not found for class 'DCE::ZWave'
class DCE::Command_Impl *) 0x92ff8f8
#5  0xb768750f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#6  0xb74e1a0e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 2 (Thread 0xb5ffeb90 (LWP 23949)):
#0  0xb782a430 in __kernel_vsyscall ()
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#1  0xb74d9df1 in select () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0xb7775289 in DCE::Socket::ReceiveData (this=0x92ff8f8, iSize=1, pcData=0xb5ffd258 "\n", nTimeout=-1) at Socket.cpp:760
tv = {
  tv_sec = 0,
  tv_usec = 696000
}
tv_select_1 = {
  tv_sec = 1294580542,
  tv_usec = 723432
}
tv_select = {
  tv_sec = 0,
  tv_usec = 999664
}
iRet = 0
rfds = {
  fds_bits = {128, 0 <repeats 31 times>}
}
tv_total = {
  tv_sec = -662,
  tv_usec = 993200
}
tv_select_2 = {
  tv_sec = 1294580542,
  tv_usec = 723430
}
nInternalReceiveTimeout = 0
iBytesLeft = 1
sSM = {
  m_sFileName = {
    static npos = 4294967295,
    _M_dataplus = {
      <std::allocator<char>> = {
        <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
      _M_p = 0x934e73c "Socket.cpp"
    }
  },
  m_Line = 689,
---Type <return> to continue, or q <return> to quit---
  m_LockNum = 861,
  m_bReleased = false,
  m_bLog = 177,
  m_bLogErrorsOnly = true,
  m_bGotLock = true,
  m_bIgnoreDeadlock = true,
  m_sMessage = {
    static npos = 4294967295,
    _M_dataplus = {
      <std::allocator<char>> = {
        <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
      _M_p = 0xb767c694 ""
    }
  },
  m_pMyLock = 0x92ff948,
  m_tTime = 1294579881,
  m_thread = 3053448080
}
clk_start = 3300000
clk_select1 = 0
clk_select1b = 0
clk_select2 = 0
clk_select2b = 0
#3  0x080692fc in DCE::Command_Impl::ReceiveData ()
No locals.
#4  0xb77748c8 in DCE::Socket::ReceiveString (this=0x92ff8f8, sRefString=@0xb5ffe374, nTimeout=-1) at Socket.cpp:884
acBuf = "\n\000\000\000??b?\224?/\t\000\000\000\000\000\000\000\000\000???\000\000?e\000\000\000?\001\000\000\000?????????\204^\177\022?5@\002", '\0' <repeats 11 times>, "?\237U?\000\000\000\000?ZS?\220???\004\000\000\000\003\000\000\000@?U?0???????????\003\000\000\0000?4\tt\vG??\237g?0?4\t\224?/\t?????\204^\177\022?5@????0?4\t@?U?\002\000\000\000\030\000\000\000d?S?.", '\0' <repeats 15 times>, "????\f\000\000\000\006\000\000\000\000\000\000\000\000???\000\000\000\000\003\000"...
pcBuf = 0xb5ffd258 "\n"
iLen = 4095
#5  0x0806917b in DCE::Command_Impl::ReceiveString ()
No locals.
#6  0xb77a26f7 in DCE::HandleRequestSocket::RunThread (this=0x92ff8f8) at HandleRequestSocket.cpp:125
sMessage = {
  static npos = 4294967295,
  _M_dataplus = {
    <std::allocator<char>> = {
---Type <return> to continue, or q <return> to quit---
      <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
    members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
    _M_p = 0x93509e4 "MESSAGE 67", ' ' <repeats 12 times>
  }
}
#7  0xb77a25d7 in BeginHandleRequestThread (HRqSock=0x92ff8f8) at HandleRequestSocket.cpp:31
pCS = (warning: RTTI symbol not found for class 'DCE::ZWave'
warning: RTTI symbol not found for class 'DCE::ZWave'
DCE::HandleRequestSocket *) 0x92ff8f8
#8  0xb768750f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#9  0xb74e1a0e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 1 (Thread 0xb71b66c0 (LWP 23937)):
#0  0xb782a430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7687bf7 in pthread_join () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#2  0x080589cf in main ()
No locals.
#0  0xb782a430 in __kernel_vsyscall ()
(gdb) q
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/pluto/bin/ZWave, process 23937
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 09, 2011, 03:06:57 pm
Did a reboot so it is working now.. will do the above once it hangs again..
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 10, 2011, 05:49:29 pm
Stopped working again today... here is the output..
I left it as is, if more info is needed. (i.e. no restart yet)

Code: [Select]
ps -ef | grep 112_ZWave
Output:
Code: [Select]
linuxmce@dcerouter:~$ linuxmce@dcerouter:~$ ps -ef | grep 112_ZWave
n09 pts/18   00:00:04 /usr/pluto/bin/ZWave -d 112 -r localhost -l /var/log/pluto/112_ZWave.log
www-data 25391 16228  0 Jan09 ?        00:00:00 sh -c tail -f /var/log/pluto/112_ZWave.log | /usr/pluto/bin/ansi2html
www-data 25392 25391  0 Jan09 ?        00:00:00 tail -f /var/log/pluto/112_ZWave.log
linuxmce 26647 24873  0 17:36 pts/30   00:00:00 grep 112_ZWave
linuxmce@dcerouter:~$

Code: [Select]
cd /usr/pluto/bin
gdb ZWave -p <processID>
thread apply all bt full

Output:
Code: [Select]
dcerouter_1012936:/usr/pluto/bin# gdb ZWave -p 5376 thread apply all bt full
Excess command line arguments ignored. (thread ...)
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <xxxxorg/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Attaching to program: /usr/pluto/bin/ZWave, process 5376
Reading symbols from /usr/pluto/lib/libSerializeClass.so...done.
Loaded symbols for /usr/pluto/lib/libSerializeClass.so
Reading symbols from /usr/pluto/lib/libDCECommon.so...done.
Loaded symbols for /usr/pluto/lib/libDCECommon.so
Reading symbols from /usr/pluto/lib/libPlutoUtils.so...done.
Loaded symbols for /usr/pluto/lib/libPlutoUtils.so
Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0xb70c46c0 (LWP 5376)]
[New Thread 0xb5effb90 (LWP 6889)]
[New Thread 0xb68c2b90 (LWP 6884)]
[New Thread 0xb70c3b90 (LWP 6883)]
Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/i686/cmov/libm.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /usr/lib/libmysqlclient_r.so.15...done.
Loaded symbols for /usr/lib/libmysqlclient_r.so.15
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1...done.
Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1
Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done.
Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
0xb7738430 in __kernel_vsyscall ()
(gdb)
(gdb) thread apply all bt full

Thread 4 (Thread 0xb70c3b90 (LWP 6883)):
#0  0xb7738430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb759c08b in write () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#2  0x0806ebf4 in WriteSerialStringEx ()
No locals.
#3  0x0807b15e in ZWApi::ZWApi::receiveFunction ()
No locals.
#4  0x0807b756 in start ()
No locals.
#5  0xb759550f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#6  0xb73efa0e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 3 (Thread 0xb68c2b90 (LWP 6884)):
#0  0xb7738430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7599075 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#2  0xb75d602b in PlutoLock::CondWait (this=0xb68c233c) at MultiThreadIncludes.cpp:532
        iResult = 165352524
#3  0xb7699a76 in DCE::Command_Impl::ProcessMessageQueue (this=0x9db1350) at Command_Impl.cpp:827
        copyMessageQueue = {<std::_List_base<DCE::Message*, std::allocator<DCE::Message*> >> = {
    _M_impl = {<std::allocator<std::_List_node<DCE::Message*> >> = {<__gnu_cxx::new_allocator<std::_List_node<DCE::Message*> >> = {<No data fields>}, <No data fields>},
      _M_node = {_M_next = 0xb761722c, _M_prev = 0xb76d7ff4}}}, <No data fields>}
        itMessageQueue = {_M_node = 0xb68c23b8}
        mq = {m_sFileName = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      _M_p = 0x9dc2e84 "Command_Impl.cpp"}}, m_Line = 822, m_LockNum = 146, m_bReleased = true, m_bLog = false, m_bLogErrorsOnly = false, m_bGotLock = true,
  m_bIgnoreDeadlock = false, m_sMessage = {static npos = 4294967295,
    _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0xb758a694 ""}}, m_pMyLock = 0x9db148c,
  m_tTime = 0, m_thread = 3062639504}
#4  0xb76969b0 in MessageQueueThread_DCECI (param=0x9db1350) at Command_Impl.cpp:48
        p = (class DCE::Command_Impl *) 0x9db1350
#5  0xb759550f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#6  0xb73efa0e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 2 (Thread 0xb5effb90 (LWP 6889)):
#0  0xb7738430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb73e7df1 in select () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0xb7683289 in DCE::Socket::ReceiveData (this=0x9db1350, iSize=1, pcData=0xb5efe258 "\020ãﵧ@5·\202âïµdÇD·\001", nTimeout=-1) at Socket.cpp:760
        tv = {tv_sec = 0, tv_usec = 160000}
        tv_select_1 = {tv_sec = 1294677892, tv_usec = 19954}
        tv_select = {tv_sec = 0, tv_usec = 995998}
        iRet = 0
---Type <return> to continue, or q <return> to quit---
        rfds = {fds_bits = {128, 0 <repeats 31 times>}}
        tv_total = {tv_sec = -6253, tv_usec = 14254}
        tv_select_2 = {tv_sec = 1294677892, tv_usec = 19951}
        nInternalReceiveTimeout = 0
        iBytesLeft = 1
        sSM = {m_sFileName = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      _M_p = 0x9dc5804 "Socket.cpp"}}, m_Line = 689, m_LockNum = 1556, m_bReleased = false, m_bLog = 145, m_bLogErrorsOnly = true, m_bGotLock = true, m_bIgnoreDeadlock = true,
  m_sMessage = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
      _M_p = 0xb758a694 ""}}, m_pMyLock = 0x9db13a0, m_tTime = 1294671639, m_thread = 3052403600}
        clk_start = 5250000
        clk_select1 = 0
        clk_select1b = 0
        clk_select2 = 0
        clk_select2b = 0
#3  0x080699c8 in DCE::Command_Impl::ReceiveData ()
No locals.
#4  0xb76828c8 in DCE::Socket::ReceiveString (this=0x9db1350, sRefString=@0xb5eff374, nTimeout=-1) at Socket.cpp:884
        acBuf = "\020ãﵧ@5·\202âïµdÇD·\001\000\000\000\020ãïµ\002\000\000\000ã?5·dÇD·\020ãïµ39.018\000\000Ðâïµ\002\000\000\0000ëïµ\000\000\000\000ô\177F·\000\000\000\000\004\000\000\000ÌI5·3\000\000\0009\000\000\000.\000\000\0000\000\000\0001\000\000\0008\000\000\000\003\000\000\000XXÜ\ttë7·ô\177X·\000\000\000\000\000\000\000 \000\000\000\000¤l\221´\033E\"@qqY·të7·ô\177X·\006\000\000\000\000\000\000\000\n\000\000\000ô\203S·\224\003Û\t\000\000\000\000\000\000\000\000 äïµ\000\000\030,\000\000\000 \005\000\000\000täïµ"...
        pcBuf = 0xb5efe258 "\020ãﵧ@5·\202âïµdÇD·\001"
        iLen = 4095
#5  0x08069847 in DCE::Command_Impl::ReceiveString ()
No locals.
#6  0xb76b06f7 in DCE::HandleRequestSocket::RunThread (this=0x9db1350) at HandleRequestSocket.cpp:125
        sMessage = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
    _M_p = 0x9dc011c "MESSAGE 67", ' ' <repeats 12 times>}}
#7  0xb76b05d7 in BeginHandleRequestThread (HRqSock=0x9db1350) at HandleRequestSocket.cpp:31
        pCS = (DCE::HandleRequestSocket *) 0x9db1350
#8  0xb759550f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#9  0xb73efa0e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 1 (Thread 0xb70c46c0 (LWP 5376)):
#0  0xb7738430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7595bf7 in pthread_join () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#2  0x08058a2f in main ()
No locals.
#0  0xb7738430 in __kernel_vsyscall ()
(gdb)



Title: Re: Zwave device stops working and fails to restart
Post by: esev on January 10, 2011, 06:21:48 pm
That is interesting.  It locks up right after sending the soft-reset.  Can you share your /var/log/kern.log and /var/log/kern.log.0 with us?  The output of the following would be helpful too:

Code: [Select]
ls -al /proc/5376/fd
It would also help to have a copy of the ZWave process in its stuck state.  You can create a complete snapshot of that process with gcore.  Here is how to run it:
Code: [Select]
gcore <processID>The last line of output after running gcore will be the filename containing the process state.  If you can get us a copy of that file and of your /usr/pluto/bin/ZWave executable that would help us a lot!
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 10, 2011, 06:41:15 pm
Trust the below is readable and what you asked for.
Have left the system in the same state if you need anything more... (Gcore uploading when i write this..)
(For some reason we are no longer allowed to post links so you have to modify the below..

Can you share your /var/log/kern.log and /var/log/kern.log.0 with us?  The output of the following would be helpful too
ww_w.thorland.org/kern.log
ww_w.thorland.org/kern.log.0

Code: [Select]
ls -al /proc/5376/fd
Result:
Code: [Select]

linuxmce@dcerouter:~$ linuxmce@dcerouter:~$ sudo ls -al /proc/5376/fd
total 0
dr-x------ 2 root root  0 2011-01-10 14:53 .
dr-xr-xr-x 7 root root  0 2011-01-10 06:23 ..
lrwx------ 1 root root 64 2011-01-10 14:53 0 -> /dev/pts/18
lrwx------ 1 root root 64 2011-01-10 14:53 1 -> /dev/pts/18
lrwx------ 1 root root 64 2011-01-10 14:53 2 -> /dev/pts/18
l-wx------ 1 root root 64 2011-01-10 14:53 3 -> /var/log/pluto/112_ZWave.log
lrwx------ 1 root root 64 2011-01-10 14:53 4 -> socket:[61143]
lrwx------ 1 root root 64 2011-01-10 14:53 5 -> socket:[63368]
lrwx------ 1 root root 64 2011-01-10 14:53 6 -> /dev/ttyUSB0
lrwx------ 1 root root 64 2011-01-10 14:53 7 -> socket:[63459]
linuxmce@dcerouter:~$

It would also help to have a copy of the ZWave process in its stuck state.  You can create a complete snapshot of that process with gcore.  Here is how to run it:
Code: [Select]
gcore <processID>ww_w.thorland.org/gcore-output.5376

The last line of output after running gcore will be the filename containing the process state.  If you can get us a copy of that file and of your /usr/pluto/bin/ZWave executable that would help us a lot!

ww_w.thorland.org/ZWave
Title: Re: Zwave device stops working and fails to restart
Post by: tortho on January 15, 2011, 12:23:38 pm
Just curious.. did anyone look at the files and logs I provided?
Title: Re: Zwave device stops working and fails to restart
Post by: esev on January 15, 2011, 03:35:33 pm
Just curious.. did anyone look at the files and logs I provided?

I took a look at them, but wasn't able to figure out why it is locking up for you.  At this point, I think we might need to make the soft-reset optional.  I'll have some time to look into this on Monday.
Title: Re: Zwave device stops working and fails to restart
Post by: hari on January 15, 2011, 07:08:56 pm
I took a look at them, but wasn't able to figure out why it is locking up for you.  At this point, I think we might need to make the soft-reset optional.  I'll have some time to look into this on Monday.
looks like it got stuck on the receive function somehow..

this is quite weird..

I'd say let's make the soft reset code optional as it causes issues. Afaict there are only a few people with broken dongles.

Sorry for any inconvenience tortho.. I could never test this myself as none of my dongles showed the lockup behavior (Seluxit, First gen aeon labs, ACT ZCU and Tricklestar).

br Hari