LinuxMCE Forums

General => Users => Topic started by: esev on October 26, 2010, 04:46:29 am

Title: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 26, 2010, 04:46:29 am
Does anyone else get this error in their ZWave log?
   No callback received: await_callback: 105 timer: 31 <0xb7136b90>
   No callback received: await_callback: 105 timer: 31 <0xb7136b90>
   No callback received: await_callback: 105 timer: 31 <0xb7136b90>
   ERROR: Dropping command, no callback received after three resends <0xb7136b90>

I have the Tricklestar 300ZW and see these from time to time.  It seems once I start getting this error it continues till I reboot the computer.  While this error is occurring LinuxMCE is unable to control any ZWave devices.

Just curious if anyone else has seen this. Is it limited to the Tricklestar dongle? Is there any way to prevent it?
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: huh on October 26, 2010, 07:37:44 am
I get this too- on my aeon labs mi casa verde z-wave stick:

05      10/25/10 21:10:12.554           ^[[33;1mNo callback received: await_callback: 201 timer: 31^[[0m <0xb7253b90>
05      10/25/10 21:28:06.314           ^[[33;1mNo callback received: await_callback: 51 timer: 31^[[0m <0xb7253b90>
05      10/25/10 23:10:51.874           ^[[33;1mNo callback received: await_callback: 145 timer: 31^[[0m <0xb7253b90>


My lamppost automatically shuts off at 22:00.  As you can see, I had errors before and after, but I just confirmed that the light shut off.  Don't know if it was at 22:00, but don't doubt that it was.

Here is the start of my file:
05      10/25/10 8:23:04.167            ^[[33;1mGoing to rotate logs...^[[0m <0xb6251b90>
05      10/25/10 8:27:35.131            ^[[33;1mGoing to rotate logs...^[[0m <0xb6251b90>
05      10/25/10 8:30:40.158            ^[[33;1mNo callback received: await_callback: 71 timer: 31^[[0m <0xb7253b90>
05      10/25/10 8:46:31.967            ^[[33;1mNo callback received: await_callback: 164 timer: 31^[[0m <0xb7253b90>
05      10/25/10 8:48:37.426            ^[[33;1mNo callback received: await_callback: 176 timer: 31^[[0m <0xb7253b90>

Going through the zipped files for the last 6 days, all of them have between about 7:30 and 8:30 that line about rotating the logs.  After that there are 15 to 25 lines of "No callback received" stretching across the day in what seems to be random intervals.  I can't imagine a z-wave device that we would use at those times.

They do not stop my lamppost from turning on at sunset and off at 10:00.

FWIW
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: sambuca on October 26, 2010, 08:05:09 am
Hari will have to confirm, but if I've understood correctly, a missing callback means that the ZWave driver couldn't receive a confirmation for the message it sent. This might be caused by communication problems, like having radio interference or too far distance between nodes.

br,
sambuca
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 26, 2010, 03:29:54 pm
I suspected communications problems at first too.  But my ZWave dongle has nearly direct line-of-sight to the closest device, and the two are less than 20ft away from each other.  The weird thing is that the errors go away immediately after a reboot.

The errors are always in a set of four: three await_callback and one Dropping command.  They will continue this way for hours, sometimes going back to normal on their own.  When everything is working fine I don't get any errors except for an occasional single await_callback.

I'm happy to try resetting my devices and moving the ZWave dongle to one of my Media Directors (will this work?) just to see if being in a different part of the house has any effect.

I should also mention, the last time I got these errors I issued a "#788 StatusReport" to the ZWave plugin.  I stopped getting these errors after sending that command.  I think that is because implementation of the StatusReport command Soft-Resets the dongle.  I've only tried this one time however, so it might have just been a coincidence that things started working again.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 26, 2010, 08:45:41 pm
Maybe sending the Soft-Reset does work.  I just noticed I was getting these error messages again.  I sent the StatusReport command to the ZWave device and, after the 30sec device polling was complete, the system went back to normal.

I'll enable debugging on the pl2303 module and also crank up the logging on the ZWave device.  Maybe between the logs I be able to catch something out of the ordinary going on.

Failing that, Hari, would you mind if I submit a patch to do a Soft-Reset if all devices fail to respond to the poll that takes place every 30 seconds?
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: hari on October 26, 2010, 09:41:59 pm
hi Eric,

Yes, the soft reset is triggered by that dce command. If I remember correctly I did implement that for domodude to test. I've never seen that device hang myself, but I'm using a seluxit dongle as primary device.

I'd appreciate such a patch as i did never find the time to implement it myself, and domodude was working around with a script that parsed the logs and used messagesend to reset the device when seeing a bunch of those messages in the logs..

The poll is probably the wrong place as this can be deactivated with a dce command.. Detecting excessive miss of callback (maybe a counter) would probably work...

Br, keep up the great work,
Hari
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 26, 2010, 11:00:56 pm
Hari,

Thanks for the reply.  With the full debugging turned on, I'm seeing some weird things.  For example, I see the ZWave dongle send a message to the driver at the same time the ZWave driver sends a message to the dongle.  This is causing an issue as in Serial.cpp line 95 all the data in the input buffers is flushed when a message is transmitted to the dongle.  This causes a message from the dongle to be missed in the driver and sometimes winds up in a ERROR! Out of frame flow!! message.

Is there a reason for flushing the input buffer like that?  I'm hesitant to touch that code because it might be there to handle a special case with another ZWave dongle.

I haven't gotten into the infinite retry state yet with full debugging turned on.  Still waiting...

Thanks!
Eric
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 27, 2010, 05:00:46 am
Hari,

I finally got the error to occur again.  However I cannot find anything telling in the logs that would indicate why it is failing.  Again, performing the soft-reset cleared things up.  If you'd like to take a look at the logs for yourself, you can download them from: http://esev.com/zwave.tar.bz2 (http://esev.com/zwave.tar.bz2)  Both the ZWave and kernel debugging logs are included.

One thing that did arrise from the logs is that when this issue occurs the ZWave controller is sending back notifications that the message cannot be sent to the ZWave stack: i.e. "ERROR: ZW_SEND could not be delivered to Z-Wave stack" shows up in the logs.

Since I cannot see any other obvious way to fix this, I'm using your suggestion of a counter to trigger a soft-reset after receiving two consecutive ZW_SEND delivery errors.  I chose two because the retry limit of a job is currently set to 3.  My thought was that if the reset was successful, the third retry would still have a chance to work.  To do this, however, I needed to insert the soft-reset job at the beginning of the ZWSendQueue.

Let me know if this will be acceptable.  I've attached a preliminary patch.  I'll wait to confirm this works before posting it to a ticket on Trac.  I can add the "Out of frame flow" fix from my previous post to this patch too, if you'd like.

TIA
Eric
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: Kooma on October 27, 2010, 08:28:32 am
I'm having a bit similar situation here with the Selexit dongle. Everything works perfect, except when using timed events to control lights, sometimes an excessive repetition of the command is broadcasted. It occurs about once every millisecond. Reloading the router gets things back to normal. The system may work several days OK yet sometimes this erroneous function takes place within a day.


The DCE router log shows (this goes on until reloading):
Code: [Select]
08      10/27/10 0:30:01.565            Received Message from 12 (Event Plug-in / ) to 503 (Lights (dim) - MH / ), type 1 id 193 Command:Off, retry none, parameters: <0x9f7fab90>
08      10/27/10 0:30:01.565              Parameter 97(PK_Pipe):  <0x9f7fab90>
08      10/27/10 0:30:01.566            Received Message from 12 (Event Plug-in / ) to 503 (Lights (dim) - MH / ), type 1 id 193 Command:Off, retry none, parameters: <0x9f7fab90>
08      10/27/10 0:30:01.566              Parameter 97(PK_Pipe):  <0x9f7fab90>
08      10/27/10 0:30:01.568            Received Message from 12 (Event Plug-in / ) to 503 (Lights (dim) - MH / ), type 1 id 193 Command:Off, retry none, parameters: <0x9f7fab90>

I hope this helps in finding out what is going on.

/Kooma
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: hari on October 27, 2010, 12:06:45 pm
I'm having a bit similar situation here with the Selexit dongle. Everything works perfect, except when using timed events to control lights, sometimes an excessive repetition of the command is broadcasted. It occurs about once every millisecond. Reloading the router gets things back to normal. The system may work several days OK yet sometimes this erroneous function takes place within a day.


The DCE router log shows (this goes on until reloading):
Code: [Select]
08      10/27/10 0:30:01.565            Received Message from 12 (Event Plug-in / ) to 503 (Lights (dim) - MH / ), type 1 id 193 Command:Off, retry none, parameters: <0x9f7fab90>
08      10/27/10 0:30:01.565              Parameter 97(PK_Pipe):  <0x9f7fab90>
08      10/27/10 0:30:01.566            Received Message from 12 (Event Plug-in / ) to 503 (Lights (dim) - MH / ), type 1 id 193 Command:Off, retry none, parameters: <0x9f7fab90>
08      10/27/10 0:30:01.566              Parameter 97(PK_Pipe):  <0x9f7fab90>
08      10/27/10 0:30:01.568            Received Message from 12 (Event Plug-in / ) to 503 (Lights (dim) - MH / ), type 1 id 193 Command:Off, retry none, parameters: <0x9f7fab90>

I hope this helps in finding out what is going on.

/Kooma

iirc this is a problem with the event plugin and not the zwave driver.. but i cannot remember the exact thread where this was discussed..

br Hari
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: hari on October 27, 2010, 12:09:42 pm
Hari,

Thanks for the reply.  With the full debugging turned on, I'm seeing some weird things.  For example, I see the ZWave dongle send a message to the driver at the same time the ZWave driver sends a message to the dongle.  This is causing an issue as in Serial.cpp line 95 all the data in the input buffers is flushed when a message is transmitted to the dongle.  This causes a message from the dongle to be missed in the driver and sometimes winds up in a ERROR! Out of frame flow!! message.
I'm not sure if I can follow..

Quote
Is there a reason for flushing the input buffer like that?  I'm hesitant to touch that code because it might be there to handle a special case with another ZWave dongle.
no afaict there is no special case for other dongles. I was just using the existing serial routines as I was too lazy to write my own...

br Hari
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: hari on October 27, 2010, 12:19:43 pm
regarding the patches, please visit us in #linuxmce-devel to get your svn account set up. Possy will create the account with your password hash.

br Hari
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: davegravy on October 27, 2010, 02:46:45 pm
I get the ZW_send periodically with the Aeon Labs Z-stick. See thread http://forum.linuxmce.org/index.php?topic=9739.0 (http://forum.linuxmce.org/index.php?topic=9739.0)

Unplugging my other USB devices consistently fixes the problem. The better fix I've found is to unplug the Aeon, delete the ZWave devices from webadmin, reboot, replug the Aeon, and then go through the setup wizard from scratch. I don't claim to understand why either of these procedures work.

It hasn't died on me in a month or so which is the longest it's gone without problems.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 27, 2010, 05:11:36 pm
Unplugging my other USB devices consistently fixes the problem.

Davegravy,

Which other USB devices do you have?  Maybe we have one in common that is causing this issue.  I've got the USB_UIRT, Hauppauge HD-PVR, Gyration GYR4101US in addition to my Tricklestar ZWave.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: davegravy on October 27, 2010, 05:55:23 pm
Unplugging my other USB devices consistently fixes the problem.

Davegravy,

Which other USB devices do you have?  Maybe we have one in common that is causing this issue.  I've got the USB_UIRT, Hauppauge HD-PVR, Gyration GYR4101US in addition to my Tricklestar ZWave.

I have the USB_UIRT, Aeon Labs Z-stick, and a USB->Serial RS232 adapter.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 27, 2010, 06:56:14 pm
Hm. I doubt this is due to the USB_UIRT.

When the issue occurs for you, have you tried sending the StatusReport command to the ZWave device?  (device tree->ZWave->Send command to device)  If that works for you, I've got a fix in the works that will cure the issue as soon as it arises.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: davegravy on October 27, 2010, 09:19:33 pm
Never tried it, but I will next time it happens!

I just worry that once we have your fix there will be little motivation to figure out what the exact cause of the problem is. Is there nothing else that can be done to narrow this down without wireless diagnostic equipment?
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 27, 2010, 09:39:44 pm
I just worry that once we have your fix there will be little motivation to figure out what the exact cause of the problem is. Is there nothing else that can be done to narrow this down without wireless diagnostic equipment?

I have the same feeling.  This is a quick hack and might mask the real issue.  I was hoping to see something in the kernel logging that would indicate why this happens, but from what I could see the ZWave dongle was getting and properly understanding the commands the driver was sending.

You're welcome to take a look at those logs.  I'd start with the ZWave.log file, search for the error and note the time that it occurred.  Then open up the kern.log and search for events coming from the kernel around that same time.

The only other theory that I have is that maybe the dongle can't handle multiple requests coming in at the same time.  I noticed that when the ZWave driver does its polling it sends a broadcast message and doesn't wait for all the individual devices to reply before issuing the next command.  I'll try putting a delay between the messages and see if that helps.  Just takes a while to test because I don't know how to predictably reproduce the error.

I'd be curious to know if there is anyone who has turned off the polling feature in the ZWave driver and is still getting these issues.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 28, 2010, 12:46:50 am
davegravy,

Do you still have any logs around with the ZW_SEND could not be delivered to Z-Wave stack error message?  If so, I'd be interested in seeing them.  I'm really interested in the messages that are sent just prior to you seeing the first ZW_SEND delivery error popping up in your log.  If you can get me that, and maybe the previous 50 or so lines that would be perfect.

Maybe that would help me focus in on the issue.

Here is a sample from my log
Code: [Select]
36 10/26/10 21:20:24.189 We have been idle for 30 seconds, polling device states <0xb7251b90>
40 10/26/10 21:20:24.288 Sending job 0x952bde0 (cb 73) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x49 0x76 (###### ##Iv) <0xb7251b90>
41 10/26/10 21:20:24.300 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:24.300 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:24.317 0x1 0x5 0x0 0x13 0x49 0x0 0xa0 (####I##) <0xb7251b90>
36 10/26/10 21:20:24.317 ZW_SEND Response with callback 73 received <0xb7251b90>
36 10/26/10 21:20:24.317 ZW_SEND was successful, removing job <0xb7251b90>
40 10/26/10 21:20:24.417 Sending job 0x952c000 (cb 74) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x4a 0x13 (######`## ##J#) <0xb7251b90>
41 10/26/10 21:20:24.426 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:24.426 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:24.449 0x1 0x5 0x0 0x13 0x4a 0x0 0xa3 (####J##) <0xb7251b90>
36 10/26/10 21:20:24.449 ZW_SEND Response with callback 74 received <0xb7251b90>
36 10/26/10 21:20:24.449 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:24.477 0x1 0x9 0x0 0x4 0x0 0x7 0x3 0x20 0x3 0xff 0x2a (####### ##*) <0xb7251b90>
36 10/26/10 21:20:24.477 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.477 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.477 Got basic report from node 7, value: 255 <0xb7251b90>
41 10/26/10 21:20:24.552 0x1 0x9 0x0 0x4 0x0 0x4 0x3 0x20 0x3 0x63 0xb5 (####### #c#) <0xb7251b90>
36 10/26/10 21:20:24.552 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.552 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.552 Got basic report from node 4, value: 99 <0xb7251b90>
40 10/26/10 21:20:24.652 Sending job 0x952c220 (cb 75) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x4b 0x63 (######1##Kc) <0xb7251b90>
41 10/26/10 21:20:24.656 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:24.656 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:24.680 0x1 0x5 0x0 0x13 0x4b 0x0 0xa2 (####K##) <0xb7251b90>
36 10/26/10 21:20:24.680 ZW_SEND Response with callback 75 received <0xb7251b90>
36 10/26/10 21:20:24.680 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:24.740 0x1 0x9 0x0 0x4 0x0 0x3 0x3 0x20 0x3 0x0 0xd1 (####### ###) <0xb7251b90>
36 10/26/10 21:20:24.740 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.740 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.740 Got basic report from node 3, value: 0 <0xb7251b90>
41 10/26/10 21:20:24.784 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:24.784 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.784 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.784 Got basic report from node 6, value: 0 <0xb7251b90>
36 10/26/10 21:20:54.884 We have been idle for 30 seconds, polling device states <0xb7251b90>
40 10/26/10 21:20:54.986 Sending job 0x952c440 (cb 76) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4c 0x73 (###### ##Ls) <0xb7251b90>
41 10/26/10 21:20:54.993 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:54.993 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:55.012 0x1 0x5 0x0 0x13 0x4c 0x0 0xa5 (####L##) <0xb7251b90>
36 10/26/10 21:20:55.012 ZW_SEND Response with callback 76 received <0xb7251b90>
36 10/26/10 21:20:55.012 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:55.097 0x1 0x9 0x0 0x4 0x0 0x4 0x3 0x20 0x3 0x63 0xb5 (####### #c#) <0xb7251b90>
36 10/26/10 21:20:55.097 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:55.097 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:55.097 Got basic report from node 4, value: 99 <0xb7251b90>
41 10/26/10 21:20:55.180 0x1 0x9 0x0 0x4 0x0 0x3 0x3 0x20 0x3 0x0 0xd1 (####### ###) <0xb7251b90>
36 10/26/10 21:20:55.180 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:55.180 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:55.180 Got basic report from node 3, value: 0 <0xb7251b90>
40 10/26/10 21:20:55.280 Sending job 0x952c660 (cb 77) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x4d 0x14 (######`## ##M#) <0xb7251b90>
41 10/26/10 21:20:55.288 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:55.288 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:55.309 0x1 0x5 0x0 0x13 0x4d 0x0 0xa4 (####M##) <0xb7251b90>
36 10/26/10 21:20:55.309 ZW_SEND Response with callback 77 received <0xb7251b90>
36 10/26/10 21:20:55.309 ZW_SEND was successful, removing job <0xb7251b90>
40 10/26/10 21:20:55.408 Sending job 0x952c880 (cb 78) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x4e 0x66 (######1##Nf) <0xb7251b90>
41 10/26/10 21:20:55.417 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:55.417 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:55.436 0x1 0x5 0x0 0x13 0x4e 0x0 0xa7 (####N##) <0xb7251b90>
36 10/26/10 21:20:55.436 ZW_SEND Response with callback 78 received <0xb7251b90>
36 10/26/10 21:20:55.436 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:55.572 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:55.572 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:55.572 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:55.572 Got basic report from node 6, value: 0 <0xb7251b90>
41 10/26/10 21:20:56.261 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:56.261 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:56.261 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:56.261 Got basic report from node 6, value: 0 <0xb7251b90>
41 10/26/10 21:20:56.972 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:56.972 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:56.972 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:56.972 Got basic report from node 6, value: 0 <0xb7251b90>
36 10/26/10 21:21:27.117 We have been idle for 30 seconds, polling device states <0xb7251b90>
40 10/26/10 21:21:27.217 Sending job 0x952caa0 (cb 79) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4f 0x70 (###### ##Op) <0xb7251b90>
41 10/26/10 21:21:27.225 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:21:27.225 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
05 10/26/10 21:21:30.333 No callback received: await_callback: 79 timer: 31 <0xb7251b90>
40 10/26/10 21:21:30.433 Sending job 0x952caa0 (cb 79) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4f 0x70 (###### ##Op) <0xb7251b90>
41 10/26/10 21:21:30.441 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:21:30.441 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
05 10/26/10 21:21:33.544 No callback received: await_callback: 79 timer: 31 <0xb7251b90>
40 10/26/10 21:21:33.644 Sending job 0x952caa0 (cb 79) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4f 0x70 (###### ##Op) <0xb7251b90>
41 10/26/10 21:21:33.648 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:21:33.648 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
05 10/26/10 21:21:36.749 No callback received: await_callback: 79 timer: 31 <0xb7251b90>
01 10/26/10 21:21:36.749 ERROR: Dropping command, no callback received after three resends <0xb7251b90>
40 10/26/10 21:21:36.848 Sending job 0x952ccc0 (cb 80) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x50 0x9 (######`## ##P#) <0xb7251b90>
41 10/26/10 21:21:36.854 0x1 0x4 0x1 0x13 0x0 0xe9 (######) <0xb7251b90>
36 10/26/10 21:21:36.854 ERROR: ZW_SEND could not be delivered to Z-Wave stack <0xb7251b90>
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: davegravy on October 28, 2010, 04:41:58 pm
davegravy,

Do you still have any logs around with the ZW_SEND could not be delivered to Z-Wave stack error message?  If so, I'd be interested in seeing them.  I'm really interested in the messages that are sent just prior to you seeing the first ZW_SEND delivery error popping up in your log.  If you can get me that, and maybe the previous 50 or so lines that would be perfect.

Maybe that would help me focus in on the issue.

I unfortunately don't have the logs any more, but I will have a go at trying to recreate the problem so I can get you new logs.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 28, 2010, 05:29:54 pm
Just had a small success.  On a hunch, I increased the time the driver waits for a callback from 3 seconds to 30 seconds.  Turns out that when I get this error, the controller/dongle actually takes just over 5 seconds to respond.  Here is the log:

Code: [Select]
36 10/28/10 11:05:28.257 We have been idle for 30 seconds, polling device states <0xb7116b90>
40 10/28/10 11:05:28.357 Sending job 0x9ce5320 (cb 1) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x1 0x3e (###### ###>) <0xb7116b90>
41 10/28/10 11:05:28.368 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7116b90>
36 10/28/10 11:05:28.368 ZW_SEND delivered to Z-Wave stack <0xb7116b90>
01 10/28/10 11:05:31.369 No callback received after 3 seconds, waiting a little longer <0xb7116b90>
41 10/28/10 11:05:33.132 0x1 0x6 0x0 0x49 0x81 0x0 0x0 0x31 (###I###1) <0xb7116b90>
36 10/28/10 11:05:33.132 FUNC_ID_ZW_APPLICATION_UPDATE:UPDATE_STATE_NODE_INFO_REQ_FAILED received <0xb7116b90>
36 10/28/10 11:05:33.132 Generic callback handling for command 73, ERROR: wrong callback type: 19 <0xb7116b90>
41 10/28/10 11:05:33.404 0x1 0x6 0x0 0x49 0x81 0x0 0x0 0x31 (###I###1) <0xb7116b90>
36 10/28/10 11:05:33.404 FUNC_ID_ZW_APPLICATION_UPDATE:UPDATE_STATE_NODE_INFO_REQ_FAILED received <0xb7116b90>
36 10/28/10 11:05:33.404 Generic callback handling for command 73, ERROR: wrong callback type: 19 <0xb7116b90>
41 10/28/10 11:05:33.424 0x1 0x5 0x0 0x13 0x1 0x0 0xe8 (#######) <0xb7116b90>
36 10/28/10 11:05:33.424 ZW_SEND Response with callback 1 received <0xb7116b90>
36 10/28/10 11:05:33.424 ZW_SEND was successful, removing job <0xb7116b90>

See the log in my previous message to see what happens when the driver times out after 3 seconds.  I think maybe if the driver times out after 3 seconds, and tries to resend, it might be causing the controller/dongle to stop accepting new requests - requiring a soft-reset to get it un-stuck.

Hari, or anyone more familiar with the ZWave protocol, does this make sense?  Also, do you know what the FUNC_ID_ZW_APPLICATION_UPDATE:UPDATE_STATE_NODE_INFO_REQ_FAILED response means?

BTW: Getting rid of the buffer flush at the end of the Serial Write routine cleared up all my Out of frame flow errors.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: esev on October 30, 2010, 09:19:56 pm
Darn.  That didn't fix it.  Back to the same old locked up dongle.

Hari explained a lot about how Z-Wave is supposed to work on IRC yesterday (thanks again Hari).  He explained that ZWave controllers/dongles aren't supposed to take any longer than about 2-3 seconds to respond.  So it looks like our controllers are not behaving correctly.

I've created a Trac ticket for this issue (http://svn.linuxmce.org/trac.cgi/ticket/874).  I'll be submitting the work-around that was discussed before - to soft-reset the controller if more than three consecutive commands fail - this evening after I've completed my testing.  I'll keep the ticket open for another few weeks just in case we find another solution, but unfortunately it looks like we have some bad hardware.
Title: Re: ZWave ERROR: Dropping command, no callback received after three resends
Post by: tortho on December 13, 2010, 02:24:18 pm
Related? http://forum.linuxmce.org/index.php?topic=11116.0