Author Topic: ZWave ERROR: Dropping command, no callback received after three resends  (Read 12285 times)

esev

  • Veteran
  • ***
  • Posts: 87
    • View Profile
    • Eric Severance
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #15 on: October 27, 2010, 06:56:14 pm »
Hm. I doubt this is due to the USB_UIRT.

When the issue occurs for you, have you tried sending the StatusReport command to the ZWave device?  (device tree->ZWave->Send command to device)  If that works for you, I've got a fix in the works that will cure the issue as soon as it arises.
Eric Severance
My setup

davegravy

  • Addicted
  • *
  • Posts: 551
    • View Profile
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #16 on: October 27, 2010, 09:19:33 pm »
Never tried it, but I will next time it happens!

I just worry that once we have your fix there will be little motivation to figure out what the exact cause of the problem is. Is there nothing else that can be done to narrow this down without wireless diagnostic equipment?

esev

  • Veteran
  • ***
  • Posts: 87
    • View Profile
    • Eric Severance
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #17 on: October 27, 2010, 09:39:44 pm »
I just worry that once we have your fix there will be little motivation to figure out what the exact cause of the problem is. Is there nothing else that can be done to narrow this down without wireless diagnostic equipment?

I have the same feeling.  This is a quick hack and might mask the real issue.  I was hoping to see something in the kernel logging that would indicate why this happens, but from what I could see the ZWave dongle was getting and properly understanding the commands the driver was sending.

You're welcome to take a look at those logs.  I'd start with the ZWave.log file, search for the error and note the time that it occurred.  Then open up the kern.log and search for events coming from the kernel around that same time.

The only other theory that I have is that maybe the dongle can't handle multiple requests coming in at the same time.  I noticed that when the ZWave driver does its polling it sends a broadcast message and doesn't wait for all the individual devices to reply before issuing the next command.  I'll try putting a delay between the messages and see if that helps.  Just takes a while to test because I don't know how to predictably reproduce the error.

I'd be curious to know if there is anyone who has turned off the polling feature in the ZWave driver and is still getting these issues.
Eric Severance
My setup

esev

  • Veteran
  • ***
  • Posts: 87
    • View Profile
    • Eric Severance
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #18 on: October 28, 2010, 12:46:50 am »
davegravy,

Do you still have any logs around with the ZW_SEND could not be delivered to Z-Wave stack error message?  If so, I'd be interested in seeing them.  I'm really interested in the messages that are sent just prior to you seeing the first ZW_SEND delivery error popping up in your log.  If you can get me that, and maybe the previous 50 or so lines that would be perfect.

Maybe that would help me focus in on the issue.

Here is a sample from my log
Code: [Select]
36 10/26/10 21:20:24.189 We have been idle for 30 seconds, polling device states <0xb7251b90>
40 10/26/10 21:20:24.288 Sending job 0x952bde0 (cb 73) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x49 0x76 (###### ##Iv) <0xb7251b90>
41 10/26/10 21:20:24.300 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:24.300 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:24.317 0x1 0x5 0x0 0x13 0x49 0x0 0xa0 (####I##) <0xb7251b90>
36 10/26/10 21:20:24.317 ZW_SEND Response with callback 73 received <0xb7251b90>
36 10/26/10 21:20:24.317 ZW_SEND was successful, removing job <0xb7251b90>
40 10/26/10 21:20:24.417 Sending job 0x952c000 (cb 74) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x4a 0x13 (######`## ##J#) <0xb7251b90>
41 10/26/10 21:20:24.426 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:24.426 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:24.449 0x1 0x5 0x0 0x13 0x4a 0x0 0xa3 (####J##) <0xb7251b90>
36 10/26/10 21:20:24.449 ZW_SEND Response with callback 74 received <0xb7251b90>
36 10/26/10 21:20:24.449 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:24.477 0x1 0x9 0x0 0x4 0x0 0x7 0x3 0x20 0x3 0xff 0x2a (####### ##*) <0xb7251b90>
36 10/26/10 21:20:24.477 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.477 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.477 Got basic report from node 7, value: 255 <0xb7251b90>
41 10/26/10 21:20:24.552 0x1 0x9 0x0 0x4 0x0 0x4 0x3 0x20 0x3 0x63 0xb5 (####### #c#) <0xb7251b90>
36 10/26/10 21:20:24.552 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.552 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.552 Got basic report from node 4, value: 99 <0xb7251b90>
40 10/26/10 21:20:24.652 Sending job 0x952c220 (cb 75) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x4b 0x63 (######1##Kc) <0xb7251b90>
41 10/26/10 21:20:24.656 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:24.656 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:24.680 0x1 0x5 0x0 0x13 0x4b 0x0 0xa2 (####K##) <0xb7251b90>
36 10/26/10 21:20:24.680 ZW_SEND Response with callback 75 received <0xb7251b90>
36 10/26/10 21:20:24.680 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:24.740 0x1 0x9 0x0 0x4 0x0 0x3 0x3 0x20 0x3 0x0 0xd1 (####### ###) <0xb7251b90>
36 10/26/10 21:20:24.740 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.740 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.740 Got basic report from node 3, value: 0 <0xb7251b90>
41 10/26/10 21:20:24.784 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:24.784 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:24.784 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:24.784 Got basic report from node 6, value: 0 <0xb7251b90>
36 10/26/10 21:20:54.884 We have been idle for 30 seconds, polling device states <0xb7251b90>
40 10/26/10 21:20:54.986 Sending job 0x952c440 (cb 76) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4c 0x73 (###### ##Ls) <0xb7251b90>
41 10/26/10 21:20:54.993 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:54.993 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:55.012 0x1 0x5 0x0 0x13 0x4c 0x0 0xa5 (####L##) <0xb7251b90>
36 10/26/10 21:20:55.012 ZW_SEND Response with callback 76 received <0xb7251b90>
36 10/26/10 21:20:55.012 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:55.097 0x1 0x9 0x0 0x4 0x0 0x4 0x3 0x20 0x3 0x63 0xb5 (####### #c#) <0xb7251b90>
36 10/26/10 21:20:55.097 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:55.097 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:55.097 Got basic report from node 4, value: 99 <0xb7251b90>
41 10/26/10 21:20:55.180 0x1 0x9 0x0 0x4 0x0 0x3 0x3 0x20 0x3 0x0 0xd1 (####### ###) <0xb7251b90>
36 10/26/10 21:20:55.180 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:55.180 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:55.180 Got basic report from node 3, value: 0 <0xb7251b90>
40 10/26/10 21:20:55.280 Sending job 0x952c660 (cb 77) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x4d 0x14 (######`## ##M#) <0xb7251b90>
41 10/26/10 21:20:55.288 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:55.288 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:55.309 0x1 0x5 0x0 0x13 0x4d 0x0 0xa4 (####M##) <0xb7251b90>
36 10/26/10 21:20:55.309 ZW_SEND Response with callback 77 received <0xb7251b90>
36 10/26/10 21:20:55.309 ZW_SEND was successful, removing job <0xb7251b90>
40 10/26/10 21:20:55.408 Sending job 0x952c880 (cb 78) - 0x1 0x9 0x0 0x13 0xff 0x2 0x31 0x4 0x5 0x4e 0x66 (######1##Nf) <0xb7251b90>
41 10/26/10 21:20:55.417 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:20:55.417 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
41 10/26/10 21:20:55.436 0x1 0x5 0x0 0x13 0x4e 0x0 0xa7 (####N##) <0xb7251b90>
36 10/26/10 21:20:55.436 ZW_SEND Response with callback 78 received <0xb7251b90>
36 10/26/10 21:20:55.436 ZW_SEND was successful, removing job <0xb7251b90>
41 10/26/10 21:20:55.572 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:55.572 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:55.572 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:55.572 Got basic report from node 6, value: 0 <0xb7251b90>
41 10/26/10 21:20:56.261 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:56.261 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:56.261 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:56.261 Got basic report from node 6, value: 0 <0xb7251b90>
41 10/26/10 21:20:56.972 0x1 0x9 0x0 0x4 0x0 0x6 0x3 0x20 0x3 0x0 0xd4 (####### ###) <0xb7251b90>
36 10/26/10 21:20:56.972 FUNC_ID_APPLICATION_COMMAND_HANDLER: <0xb7251b90>
36 10/26/10 21:20:56.972 COMMAND_CLASS_BASIC -  <0xb7251b90>
36 10/26/10 21:20:56.972 Got basic report from node 6, value: 0 <0xb7251b90>
36 10/26/10 21:21:27.117 We have been idle for 30 seconds, polling device states <0xb7251b90>
40 10/26/10 21:21:27.217 Sending job 0x952caa0 (cb 79) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4f 0x70 (###### ##Op) <0xb7251b90>
41 10/26/10 21:21:27.225 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:21:27.225 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
05 10/26/10 21:21:30.333 No callback received: await_callback: 79 timer: 31 <0xb7251b90>
40 10/26/10 21:21:30.433 Sending job 0x952caa0 (cb 79) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4f 0x70 (###### ##Op) <0xb7251b90>
41 10/26/10 21:21:30.441 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:21:30.441 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
05 10/26/10 21:21:33.544 No callback received: await_callback: 79 timer: 31 <0xb7251b90>
40 10/26/10 21:21:33.644 Sending job 0x952caa0 (cb 79) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x4f 0x70 (###### ##Op) <0xb7251b90>
41 10/26/10 21:21:33.648 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7251b90>
36 10/26/10 21:21:33.648 ZW_SEND delivered to Z-Wave stack <0xb7251b90>
05 10/26/10 21:21:36.749 No callback received: await_callback: 79 timer: 31 <0xb7251b90>
01 10/26/10 21:21:36.749 ERROR: Dropping command, no callback received after three resends <0xb7251b90>
40 10/26/10 21:21:36.848 Sending job 0x952ccc0 (cb 80) - 0x1 0xc 0x0 0x13 0xff 0x5 0x60 0x6 0x2 0x20 0x2 0x5 0x50 0x9 (######`## ##P#) <0xb7251b90>
41 10/26/10 21:21:36.854 0x1 0x4 0x1 0x13 0x0 0xe9 (######) <0xb7251b90>
36 10/26/10 21:21:36.854 ERROR: ZW_SEND could not be delivered to Z-Wave stack <0xb7251b90>
Eric Severance
My setup

davegravy

  • Addicted
  • *
  • Posts: 551
    • View Profile
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #19 on: October 28, 2010, 04:41:58 pm »
davegravy,

Do you still have any logs around with the ZW_SEND could not be delivered to Z-Wave stack error message?  If so, I'd be interested in seeing them.  I'm really interested in the messages that are sent just prior to you seeing the first ZW_SEND delivery error popping up in your log.  If you can get me that, and maybe the previous 50 or so lines that would be perfect.

Maybe that would help me focus in on the issue.

I unfortunately don't have the logs any more, but I will have a go at trying to recreate the problem so I can get you new logs.

esev

  • Veteran
  • ***
  • Posts: 87
    • View Profile
    • Eric Severance
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #20 on: October 28, 2010, 05:29:54 pm »
Just had a small success.  On a hunch, I increased the time the driver waits for a callback from 3 seconds to 30 seconds.  Turns out that when I get this error, the controller/dongle actually takes just over 5 seconds to respond.  Here is the log:

Code: [Select]
36 10/28/10 11:05:28.257 We have been idle for 30 seconds, polling device states <0xb7116b90>
40 10/28/10 11:05:28.357 Sending job 0x9ce5320 (cb 1) - 0x1 0x9 0x0 0x13 0xff 0x2 0x20 0x2 0x5 0x1 0x3e (###### ###>) <0xb7116b90>
41 10/28/10 11:05:28.368 0x1 0x4 0x1 0x13 0x1 0xe8 (######) <0xb7116b90>
36 10/28/10 11:05:28.368 ZW_SEND delivered to Z-Wave stack <0xb7116b90>
01 10/28/10 11:05:31.369 No callback received after 3 seconds, waiting a little longer <0xb7116b90>
41 10/28/10 11:05:33.132 0x1 0x6 0x0 0x49 0x81 0x0 0x0 0x31 (###I###1) <0xb7116b90>
36 10/28/10 11:05:33.132 FUNC_ID_ZW_APPLICATION_UPDATE:UPDATE_STATE_NODE_INFO_REQ_FAILED received <0xb7116b90>
36 10/28/10 11:05:33.132 Generic callback handling for command 73, ERROR: wrong callback type: 19 <0xb7116b90>
41 10/28/10 11:05:33.404 0x1 0x6 0x0 0x49 0x81 0x0 0x0 0x31 (###I###1) <0xb7116b90>
36 10/28/10 11:05:33.404 FUNC_ID_ZW_APPLICATION_UPDATE:UPDATE_STATE_NODE_INFO_REQ_FAILED received <0xb7116b90>
36 10/28/10 11:05:33.404 Generic callback handling for command 73, ERROR: wrong callback type: 19 <0xb7116b90>
41 10/28/10 11:05:33.424 0x1 0x5 0x0 0x13 0x1 0x0 0xe8 (#######) <0xb7116b90>
36 10/28/10 11:05:33.424 ZW_SEND Response with callback 1 received <0xb7116b90>
36 10/28/10 11:05:33.424 ZW_SEND was successful, removing job <0xb7116b90>

See the log in my previous message to see what happens when the driver times out after 3 seconds.  I think maybe if the driver times out after 3 seconds, and tries to resend, it might be causing the controller/dongle to stop accepting new requests - requiring a soft-reset to get it un-stuck.

Hari, or anyone more familiar with the ZWave protocol, does this make sense?  Also, do you know what the FUNC_ID_ZW_APPLICATION_UPDATE:UPDATE_STATE_NODE_INFO_REQ_FAILED response means?

BTW: Getting rid of the buffer flush at the end of the Serial Write routine cleared up all my Out of frame flow errors.
Eric Severance
My setup

esev

  • Veteran
  • ***
  • Posts: 87
    • View Profile
    • Eric Severance
Re: ZWave ERROR: Dropping command, no callback received after three resends
« Reply #21 on: October 30, 2010, 09:19:56 pm »
Darn.  That didn't fix it.  Back to the same old locked up dongle.

Hari explained a lot about how Z-Wave is supposed to work on IRC yesterday (thanks again Hari).  He explained that ZWave controllers/dongles aren't supposed to take any longer than about 2-3 seconds to respond.  So it looks like our controllers are not behaving correctly.

I've created a Trac ticket for this issue.  I'll be submitting the work-around that was discussed before - to soft-reset the controller if more than three consecutive commands fail - this evening after I've completed my testing.  I'll keep the ticket open for another few weeks just in case we find another solution, but unfortunately it looks like we have some bad hardware.
Eric Severance
My setup