3. If the core dcerouter crashes then the media players and orbiters etc should continue running and try to reconnect in the background. The current behaviour has the effect of making the system quite infuriating sometimes (and as a result I have moved all media playback to dedicated devices - such as Xtreamers - under IR control)
Generally in our experience MD's will auto re-connect if the Core locks and has to be restarted. It has to be said that it is very rare for even our development Core's to crash/lockup. However iff you use alpha/beta code in production systems then you have to expect instability and issues...helping to document/fix those with the Dev's is what using these incomplete builds is all about.
I think what chrisbirkinshaw is talking about on a higher level is making the different pieces of the system less dependent on each other whenever possible. The idea being that if one device or system goes "boom" others are not affected or maybe some other device or system takes over the dead one's work. In theory, I agree with this idea of a self-healing or redundant system. I wonder which pieces can be set up to act in this way and how much effort it would take in each case.
In my experience, if the dcerouter goes down, playback stops. This is happening much less frequently than it used to (0710 to current 0810 builds), but I don't know if this is due to dcerouter improvements, or just because everything that talks to the dcerouter is becoming more stable. In any case - it seems to me that media playback does not necessarily need to be so dependent on the dcerouter. Of course features like controlling playback from orbiters would be affected if the dcerouter dropped out, but not the basic playback. When dcerouter comes back online, it can tap into whatever is going on and update its state.
There is also the issue of notifying the human when something has failed. It's important to know when something has gone wrong, without a doubt. This system is supposed to secure my home - it can't fail silently. Currently, most failures require us to dig through logs to figure out what devices are affected. I think someone was working on an e-mail message event responder - that might be one good way to get the word out. In order to make this robust, there would need to be a separate entity in charge of making sure everything is running properly - one that is not associated with anything that can fail. That probably means that it could not use DCE, or at least not have its messages routed through dcerouter.
How do those of you who use the security and telecomm features approach reliability checks and error reporting? Or do you just "trust it"?