Voice Recognition and Scene Creation

archived · August 15, 2005, 06:15:52 AM

Hi,

I've been using MythTV for 12 months and have recently started playing with X10 products and Mister House. One of the things I love about Mister House is the voice recognition and AT&T natural voices support.

It's quite easy (well, easy for me) to setup and create scenes in Mister House that respond to X10 and/or voice commands.

Does (or will) Pluto have similar facilities?

archived · August 16, 2005, 07:33:24 PM

we've had lots of requests and comments about the voice recognition and speech synthesis. We already include a speech synthesis module for Festival. It is a super simple wrapper, it's probably only 10 lines of code or so. so adding another module for AT&T natural voices is probably also a trivial task.

as far as speech recognition goes, although hooks are there, and we added a wrapper for Sphinx, an open-source voice recognition system. The problem was we found it was not very accurate particularly in a room with a microphone a few meters away. We also looked into a solution from nuance, a commercial product. Nuance worked extremely well, but the licensing fee was also very high (in the millions). So until we are able to push a very large volume, the nuance solution is not practical at the moment.

with Mr. House, I assume you're using Microsoft's Windows speech SDK? We did some experiments with that in-house as well, and found that although it did work better than sphinx, it was still lot in the same league as nuance.

archived · August 16, 2005, 07:38:22 PM

By the way, as far as the speech-recognition, we flew in a speech recognition expert to work with us for about one week. His whole company is deploying speech recognition systems for companies and commercial applications, and doing usability studies. And after the week or so, we were never able to get more than 90% recognition using the open source SR software and a sub $1000 mic solution. so we did put a lot of effort into it.

please let me know what solution you found that you are satisfied with.

archived · August 19, 2005, 11:30:43 PM

Quote from: "aaron.b"The problem was we found it was not very accurate particularly in a room with a microphone a few meters away.

VR is more than a technology problem, it's a physics problem. The problems with your "room with a microphone a few meters away" could have been:

* Reflectivity of floors and/or walls -- hardwood floors, for example, bounce sound like crazy.

* Actual distance to the mic element -- sometimes you need to be closer to the mic than you wish; that's the life of someone who wants VR... you need to learn your own setup and deal with it until things improve.

* Quality and installation of the microphone. Expensive doesn't mean good, you need the right kind of element. A PZM-11 is a phenomenal VR microphone but only costs $100 or so. You also need proper wiring and equipment. For example, you might need preamps, mixers, etc. Many people use the Shure SCM-810 or the Gentner AP-800.

I guess my point is that there are people who can get VR to work acceptably for them. It takes some work, money, and most importantly, a lot of honesty that you are not going to have that house that responds to you from across the room while you're playing the radio and asking for it to unlock the front doors.... it's going to work, but not the way it does in the movies.

archived · August 22, 2005, 06:00:34 AM

I don't expect voice recognition to work from across the room. There are too many variables to consider (room acoustics, background noise, etc). I also wouldn't expect to use it as the only means for controlling the house (as it would be useless if you had a loud party).

I've only had a brief play with the voice recognition part of Mister House and used a logitech USB headphone/microphone. The headset cost me a whole AU$80 (about USD$50). The voice recognition worked surprisingly well. But that would still have me tied to a PC and I'm looking for something that can work throughout the home.

What I think I need is one of the following:

- An intercom with the output of one unit connected to the PC
- A small RF "walkie talkie" with an output connected to a PC
- A bluetooth headset as used for a mobile phone

The bluetooth headset might be the easy to incorporate into your system since you're already using bluetooth. The intercom and walkie talkie both have "push to talk" which would help eliminate mis-readings from environmental noise when listening to music or watching video. I think the walkie talkie is probably the easiest to implement as it's portable and doesn't require any "installation". The intercom is the least portable of these options, but could be inexpensive to setup. I'd prefer to use one of the other options though and have the system be able use it to provide intercom functionality.

Depending on how the products work, it may even be possible to trigger a relay and get the system to drop the volume of any content when you hit the button (which should eliminate problems due to loud music or video).

Anybody have any better ideas?

archived · August 22, 2005, 06:51:25 AM

It would be nice to have VR intergrated with the asterisk/PBX system.

archived · August 22, 2005, 03:56:04 PM

As of Bluetooth 1.1, there is a profile that allows you to use a BT headset as a PC headset rather than a mobile phone headset. I'm not sure why you wanted a walkie talkie and an intercom, both of which produce significant noise and would make VR much more difficult (you'd probably have to train the VR engine using those devices... ugly).

The problem you'll have with that is that there isn't a clean "handoff".

Cheaper and infinitely more useful is a cordless phone with a headset (potentially even a BT headset if you don't want the wires). Just put the phone on your belt or in your pocket and walk around. If the phone rings, you can answer it. You want to do VR, you might hit the "#" key one or two times -- you could probably get Asterisk to recognize this key sequence and execute a VR prompt.

That way you'd never need to worry about open-air mic systems, mixers, amplifiers, phantom power, etc.

There's a VOIP phone that Uniden makes that's extremely small (like a cellular phone) and uses WiFi. That might be a good place to start.

Dan

archived · August 24, 2005, 12:31:04 PM

Hi,

let me add 2 cents. Speech recognition is not yet ready to do all you can imagine recognition. But it can be useful additional option - mainly for incoming phone calls (instead of pressing DTMF keys just say person you want to talk to (have you tried to push dtmf on mobile while on your ears?), or trigger some action).

There are experiments showing that we should wire microphones all over rooms to make recognition reliable enough - but there is another caveat: how would you feel in a room bundled with microphones (is only your VR listening or some guys from another part of the world too?).

Do you feel comfortable in a room with video cameras? I guess not - same with mics. If we incorporate speech recognition in Asterisk - that could be a good start. If you want, you can grab phone or Wireless headset and talk to system - if not, ok....

I've made some preliminar experiments incorporating speech recognition to Asterisk and posted instructions in another thread.... Let's test and work on this - it's quite easy to get audio out of Asterisk...

HTH,

regards,

Rob.

archived · August 30, 2005, 06:23:39 AM

Seems to me that most orbiter devices, Pocket PC's, certainly mobile orbiters, etc would all have microphones on them. Why not pipe the voice command from the orbiter back to the core for processing? Even if you don't do a stream, just a simple push-to-talk interface that automatically records a wave file and sends it as a DCE attachment? Just a thought. The current Orbiter interface is far from friendly. Even being tied to an orbiter may give an advantage over navigating the menu. Even using your voice to navigate the menu might be useful.

archived · September 10, 2005, 12:15:41 PM

that's a good idea. since all our libraries are cross-platform and you can mix Windows and Linux devices. It would even be possible to have a Windows device to the speech-recognition since it seems this Windows includes a fairly decent speech-recognition engine. regardless of what operating system is used, it would not be a huge challenge to have the orbiters feed their audio to a central device for processing, and the framework already has everything that would be needed. we can add this to the wish list, but it will probably be a while before anybody in the House will be able to get to it since were really swamped taking care of new dealers at the moment.

archived · December 24, 2006, 06:54:34 PM

Hi,

What about the possible inclusion of Lumenvox into Plutohome? The licensing fees are very reasonable, and I think the Lumenvox speech recongnition engine is quite accurate. Check out http://lumenvox.com for more info. There's a section on the site concerning their opensource version (not free, but quite cheap). Also, try out the Pizza demo...it's pretty cool.

Plutohome rocks by the way!!

Mike C.

archived · December 24, 2006, 08:43:59 PM

Mike,

Lumenvox and Neospeech were very impressive. We've been considering offering a low-cost version of Pluto that includes some 3rd party licensed modules (like AMG/Gracenote audio fingerprinting, css decryption, etc.) that we can't include in the free version. We also have some commercial clients that would be interested. These would make great additions. Do you know anybody at Lumenvox and Neospeech that you could set me up with?

Aaron

archived · December 24, 2006, 11:38:54 PM

Quote from: "aaron.b"Mike,

Lumenvox and Neospeech were very impressive. We've been considering offering a low-cost version of Pluto that includes some 3rd party licensed modules (like AMG/Gracenote audio fingerprinting, css decryption, etc.) that we can't include in the free version. We also have some commercial clients that would be interested. These would make great additions. Do you know anybody at Lumenvox and Neospeech that you could set me up with?

Aaron

Hi,

I don't know anyone, but do know that Asterisk 1.4 includes such feature. I think that this could be used for all kind of recognition (you just make a invisible phone call to certain number and recognition is performed on Core). It would be fun to add Simple phone also to Winxp Orbiters and to start with.

I think that restricted dialogues can be made quite effectively to be accurate enough for normal use (maybe some kind of extended IVRs for a start). Remember, a lot of researches show that moderate vocabulary is enough for day-to-day usage of speech recognition.

HTH,

Rob.

archived · December 24, 2006, 11:54:03 PM

Hi Aaron,

Call 1-877-977-0707 and ask to speak with Gerd. I am sorry but I don't have his last name. I've spoken with him before and he's very nice and I think he would be very interested in working with you. If you like, I can try to get in contact with him this week to see if he can setup a conference call with you ( I know you are busy ), or you can call him at Lumenvox. Either way, please keep us on the forum posted as to your progress! I can hear Allison welcoming me home after a hard day at the office..."Good evening Mike, what would you like to watch".

Cool stuff

archived · December 27, 2006, 06:36:44 PM

Mike Chapman contacted me today and suggested to post my contact details, if you would like to get in touch with me to talk about the Asterisk/LumenVox Speech Engine Integration, which might be a good solution for PlutoHome.

For more detailed information, you can visit this link:

http://www.lumenvox.com/partners/integrator/digium/asterisk.aspx

Please let me know, if you have any questions.

Gerd Graumann
Director of Business Development
P: 877-977-0707, just say "Gerd"
F: 858-707-7072
Gerd@LumenVox.com
www.lumenvox.com

LinuxMCE Forums

News:

Voice Recognition and Scene Creation

archived

archived

archived

archived

archived

archived

archived

archived

archived

archived

archived

archived

archived

archived

archived