Voice Commands

fearingsept · January 15, 2009, 06:21:12 PM

Does LMCE have the capability to take voice commands? If not, is there some kind of open software that can be integrated?

jondecker76 · January 15, 2009, 06:53:00 PM

Voice commands are not supported currently.

I don't know what all software is available for this, but my eeePC has voice command that works pretty well - that would be a good place to at least start looking

colinjones · January 16, 2009, 01:10:47 AM

voice recognition has been discussed several times before, but I don't think much came of it... there were several options put out there, and some of them by people who clearly have a lot of background in the subject...

tschak909 · January 16, 2009, 04:47:08 AM

This was tried by Pluto, as early as 2005, with the earliest versions of the Version 2 codebase.

CMU Sphinx II was used as the test engine, and if you look in the source tree, you'll find code related to this effort.

I have experience in this type of work.. so what i'm going to tell you won't be easy to swallow...

Short answer, it works maybe 30% of the time... and maybe 60% of the time with a highly optimized setup.

To make it work better would require a localised microphone such as a headset...

HOWEVER,

given the large amount of codec processing that happens as a result of using something such as a bluetooth headset, the resulting waveform will not be accurately matched by the hidden markov models (HMM) currently present in Sphinx, and thus a new model will have to be created (difficult), alongside a domain specific corpus (easy compared to the former, but user interface issues must be considered.)

This is possible. But it's going to take some adventurous hackers to do it. Come on guys, who of you will take the challenge?

-Thom

bulek · January 16, 2009, 09:20:07 AM

Hi,

I think that currently only feasible voice recognition would be to try LumeVox in integration with Asterisk.... It not so expensive, they support Asterisk, you can make nice voice interactive scenarios with Asterisk...

With such situation you can develope nice recognition scenarios beyond sitting at the phone... For instance, you can have bluetooth or other mic attached to you, you can do it with N800 embedded sip client, you can use embedded phones on MD with high quality mics etc...

But beware - speech recognition is not your human friend that understands you all the way and gives you cold beer when you feel sad...I'm about to try speech recognition and will certainly start with described scenario with Asterisk in mind...

Someone in MH community did something similar with open source SW called PocketSphinx (descendant from SphinxII)....Since it's done with Asterisk, steps for LMCE will be similar - you just need to add a way to trigger DCE from there ...

http://www.nabble.com/Misterhouse-Asterisk-%2B-Pocketsphinx-HOWTO-td16800602.html
http://www.nabble.com/Issues-with-Latest-Releases-of-CMU-Pocketsphinx-td20870291.html
http://www.nabble.com/Misterhouse-Asterisk-%2B-Pocketsphinx-HOWTO-td16800602.html#a16800602

Another option might be using "simon-listens" project.... Also Open source...

HTH,

regards,

Bulek.

tschak909 · January 16, 2009, 04:09:27 PM

That would get you started. Sphinx4 is also an option...

But you still need to deal with the accuracy problem.

-Thom

krys · January 16, 2009, 09:28:43 PM

Would it be fairly easy to just have something as simple as an on command that is recognized? that way when you walk in a room you could just say on or wake up and the lights come up and tv kicks on?
I personally dont ever see myself wanting to use voice commands to control media playback (unless it WAS like talking to another human) but I think it would be cool to just walk in a room and say on or walk out and say off, and it seems like it would simplify the setup also.

tschak909 · January 16, 2009, 10:59:11 PM

you misunderstand the nature of the problem....

getting the system to RESPOND TO COMMANDS is the EASY PART. Look at the Speech code for examples of this.

getting the system to consistently recognise what you're saying, no matter where you are in the room and no matter what the ambient noise is the HARD PART.

current statistical reduction models still have difficulty discerning stochastic (seemingly random) elements in the input signal (random noises, things that almost sound like speech, hums on fans, etc.).. and thus require massive amounts of model work.

This isn't like sitting down with a headset.

-Thom

LinuxMCE Forums

News:

Voice Commands

fearingsept

jondecker76

colinjones

tschak909

bulek

tschak909

krys

tschak909