Page 1 of 1

question from newbie - speech recognition in call-center

PostPosted: Thu Oct 15, 2009 12:45 pm
by johnyjj2
Hello :-)!

I need to create application which behaves as follows:
1. User calls a special number.
2. User talks to server, giving some numbers and additional informations (only digits + about five control commands, however language which I'd like to use is not so popular so it may be difficult to have proper acoustic model for this language. However I can train acoustic model or at least try to use model of English language). I want the user to tell twelve digits, the server to recognize those digits, calculate control sum and answer to the user "code is proper" or "code is improper". So it should be call center with use of speech recognition, not humans.
3. Server saves some text data on its disk, based on speech recognition and communication with the user.

Can it be done with the use of ViciDial?

Thanks very much for your help in advance!
Greetings :-)!

PostPosted: Sat Oct 17, 2009 7:08 pm
by mflorell
What language do you need?

We would usually do something like this in a custom AGI script.

PostPosted: Sun Oct 18, 2009 4:14 am
by johnyjj2
Thanks for your answer :-)!

It is Polish language.

Greetings:)!

PostPosted: Mon Oct 19, 2009 2:57 pm
by mflorell
I think that you will first need to find a speech recognition app that will take your model. I don't have too much experience with this except for Sphinx which is free but extremely CPU hungry.

PostPosted: Mon Oct 19, 2009 9:45 pm
by williamconley
your first version should attempt it without speech rec if possible. we've looked into speech rec and found it to be intense, but doable. is there a need to avoid having clients enter this on keypad? are the entires uniformly shaped (numbers and characters always in the same place)? If so, you could have the user enter the first XX digits as numbers, and press keypad specific number of times for letters (5 = J, 55=K, 555=L). If there are a limited number of "letters" and they are in predictable places, this would work. then the rest of the script becomes much simpler, and in fact, "doable". then later add voice recognition.

on the other hand, that being said, if the recognition is only for the alphabet, then it could be done. sphinx is the place to begin.

PostPosted: Tue Oct 20, 2009 12:05 am
by johnyjj2
Thanks for your answers :-)?

If not Sphinx, then what? Julius, HTK?

Yes, I can try using keypad but the most difficult thing for me is to establish the connection between mobile phone and server :-P.

What kind of device do I need on server side? I thought about SIP trunk or normal analog phone line using a sangoma or Digium card.

How would I connect (in the case of speech) from mobile phone? By calling a special number or by entering a special application on mobile?

Greetings :-)!

PostPosted: Tue Oct 20, 2009 1:02 am
by williamconley
Not even remotely difficult to establish the connection. It's just a phone call and a dial plan entry. Could be handled with a survey within vicidial for most of it quite easily.

SIP trunk is a good place to start (cheapest test version). And DTMF from cell phone to Vicidial is quite reliable. Better than voice recognition. Do it with ulaw or alaw to begin with.

It could be done with limewire modifications or with other survey applications. We've written our own survey applications as well. You play the greeting, then ask for the information you require (in pieces if you can to be sure it isn't too long to enter if there is text involved) and then you run your "verification" algorythm and give the client the response and either loop or terminate.

Trust me, though, you want to do the first version without voice just to get it running. If you can. Is there a "pattern" to the digits you are verifying? (text in specific places always?)