Ahhhhh okay. So here is my answer to that...there isn't any Speech to Text functionality within ViCiDial. However, that's not to say you can't incorporate that functionality into ViCiDial. I have worked on a project that is very similar to what you're describing here and ViCiDial "can" do it...I mean, technically speaking, you don't need ViCiDial in this particular case...you can use Asterisk solely for this purpose.
I'm kinda interested in developing this functionality for ViCiDial because the landscape is changing in telephony and voice bots are now a thing and I think the project might benefit from at least having the option to have a voice controlled bot...the trouble is is does it really fit ViCiDial or is it it's own thing. I've found that it is it's own thing...but some integration could be possible but doesn't currently exist.
So, the way it "could" flow;
- You would design a custom AGI script that would be called when your caller connects to your ViCi instance.
- Of course, you'll need a DID for the call to drop into. You can make outbound calls and have them dropped into a DID and thus into your voice bot.
- You'll need a Speech to Text engine to convert the caller's speech. Now, the question as to which service...is beyond the scope of ViCiDial. For example, you may want to use IBM's Watson or Google's STT or whatever service.
- You can transfer the audio to those engines or you can hook into a SIP service and relay the audio that way. Both have advantages and disadvantages.
- And then you'll another engine to generate the Text to Speech so that the audio can be played back to the caller. Again, another service is needed here. Thankfully there are more TTS services available than STT and also, there's a project on the Asterisk project (I mentioned project twice there doh) which is still in active development but not released (https://wiki.asterisk.org/wiki/pages/vi ... d=45482453)
- Doing all this will be quite intensive so you'll need to scope out your server capacities and network bandwidths and stuff.
If you were looking for simple DTMF inputs from callers, ViCiDial would be totally fine with that. If you want to use your callers speech...well it gets tricky.
I did have on my dev kanban here a project to do a voice controlled AGI environment however, it's still in the planning stages and yeah, there's no code written as of right now...although I've had feedback from the Asterisk guys and they don't really see a need for my particular voice controlled bot as it were but nonetheless, I will at least realise it as an exercise in building such bots...for personal growth.
I hope this helps! Thank you.