You need to add Microsoft into that as they have commercial AVR system as well as the demo of on the fly voice to voice translation in Skype that they showed https://www.youtube.com/watch?v=rek3jjbYRLo
After all if they can parse inbound voice
and then translate and respeak it
the output part of going to text instead of speech is trivial.
And yes this requires a lot of processing. Basically Google, Microsoft and perhaps Amazon have the horsepower to do this.
Don't bet against them unless you already have some brilliant research insight that is groundbreaking at the academic level
Good Point Greg! there are domain specific tools a plenty.