Combining the Llama-70B model running on Groq with the Whisper model achieves near-zero latency performance.
If this speed can be achieved in GPT 4 or a later version of GPT 5, there is a lot of room for imagination. You can write a book in almost seconds, and real-time AI calls are not a problem!
This thing is really powerful, so fast hahahaha
Output speed close to 500 tokens/s… The model is Mixtral 8X 7B
Llama 2 7B is 750 tokens/s
It is so fast that no one can match it, but the accuracy is not very good…
You can experience:http://groq.com
It also provides APIs, so you can make your own test:http://wow.groq.com
Video: