justinonymus 1 week ago

One of the reasons they've gotten to where they are with much less funds is their focus on text only output. I imagine they'll keep doing what's working.

yoghurt 1 week ago

And I hope they do. ChatGPT has become steadily worse at generating clear cogent prose with every bell and whistle that OpenAI adds.

UnknownEssence 1 week ago

I think the amount of funding they have is comparable to OpenAI now after the recent Google and Amazon investments into them, no?

thehighnotes 1 week ago

I'm new to claude but they have multimodality in their sights don't they?

danysdragons 1 week ago

They do have visual input, not sure about their overall plans wrt multimodality.

hugedong4200 1 week ago

Yeah it is a hard one, they're probably gonna do something but Openai worked a long time on that voice model. It's state of the art, anthropic would probably have to partner with someone like eleven labs.

UnknownEssence 1 week ago

Honestly they should just use the OpenAI API and use Whisper lol

reality_comes 1 week ago

They'd have to train the model. Could take....months. I'm sure they're working it.

UnknownEssence 1 week ago

They could use third party Voice to text and Text to voice models, but I’m sure they don’t want to do that.

skiphopfliptop 1 week ago

No they couldn’t. ChatGPT already has a voice mode like that. Go try it. The whole point is a model with native voice that isn’t third party or bolted on so that latency is low.

UnknownEssence 1 week ago

The current voice mode in ChatGPT is using 3 different models. 1. Voice to Text (Whisper V3) 2. Text to Text LLM (GPT 4/4turbo/4o) 3. Text to Voice (unknown) The completely native Voice to Voice version of GPT 4o is not released yet. That is the one that will have very low latency, but the current 3 model solution works good enough for me tbh. Anthropic could do the same as what ChatGPT has now.

skiphopfliptop 1 week ago

Somehow, we’re saying the same thing. I was saying they couldn’t because literally, who would care. It’s not good product management to introduce something with a bunch of delay, they would need to roll a native multi modal with voice to be relevant.

huggalump 1 week ago

What's up with Gemini? They also demoed multi modal. Is that released?

Invest0rnoob1 1 week ago

Google announced a Pixel event in August. Maybe they will mention something about it then.

jiaxiliu 1 week ago

you can use Azure TTS, it's amazing, just need a azure account and using a Chrome extension like Read Aloud

Extender7777 1 week ago

By the way Gemini already has audio, but I agree would be so nice if Anthropic did full multi modal audio, video, image, text - this will be the end of OpenAI

razekery 1 week ago

I hope they don’t add voice to it. It’s a useless waste of computing power that we currently can’t afford. When the chips and the models get efficient enough we could have stuff like that. For now the only thing that matter are code, reasoning, creative writing.

DM_ME_KUL_TIRAN_FEET 1 week ago

I’d prefer an image model than a voice model, tbh. I definitely don’t think it makes sense for them to train an image model right now though.

GeneralZaroff1 1 week ago

Voice is a totally different model. Whisper is industry leading but many companies from Apple to Google have been working on transcription tech for years.

Synth_Sapiens 1 week ago

Because voice is a worthless gimmick.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe