pleroma.debian.social

pleroma.debian.social

Opinion poll:

In your opinion, is speech-to-text "generative AI?"

I have never considered speech-to-text as "generative." I've always thought of it as transitioning information between contexts (aural to written).

In other words, if I speak words, and then manually wrote them down, I generated them at the "speaking" part, not at the "writing them down" part.

I've had folks disagree as of late, though, saying that speech-to-text is "generative" in that it literally generates content where content did not exist.

This is a nuance I hadn't considered.

@veronica with a free software model trained on free data, free. Otherwise nonfree.
replies
0
announces
0
likes
0

The reason I'm thinking about this distinction is captions.

I am a firm believer in captions, and I've gone to great lengths to manually write out captions for every video.

Speech-to-text has helped immensely with this task over the years. If I've done non-scripted content, using a speech-to-text program has saved me countless hours of retyping.

But much more importantly, I know the technology has helped folks who are deaf or hard of hearing. Live captioning using speech-to-text is a great tool, particularly when it's locally hosted.

I haven't considered that to some folks, that usage would count as "generative AI".

@veronica I think it depends a lot on the model. Ex: Whisper absolutely is "generative AI" because it's using an LLM to do the translation and will occasionally hallucinate.

I dont think things like Dragon or Talon fall in the same bucket.

@zrail @veronica I think the same

@dolanor @zrail I think I'm struggling to find a hard line between these two things (Whisper and something like Dragon).

@veronica I think it's contextual. Purely transcripts? Not generative. But something translated starts crossing the line into generative just because there isn't always word for word equivalents.

Somewhat similarly, I don't have a problem with something reading out a news article, but I do take issue with using AI for voice acting as it's more than just a change in medium. If that makes any sense.

@blakeshall I think that's a lot of it for me, personally! Yes, the context matters. Replacing a voice actor is one thing. Helping someone speak after a loss of voice is another.

Now, if the speech-to-text adds context, like changing the following text:

"I used grep to find these files"

into this:

"Veronica said she used grep to find these files"

I think that part is generative, if that makes sense? Am I making sense?

I'm slowly devolving back into my earliest form, "imagined scenarios for academic purposes"

@veronica I think this gets muddy given that people often use LLMs to do this these days (whereas previously they may not have).

Side note: Steno is such a neato art form. I wish I had the time to get into it.

@jessebot I'm with you, but a stenographer isn't going to follow a person who is deaf or hard of hearing around to do live captioning, and I think that's where the nuance really deepens for me.

@veronica

It's got the training data is problem closed that makes it hard to call it "free software"

It's also still somewhat error prone, but it seems like it's the conversational LLMs that cause the most social problems