this post was submitted on 10 Jan 2024
1 points (100.0% liked)

Technology

82227 readers
4585 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Grimy@lemmy.world 0 points 2 years ago (32 children)

Using publically available data to train isn't stealing.

Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can't use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

They want to kill the open-source scene and are manipulating you to do so. Don't build their moat for them.

[–] givesomefucks@lemmy.world 1 point 2 years ago* (last edited 2 years ago) (18 children)

And using publicly available data to train gets you a shitty chatbot...

Hell, even using copyrighted data to train isn't that great.

Like, what do you even think they're doing here for your conspiracy?

You think OpenAI is saying they should pay for the data? They're trying to use it for free.

Was this a meta joke and you had a chatbot write your comment?

[–] webghost0101@sopuli.xyz 0 points 2 years ago* (last edited 2 years ago) (14 children)

The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn't copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

EDIT: In case it isn't clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

[–] RainfallSonata@lemmy.world 1 point 2 years ago (2 children)

I didn't want any of this shit. IDGAF if we don't have AI. I'm still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

[–] RememberTheApollo@lemmy.world 1 point 2 years ago (1 child)

It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.

That’s what the drive for AI is all about.

[–] myslsl@lemmy.world 1 point 2 years ago

Machine learning techniques are often thought of as fancy function approximation tools (i.e. for regression and classification problems). They are tools that receive a set of values and spit out some discrete or possibly continuous prediction value.

One use case is that there are a lot of really hard+important problems within CS that we can't solve efficiently exactly (lookup TSP, SOP, SAT and so on) but that we can solve using heuristics or approximations in reasonable time. Often the accuracy of the heuristic even determines the efficiency of our solution.

Additionally, sometimes we want predictions for other reasons. For example, software that relies on user preference, that predicts home values, that predicts the safety of an engineering plan, that predicts the likelihood that a person has cancer, that predicts the likelihood that an object in a video frame is a human etc.

These tools have legitamite and important use cases it's just that a lot of the hype now is centered around the dumbest possible uses and a bunch of idiots trying to make money regardless of any associated ethical concerns or consequences.

load more comments (11 replies)
load more comments (14 replies)
load more comments (27 replies)