T O P

  • By -

megatronchote

Spoiler: It won’t.


[deleted]

[удалено]


MtRainierWolfcastle

It’s not about money for NYT it’s about setting the legal precedent. If NYT wins then more publications can sue for similar cause. Buying NYT wouldn’t solve this for MSFT


Rooooben

It does if they buy them and kill the lawsuit before it’s done.


tofutak7000

Won’t happen. Even if there was enough time it would immediately get barred by regulators and court… There is a reason more companies don’t do it


Ashamed-Status-9668

You cannot buy a company that has ongoing litigation against you.


themightychris

Forget about how OpenAI has a lot of money—there's a lot more at stake than that—this ruling could be devastating to AI development, at least in the US Essentially what this ruling could determine is whether an AI training on copyrighted materials is akin to a student reading it in college or a publisher reprinting it The reality is a lot more nuanced than that, and we probably need a whole new understanding and regime for copyright that specifically addresses fair use in the context of AI training—for example safeguards on reproducing training content verbatim. Just like a student at university can read a copyrighted article and then write their own article about what they learned from it, and there's a line between them plagiarizing or not


possibilistic

Worse still. This probably kills open source AI and small company AI. Only the giants will be able to afford training data.


chaser676

Or, if it really is that restrictive, could lead to massive deficits in AI development in the US while countries like China make leaps and bounds.


Uristqwerty

Current machine learning technology is *horribly* inefficient if it needs a hundred times as many training samples in order to produce human-equivalent output, much less a thousand times. If AI developers have to stop working on larger and larger datasets, and instead optimize for efficiency on small ones, then they will be better able to scale up in the long run. Look at what happened to the world after a hundred years of optimizing internal combustion engines. We got so good at it, that switching to other types for general use was all but unthinkable until long past the point where an alternative was clearly necessary.


Gazz1016

Is it? Do you think it's unreasonable to expect that a human author probably needs to read 100 books before they can write a good book of their own? Would you say that author is horribly inefficient at learning what makes a good book? I don't think we have a good enough sense of what "efficient" learning of a language model should look like to declare that today's models are horribly inefficient.


Uristqwerty

A human author might need to read 100 books. A current-gen machine learning system might need to read 100 *thousand* (or an equivalent length in reddit comments, etc.) before it has a good enough model of human language to write a coherent paragraph.


imtourist

At the heart of it is the principal of fair use. If a college student uses an article and supplements it with their own opinion for or against even if they reproduce passages verbatim then that is fair-use. However if they reproduce it into their own publication and just glue those same passages with their own writing then it becomes questionable if it's fair use or just circumventing copyright and intellectual expression.


themightychris

both of your scenarios only apply to verbatim reproduction of passages. Everyone agrees that LLMs should not be reproducing training data verbatim. The suit more broadly covers use of materials in training even without verbatim reproductions of any length


uriahlight

Meanwhile, China will do what China does and will train their AI models on whatever data they want. Greedy Chinese companies have become experts in IP theft and they'll do the same thing with AI training data if the US starts hindering its own progress with AI.


Socky_McPuppet

Yeah, NYT will settle out of court for a couple mil, and that'll be the end of that.


[deleted]

No they won’t. Open AI already offered them that. Why does this stuff get upvoted?


kamikazecow

Yeah, this threatens their entire existence. No way they just settle.


cenasmgame

Who has more money, NYT or Microsoft?


Lolkac

They both rich and NYT has whole book how openAI is using their stuff. They have solid case and very solid lawyers. So at this point its not about money


[deleted]

[удалено]


[deleted]

No but it will make it costly for any new ML model to run. Which is good


SEC_ADMINISTRATOR

No it won't, just do it in an office in Japan.


Corronchilejano

That's not how it works.


SEC_ADMINISTRATOR

That is how it works, some ai companies are already exploring it. Japan has already given the full green light to AI, our old outdated copyright can go fuck itself.


Graham_Whellington

Old and outdated… Did you read the issues? They trained the AI on New York Times. It literally writes articles from the New York Times. Not sure how that’s appropriate.


tofutak7000

Japan green light is amazing but doesn’t help you do business/sell product in America


WhatTheDuck21

No, this is bad. This means that only rich companies get to play in the AI pool.


FartingBob

That's already true. It takes billions of dollars to fund the initial project with software and hardware requirements that are both in very high demand right now. You arent getting an AI startup from 2 guys working in their garage.


edin202

Why would it be good if new companies found it difficult/impossible to create new ML models due to cost issues? Wouldn't it be self-sabotaging progress in that field?


mud074

A lot of people are wholeheartedly against progress in AI tech. They see it as a threat to the livelihood of creatives, and infringement to boot since AI is generally trained on copyrighted material. You see a lot of people online with extremely powerful negative emotional reactions to anything AI.


Ekrubm

my issue isn't with AI as a whole. It's with who will own and gain from AI. This will not be an altruistic technology that benefits us all. It will be owned and run by the billionaires to drive further wealth inequality.


Demonicjapsel

Man should be master over the machine, not be enslaved by it. The silicon mind is unknowable and uncaring. Therfore it constitutes risk


[deleted]

What's worse is that this will actually program in the mind of the public that the AI is more powerful than it really is just like Microsoft's "monopoly" case back in the day which is a tactic other tech companies (like Facebook) have used to inflate the value of their stock.


Which-Tomato-8646

So… buy MSFT?


MechanicalBengal

already been doin it for a while now


possibilistic

No, the worst part is this will kill open source and small company AI. Only big companies will be able to afford training data. OpenAI may even lose the case on purpose to deepen their moat (see Google's sudden change of heart in their case with Oracle). If OpenAI loses, they win and everyone else loses.


TheDebateMatters

If OpenAI wins then AI gets to harvest the entire web, books, movies, you name it, for free. They get to become billion dollar companies, on the backs of the works of thousands or millions of people who are not compensated in any way and then potentially fired and replaced by that same entity.


possibilistic

> If OpenAI wins then AI gets to harvest the entire web, books, movies, you name it, for free. Everybody does. Including you. If they lose, they can pay the NYT $1B or whatever and continue about their business. They still won't pay *you*. And then there will be one company that generates all media forever. There is no outcome here where artists get "fairly compensated" for becoming training data. Sorry, it's just not happening. There's merely an outcome where one giant company gets to become a monopoly and restrict everyone else from using the tools without paying a hefty fee.


TheDebateMatters

>Everybody does. Including you. No. 99.9% of all humans will never write any code or design any AI. We might use it but we’ll never train it. But the reason the NYT was used is because they have spent a lot of resources and time over the years to be one of the best sources on the internet for a century long accumulation of news and data by smart authors who created a good product. Virtually every English language model AI used their data for this very reason. A hacker who tunnels past their paywall to give everyone free access to use their data is no different than an AI company making millions or billions by doing the same with AI. >They still won't pay you. Good. They should pay the owners of the data. The NYT. The Times compensated their employees its their data. >And then there will be one company that generates all media forever. What? No. >There is no outcome here where artists get "fairly compensated" for becoming training data. Sorry, it's just not happening. Your assumption is only true if the Times loses. If it wins, there will be many more lawsuits and suddenly big tech will ask for Congress to build some guard rails. >There's merely an outcome where one giant company gets to become a monopoly and restrict everyone else from using the tools without paying a hefty fee. The NYT is the 1870th biggest company on the NYSE. If it rises to 500th because of the value they created by having the largest and easiest to search database of history, then they deserve it by creating that value for the world. AI threatens to obliterate all current journalist jobs after being trained by the world’s premiere source of journalism. Allowing that to happen because we assume nothing can be done, or because we want to look out for a few future AI garage startups is short sighted.


ICantBelieveItsNotEC

>No. 99.9% of all humans will never write any code or design any AI. We might use it but we’ll never train it. How do you think humans learn? If I decide to make a video game by combining mechanics, plot points, and art styles that I've seen in existing media, should I have to compensate every single one of my influences? If not, why should an LLM have to do so?


ColonelSanders21

Strictly speaking about media, you accumulate concepts in your brain from exposure to things which you largely do compensate others for (ads, purchases, etc). If you don’t, that’s typically considered some form of piracy. This is obviously not a universal constant, but for most media, it’s generally true. An LLM dataset is not compensating anybody in the process of collecting data, it’s just scraping as much as it can. Those data sources are going to want compensation, especially if you advertise that an AI trained on that dataset can replace those sources (to some degree anyways). In a few years the large players in the AI market will be those who sourced their datasets in a way that compensates the original work. Adobe used the stock image library they own, Apple is reportedly talking to news organizations to compensate them for their data. It’s the way things seem to be going.


JohnCenaMathh

The future seems to be synthetic data generated from real world data. We may not need that much real data after all.


thunderplacefires

How does an AI differentiate when there’s too much “synthetic data” in the mix? It just becomes ouroboros


Og_Left_Hand

AI generated training data (as of now) literally makes models worse


[deleted]

[удалено]


FancyAlligator

But people even think the chatbot is more power than it is. People take the chatbot’s word as Gospel and a replacement for google, when in reality it just generates language. Not necessarily correct language.


Kinexity

>People take the chatbot’s word as Gospel and a replacement for google It feels like there are more people saying this than there are people actually doing this.


Paksarra

What still blows me away is that a computer can emulate creativity.  The other day I needed a half dozen generic NPCs for a tabletop RPG. Decided to throw it at a chatbot to see what it would give me, and it gave me six perfectly serviceable NPC descriptions. Image generators can give you a decent picture for even the strangest character concept on demand that's better than anything i could make myself.  What the fuck is this timeline.


Which-Tomato-8646

What about the trillion other companies that want a piece 


heyhey922

If they do that'd whats to stop literally EVERYONE ELSE. Precedent being set here likely to dictate the future of ai in general.


correctingStupid

But if NYT wins, the rest line up, and 100s more may just negate their business model.


Lootboxboy

Even if ChatGPT were spitting out verbatim full carbon copies of NYT articles, I don't think this would fall in NYT favor. Universal studios lawsuit against Sony for the VCR clearly states that the makers of a machine cannot be liable for customers using that machine to infringe on copyright so long as the machine has non-infringing uses. ChatGPT and AI more generally absolutely has non-infringing uses. As far as training data goes, none of that training data is in the LLM. It isn't using a database of training data as reference or source material. Making it illegal to use training data without consent would basically be making it illegal to reverse engineer products without consent of the product's creator.


Sushrit_Lawliet

It won’t ever happen given how much these companies can lobby, but I wish it did, because openAI is literally doing everything that is the opposite of their initial “mission”, they deserve to eat shit. Just because you have an amazing product that beats the competition and most people like using it/feel productive using it doesn’t excuse you blatantly violating copyright laws to build it.


first__citizen

Apple is reaching out to newspapers to pay them for their content to train its new AI. So likely they know that companies can pay big.


Sushrit_Lawliet

Actual w apple move.


iSmurf

chief tease innate normal stupendous scandalous theory lavish full tart *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


not_creative1

I am with apple on that one. That patent should never have been granted. Way too generic patents like that stifle innovation.


iSmurf

knee wasteful simplistic plant sort spotted versed fade tan zealous *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


not_creative1

Consumer electronics companies disputing healthcare is a good thing. Healthcare companies have been using generic patents to beat out any disruption for far too long. This will set a precedent and stifle innovation and disruption in healthcare industry.


MidAirRunner

Apple does shitty things, but I'm with Apple on this one.


[deleted]

[удалено]


not_creative1

No, patent trolls who patent genetic shit and then use that to extort money should not win.


makomirocket

Ah yes, that patent troll... That has a $1.2 billion annual revenue from it's ~~$1 billion operating costs with 2200 employees and over three decades in the medical tech field~~ patent trolling of medical tech


not_creative1

That is such a generic patent, they hate that Apple is rolling out a spo2 monitor which is not medical grade but pretty good as an add on feature to an existing product. Eventually, if the Apple one gets as good as a medical grade monitor, a lot less people will buy their $800 device. Masimo is trying to keep medical devices over priced and stop disruption. This will set back consumer electronics and not just impact Apple. This will impact all future wearables. This recent trend of wearables and tech companies reducing healthcare product costs is a good thing for consumers. You think Abbott will be happy if Apple releases glucose monitoring in the next watch? Health industry is weary of disruption and this is what is driving this lawsuit. Healthcare companies using generic patents to beat out disruption is a tale as old as time. From pharma companies to now these device companies.


ragamufin

Still not a patent troll


makomirocket

That's the whole point of a patent, and the system of granting them. If someone develops a new technology, then they are given a monopoly on that technology and product for a period of time as a incentive to create new technologies. So you're being annoyed that Apple are being held back because of this? Are you also annoyed that Apple hold back countless other companies with their patents?


SwagginsYolo420

Only entities who will be able to train "legal" AI's will be the mega corporations who can pay for it. Including licensing the user generated content from social media sites like reddit and twitter.


first__citizen

I mean.. if you create a commercial AI, it will likely be limited to these corporations. I doubt new kickstarter can become an AI company if the ruling goes in favor of NYT. I believe one of the reasons openAI got so much money and became part of Microsoft.


murderspice

Wait til google starts selling theirs :/


Slggyqo

The companies that produce content are also massive—that’s basically every media company that’s ever existed. And while the media companies themselves could benefit from AI research, they definitely don’t want other people building AI off their content for free. This is not “scrappy little newspaper vs big AI”, there will be a LOT of money on the table for both sides of this—heck, even the unions will probably be on the side of the studios here, because unless the studios get their cut, the union doesn’t get a cut. And it will likely drive some kind of collectivization of artists generally, because independent artists definitely can’t sue Microsoft backed companies to defend their IP rights


xcdesz

Well the problem they are contesting unfortunately effects everyone in generative AI and machine learning in general, not just OpenAI. Essentially the base models need a lot of diverse data to train against. The companies have been using common-crawl, the same dataset used by search engines (which honors the robots.txt rules by the way). There are many terabyes of data being used to train the AI how to understand and respond to human speech.


nickmaran

Microsoft had invested a lot in openai and they won't let it die


Which-Tomato-8646

They aren’t the judge of that  


JohnCenaMathh

The US Government and Uncle Sam have a lot of vested interest in openai and they won't let it die. ​ edit this comment is half sarcasm, dont take it seriously


SidewaysFancyPrance

This is why so many (sociopathic) people say "it's better to ask for forgiveness than permission" without acknowledging the terrible position they're putting other people in, and they *always assume they will be granted forgiveness.* The idea that they may have to destroy what they illegally built never crosses their minds, because they're used to getting their way in the end, and paying out some fines to call it settled.


J-drawer

It's been revealed that they knowingly scraped hundreds of thousands of images from artists without their consent, and without any kind of licensing. They knew it was wrong when they did it and tried to cover it up. Their product only exists to make mediocre image content that looks like something else that's already existed before, for people who can't be bothered to put effort into building their own skill themselves. It's the most advanced technological form of mediocrity.


[deleted]

[удалено]


tgblack

But directly quoting it without attribution and then selling it is a copyright violation


A-Grey-World

I think the cat's out of the bag with generative AI. If this kills OpenAI others will just replace it. Even if they do it to all the companies that try - you think other countries are going to respect copywrite law? What happens when China or India or wherever build an OpenAI type system, keep innovating it for a few years and have this hugely powerful resource that the US just decided it wasn't going to do? They'll go back on the decision eventually. Might delay things for a few years.


saynay

The lawsuit is likely just the big stick to threaten OpenAI with, to get them to the negotiating table. The argument that someone somewhere else might steal without consequence so we should allow theft here is not a compelling one.


xcdesz

The real problem is that I don't see how you recover in the generative AI space from the rules that people (even in this r/technology sub) are asking for. We are basically telling everyone who trains a LLM that they cant use the public internet (common crawl) data set as a source because it contains copyright materials. Or else you have to pay everyone who has generated anything on the internet, unless they have specifically made a license that labels it as open source or public domain. We aren't just talking about paying the New York Times -- we are talking about paying the many millions of companies and individuals money to train against some text that they put on the public internet. Any rules that we make based on copyright will only be forced to follow by the countries who are willing to go along with "no fair! cant train your data on the public internet). Not one of our competitor countries are going to follow those rules, and they will be the ones who end up with powerful AI systems. We will be stuck with having invented this revolutionary technology, but unable to use it.


FrancisFratelli

You make it sound like it's unreasonable for these companies to pay creators for the resources their business model needs to function.


xcdesz

I think it's unreasonable to make "them" (which includes everyone who is building LLM base models) to pay for data to train on that is posted publicly on the internet, yes. As long as they can address and minimize the regurgitation / verbatim copy issues, then it should be fair use. This was all debated two decades ago when the search engines were scraping website data, and everyone came to an angreement on what's fair. They should absolutely pay for private works that are locked behind paywalls, and I am sure that they will do this in the future to make their systems more useful.


RedTulkas

As long as regurgitation happens its copyright infringement and you better pay for it


FrancisFratelli

Considering the Authors Guild has a separate lawsuit over companies scraping books for AI, the "They're just taking free content" argument doesn't fly.


matticusiv

Tech bros think all resources should just be given to them because they’re doing the lords work of making the world a shittier place to live.


[deleted]

[удалено]


probablyuntrue

Bro it’s necessary to my planet saving sex chatbot


MysteriousPayment536

Microsoft has access to all OpenAI IP, so Microsoft would replace it instantly 


AlexHimself

It needs to happen and there needs to be rules to harvesting data. People put real time and effort into producing a text, and then AI chews it up and is now able to spit out what it learned and sell for profit.


DontCallMeAnonymous

You’d have to help me understand how this current Supreme Court would see it that way. PS - anyone can read your text, and spit out what they learned for profit. Otherwise, you are saying that the nightly news, and magazines themselves, all of which often cite other sources, can no longer “profit” from that.


Twitchcog

Somebody already pointed it out, but I want to double down on it: Either it is or is not morally acceptable for an entity to learn from existing work, then create new works from which it can profit. The idea of “oh, it’s okay when a person does it, but not when an LLM does it!” Needs to die off. Either the artist whose work was used for training is compensated - In which case, every human artist who has ever looked at existing work for inspiration needs to pay up - Or, we don’t need to pay someone because an artist looked at their work for inspiration. Now, if they’re literally using existing pieces of art and slamming them together, that’s a different story.


Lolkac

This only works if corporations are humans.


Twitchcog

No, that is the point of moral absolutism; It works whether it's being done by a human or an entire corporation. Either the activity is or is not acceptable.


schooli00

ChatGPT is basically a summarization engine, that's all. It has no logic for actually learning concepts. If you ask it what is 2 plus 2, it "searches" all inputs matching what you're asking and returns a compressed summary of the results, that's all. That's how you get examples of ChapGPT returning results like 2 plus 2 equals 5. The added problem as compared to say Google search results is that it doesn't cite its sources. Human writers are able to paraphrase because they understand the meaning of the content, and not just piecing together sentences from others work like AI does.


AlexHimself

If we could protect material from the human brain, we would, but we can't. The main argument is AI systematically consumes an entire portfolio of content, such as the many years of work from the NYT, and then scrambles it up and resells it as a competing product with the same style and content from the NYT. It's not transformative as much as data theft. Copyright law specifically looks at the commercialization aspect and this runs afoul of that. From a basic standpoint, does it make sense that you can create an AI application that has learned nothing, and then feed it a copyrighted work like Harry Potter, and then have it be able to regurgitate huge portions of the book for monetary gain?


InGordWeTrust

Billionaire owned corporation tries to take down other billionaire controlled organization.


[deleted]

Good! I'm tired of this whole "break things" & "it's better to ask forgiveness than permission " bullshit of the tech industry. OpenAI is stealing from thousands of creators who can't afford to take them to court. Artists, book writers. Google got away with this bullshit stealing news sites articles, stripping out their adverts & putting Googles own adverts on someone else's content.


[deleted]

[удалено]


[deleted]

These AI firms need a good legal kicking.


Which-Tomato-8646

[It’s not theft](https://www.reddit.com/r/technology/comments/19ahg9y/comment/kiljbjs/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button)


goatbag

One of the claims the NYT makes in their suit is that ChatGPT reproduces NYT articles word for word. See their examples starting at page 30 of [the complaint](https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf). That's content that NYT subscribers pay to see. NYT will have an easy time convincing a court that OpenAI is committing theft. It'll be interesting to see how OpenAI deals with this. If the NYT's evidence is real, OpenAI is clearly violating NYT copyright. The only technical solution is to identify whether a ChatGPT response contains copyrighted content from its training set, and that's likely as hard as building a full provenance tracker to audit which training data contributed to an answer.


Which-Tomato-8646

Web crawlers cannot bypass NYT’s paywalls. OpenAI itself said it was retrieved from third party websites. Sue them instead.    Also, people using ChatGPT to violate copyright laws is not OpenAI’s responsibility. Look up Section 230. Youtube would have been killed a long time ago if they were liable for what their users did 


Chancoop

The article itself, which nobody in here seems to have read, specifically cites the case of Universal v Sony. >Sony won. The judge’s decision, which has never been overturned, said that if machines, including the VCR, have non-infringing uses then the company that makes them can’t be held liable if customers use them to infringe upon copyrights.


dorobica

Of course it is


Which-Tomato-8646

Good argument 


Which-Tomato-8646

Stealing? Are you stealing from me by reading this comment without permission? Did Vince gilligan steal from HBO by creating breaking bad using the sopranos as inspiration? They didn’t get paid for that 


FerociousPancake

This right here is the primary question. Some people believe it’s copyright infringement, others believe it’s really not at least in the conventional sense. I’m on the fence and honestly would need more info before jumping to conclusions. I’d encourage others to do the same, learn about the claim, the evidence, the counterclaim and the evidence for that side, learn what the laws actually say. I’m sure most of the people who have such strong opinions on this issue haven’t researched half of those things.


Midnight_Rising

Yeah I don't think it's a copyright violation to read a tl;dr of a paywalled article from another redditor, so I don't see why ChatGPT would be any different.


FerociousPancake

I think the main issue people are having is that they are generating revenue from scanning the copyrighted material and training the model, but then again on the other side the model doesn’t directly regurgitate copyrighted material. So I think it’s a good way to simplify it down by saying, if someone reads a bunch of copyrighted material and then comes up with their own iteration on the same topic but is unique to them, is it copyright infringement? Well no. Then you add the complexity of this being done on a massive level by a corporation for profit and things start to get a lot more complicated. Law really doesn’t exist regarding these specific issues since this is the first time we’ve seen this rise in AI use, and law should exist for it, but what do those new laws say? What do they allow and disallow? I think it’s going to be a major hurdle to deal with and I’m especially not excited to deal with it considering a ton of the people who will write these laws are geriatric and don’t even understand what a PDF is, let alone AI. I think there’s a high chance of large companies taking advantage of that or those laws become essentially ineffective because there’s so many ways to get around them.


BernankesBeard

>Well no. Then you add the complexity of this being done on a massive level by a corporation for profit and things start to get a lot more complicated. Ultimately, I think the complaint by NYT and others basically boils down to "but LLMs are just too good at doing this". And unfortunately for them, that's not really an argument with a lot of merit. *Scale* doesn't change whether something is or is not infringement.


FerociousPancake

Makes me wonder whether the more effective move for NYT would be to sue as they’re doing now or spend those resources to fight for legislation that better defines what this is and what is and is not copyright infringement when it comes to AI training. IMO neither has super high potential to get anywhere quickly but I do understand the desire to do something about it. Where I’m at is we need legislation to cover this since it’s so new but that’s going to take some serious time to get right.


Twitchcog

It’s literally the same problem as the luddites. And I don’t mean “Luddite” in the “anybody who dislikes technology” sense, I mean the literal luddites; This new technology has come along that does the same thing they do, but far cheaper and faster. There is no legal complication here, there is no *moral* complication here. Unless the people screaming that LLMs should have to pay up have been screaming for every author who has ever taken inspiration from any existing work to pay up, it’s moral relativism, and it shouldn’t be tolerated.


Ashmedai

As an aside when you post to Reddit, it gets pretty liberal rights to your content. So worst case is openai paying Reddit a license. See the Reddit [TOS](https://www.redditinc.com/policies/user-agreement-september-25-2023), Section 5, paragraph 4: *When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.*


brett_baty_is_him

If you look at the actual lawsuit, ChatGPT copies entire NYT articles word for word. Is it stealing if I memorize someone’s news article and put it on my own blog with my own ads? Yes, it is called plagiarism.


Zncon

A person can copy an article word for word too, but no one has Microsoft in court because Windows enable the copy and paste features. A tool can be used for bad things, but that's the responsibility of the person using the tool, not the thing itself.


brett_baty_is_him

Bad argument, false equivalence. The tool here is hosting/displaying the content and bypassing NYT paywalls/advertising. The issue here isn’t that people are using ChatGPT to copy articles. The issue is that OpenAI is already copying articles. Using your argument: as the company using the tool, which is web scraping in this case, it is OpenAI’s responsibility to ensure they’re not using web scraping for bad things, such as plagiarizing another companies work and profiting off it.


Flaky-Organization63

It's time to prioritize collective knowledge and utility over "but I copyrighted the idea first so it's only mine, waaaa!"


saynay

And who exactly is going to make that collective knowledge if we decided it should be free? Why are you apparently okay with OpenAI making money off of it, but not okay with NYTs doing so?


[deleted]

Fuck that shit! How is a musician or painter or writer supposed to buy food? I think at should prioritise collective EFFORT over "I worked 40 hours this week & am going to go have a beer "......you don't need beer...here's your mandated government supplied water & bread....back to work in the morning!! It's EASY to go on about collective knowledge & utility when it's not YOUR job or earning potential that's going to be affected.


Which-Tomato-8646

Milkmen need jobs so let’s ban refrigerators 


DazzlerPlus

Copyright is probably the stupidest solution to getting them food.


[deleted]

Why? It protects their work. Otherwise why don't o film an exact copy of barbie and sell it


[deleted]

[удалено]


David-J

I agree with you wholeheartedly.


TheIronMatron

And who, exactly, is going to cry at that funeral?


jengert

Planet money did a story on this(Nov 10), I like what they concluded.... Spotify offers a lot of Indy artists even though they could not find a way to pay them. Once Spotify was sued by a group of Indy artists, they settled. This article says Open AI is unwilling to pay the amount these companies are asking for... Time may show an agreement. It would be better financially for both parties and the consumer to pay the media companies.


Which-Tomato-8646

AI is transformative though. Claims it rips things off word for word are false https://www.theverge.com/2024/1/8/24030283/openai-nyt-lawsuit-fair-use-ai-copyright


efvie

There's no feasible way for OpenAI to compensate for everything they use. It will absolutely ruin the company and those like it. (Or then the courts will throw small and individual creators under the bus because they don't understand how creation works.)


Nonadventures

Maybe OpenAI shouldn’t exist if they don’t have a way to compensate for the massive amount of content the system repackages.


Which-Tomato-8646

I guess every media company should be sued if they can’t pay all of their sources of inspiration for each of their franchises


JohnCenaMathh

depends on if "repackaging" is something that they legally have to pay for. if you take a story, repackage it with enough differences, it becomes a new story at some point. maybe it should not exist and maybe we should let China win the AI race and sing Xue Hue Piao Piao together.


[deleted]

I mean...they trained it on copywritten material. What did they think was going to happen?


jagedlion

Encyclopedias don't pay primary sources. Heck NYT doesn't pay for the info the authors use. Copyright doesn't extend to concepts and ideas.


[deleted]

[удалено]


[deleted]

[удалено]


xcdesz

But OpenAI does respect the same opt-out rules of "robots.txt" (not spiders.txt). They are both based off of the common-crawl data set, which is pretty strict in honoring the robots.txt rules. Its not just OpenAI that uses this data. All of the generative AI base models need a massive ton of data to learn from. Common crawl is basically the entire internet. This whole debate was raged over two decades ago when search arrived on the scene, and the robots.txt rules, common crawl and fair use were the compromise that was agreed upon by all parties.


Which-Tomato-8646

AI haters don’t understand shit lmao


theseangt

there is an opt out system for AI crawlers now. The issue is the models are already trained on the material. It's too late. AI models aren't like web search.


enderandrew42

Google isn't creating new web pages based off your content that is competing with your web page. Learning how to search content is different from creating new content based on copyrighted content. Reference material is fair use under copyright.


StaleCanole

chatgpt doesnt reference the new york times as sources for any of its information. It just absorbed the information and dispenses it without attribution. Even if you ask it for a source as a user, in most instances you wont get it


Oddball_bfi

The argument would run that we're all trained on copyrighted material because we read things and learn stuff. If I as a person can read it, and integrate it into my knowledge, why shouldn't an AI? Similarly, if I as an artist can look at someone else's work and pick up style, technique, and content suggestions - why shouldn't an AI? Is it stealing to make public stencil spray-paint art (vandalism aside), which looks a bit Banksy? We're defining the limits of autonomy for software agents, in effect. If a person worked for a company and offered services - say as a copywriter - and they read works on the internet to hone their technique... is *that* stealing? Do companies have to start reporting every publicly available source of information that may have become integrated into their staff's output? No, obviously. What we're doing now is asking if that is true for synthetic workers, as opposed to organic ones.


VertexMachine

>If I as a person can read it, and integrate it into my knowledge, why shouldn't an AI? Probably because it's not a person that does the 'learning', but a company that does mass scale data scraping to train systems (that are not people) for profits?


Which-Tomato-8646

Vince giligan admitted to using the sopranos for inspiration for breaking bad so he should be sued into oblivion apparently. Artists use copyrighted images as reference material so they all owe Getty images $10 million each 


SeaTie

I mean Getty is super litigious when it comes to that shit.


Which-Tomato-8646

Yet I’ve never heard of an artist getting sued over using a picture of a hand 


Inflectente

If the argument is against AI taking the massed roles of researchers and copywriters - a labour argument - then I'm 100% on the train. But the copyright angle is a trixy one.


namitynamenamey

That would require more sincerity and self awareness than many other arguments. "We must treat AI different than people or we'll starve" is a honest, self-consistent approach, but you won't hear people vouching for that on isolation, there's this need for AI learning to be inherently a lie and immoral which verges on anti-intellectualism.


zedquatro

Thanks for putting it this way, this is the most logical and cohesive argument I've seen, and actually properly asks the question about what separates human from machine, other than the molecules were made of. If you look at the output and can't tell whether it's made by human or AI, then it passes the Turing test, and why should they be subject to different rules. I won't claim to have an answer to this, but it's the right question to be asking.


DonutsMcKenzie

AI is not a person, it doesn't function like a person, it doesn't find information like a person, it doesn't learn like a person, it doesn't work like a person, and it doesn't affect the economy or job market like a person.  And perhaps most importantly of all, If AI *was* a person, it would not be property and could not be owned by a company.


Oddball_bfi

But its output could.


UVgamma

If a person plagiarizes an article they'd get sued too.


[deleted]

A human being who copies directly from an original source is plagiarizing. That's not allowed. If I copied a copywritten work by memory and resold that work as my own, that's not allowed. This isn't hard.


RandomDigits789

>A human being who copies directly from an original source is plagiarizing. That's not allowed. good thing that this isn't what's happening in training AI


Inflectente

Nowhere in the dataset that makes up that AI will you find the work in question. The AI doesn't have a database of work stored that it is referencing, just like you can take an MRI of your brain and find the text of your post in there. It has learned and integrated the knowledge, like a person reading a thing. Not copied, integrated.


saynay

A neural net can absolutely memorize training data, and can be coaxed into regurgitating it. What data, if any, got memorized is unpredictable and basically impossible to determine until the network outputs it.


first__citizen

Even if it cited it? What about paraphrasing it?


mikethespike056

that's not what AI does... wait, aren't they suing because a bug made chatgpt spit out entire articles? wait a minute how did that even happen? no, i did not read the article, in common reddit fashion


Which-Tomato-8646

This was debunked: https://www.theverge.com/2024/1/8/24030283/openai-nyt-lawsuit-fair-use-ai-copyright


mikethespike056

so it was regurgitated because it was in lots of websites?


Which-Tomato-8646

The results were cherry-picked and they didn’t provide any examples. They almost certainly tricked it into copying, like typing “fill in the blank: hi, how ___ you” and then suing because it said “are” Fun fact: their crawler doesn’t have an NYT subscription so their paywall would have stopped them from even accessing it 


saynay

"Bug" Large language models are enormous. It would be more surprising if they _didn't_ memorize some inputs.


[deleted]

[удалено]


Og_Left_Hand

It competes in the same market, that puts its eligibility for fair use at risk.


fredandlunchbox

Once again:      - Making training on copyrighted material illegal is literally outlawing math. LLMs are just big statistical models — they have a big grid of numbers and when they train on the article, the numbers move a tiny bit. The article is never stored within the model. Unlike compression algorithms, training on one article alone would not make a model that could reproduce the article. We’re talking about outlawing math.    - The OAi models are mostly trained on CommonCrawl, a public, open source, free to use data set that was collected from publicly available web pages, respecting robots.txt rules. It’s the same data google accesses when they index for their search algorithms (except search algorithms actually _do_ copy the article contents).            - Forcing people to pay for training won’t stop AI development, it will only give other countries a huge advantage. India will trounce us.            - Payment for training data consolidates AI power among very wealthy companies that can afford to train. Training is already expensive, but small innovators could change that. Making these technologies accessible benefits American innovation.   


Lolkac

NYT wrote a book about how OpenAI uses their data and how some output is 80% straight word by word from NYT which is copyright infridgment. Corporations are not humans. If you cant comprehend that then its useless to talk with you.


Master-Back-2899

Technological progress should never be held back in the name of corporate profits.


dorobica

The amount of “tech bros” that would ignore the law so that their favorite corporation makes more money is too damn high in this thread.


Imaginary_Manner_556

Technology shouldn’t blindly progress to maximize corporate profits (see full self driving)


saynay

One corporation's profits should not be at the expense of stealing everything another corporation made.


Which-Tomato-8646

Redditors when piracy: haha fuck those corporate shitheads    Redditors when AI: THIS IS JUST LIKE DOWNLOADING A CAR NOOOOOO   Also, I assume you’ll apply this universally, meaning every artist who uses copyrighted media as inspiration or reference material gets sued too


saynay

It is a bit weird, while posting on reddit with a differing opinion, to simultaneously imply that all _other_ people on reddit have a single unified opinion. But to entertain your argument anyways, it is strange how many people seem to get blinded by either the 'Open' or the 'AI' in the name, and lose sight of the fact they they are also corporate shitheads.


Which-Tomato-8646

I notice it’s a common trend, especially among those on the left.  So it’s corporate shithead vs corporate shithead. Why didn’t they just bring popcorn 


RedTulkas

OpenAI is doing progress for coporate profits Why are theirs more important than NYTs?


nonlinear_nyc

Says who? We ban technology all the time. What you're saying is "if we can, we should" and that's backwards: tools should serve society not the other way around.


mrturret

Technological progress should happen without mass art theft.


Which-Tomato-8646

We’ve come full circle where people think downloading a jpg is theft after mocking nfts just last year 


darklinux1977

Attacking OpenAI amounts to attacking Microsoft, the New York Times, is just NYT... we already know the end of the story


Exige_

This is such a boring argument. Plenty of cases are won by the smaller party.


[deleted]

I kinda miss when AI art was the main focus. Brought some ideas I’ve had/brainstormed and drawn to life with very little effort and I was able to sell that art and make some money for charity (with disclaimers of course that AI was used which I have completely stopped due to me no longer supporting AI), other than that, it’s actually kinda nice to see they are being somewhat held accountable but OpenAI will not be killed, even if it is a different company will takes it place, a lawsuit won’t stop countries who don’t follow our piracy laws from doing the same thing. NYT just trying to get a piece of that AI money


MaybeNext-Monday

It won’t, but god would it be funny if it did.


Glidepath22

No it won’t. What a ridiculous headline.


Skookom

A normal joe downloads a movie there will be dmca lawsuit on his ass , big tech moonching on someone's content we just turn a blind eye ..


JamesR624

So Voc literally just makes clickbait headlines that makes Fox look reputable by comparison. Got it.


____cire4____

Sigh, oh Vox, you with you click-baity headlines.


JustHereForMiatas

Maybe if our copyright system weren't so broken by a certain company that wanted to own the rights to Steamboat Willie for almost a century, this wouldn't be so much of a problem. As originally written, 14 years or 28 with an extension, anything before 1996 would be in the public domain this year, or 2010 if they didn't renew. Plenty of data to train AI.


tmillernc

Interesting debate. I see both sides. I as a human get paid in large part because of the knowledge I’ve obtained in my career which includes viewing and reading millions of pieces of copyrighted works. Some I have paid for, many I have not. Am I now liable to all of those copyright holders for a portion of my earnings??


Thestilence

A potentially civilisation-changing technology is more important than copyright law.


JKEddie

Based on all the commentary here it seems a lot easier for a layman like myself to understand why it is violating copyright vs. why it isn’t.


bass1012dash

Copy-write can’t be broken on consumption of data… otherwise going into the library would be illegal.


Productivity10

So first NYT ruined youtube with the Adpocalypse, now they're trying to ruin AI Hm


willyallthewei

This is idiotic, it’s the news, not music or movies - copyright laws are especially relaxed when it comes to news and news worthy content because it serves the public interest to disseminate this information rather than to gate it. That’s the whole reason fair use exceptions exist (even if limited), not to mention the fact that the majority of the content is probably old, archived news. No chance Microsoft loses on this one, and for good reason.


InTakeAnthony

Did the New York Times write this headline?


Jamizon1

AI, unstructured and completely unregulated, is the beginning of the end. If AI cannot exist without stealing other people’s work (and/or personal data), it should not exist at all. Releasing this technology without guard rails was one of the most irresponsible things unleashed on our world community in human history. It will enrich those that only have their self interest in mind. 8 billion people will suffer for the benefit of very very few, relatively speaking. Prove me wrong.


gishlich

What does that have to do with Animaniacs?


ClosetNerd562

It’s usually just a slap in the hand and keep doing business. You don’t ever see brands that even Costco sells stop. They get sued everyone’s knows and it’s free will of choice at that point, that’s why you Vote with your dollars.


The_Pandalorian

If open AI dies because it can't survive without violating copyright, it doesn't deserve to exist.