Data contamination expert 👌

ElCanut@jlai.lu · 8 months ago

Data contamination expert 👌

TropicalDingdong@lemmy.world · 8 months ago

I used some tools to corrupt about 10 years of comments and posts of mine.

mp04610@lemm.ee · 8 months ago

While that’s the correct thing to do in my opinion, it would be a mistake to assume that Reddit didn’t store your original comments.

By corrupting their dataset, you may actually be helping them recognize maliciously edited comments.

khannie@lemmy.world · 8 months ago

it would be a mistake to assume that Reddit didn’t store your original comments.

They were fairly specific about not doing that (I’d imagine largely because of GDPR).

I deleted 10 years of “content” before I left and checked their policies. They apparently actually do properly delete from their servers.

ItsAFake@lemmus.org · 8 months ago

But the GDPR only covers European users tho.

khannie@lemmy.world · 8 months ago

That’s true but it’s far easier to globally implement rather than trying to segment. Very difficult to accurately prove a user isn’t EU resident across an entire userbase.

Frozengyro@lemmy.world · 8 months ago

I’ve got a bridge in the desert I’d like to sell you.

joenforcer@midwest.social · edit-2 7 months ago

GDPR is no joke. Storing a handful of comments is not worth the penalty if they get caught.

Note that I speak from experience as part of a company that needs to comply with the regulations. We do it because the risk of violation is 10000000% not worth it no matter how annoying and arduous it is to comply.

Ragnarok314159@sopuli.xyz · 8 months ago

I think Reddit caught on to this. I tried destroying my comment history (~7 years with 600k karma) with a few of the available tool on GitHub.

Found my account permabanned next time trying to login. People should attempt to eliminate/poison as much as possible, but Reddit has all the comments and modifications in a database somewhere to sell it all to whatever AI is the highest bidder.

They have to do something to make money after taking away awards. The advertising is absolute shit and not worth the $100 entry fee.

ElCanut@jlai.lu · 8 months ago

Can’t post a genius idea like this one without posting the links of the tools

TropicalDingdong@lemmy.world · 8 months ago

Its not my idea, but I could probably dig up the tool I used. Dollars to donuts, it doesn’t work any more.

This might have been the tool I used. I dont think so because I overwrote everything with one message, but google around you’ll find similar.

https://github.com/adriantache/YARCO

RecallMadness@lemmy.nz · edit-2 8 months ago

This would be better if it fed the parent comment into ChatGPT prefixed with “create a plausible but factually incorrect aggressive response to <comment>”

Feed the machine to the machine!

benignintervention@lemmy.world · 8 months ago

I wonder how much these models are now learning from spam they were used to generate

Beefalo@midwest.social · 7 months ago

This announcement is just “oh by the way, the horse is now out of the barn. He left like 10 years ago but this is the announcement.”

Shout out to whoever dismissed the first AI writings with “It’s like a perfect Redditor. Totally confident and completely full of shit, doesn’t even know that it’s lying.”

That doesn’t happen by accident. That happens when everyone was already scraping the shit out of the site, at the very least.

Poem_for_your_sprog@lemmy.world · 7 months ago

Set up a bot that just constantly posts blatantly wrong information, like “the earth is flat according to encyclopedia Britannica”, “the sky is green because it’s full or chlorophyll according to the UK foundation of science”

Zink@programming.dev · 7 months ago

Or in line with current events, “we are sorry about your experience and will refund you triple.”

Vilian@lemmy.ca · 7 months ago

we need to make a repository just for that and spam reddit with it, everyone is welcome to contribute, open-source fake news

Bombyk0l@sh.itjust.works · 7 months ago

That should be super easy. Just make a massive database of random stuff and put them in a sentence structured “XX is YY because ZZ” with no other explanation.

alphacyberranger@lemmy.world · 8 months ago

If it takes reddit data to train a model, instead of Artificial Intelligence we will end up with Artificial Idiocy and a horny one that too.

Flumpkin@slrpnk.net · 7 months ago

I’m pissed at reddit but I still hate searching for something and finding a post on reddit discussing it, only to find some of the posts being deleted or overwritten.

mods_are_assholes@lemmy.world · 7 months ago

Good, then the protest at least worked somewhat.

FIST_FILLET@lemmy.ml · 7 months ago

if you’re lucky, some posts have been archived on the internet archive’s wayback machine. highly recommend pinning the extension to your toolbar, it’ll show a number badge of how many times the current site has been archived :) https://addons.mozilla.org/en-US/firefox/addon/wayback-machine_new

Flying Squid@lemmy.world · 8 months ago

EmperorHenry@discuss.tchncs.de · edit-2 7 months ago

after they announced it would’ve been the time to start poisoning the comments. Then it would’ve been completely justified and moral.

Honestly, keep up the good fight. Start poisoning all open sources being scraped by any type of AI.

And I use the term “ai” very, very loosely. Because what’s called ai now isn’t real ai. It’s just an automated data collection tool.

It doesn’t create anything, it plagiarizes real artists.

Adalast@lemmy.world · 8 months ago

OpenAI team after including the data: why is the model suddenly even more horny, abusive, and discriminatory?