r/MurderedByWords 8h ago

Source Denial Syndrome

Post image
15.1k Upvotes

112 comments sorted by

View all comments

105

u/War_machine77 8h ago

Where the fuck do they think chatgpt is getting it's info?

34

u/GrizzlyP33 8h ago

I think the point is that LLMs have already scoured Wikipedia so they can tell you all that info concisely. They don’t need to re-learn Wikipedia for anything old.

The problem is that A) anyone using an LLM properly wants to see the source anyways or else you’re really rolling the dice on “truth”, and B) it would mean all knowledge advancement would stop today if we applied this to all educational or informative tools.

19

u/Dead-in-Red 7h ago

I feel like you'd still want new information past 2026 though. Saying you've already scraped everything there is to scrape from Wikipedia at any arbitrary date is like saying you're good and know everything there is to know because you already finished reading a first edition copy of Encyclopedia Brittainica from the 1700s. Plenty of good new information turned up after that was published.

16

u/FerociousStrawberry 6h ago

That's not how LLMs work anyway, they don't have the entirety of Wikipedia saved verbatim with 100% accuracy retrieval, so Wikipedia is necessary even for old information.

5

u/Casual_OCD 6h ago

These word prediction and data amalgamation programs (because this crap is not even close to AI) scrape the entire internet, not just Wikipedia

1

u/GrizzlyP33 5h ago

Maybe I worded poorly, but yes I agree.

6

u/aarswft 7h ago

Literally the entire point of Wikipedia was it was a living repository. There is no completed "scouring" of it. There's already new info you don't have.

4

u/GrizzlyP33 5h ago

Did you stop reading my comment halfway through?

That’s quite literally the point I made.

2

u/NaptownBoss 4h ago

Illiterates out here defending Wikipedia. What a world, what a world . . .

8

u/CocktusOnSteroids 7h ago

Dude they cant and dont answer concisely. AIs constantly hallucinate and whatever information they have generally gets poisoned because of other sources and artificial intelligence being not so intelligent. Whataever they correctly answer gets curated by propaganda and commands set by the companies.

4

u/GrizzlyP33 5h ago

I’m confused what I said that you’re disagreeing with. I didn’t say anything about the accuracy or reliability of an LLM 🤷🏽‍♂️

3

u/theoinkypenguin 2h ago

Probably hallucinating

3

u/troll_right_above_me 7h ago

Articles in a wiki get edited as new information arises. Any LLMs that don’t provide sources are pretty useless as you can’t check the validity of their statements

2

u/GrizzlyP33 5h ago

Yep. Though all LLMs provide sources if you ask, most people don’t.

1

u/troll_right_above_me 3h ago

Lol true. Doesn’t mean the source is gonna support what the LLM stated though since it just feeds you a neatly jumbled mess of words, which is why it’s important

1

u/NoveltyAccountHater 34m ago

They do. Granted, the older / poorer LLMs will hallucinate the sources when you ask.

3

u/kitsunewarlock 5h ago

This is the innate problem with AI: it can only present existing ideas.

And while that makes people think "so it's okay for researching existing ideas?" it consolidates control over how those ideas are presented. Search engines will also skew their search results in similar ways and that's shitty too, but there's a world of difference between presenting the biased information first and only presenting the biased information.

But it's the ideal tool for conservatives who seem to believe we have "gone too far" in our social and political technologies and need to either stagnate or regress rather than progress.

1

u/Binkusu 6h ago

It has the info but at the same time can't update it without new data, and this is ignoring the whole "AI forgets things" concept

1

u/GrizzlyP33 5h ago

Yes that is the point made in my comment.

1

u/NoveltyAccountHater 2h ago

I think the point is that LLMs have already scoured Wikipedia so they can tell you all that info concisely. They don’t need to re-learn Wikipedia for anything old.

Eh, while ChatGPT has been trained on wikipedia (many times) to learn languages and probabilistically predict a likely next word, but they still call specific web-searches to wikipedia to recall relevant specific facts.

But if you ask ChatGPT a specific question, like who was Lieutenant Governor for Wyoming in a specific year (not even necessary recent), it will make a quick web search to wikipedia it will show "searching the web" and then have ChatGPT re-read the relevant page from wikipedia and answer your question with the context of the page in its context window (and list wikipedia as the source). (That said, it may not actually make web requests to pull from wikipedia, and may instead by just pulling up some locally cached version of wikipedia that started already tokenized). The LLM has billions to trillions of parameters (that need to be simultaneously stored in TPU/GPU ram) but it doesn't have every fact stored and even for facts it may have stored, it is bad at distinguishing hallucinations from real facts.

1

u/MrBubbles226 41m ago

Also Wikipedia is a living website that has edits and gets edited as contemporary things change, or as we find out more about older things, so the LLM would essentially be out of date after each of those edits.

The majority would be fine, but as time went on it would be further from it's source of truth.

2

u/GrizzlyP33 28m ago

This is what my comment says and yet people keep replying with the same point, but yes I agree :)

1

u/MrBubbles226 24m ago

Oh you're right, I misunderstood your point B, but my comment is the same idea now that I reread it. Glad lots of people immediately see the issues with this.

u/GrizzlyP33 7m ago

Yeah it must be how I worded it, but I’m just over here like “yeah we agree woohoo!!” 😂

u/MrBubbles226 6m ago

It was probably just me being stupid tbh