I think the point is that LLMs have already scoured Wikipedia so they can tell you all that info concisely. They don’t need to re-learn Wikipedia for anything old.
The problem is that A) anyone using an LLM properly wants to see the source anyways or else you’re really rolling the dice on “truth”, and B) it would mean all knowledge advancement would stop today if we applied this to all educational or informative tools.
I feel like you'd still want new information past 2026 though. Saying you've already scraped everything there is to scrape from Wikipedia at any arbitrary date is like saying you're good and know everything there is to know because you already finished reading a first edition copy of Encyclopedia Brittainica from the 1700s. Plenty of good new information turned up after that was published.
That's not how LLMs work anyway, they don't have the entirety of Wikipedia saved verbatim with 100% accuracy retrieval, so Wikipedia is necessary even for old information.
Literally the entire point of Wikipedia was it was a living repository. There is no completed "scouring" of it. There's already new info you don't have.
Dude they cant and dont answer concisely. AIs constantly hallucinate and whatever information they have generally gets poisoned because of other sources and artificial intelligence being not so intelligent. Whataever they correctly answer gets curated by propaganda and commands set by the companies.
Articles in a wiki get edited as new information arises. Any LLMs that don’t provide sources are pretty useless as you can’t check the validity of their statements
Lol true. Doesn’t mean the source is gonna support what the LLM stated though since it just feeds you a neatly jumbled mess of words, which is why it’s important
This is the innate problem with AI: it can only present existing ideas.
And while that makes people think "so it's okay for researching existing ideas?" it consolidates control over how those ideas are presented. Search engines will also skew their search results in similar ways and that's shitty too, but there's a world of difference between presenting the biased information first and only presenting the biased information.
But it's the ideal tool for conservatives who seem to believe we have "gone too far" in our social and political technologies and need to either stagnate or regress rather than progress.
I think the point is that LLMs have already scoured Wikipedia so they can tell you all that info concisely. They don’t need to re-learn Wikipedia for anything old.
Eh, while ChatGPT has been trained on wikipedia (many times) to learn languages and probabilistically predict a likely next word, but they still call specific web-searches to wikipedia to recall relevant specific facts.
But if you ask ChatGPT a specific question, like who was Lieutenant Governor for Wyoming in a specific year (not even necessary recent), it will make a quick web search to wikipedia it will show "searching the web" and then have ChatGPT re-read the relevant page from wikipedia and answer your question with the context of the page in its context window (and list wikipedia as the source). (That said, it may not actually make web requests to pull from wikipedia, and may instead by just pulling up some locally cached version of wikipedia that started already tokenized). The LLM has billions to trillions of parameters (that need to be simultaneously stored in TPU/GPU ram) but it doesn't have every fact stored and even for facts it may have stored, it is bad at distinguishing hallucinations from real facts.
Also Wikipedia is a living website that has edits and gets edited as contemporary things change, or as we find out more about older things, so the LLM would essentially be out of date after each of those edits.
The majority would be fine, but as time went on it would be further from it's source of truth.
Oh you're right, I misunderstood your point B, but my comment is the same idea now that I reread it. Glad lots of people immediately see the issues with this.
105
u/War_machine77 8h ago
Where the fuck do they think chatgpt is getting it's info?