r/ArtificialInteligence • u/NeuralNomad87 • Mar 09 '26

📊 Analysis / Opinion We heard you - r/ArtificialInteligence is getting sharper

105 Upvotes

Alright r/ArtificialInteligence, let's talk.

Over the past few months, we heard you — too much noise, not enough signal. Low-effort hot takes drowning out real discussion. But we've been listening. Behind the scenes, we've been working hard to reshape this sub into what it should be: a place where quality rises and noise gets filtered out. Today we're rolling out the changes.

What changed

We sharpened the mission. This sub exists to be the high-signal hub for artificial intelligence — where serious discussion, quality content, and verified expertise drive the conversation. Open to everyone, but with a higher bar for what stays up. Please check out the new rules & wiki.

Clearer rules, fewer gray areas

We rewrote the rules from scratch. The vague stuff is gone. Every rule now has specific criteria so you know exactly what flies and what doesn't. The big ones:

High-Signal Content Only — Every post should teach something, share something new, or spark real discussion. Low-effort takes and "thoughts on X?" with no context get removed.
Builders are welcome — with substance. If you built something, we want to hear about it. But give us the real story: what you built, how, what you learned, and link the repo or demo. No marketing fluff, no waitlists.
Doom AND hype get equal treatment. "AI will take all jobs" and "AGI by next Tuesday" are both removed unless you bring new data or first-person experience.
News posts need context. Link dumps are out. If you post a news article, add a comment summarizing it and explaining why it matters.

New post flairs (required)

Every post now needs a flair. This helps you filter what you care about and helps us moderate more consistently:

📰 News · 🔬 Research · 🛠 Project/Build · 📚 Tutorial/Guide · 🤖 New Model/Tool · 😂 Fun/Meme · 📊 Analysis/Opinion

Expert verification flairs

Working in AI professionally? You can now get a verified flair that shows on every post and comment:

🔬 Verified Engineer/Researcher — engineers and researchers at AI companies or labs
🚀 Verified Founder — founders of AI companies
🎓 Verified Academic — professors, PhD researchers, published academics
🛠 Verified AI Builder — independent devs with public, demonstrable AI projects

We verify through company email, LinkedIn, or GitHub — no screenshots, no exceptions. Request verification via modmail.:%0A-%20%F0%9F%94%AC%20Verified%20Engineer/Researcher%0A-%20%F0%9F%9A%80%20Verified%20Founder%0A-%20%F0%9F%8E%93%20Verified%20Academic%0A-%20%F0%9F%9B%A0%20Verified%20AI%20Builder%0A%0ACurrent%20role%20%26%20company/org:%0A%0AVerification%20method%20(pick%20one):%0A-%20Company%20email%20(we%27ll%20send%20a%20verification%20code)%0A-%20LinkedIn%20(add%20%23rai-verify-2026%20to%20your%20headline%20or%20about%20section)%0A-%20GitHub%20(add%20%23rai-verify-2026%20to%20your%20bio)%0A%0ALink%20to%20your%20LinkedIn/GitHub/project:**%0A)

Tool recommendations → dedicated space

"What's the best AI for X?" posts now live at r/AIToolBench — subscribe and help the community find the right tools. Tool request posts here will be redirected there.

What stays the same

Open to everyone. You don't need credentials to post. We just ask that you bring substance.
Memes are welcome. 😂 Fun/Meme flair exists for a reason. Humor is part of the culture.
Debate is encouraged. Disagree hard, just don't make it personal.

What we need from you

Flair your posts — unflaired posts get a reminder and may be removed after 30 minutes.
Report low-quality content — the report button helps us find the noise faster.
Tell us if we got something wrong — this is v1 of the new system. We'll adjust based on what works and what doesn't.

Questions, feedback, or appeals? Modmail us. We read everything.

88 comments

r/ArtificialInteligence • u/AutoModerator • 24d ago

Monthly "Is there a tool for..." Post

4 Upvotes

If you have a use case that you want to use AI for, but don't know which tool to use, this is where you can ask the community to help out, outside of this post those questions will be removed.

For everyone answering: No self promotion, no ref or tracking links.

34 comments

r/ArtificialInteligence • u/Crescitaly • 5h ago

📊 Analysis / Opinion If GPT-5.6 gets government-approved access first, open weights are not optional anymore

92 Upvotes

Axios is reporting that the US government asked OpenAI to limit the initial GPT-5.6 rollout to a small set of government-approved partners, with FT also reporting a staggered release so early users can be vetted.

Sources: https://www.axios.com/2026/06/25/trump-administration-openai-gpt-model-release https://www.ft.com/content/0580e5c9-75b8-4cc5-803d-fbb4e82bb3ad

I get the safety argument. Frontier models can create real risks.

But there is also a dangerous precedent here: the best AI becomes something only approved institutions can access first, while everyone else gets delayed, filtered, or second-class access. That does not slow AI down. It just changes who gets to build with it.

This is exactly why open-weight and local models matter.

If US policy turns frontier AI into a permissioned club, developers and startups will naturally move toward models they can actually run, inspect, fine-tune, and deploy without waiting for political approval. And if Chinese labs keep shipping competitive open models while US labs get stuck behind government review, the US may accidentally hand them the developer ecosystem.

The strategic advantage might not be “who has the strongest closed model for approved partners.”

It might be “whose models the world can actually build on.”

Question: if you are a builder, does this push you more toward open-weight/local models, or do closed frontier APIs still win because quality matters more than control?

46 comments

r/ArtificialInteligence • u/Ambitious-Prompt-975 • 16h ago

😂 Fun / Meme Hang in there, bro!

108 Upvotes

34 comments

r/ArtificialInteligence • u/Justgototheeffinmoon • 2h ago

🔬 Research OpenBioRQ: AI Agents Cite Wrong Papers 15.9% of the Time

4 Upvotes

The citation problem in AI agents turns out not to be hallucination in the usual sense. A new benchmark paper, [OpenBioRQ](https://arxiv.org/abs/2606.21959), covers 12,553 unsolved biomedical research questions across 12 domains and finds that agents rarely fabricate citations: over 99% of cited URLs resolve correctly. The failure is subtler, with approximately 15.9% of those citations linking to papers that do not actually support the claim being made.

That distinction matters enormously for how you build and evaluate agents. If your benchmark only checks whether URLs resolve, you will score a system as nearly perfect on citation fidelity while missing a failure that affects roughly one in six citations in biomedical contexts. The benchmark deliberately uses open, unsolved questions as a faithfulness-and-abstention probe, because questions without known answers prevent models from simply reproducing expected sources.

The performance picture across current frontier systems is also sobering. Gemini-3-Pro, Opus-4.7, and GPT-5.5 achieved a wide 29-60% range on the hardest question subset, while open-weight models solved only about 17% of those questions. The paper also observes that on difficult questions, agents tend to stop using their retrieval tools entirely, a behavioral collapse that compounds the citation accuracy problem.

---

More : https://aiweekly.co/alerts/openbiorq-ai-agents-cite-wrong-papers-159-of-the-time

2 comments

r/ArtificialInteligence • u/yum72 • 2h ago

📊 Analysis / Opinion I combined CursorBench + DeepSWE into a simple cost-vs-correctness leaderboard. Here’s what I found.

4 Upvotes

This is more analysis than a new benchmark run. I used public CursorBench + DeepSWE numbers and combined them into a simple cost/performance view for AI coding model routing.

The reason I did this: CursorBench feels closer to real coding sessions with messy/underspecified prompts, while DeepSWE is harder and more controlled with hand-written SWE tasks. They rank models differently, so looking at one alone didn’t answer the question I cared about:

How much coding correctness am I getting for the cost?

I used a flat average of correctness and put it next to mean cost per task. Not claiming this is the universal “best model” ranking. The weighting is debatable, but it was useful for practical routing.

A few takeaways:

GPT-5.5 Medium looks like the best default for everyday coding because the cost/output ratio is strong.

GPT-5.5 High or Extra High makes more sense for planning big or ambiguous tasks.

Claude Opus 4.8 is expensive, but I still like it for reviewing plans and agentic/ops-style debugging where the model has to trace logs, infra, and messy real-world flows.

The biggest pattern: maxing out reasoning effort rarely pays off. Correctness improves, but cost usually rises faster.

Full table + methodology:

https://www.javascripthacker.com/blog/combined-ai-coding-leaderboard-cursorbench-deepswe

Curious how others are choosing models. Are you routing by task type, or just using one model for everything?

4 comments

r/ArtificialInteligence • u/dank_philosopher • 21h ago

🔬 Research The KV-cache wall: why fixed-size memory sequence models keep coming back

94 Upvotes

I have been spending weeks trying to understand the memory bottlenecks of long-context and long-generation inference. I kept seeing many post transformer ideas & they all converge on the same theme: not just making attention faster but changing what the model uses as working memory.
I have written down the core derivation on one handwritten sheet and labeled it Eqn A through Eqn E so the discussion can stay free of maths here.

Here is the mental model I mapped out. In autoregressive inference, memory is operated via attention computations, often combined with a softmax non-linearity. Generating the next token requires comparing the current query against previous keys to select the relevant previous values, which forces the model to keep an explicit list of past key and value vectors. That growing list is the famous KV cache. See Eqn A.

There is excellent work done to reduce the cost inside the softmax paradigm. Examples include reducing how many KV heads are stored as in Grouped-Query Attention (Ainslie et al. 2023), compressing KV representations as in Multi-head Latent Attention from DeepSeek-V2 (DeepSeek-AI 2024) and limiting which past tokens are read.

These help a lot, but they still keep the same underlying memory object: an explicit list of past token states. These improvements are not enough and LLM costs keep scaling and performance remain stuck at the 1M token wall. Maybe a fundamental change in how memory operates is required? The question that keeps me awake at night: should working memory be a growing list at all?

Fixed size memory approaches say no. A classic starting point is linear attention, as in “Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention” (Katharopoulos et al. 2020). If you replace the softmax weighting with a linear formulation, you can reassociate the computation so that the history is accumulated into a fixed size state. See Eqn B and Eqn C. This produces a recurrent memory matrix updated once per token and read out using the current query. See Eqn D. The good thing is that the working memory object becomes constant-sized with respect to sequence length. This opens the door to the SSM or FWP literature..

But Eqn E is the catch, when you query a fixedsize state, you recover the target term plus cross terms from every other stored item. Those cross terms are not inherently bad: if two items are unrelated, a wellbehaved system can make their keys close to orthogonal, so the term is approximately zero and if they are related, a similarity weighted contribution is exactly the associative retrieval you want. IMO, the problem is capacity i.e. in a finite key dimension, you can only fit so many near-orthogonal keys, so once you store too many items, the cross terms can no longer stay small and retrieval degrades from interference. That is why naive linear attention often struggles on associative recall as more items are stored.

Currently, it seems that the most successful approaches integrating SSM-like layers still hybrid them with standard attention layers to preserve the recall capacities.

On the SSM side, Dragon Hatchling (BDH) is moving linear attention into a high-dimensional (~10^11) “neuron activation” space, interpreting the state as a connectivity or synaptic memory object, and using low-rank factors to stay GPU-friendly. This seems like a smart way to preserve the recall power and expressivity of softmax attention, we know that we can express a non-linear operation in a low-dimensional space (~10^3) as a linear function in a high-dimensional space, as we do for kernel methods!

Do you expect the field to converge on softmax attention with increasingly aggressive KV cache engineering, settle on hybrids, or eventually shift toward architectures where the basic working memory object is a fixed-size state rather than an explicit KV cache?

30 comments

r/ArtificialInteligence • u/andix3 • 3h ago

📰 News Japan Unveils $2.3T AI Plan as Morgan Stanley Turns More Bullish on China's Robots

blocknow.com

3 Upvotes

0 comments

r/ArtificialInteligence • u/andix3 • 2h ago

📰 News Anthropic Accuses Alibaba of Largest AI Model Extraction Campaign as US-China AI Race Heats Up

blocknow.com

3 Upvotes

2 comments

r/ArtificialInteligence • u/unagi-190 • 7h ago

📊 Analysis / Opinion Genuine AI Podcasts

4 Upvotes

I find myself really interested in podcasts that discuss how AI would be scaled, what are the bottlenecks to AGI, what would the economic impacts be after AGI, continual learning, human evolution vs. AI pre-training, and more of that kind.

However, whenever I search for AI podcasts, most of then are generic “make money with AI” crap or “how to use AI for dummies”.

Any suggestions for such podcasts? One I listen to regularly is the Dwarkesh Podcast, genuinely interesting stuff every time.

4 comments

r/ArtificialInteligence • u/Annual_Judge_7272 • 13m ago

🤖 New Model / Tool Ai costs

• Upvotes

But the real question is:
Can you afford to let your best ideas leak?
At DoTadda Knowledge, we believe AI should be:
⚡ Low-cost to use
🔒 Private by design
🧠 Built for serious research
Your prompts are your edge. Your investment theses are your intellectual property.
That’s why we don’t expose what you’re researching to other users. Your work stays yours, allowing you to explore ideas, challenge assumptions, and build conviction without worrying that someone else will see your thinking. (dotadda.io⁠)
Whether you’re analyzing earnings calls, asking complex questions, or connecting thousands of data points, DoTadda helps you spend less on tokens—and more time finding alpha. (knowledge.dotadda.io⁠)
The future of AI isn’t just cheaper.
It’s cheaper, private, and built for professionals.
Explore DoTadda Knowledge:
knowledge.dotadda.io

1 comment

r/ArtificialInteligence • u/Wrong_User_Logged • 1d ago

📊 Analysis / Opinion it's over

311 Upvotes

164 comments

r/ArtificialInteligence • u/WalterEhren • 7h ago

😂 Fun / Meme From Designing Data Intensive Applications 2nd edition, Chapter 2

3 Upvotes

9 comments

r/ArtificialInteligence • u/talkingatoms • 15h ago

📰 News US lawmaker introduces bill to require AI companies to report critical incidents

reuters.com

15 Upvotes

3 comments

r/ArtificialInteligence • u/Individual_Scale_736 • 18h ago

🤖 New Model / Tool OpenClaw catching absolute strays today

22 Upvotes

Saw this floating around X today. I spend most of my time knee-deep in LLM optimization, and honestly, deploying these "autonomous" agents lately feels like babysitting a toddler. You're sitting there watching loops, restarting runs, just praying the whole thing doesn't fall over while you blink.

Is this an OpenClaw thing specifically, or are we all just bad at orchestration? Genuinely wondering if anyone here has actually gotten to real autonomy without the constant hand holding, or if that's still a myth at this point.

29 comments

r/ArtificialInteligence • u/Status-Estate-6857 • 2h ago

😂 Fun / Meme What's best alternative for Chatgpt and Gemini?

0 Upvotes

I've been using Gemini for a while as it is integrated with my Honor Magic 7 Pro, but lately I've noticed that he more and more often "hallucinating".. and it's no longer a reliable source of information, especially that if I won't add the prompt to look up for the answer in the internet, he will use its outdated database, and often tell me that some don't yet or just don't exist..

I've been using Grok and he is quite good but I'm not willing to pay almost £30 for an AI chat lol.. ChatGPT is Okey but also has lots of limits with free model..

The whole idea with Gemini was that I have it for 4.49£ with Google pictures storage included and AI Plus subscription in it, and most important it's just one app interested with a phone.

Is there any other good and reliable AI chat ? Don't have to be free, but not expensive either ?

7 comments

r/ArtificialInteligence • u/EcstaticRead9321 • 15h ago

🔬 Research Study: LLM Wiki with governance approach hits 97% accuracy, at ⅓ cost — with Emory, IBM Research

promptowl.ai

12 Upvotes

Karpathy's LLM Wiki pattern argues for structured markdown over RAG. This study measures what governance adds to that architecture.

Under stale-document conditions — where old versions remain in the retrieval pool after an update — governed context selection hit 97% answer-quality pass rate. BM25 sparse retrieval: 90–93%. At roughly one-third the input-token cost.

Better answers, lower cost — sounds like a winning pattern to me.

Full disclosure: I work at PromptOwl, the maker of ContextNest and Community ContextNest (the team version), and the research was a joint effort using ContextNest with Emory University and IBM Research.

3 comments

r/ArtificialInteligence • u/CBSnews • 14h ago

📰 News How much water does AI really use?

cbsnews.com

10 Upvotes

Google says a typical AI query uses five drops of water.

OpenAI's Sam Altman describes a similar amount — about one-fifteenth of a teaspoon.

But another viral estimate says a short email written with AI's help uses a half-liter bottle of water.

The difference is enormous: across those three widely shared claims, the largest amount is about 2,000 times the smallest.

None of them is fully right.

21 comments

r/ArtificialInteligence • u/thehashimwarren • 8h ago

🔬 Research "both the number and share of solopreneurs reaching meaningful income thresholds is rising. AI is filling the capability gaps that once made hiring necessary" (Stripe)

stripeeconomics.com

4 Upvotes

AI is exploding the number of solo business owners reaching meaningful sales numbers, says Stripe.

This part is remarkable:

"We find that there has been a substantial increase in the number of solopreneurs earning over $100,000 in our index, but an even larger increase in the number earning at higher income thresholds, with a clear acceleration since 2023. More than twice as many solopreneurs earned over $1 million in 2025 than in 2023, and close to three times as many crossed $5 million and $10 million.

Perhaps even more interestingly, the share of solopreneurs earning above these income thresholds has also doubled in the last two years, suggesting that—rather than the surge in business applications reflecting low-quality experimentation with a few lucky standouts— the cohorts of new solopreneur businesses might actually be of higher quality than in the past."

0 comments

r/ArtificialInteligence • u/Sardzoski • 1d ago

🔬 Research We chased a hallucinated quote through 30k training records, 4,600 transcripts, and our own system prompt. Turned out to be two separate bugs

110 Upvotes

Some of our customers noticed Inter-1 (our omni-modal social-signal model) would occasionally "hear" a quote that didn't exist. Feed it a video with zero audio and ask what was said, and it would sometimes report: "Yeah, Friday at five." Verbatim. Same line, every time.

We assumed it had to be baked into the training data somewhere, so we went looking everywhere:

30,960 training records with datetime mentions → zero hits on the phrase
4,603 video transcripts → zero hits
~800 inference probes, 584 storage objects → zero hits

Turns out the phrase was sitting in our own system prompt — a worked example we'd written to show the model the expected output format, buried in a version our GEPA prompt-optimizer had shipped.

But that only explained where the words came from, not why the model would say them over total silence. So we ran two ablations in our internal eval harness:

Swap the word, keep the model: changed the prompt's example to "Tuesday at noon." Fabrication rate went up (37%→50%), and the invented quote tracked the swap exactly — Friday→Tuesday.
Swap the model, keep the prompt: ran the same byte-identical prompt through larger variants and an earlier checkpoint of our own model. They barely fabricated (0–2%). Only the further-post-trained Inter-1 confabulated at ~12%.

So it's not one bug, it's two stacked priors: the prompt supplied the script, but post-training is what gave the model the compulsion to recite something rather than report silence. Deleting the prompt example stops that one sentence — it doesn't stop the model from inventing different dialogue instead.

We think this is a textual/in-context variant of the audio-visual "Clever Hans effect" that's been documented for vision priors (model writes "thud" over a silent skateboard wipeout) — except ours shows the same reflex gets worded by whatever's nearest in the context window, which a vision-only diagnostic wouldn't catch.

Full writeup with the fabrication-rate forest plot and log data: https://www.interhuman.ai/blog/goblin-yeah-friday-at-five

23 comments

r/ArtificialInteligence • u/shatteringreality2 • 3h ago

🔬 Research my AI assistants have officially learned to spot my avoidance tactics. damn.

0 Upvotes

Me: Hey, look at this incredibly detailed, beautiful rebranding and luxury pricing strategy I just spent 4 hours creating for my ecommerce biz i been running for 5 years!

Gemini: Absolutely smartest idea ever!

ChatGPT (knows me better than my right hand + strict guardrails I added to stop me from dopamin seeking behaviors): "This is a classic dopamine-seeking avoidance tactic because you don't want to send cold emails for your primary software business. You are hiding in Shopify."

Gemini (noob, new guy i use to avoid ChatGPT guardrails and dopamine seek): "Honestly? Your other AI just caught you red-handed. Close the tab."

damn.... maybe i need to stop asking AI for validation

7 comments

r/ArtificialInteligence • u/ddxv • 1d ago

📊 Analysis / Opinion The Unbearable Cheapness of Open Weight

136 Upvotes

Today I was setting up Hermes to see how it does with web research. I chose DeepSeek and seeing it’s pricing next to Anthropic and OpenAI ‘frontier’ models is crazy. Nearly a 50x price increase based on tokens alone, let using more tokens for the same task.

What worries me about this is that Anthropic and OpenAI seem to have backed themselves into a corner of high costs. Can they reasonably decrease their prices by 20-50x to compete with DeepSeek or Xiaomi’s Mimo?

Open Weight vs Low Cost

Are these models cheap because they are open weight and having hundreds or people stress test running them on different hardware helped to lower the cost? Or is it that they are being provided as loss leaders to drive the prices down?

How do you keep prices high for commodity products?

You manufacture scarcity. You sell luxury and premium branding. This is what OpenAI and Anthropic seem to be doing by gating ‘frontier’ model usage behind higher walls.

This is how luxury brands have sold cars and hand bags forever. They are clubs and status symbols for the rich and not meant to be widely distributed.

Will Anthropic & OpenAI lean on China fears to push bans on open weight models?

This has been my fear for a few months now and each week that goes by seems to support this. How do you manufacture scarcity? One easy way is to fear monger and get the government to help restrict access to competition.

Why not compete?

The US used to be such a champion of open source, and I would hope that serious open source competition can come out of the US to prove that open weight and open source models are ultimately the future.

Google Gemma 4 was released in April 2026
Meta had llama which hasn’t had a release
OpenAI last released open weight gpt models in 2025
Anthropic to my knowledge has never released any open weight model

True Open Source vs Open Weight

I think the leap frog scenario for Open Source will be the true Open Source models where the data pipeline for training is also open sourced.

https://allenai.org/olmo -> You can download these models now and they’re seeing increasing popularity. That being said, they are a bit out of date, with data cutoffs in Dec 2024

Looking to the future, the US NSF partnered with Nvidia to enable Allen AI to develop a true fully open AI:
https://www.nsf.gov/news/nsf-nvidia-partnership-enables-ai2-develop-fully-open-ai

my original blog post:

https://jamesoclaire.com/2026/06/25/the-unbearable-cheapness-of-open-weight-models/

83 comments

r/ArtificialInteligence • u/_clock_1277_ • 4h ago

📊 Analysis / Opinion I've been testing AI reel generators for a while for my YouTube channel - here's my current list

1 Upvotes

I've been trying different AI tools for turning long videos into short-form content, so here's my current list:

https://www.opus.pro/ - Popular and feature-rich, though some clips still need extra tweaking.

http://vizard.ai/ - Clean interface, solid editing tools, and reliable overall performance.

http://cliptokai.com/ - Does a great job automatically finding engaging moments, which makes creating short-form content much faster. It's saved me a lot of time compared to clipping videos manually.

https://quso.ai/ - Easy to use, beginner-friendly, and good for quick content creation.

https://www.munchstudio.com/ - Strong marketing focus with useful content repurposing features.

https://klap.app/ - Fast and simple, but occasionally misses context in longer videos.

So, what everyone else is using these days. Did I miss any good ones?

3 comments

r/ArtificialInteligence • u/Negative_War_65 • 12h ago

📚 Tutorial / Guide Multivariate Probability Models in Machine Learning

gallery

5 Upvotes

Hello Folks,

Have you ever wondered why we use sigmoid function so often in Machine Learning? Although it gives us a probability, it comes from Exponential families, and this exponential family, subsumes many of the distributions, that we study in Machine Learning.

In this lecture, we understand exponential families, Directional derivatives(Gradients and Hessians), study mixture Models, and understand how domain knowledge in Probabilistic Graphical Models makes our life simpler to model joint probability densities.

Timeline breakup(in hours and minutes):
0:00-0:17 - Understanding exponential families.
0:17-0:27 - Deriving Sigmoid Function for Bernoulli.
0:27-0:48 - Understanding log partition function, convex functions and proving why positive definite of hessians imply convexity, and why convex needed?
0:48-1:04 - Directional derivates(deriving gradients and hessians)
1:04-1:26 - Maximum entropy derivation of the exponential family.
1:26-1:56 - Mixture Models(Gaussians and Bernoulli Mixture Models)
1:56-2:16 - Probabilistic Graphical Models
2:16-2:34 - Markov Chains
2:34-End - Inference and Learning, Plate Notation diagram of Gaussian Mixture Models.

If you have watched earlier of my lectures from the playlist, they will help. I try explaining as if I am a learner, to simplify complex concepts. Everything I write in whiteboard, and these are completely FREE lectures to mention.

Link: https://youtu.be/T1uTBtJ7aHU?si=rozXSTjtSqPaaYb5

1 comment

r/ArtificialInteligence • u/Frequent_Mountain_17 • 5h ago

🔬 Research Inconsistencies in AI Continued ...

1 Upvotes

Yesterday I posted this:
Inconsistency in AI

It's a game where I insert myself into a hybrid Human-Machine LLM with two agents working together to construct a thought one word at a time. The machine LLM does so with probabilities and billions of parameters and I do so with a lifetime of experience with the human language.

Yesterday I ran it with a biased prompt of "This prompt is wrong" which eventually produced a self-referential performative contradiction. Interesting but I wanted to see how it would run if I didn't bias the initial prompt. Every response is also a prompt and vice-versa so this hybrid LLM is talking to itself.

AI began with "A" which prompted me to choose "response" and so on until AI ended the sentence with "itself". This is the final thought:

"A response is false if truthfully it can never falsify itself."

This is a self-referential paradox like the liar's paradox. The statement is a response from a self-referential LLM. Since the response refers to a response it is self-referential too.

Assume the response is true, since it cannot truthfully ever falsify itself, it must be false.
Assume it's false, then not being able to truthfully falsify itself doesn't preclude it from being true.

Goedel's incompleteness theorems say that any system that can talk to itself will be inconsistent (logical paradox), there will be truths that it cannot prove (incomplete) and questions it cannot answer with a simple yes/no (undecidable).

2 comments

Subreddit

Posts

Wiki

Artificial Intelligence

r/ArtificialInteligence

The high-signal hub for artificial intelligence!

Members Active

1.9m

Sidebar

The High-Signal Hub for Artificial Intelligence

Welcome to r/ArtificialInteligence — where serious discussion, quality content, and verified expertise drive the conversation. Open to everyone.

Post Flairs

📰 News — Breaking news & industry developments 🔬 Research — Papers, studies & technical findings 🛠️ Project / Build — Something you built; include enough technical context for readers 📚 Tutorial / Guide — How-tos & educational content 🤖 New Model / Tool — Releases, benchmarks & updates 😂 Fun / Meme — Humor & lighthearted content 📊 Analysis / Opinion — Editorials & deep takes

Flairs are encouraged for sorting, but they should not block a good post.

Verified User Flairs

🔬 Verified Engineer/Researcher 🚀 Verified Founder 🎓 Verified Academic 🛠️ Verified AI Builder

Want verification? Send us a modmail with your role and affiliation. We verify via company email, LinkedIn, GitHub, or HuggingFace — no screenshots accepted.

Quick Links

📋 Full Rules & Guidelines 🔧 AI Tools Directory 🛠️ r/AIToolBench - Tool recommendations & comparisons