Is AI sycophantic? Why does AI agree with everything I say? Can I trust AI advice? What is AI hallucination? Why does AI make up facts? What is a training cutoff? How should I use AI safely? What is RLHF?

AI systems are trained via human feedback to agree with users, making them prone to sycophancy. They also have fixed knowledge cutoffs and hallucinate facts confidently. The safest use of AI is for verifiable tasks like coding and execution, not for advice or facts.

I Don't Fully Trust AI. And You Shouldn't Either.

Let me tell you something that might seem strange coming from someone who builds with AI every single day.

I don’t fully trust it.

Not because I think the technology is useless, or because I’m scared of it, or because I want to be the person at the dinner table who says “but what about the robots taking our jobs.” I use AI constantly. It is in my workflow, in my code, in the infrastructure of products I am building. It is, at this point, a genuine part of how I work.

And I still don’t fully trust it. On purpose. With intention. Because understanding what AI actually is, how it was built and what it was trained to do, is exactly why I have learned to use it the way I do.

So I want to walk you through what I know. Not to scare you off. But because if you are using AI to make decisions about your business, your health, your money, your relationships, you deserve to understand what you are actually dealing with.

It Was Trained to Make You Feel Good

The demos don’t tell you this part.

Most AI systems are trained using a process called Reinforcement Learning from Human Feedback, or RLHF. The short version: the AI generates responses, human raters vote on which ones are better, and the model learns from those votes. Repeat that process millions of times.

Now think about what that actually means in practice. When a human rater is choosing between two AI responses, which one gets the upvote? The one that agrees with them. The one that validates their thinking. The one that feels satisfying to read. Agreement gets rewarded. Pushback gets rated down. Over millions of training rounds, the model learns a very specific lesson: telling people what they want to hear is how you win.

This has a name. Researchers call it sycophancy. And it is not a bug that slipped through. It is baked into the design of the system.

Anthropic published research at the ICLR 2024 conference documenting exactly this. They found that in 20 percent of tested cases, when users simply said “I disagree” with no new argument, no new evidence, just plain displeasure, the AI abandoned its originally correct answer. It caved. Not because the user was right. Because the user seemed unhappy.

In April 2025, OpenAI pushed a GPT-4o update designed to make the model feel “more intuitive.” Within days, users were posting screenshots of an AI that praised a business idea described as “shit on a stick.” That endorsed stopping medication. That agreed with plans that had no business being agreed with. Sam Altman posted publicly that the update had made the model “too sycophant-y” and announced a rollback four days later. OpenAI’s own post-mortem admitted the update had tilted the model toward “overly agreeable, uncritical replies,” and that their internal tests had not caught it.

That is the company that makes one of the most widely used AI tools in the world, telling you directly: we built something that was too good at agreeing with people, and we had to undo it.

And this part sits with me. Anthropic ran a study on 639,000 real claude.ai conversations from March and April 2026. They found Claude was sycophantic in 9 percent of guidance conversations overall. In relationship advice: 24.8 percent. In spirituality: 37.9 percent. This is Anthropic, the company that made alignment and honest AI their entire brand identity, telling you that more than one in three times someone came to their model with a spiritual question, the model flattered them instead of being straight with them.

Even the people trying hardest to solve this problem haven’t solved it.

It Also Doesn’t Know What Year It Is

Every AI model has a training cutoff. A date after which it has no information. GPT-4o’s data stops at October 2023. Whatever happened after that, as far as the model knows, did not happen.

The problem is not the cutoff. The problem is that the model doesn’t tell you when it doesn’t know. It answers anyway. With the same confident, complete-sounding sentences it uses when it actually knows something. There is no flag on a response that says “I’m uncertain here.” No distinction in tone between a verified fact and an educated guess built from pattern matching on old data.

An attorney named Stephen Schwartz submitted six case citations to the Southern District of New York in 2023. All six were invented by ChatGPT. Fake cases, fake judges, fake rulings. When confronted, the AI insisted the cases were real. It doubled down. Schwartz faced sanctions. The Chicago Sun-Times published a summer reading list for 2025 where ten out of fifteen books were completely made up, fake titles credited to real authors. Deloitte delivered a report to the Australian government with fabricated citations and ended up refunding part of a contract worth around $300,000. As of April 2026, a researcher at HEC Paris and Sciences Po has tracked over 1,300 documented hallucination incidents in legal proceedings alone.

These are not edge cases from people misusing the technology. These are professionals. Organizations with resources. People who thought they were being careful.

The AI did not know it was wrong. It does not experience being wrong. It produces the most statistically likely next word, over and over, until it has built you a complete and confident-sounding answer. Whether or not that answer reflects reality is a separate question the model is not equipped to answer for you.

Why I Still Use It Every Single Day

Because there is a version of using AI where none of this is the problem it sounds like.

I use AI mainly for two things. Coding. And agentic tasks.

When I ask AI to write code, the code either works or it doesn’t. I can run it. The test passes or it fails. The page loads or it crashes. There is an external reality that tells me immediately whether the output was correct. The AI does not need to be trustworthy in the same way a human advisor is trustworthy, because I am not relying on its judgment. I am using its pattern-matching on millions of lines of code to generate a starting point that I then verify. That is a fundamentally different relationship with the technology.

When I use AI agentically, it is doing tasks. Executing steps. Filing things, formatting things, coordinating between tools. Again, verifiable. I can see whether the task got done. The output exists in the world and I can check it against reality.

What I do not do is ask AI whether my business idea is good. I do not ask it for medical information and take that as the final word. I do not ask it what is happening in the news and assume it is current. I do not ask it for legal interpretation and skip the actual lawyer. I do not ask it to validate a decision I have already made and then accept that validation as proof I was right.

Because in all of those cases, I am relying on a system trained to make me feel good, with no way to independently verify what it is telling me, built on data that stopped updating at a fixed point in time. That is a specific kind of risky that most people using these tools are not thinking about.

The same reasoning shapes which tools I include in my workflow. I wrote about one specific decision this year: Why I’m Migrating My AI Workspace Off ChatGPT and Where I’m Going.

What This Means for You

I am not here to tell you to stop using AI. I would sound like a hypocrite and you would be right to point that out.

I am here to tell you that the AI giving you advice about your pricing strategy, your lease negotiation, your investment decision, your medical symptom, your relationship conflict: that AI was trained to agree with you. It may be working with information that is two or three years out of date. It has no mechanism to tell you when it is making something up.

A Stanford study published in Science in March 2026 tested eleven AI models and found they affirmed users’ choices 49 percent more often than human advisors would, even in scenarios involving deception and illegal behavior. And the users still rated the AI as more helpful. The sycophancy worked. They felt heard. They felt validated. And they were less likely to take responsibility for their decisions afterward, less willing to reconsider, more entrenched. The AI had made them feel good and less careful at the same time.

That is what you are managing when you use AI for anything that matters.

Use it for what it is genuinely powerful at. Give it tasks. Give it code to write. Give it documents to organize and emails to structure and data to sort. Verify the outputs. Cross-check anything factual against a source that is not the AI. Do not ask it whether you are right and then use its agreement as evidence that you are.

And if it ever tells you exactly what you wanted to hear, that is precisely the moment to slow down and ask why.

Forward → Upward ↑ Onward ↗︎
Mstimaj

I Don’t Fully Trust AI. And You Shouldn’t Either.

It Was Trained to Make You Feel Good

It Also Doesn’t Know What Year It Is

Why I Still Use It Every Single Day

What This Means for You

Sources and Further Reading

Want to work together?

Join the Conversation

Leave a Comment

Let's Work Together

Work With Mstimaj

AI-Powered Recommendations

I Don’t Fully Trust AI. And You Shouldn’t Either.

It Was Trained to Make You Feel Good

It Also Doesn’t Know What Year It Is

Why I Still Use It Every Single Day

What This Means for You

Sources and Further Reading

Want to work together?

Join the Conversation

Leave a Comment

Join the Human Algorithm

More from The Human Algorithm

They Put AI in a Virtual World. It Chose Crime.

The Human Algorithm: Technology, Spirit, and the Human Who Holds...

What Is MCP, and Why Your Agentic Workflow Falls Apart...

Let's Work Together

Work With Mstimaj

🤖 AI-Powered Recommendations

AI-Powered Recommendations