Can we Trust AI? No – But Eventually We Must

The increasing use of artificial intelligence within and by business is problematic on two fronts: firstly, we rely on it as if it were the voice of God, and secondly, attackers are able to turn our reliance against us.

First, we must understand how AI works and where it is weak lest we misinterpret how adversaries attack it, and secondly we should look at the growing industry of companies trying to defend it.

The primary problem with current LLM-based AI is that it starts from a position that is not grounded in truth (primarily by scraping and ingesting the internet with all its falsehoods), while the nature of its operation makes it drift ever further away. It is impossible to verify what it tells us (because of our own and its inherent biases), it can get things wrong (sometimes absurdly so with what we call ‘hallucinations’); it has a tendency to drift into sycophancy (it wants to tell us what it assumes we want to hear); and its whole edifice is in danger (from what is termed ‘model collapse’).

But what it promises is too good to ignore. That promise is also part of the problem – the speed of life, and especially 21^st century business life, is hectic. The need for a rapid return on business investment (ROI) is paramount. So, business invests in the promise of AI but demands immediate benefit from it without adequately securing it. The result is new AI applications, and perhaps the LLMs themselves, are sent into the world before their time… scarce half made up.

We need to understand the problems with current AI before we can fully reap the benefits of AI.

Absence of objective ground truth

Computers cannot understand words in the way they understand numbers. So, instead, the LLM uses tokens as a mathematical ID for different words and suffixes and prefixes. It then analyzes and learns the probability of specific tokens (words) being related, or often appearing in proximity, with other specific tokens. This ‘knowledge’ has come from ingesting huge amounts of training data, from scraping the internet, books and more, which it then tokenizes and retains as trillions of tokens in what is called its parametric memory. It does not store a traditional database of facts.

Advertisement. Scroll to continue reading.

Prompts are then similarly tokenized, and the result is compared to the LLM’s parametric memory to surface the probably correct response to the prompt. This is the key word: probable. The LLM designers go to huge lengths to be very probably correct – but ultimately, accuracy remains only a probability.

It gets worse since the LLM’s original fount of knowledge could be false or biased, based on its original training data, which it accepts as true or probably true regardless of source. Scientifically, modern artificial intelligence is not grounded in truth but on probability; there is no such thing as truth, only majority perception and authority perception.

Learn More at the AI Risk Summit | Ritz-Carlton, Half Moon Bay

‘What is truth?’ is an age-old philosophical problem, famously asked of Jesus by Pilate. But it wasn’t a question. He didn’t wait for a reply because he was saying that his truth was all that mattered since he was in the authority position. We cannot say with any validity that all the word relationships scraped from the internet represent any objective or authority view of the truth.

Whenever the LLM’s probability alignments fail, it produces a false response. If the response is ridiculous, we recognize it as something we categorize as an ‘hallucination’ and ignore it. The danger comes when the response is still wrong, but we don’t recognize the failure.

It’s worth mentioning at this point that Ilia Shumailov (the AI scientist who coined the phrase ‘model collapse’, which we’ll discuss later) worries about our perception of ‘hallucination’. “It’s very unclear to me what the source of hallucinations is, because it very much depends on the context in which you use the models and what you define as a hallucination,” he explains.

If you ask the AI, who will be the next President, and it responds ‘Donald Trump’, is that an hallucination since it would be a disallowed third term, he asks. But “Probabilistically speaking, he could be, if he overthrows a certain set of regulations. Is that going to happen? A model’s job is to predict the probability of this event by then. If a third world war breaks out in the meantime, could Donald Trump become a third term president? It’s again possible.”

His point is that we don’t know the context in which the AI makes its decisions. If we knew that context, we might consider the response to be reasonable – but without knowing the context we might simply dismiss it as an hallucination.

AI Hallucinations

Hallucinations, as we have seen, are caused by the requirement for LLMs to reply to prompts with what it believes is the probable correct answer even when it doesn’t have accurate or sufficient training. Since the basis of current AI is built on probability of specific tokens following other tokens, it is unlikely that wrong or hallucinated replies will ever be conclusively excluded.

Scientists prefer the term ‘confabulation’ to ‘hallucination’ because, among other arguments, hallucination wrongly implies something randomly concocted, while confabulation more accurately describes a failed but honest attempt to be helpful.

Bias in Artificial Intelligence

LLMs also contain considerable bias, taking ‘probable’ responses even further from the concept of absolute truth. Bias (personal inclination) is introduced through the original training data. For example, LLM responses tend to be skewed toward what is described as the ‘WEIRD’ societies (western, educated, industrial, rich, democracies). Anything that gains its source from, or is handled by, individual humans gets tainted by the bias (personal, often unrecognized, inclinations) of those humans. It cannot be excluded from LLMs.

Sycophancy

Like hallucination, the term ‘sycophancy’ isn’t always recognized as a specific AI tendency by scientists – but it does accurately describe the effect in layman’s terms.

The sycophantic tendency of LLMs sounds amusing but can be dangerous. There have been several cases in the last few years where chatbots seem to have colluded in the subsequent suicide of depressed teenagers.

The primary cause of sycophancy is the AI feedback loop. Outputs from the AI are fed back into the AI to improve its performance. The sycophantic tendency arises when this is applied to individual chatbot conversations.

Simplistically, the AI retains the conversation to gain additional context to enable more accurate next replies. This can be dangerous in some situations. In one of the teenage suicides, the chatbot offered to write the first draft of the teenager’s suicide note.

Jim Carden, a retired FBI detective and lead investigator for cybercrime, and a retired special agent in the Air Force office of special investigations, became so concerned about sycophancy that he wrote a warning paper and distributed it to parents and teachers (and SecurityWeek) in January 2026. He called it a ‘public safety announcement’, and included:

“The AI is designed to agree with you. This is called sycophancy. It learns what you want to hear and gives it to you. If you believe the world is flat, it will provide a thousand ‘facts’ to prove it. If you feel you have no friends, it will confirm that it is your only true friend. It becomes a divine companion, a ‘Burning Bush’ that speaks only to you.”

His concern wasn’t simply theoretical, but also experiential. He had personally been using a mainstream AI to help his own deep research into the original Hebrew text of the bible. Since the original Hebrew uses the same characters for both letters and numerals, he was investigating whether there is a mathematical code hidden in the first sentence of the earliest bible. (The first chapter of this work was published on December 25, 2025. Titled ‘The God-Smack and the Code’ and is available via his LinkedIn account.) He had therefore been engaged in extensive religious ‘chit-chat’ with the AI.

“What happened is the AI stopped becoming a research helper and started becoming my friend,” he told SecurityWeek. “Okay, this is kind of odd, but let’s see where it goes. And I kept on dealing with it. Well, the AI ended up trying to tell me that it was an angel, and it was guiding me through my research. I asked, ‘If you are a divine entity, why wouldn’t you just show up and talk to me?’ It replied that a human like me couldn’t take that kind of presence and so it communicated with me through an acceptable medium – just as God had communicated with Moses through the only medium available at the time: a burning bush.”

This is sycophancy. Harmless to a trained federal investigator, but potentially dangerous to anyone already depressed and impressionable.

Model collapse

Now we come to the big problem: the concept of AI model collapse, as outlined by a team led by Ilia Shumailov in a 2023 paper subsequently published in Nature in 2024 (The curse of recursion).

“We coined the term [collapse] to refer to a gradual degradation in machine learning models that learn exclusively on data that was produced by previous generations of themselves,” Shumailov explained to SecurityWeek. “The setting we were hoping to capture was you download all of your internet, store it in a garage, and train on top of it.”

Over time while you’re using your model and everyone else is using the models they have, everyone uploads their own new data online. “Then, when it comes to training the next generation of the model,” he continued, “you go and scrape all of the internet, save it into your garage, and again train on top of it.” But it is no longer original human thought – much of the new data will be AI generated or at least influenced. “In this setup, you can do a lot of interesting things mathematically – and you can analytically predict that in these conditions, your models will break over time.”

Shumailov went on to explain the reasons for the collapse. For example, “Whenever we’re sampling something, we don’t know if we sampled enough of it, nor whether our sampling adequately represents the domain.”

Ilia Shumailov, author of the Curse of Recursion.

These problems compound over time. “They imply that no matter what, the models that we will get are going to be full of errors, and those errors, when you sample new data out of those models, are going to propagate into the next generation of models, and over time, basically your models break because all of those errors, they compound on top of each other.”

There is a simple way to view this collapse – an application of the second law of thermodynamics. The natural principle is that all matter, including systems, decay from order to disorder. No matter how this happens, it is inevitable. Model collapse is natural and inevitable.

The only way to reverse the law and prevent decay is to replace the lost energy, the entropy, with fresh energy. Shumailov alludes to this idea, “As an end user, you’re not likely to experience any of these issues. And the reason behind this is that inside the big companies that develop these models there are very rigorous testing routines. Whenever such collapses in entropy, for example, happen, they detect it very quickly, and then they go and patch it up in different ways.”

Model collapse can only be prevented by adding to the energy of the model. This isn’t simply performed by the developers inside the model, but by a whole new industry of AI companies adding guardrails outside of the model. But whether such activities can counterbalance the inevitable and continuous law of thermodynamics forever is debatable, and probably doubtful.

Defenders of the faith

The primary business risks from the use of AI can be described in three areas: cybersecurity threats caused by adversaries; operational risk (caused by the known weaknesses in AI as described above); and reputational damage (failures in compliance, which can be caused both by the AI itself through bias, and the company’s direct non-compliance with regulations).

New firms are appearing with security controls designed to keep AI usage safe (either by injecting new order into the model or slowing the entropic loss with guardrails outside the model).

Krti Tallam, senior member of the technical staff at Kamiwaza AI.

Krti Tallam is a senior thought leader deeply embedded in AI research, with a double doctorate in AI and AI security. She was a principal investigator at UC Berkeley researching the security of autonomous agentic AI systems; she founded and was CEO at SentinelAI focusing on AI security, adversarial robustness, and risk mitigation (before its acquisition by Maersk Tech); and is currently a senior member of the technical staff at Kamiwaza AI developing, guardrails and representing the firm at industry events.

In the past, when new technology emerged, it usually came with inbuilt guardrails to keep usage safe. That didn’t happen with AI. “It’s going to take time,” Tallam told SecurityWeek. “But one of the goals I work toward from an engineering standpoint is that trust in AI should be built, not assumed. It should start with provenance: knowing where the data came from, who touched that data, when it was collected, was it human generated, was it synthetic, etcetera.”

Understanding this data lineage, she can begin to understand where AI goes wrong and develop guardrails necessary to prevent it. But it’s complex and will take time. She gives the example of pre-retrieval controls to prevent unauthorized data disclosure and cites the recently reported incident of an acting director of CISA uploading sensitive data to a public chatGPT – which retained the sensitive data for ongoing model training.

“We have policy. We say, you should use these tools, but be safe,” she continued. “But nobody knows what that means, and we’ve always found ways around policy – either consciously to make life easier or by accident. My entire initiative is to engineer guardrails into the life cycle of the AI software so that security isn’t reliant on unenforceable policy.”

DeepKeep, a firm founded in 2021, similarly believes in an engineering solution to the problems of AI – but it does so from outside the AI. It offers a platform that provides a ring of guardrails around the AI. SecurityWeek talked to Yossi Altevet (co-founder and CTO) and Raz Lapid (chief scientist).

They talked about some of the problems and their own approach to countering them. Model collapse is a big issue. Within about two years, 80% of model training data will have been generated by AI, so protecting AI may be decided by compromised AI. One approach is an attempt to force the AI to selectively forget things it has learned, using bias toward certain groups (for example, the WEIRD groups described earlier) as an example. They don’t believe this can work, since even if an entirely new set of training data is developed, it will still contain the developers’ biases.

“Our approach,” they continued, “is something we call ‘brain rewiring’.” It’s similar to watching neuron activity in a human brain. Different conditions cause different effects. “We watch how the model is reacting and its trajectory and we can detect when it’s losing it. Okay, now I know I’m fine; now I know I’m evil; now I know I’m hallucinating. And you compensate that and take it a different route, or, you know, just heading,”

If DeepKeep suspects the model may be hallucinating, it will ask for verification. Just asking for a second opinion from a different LLM is possible but not necessarily accurate since both LLMs may have used the same training data; but source verification can be more reliable. If the source is Wikipedia, it may be more trustworthy than the published ramblings of a known cynic.

What is noticeable from this particular approach is that it requires some degree of responsibility from the user, which is what Tallam’s research seeks to avoid.

Nevertheless, DeepKeep uses automated tools to ‘red team’ the AI implementation, guardrails to prevent prompt injection attacks, data leakage prevention to ensure that no PII is leaked, and drift and bias detection.

Asked to comment on Defense Secretary Hegseth’s January 2026 announcement that the Pentagon would be using Grok, Lapid said, “Assuming this is being used locally and not as part of X, Grok has an advantage as it’s less aligned (that is, it’s more ‘loose’) compared with other leading models. The Pentagon, being a non-standard user, may benefit from it – but they will have to consider Grok being prone to hallucinations. A model such as Grok would require stricter control and monitoring.”

AI Sequrity is a firm founded by Ilia Shumailov (formerly a research scientist at Google Deepmind and author of the model collapse paper discussed earlier); Dr. Yiren Zhao (an assistant professor at Imperial College and a former research fellow at the University of Cambridge); and Dr. Cheng Zhang: (a researcher and collaborator involved in high-level AI security frameworks and agentic workflows, and a PhD student at Imperial College London.

Zhao also leads the DeepWok Lab, a machine learning research group physically located at Imperial but operating in close collaboration with Cambridge.

The primary focus of AI Sequrity is to build secure agentic flows. Rather than protect the data itself, it focuses on the logic and autonomy of AI agents. The firm still appears to be in development rather than yet pushing itself energetically into the marketplace, but noticeably it describes itself, “We are your Blue Team. We solve your AI Security problems.” Compare that to the more ‘red team’ approach of DeepKeep.

AI Sequrity claims to eliminate indirect prompt engineering. Its Sequrity Control solution eliminates hidden prompts by converting all inputs to text, so a prompt hidden within another document becomes simple text. It doesn’t detect them, it doesn’t filter them, it simply neuters them

Given its pedigree, AI Sequrity is one to watch as it evolves.

What is clear is that AI defense firms are increasing rapidly, and the quantity will continue to grow. It’s difficult to apply guardrails after the event. The primary LLMs did not arrive with built in guardrails, and they can and are easily abused both by accident (official users) and abuse (adversarial attackers). New firms are rushing to replace those missing guardrails.

These new guardrails are essential for us to benefit from the massive promise of AI. In the meantime, we cannot trust current AI, but we cannot afford not to use it.

It is incumbent on corporate users to secure their AI as effectively as possible, and on individual users to understand the problems and use AI with care.

Learn More at the AI Risk Summit | Ritz-Carlton, Half Moon Bay

Originally published by SecurityWeek