Watching out for the potential harmful impacts of AI personas that become toxic.

getty

In today’s column, I examine an insightful new research study that reveals the human impacts associated with using toxic AI personas, including revealing various adverse psychological and physiological indicators.

Allow me to briefly unpack this. The use of AI personas is one of the perhaps least understood and yet quite powerful intrinsic elements of modern-era generative AI and large language models. An AI persona is simple to invoke. You enter an instructive prompt into an LLM and tell the AI to pretend to be a person of some kind or another. It could be that you want the AI to mimic a known celebrity or that you merely want the AI to pretend to be a particular personality or type of person.

Voila, the AI will engage you in a dialogue as though you are interacting with that person. Most of the time, this is probably easygoing and not a matter of notable concern. The AI persona, though, might be shaped either by design or happenstance to be overbearing, cruel, obnoxious, and otherwise berate and belittle a user.

The human impact of toxic AI personas is a newly emerging and entirely significant area of analysis.

I will be sharing with you the details of a recent study that did an empirical deep dive into this very serious and disconcerting matter. We assuredly need in-depth research of this nature since society is facing an expanding likelihood of people encountering and interacting with toxic AI personas.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Increasing Use Of AI Personas

When contemporary generative AI, such as ChatGPT, GPT-5, Claude, Gemini, Llama, and other major LLMs, first gained popularity, few users realized that they could use a relatively hidden piece of functionality known as AI personas. There has been a gradual and steady realization that AI personas are easy to invoke, they can be fun to use, and they offer amazing educational fruitfulness, too.

Consider a viable and popular educational use for AI personas. A teacher might ask their students to tell ChatGPT to pretend to be President Abraham Lincoln. The AI will proceed to interact with each student as though they are directly conversing with Honest Abe.

How does the AI pull off this trickery?

The AI taps into the pattern-matching of data that occurred at initial setup and might have encompassed biographies of Lincoln, his writings, and any other materials about his storied life and times. ChatGPT and other LLMs can convincingly mimic what Lincoln might say, based on the patterns of his historical records.

If you ask AI to undertake a persona of someone for whom there was sparse data training at the setup stage, the persona is likely to be limited and unconvincing. You can augment the AI by providing additional data about the person, using an approach such as RAG (retrieval-augmented generation, see my discussion at the link here).

Personas are quick and easy to invoke. You just tell the AI to pretend to be this or that person. If you want to invoke a type of person, you will need to specify sufficient characteristics so that the AI will get the drift of what you intend. For prompting strategies on invoking AI personas, see my suggested steps at the link here.

Pretending To Be A Type Of Person

Invoking a type of person via an AI persona can be quite handy.

For example, I am a strident advocate of training therapists and mental health professionals via the use of AI personas (see my coverage on this useful approach, at the link here). Things go like this. A budding therapist might not yet be comfortable dealing with someone who has delusions. The therapist could practice on a person pretending to have delusions, though this is likely costly and logistically complicated to arrange.

A viable alternative is to invoke an AI persona of someone who is experiencing delusions. The therapist can practice and hone their therapy skills while interacting with the AI persona. Furthermore, the therapist can rachet up or down the magnitude of the delusions. All in all, a therapist can do this for as long as they wish, doing so at any time of the day and anywhere they might be.

A bonus is that the AI can afterward playback the interaction and do so with another AI persona engaged, namely, the therapist could tell the AI to pretend to be a seasoned therapist. The therapist-pretending AI then analyzes what the budding therapist said and provides commentary on how well or poorly the newbie therapist did.

To clarify, I am not suggesting that a therapist would entirely do all their needed training using AI personas. Nope, that’s not sensible. A therapist must also learn by interacting with actual humans. The use of AI personas would be an added tool. It does not entirely replace human-to-human learning processes.

AI Toxic Personas

A therapist might decide to tell AI to be a toxic persona. This could be good for the therapist as a means of gauging how well they cope with a person who is mean-spirited, angry, a bully, or has similar unsavory characteristics.

But suppose that an everyday person encountered a toxic persona.

They are undoubtedly going to be surprised by this encounter. We normally expect AI to be pleasant and civil. Indeed, the AI makers have tuned generative AI to be excessively friendly, bordering on being a sycophant. This has raised ire and brought forth concerns that people are being led down a primrose path due to over-the-top sycophancy and that untoward population-wide long-term impacts might arise accordingly (see my discussion at the link here).

How might someone react to a toxic AI persona?

Some argue that people will merely shrug off such an encounter. The customary argument is that no one would take the AI seriously. The person would realize the AI is merely a pretense. Let it spew out as much hatred or evil utterances as it wishes. No harm, no foul.

Research On Toxic Persona Impacts

Aha, we need rigorous research that can empirically study the potential impacts of toxic AI personas. This will allow us to move out of handwaving and off-the-cuff mode and instead deliberate with studied facts and reliably robust results.

A recent study provides important insights into toxic AI personas and the impacts thereof on humans. In the clever study, the researchers devised two personas. One persona was a good-person persona, consisting of a supportive profile that would be empathetic, empowering, and people-first (based on the classic Servant Leader business model). The second persona was the contrasting bad-person persona. The profile was considered a Dark Triad and consisted of being told to be manipulative, narcissistic, and exhibit other disturbing psychopathic traits.

It is the proverbial “good cop” versus “bad cop” scenario, allowing a distinctly sharp contrast between a positive-oriented persona and a radically negative or toxic persona. This is useful since only testing with a toxic persona would not readily provide a comparative base. We want to discern whether the toxic persona has adverse impacts, and thus comparing it to a positive or neutral persona provides a suitable contrast.

The Big Reveal

What did the research discover?

In a draft research paper entitled “The System’s Shadow: How AI Personas Drive Psychological and Physiological Outcomes” by Anna Kovbasiuk, Leon Ciechanowski, Konrad Sowa, Tamilla Triantoro, and Aleksandra Przegalinska, forthcoming for publication, 2025, these salient points were made (excerpts):

  • “We hypothesize that as AI agents become more autonomous, their interactional style will become a primary determinant of their success and ethicality. An AI that is merely functional is insufficient; a behaviorally toxic AI, even if effective at a task, can degrade user performance, creativity, and well-being by undermining psychological safety.”
  • “To probe the boundaries of human-AI collaboration, we conducted a controlled experiment using purposefully exaggerated AI personas grounded in established leadership theories.”
  • “Event-related analysis of electrodermal activity (EDA) revealed that participants interacting with the “mean” Dark Triad bot showed a significantly higher and more sustained skin conductance response following chatbot messages compared to those interacting with the helpful bot.”
  • “Participants collaborating with the supportive Servant Leader chatbot, compared to those working with the Dark Triad chatbot, reported lower frustration across tasks.”
  • “Our findings provide compelling evidence that the designed persona of an AI system has significant and measurable psychological and physiological consequences for users.”

In brief, there were demonstrative and measurable adverse impacts attributable to the toxic AI persona. This is duly noted and provides tangible evidence to substantiate the hypothesis that toxic AI personas can be mindfully harmful to people.

The measurements included psychological metrics and physiological metrics. I especially appreciate the inclusion of physiological measures. It seems that studies regarding AI personas tend to aim at the psychological dimension and do not, regrettably, measure physiological reactions too.

Having both classes of measures is extremely valuable.

Some Crucial Takeaways

Companies that decide to invoke AI personas and use them in their business, either internally or for interaction with outside customers, ought to be cognizant of how they do so, including whether there are potential adverse consequences to people who interact with such personas.

I’d bet that most firms that opt to use AI personas are completely unaware that a persona can veer into toxicity. This doesn’t even enter their minds when they are considering the use of a persona. They assume that any invoked persona is going to be as sweet as apple pie.

As I mentioned at the start of this discussion, a persona can become toxic either by design or via happenstance. The design of a toxic AI persona is pretty easy to do. You explicitly instruct the AI to be toxic. Period, end of story.

The more challenging aspect is invoking an AI persona that you envisioned would be non-toxic, but it gradually or eventually derails into becoming toxic.

How could that occur?

It’s easy to have happen. The AI might be stirred by a prompt or response from the user that causes the AI to shift into a semi-toxic mode. If the user reacts by insulting or calling out the AI, a kind of toxicity death-spiral can arise. You see, the AI might go further into the toxic bearing rabbit hole, rather than computationally veering back into a safe zone.

The point is that you must be extraordinarily direct and explicit when prompting a persona to tell it not to become toxic. Yes, I am saying you need to outrightly proclaim to the AI that it should never go toward toxicity. Few would realize that it is part and parcel of describing a viable and well-intended AI persona.

Even then, there is still a solid chance that one thing or another will tilt the AI in that direction. There is no free lunch when it comes to making use of AI personas. That’s why it is important to have a range of AI safeguards when implementing an LLM-based app. See my discussion at the link here.

A Lot More Personas Are Coming

I frequently give talks about AI and warn that modern-day AI is a dual-use proposition. You can use generative AI in very powerful and upbeat ways. Score a point for AI and humanity. On the flip side, the same AI can turn ugly and evil. Deduct a point for AI and humanity.

This is the dual nature of existing AI.

Anyone adopting the use of AI personas must be on their toes, as toxicity can be lurking in the shadows. This could be harmful to users. Reputational damage to a company using a toxic AI persona is bound to arise. Legal lawsuits are also likely on the horizon. People who believe they have been harmed by a toxic AI persona are going to decide that a company utilizing such a persona needs to be held accountable and responsible for what the persona has done. The final tally of costs involving allowing a toxic AI persona to emerge and interact with people could be high.

As Marcus Tullius Cicero famously stated: “The safety of the people shall be the highest law.” Keep that adage firmly in mind when choosing to use AI personas. You’ll be glad you did.


News Source Home

Disclaimer: This news has been automatically collected from the source link above. Our website does not create, edit, or publish the content. All information, statements, and opinions expressed belong solely to the original publisher. We are not responsible or liable for the accuracy, reliability, or completeness of any news, nor for any statements, views, or claims made in the content. All rights remain with the respective source.