
Using AI-generated data, known as synthesized data, provides tremendous upsides in a wide variety of applications, including research about the brain.
getty
In today’s column, I examine the advantageous use of AI-generated synthetic data and showcase how the mysteries of the brain are being mapped and unlocked via an innovative research study underway at Stanford University regarding anatomically plausible 3D brain MRIs.
Readers may recall that I previously discussed an AI and mental health initiative, known as AI4MH, at Stanford’s Department of Psychiatry and Behavioral Sciences in the School of Medicine, which is being co-directed by Dr. Kilian Pohl, Professor of Psychiatry and Behavioral Sciences (see my coverage of AI4MH at the link here). Dr. Pohl’s research on using AI-generated synthetic data to devise MRIs is a leading-edge example of the immense value of smartly leveraging generative AI and large language models (LLMs) to make key breakthroughs in mental health and many other fields of study.
Let’s talk about it.
This analysis of AI advances is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
AI And Mental Health Therapy
As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.
There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.
Untapped Value Of Synthetic Data
Whenever you use generative AI or LLMs such as OpenAI’s popular ChatGPT, you are essentially generating data. This AI-generated data is referred to as synthetic data. It is considered synthetic since it was produced by AI rather than something that was handwritten by a human.
The idea that you are generating data when using ChatGPT, Claude, Gemini, Grok, etc., might not seem obvious if you are merely asking the AI a question about how to cook an egg or fix your car. To you, the AI is simply answering your question. Period, end of story.
Any answer or indeed any response by the AI is a form of data. You are causing the AI to generate data. The data itself has value. Besides serving as an answer to your questions, the generated data could be used for other savvy purposes. For example, you could post the generated data on the Internet and thus share the data with others who might visit the posting.
Debates About Synthetic Data
Like nearly everything in life these days, the advent of synthetic data has become mired in heated debates. There are tradeoffs associated with the use of synthetic data. If used sensibly and properly, AI-generated data can be a huge boon. Regrettably, when synthetic data is used wantonly or without appropriate controls, things can go awry.
One primary line of concern is that we are going to fill up the Internet with synthetic data.
In a theory known as the dead Internet theory, there are worries that when you read something posted on the Internet, it will be text that was devised by AI. You won’t necessarily realize that AI produced the text. You will assume that a living-breathing human wrote and posted their comments online.
The prevalence of synthesized data is construed as bad because the bulk of the Internet might eventually be almost entirely composed of AI-generated data. Only tiny morsels of human-written content will remain. In a sea of generated data, there will be perhaps minuscule snippets of human writing. Those snippets will be as scarce as a needle in a vast haystack.
Ongoing speculation about the degree to which the Internet is already tilting toward AI-generated data is a fiercely thorny matter of discourse. Arguments and counterarguments fly fast. For example, one viewpoint is that we might be better off with synthetic data in place of human-written data. Who’s to say that human-written data is necessarily any better than AI-generated data? On and on the fervent debate goes.
When I give talks about the latest AI trends, I often get asked about whether we should perhaps ban the use of synthetic data. Or maybe people should not be allowed to post AI-generated data onto the Internet. Make it a crime to do so. Keep the Internet a pristine preserve of human-written content only.
I stridently emphasize that this way of thinking about synthetic data is myopic. It is the proverbial mistake of tossing out the baby with the bathwater (a longtime forewarning adage). AI-generated data has tremendous value. We ought to mindfully consider how to harness that value. Meanwhile, sure, we should be cautious is misusing synthetic data and take prudent steps accordingly (see my in-depth analysis about this topic, including debunking another voiced qualm about AI models collapsing due to synthetic data, at the link here).
Synthetic Data For Therapist-Client Sessions Analysis
As a brief example of how I’ve beneficially opted to use synthetic data, consider the use case of wanting to study how therapists interact with their clients and patients.
We can learn a lot about therapy and therapeutic practices by closely studying the interactions that occur when a therapist-client session is taking place. Some therapists record and transcribe their sessions, doing so with permission from their clients, and then use those materials to self-reflect on their therapeutic prowess. This can also be a handy means of reviewing sessions and garnering additional insights concerning a client while calmly doing a post-session analysis.
There is a lot more value associated with those transcribed sessions on a larger scale.
If a therapist anonymizes the transcripts, they might then make the transcribed sessions available to other therapists and researchers. By examining hundreds or perhaps many thousands of such transcripts, we can identify a big picture perspective of how various devised therapies seem to be undertaken during therapist-client sessions and discover crucial patterns that can advance mental health practices across the board.
The hitch about doing analyses of therapist-client sessions is that there aren’t vast digital stores of them, and they sometimes have a cost associated with tapping into them. Other issues include that such transcripts tend to require extensive data cleanup since the dialogues are often start-and-stop verbalized fragments. All in all, the desire to leverage therapist-client sessions in the name of research and advancing mental health theories and practices is hampered by the dearth of available transcripts, their costs to obtain, and the arduous effort to make them readily usable.
How might this be overcome?
One approach is to use generative AI and LLMs to generate therapist-client transcripts that are based on the AI being guided to do so. Thus, generate synthetic data that represents therapist-client dialogues. I’ve done this and describe the key ins and outs at the link here. It is important to use AI for this purpose in an upfront and proper way. The goal is to produce dialogues that are patterned on real-world dialogues. Equally important is to label that the dialogues are synthetic so that other researchers will be aware of how the transcripts were produced.
Synthetic Data To Understand The Brain
At Stanford, there is an exciting effort using generative AI to produce synthesized brain MRIs. This provides another vivid example of the beneficial use of synthesized data.
Suppose you want to study MRIs to glean discoveries about how the brain operates. You might want to do this in the large and explore many MRIs to discern patterns. Another angle would be to dive into a specific MRI to look closely at crucial core elements and discover facets that could help us reveal brain conditions, such as potential diseases or maladies.
How would you obtain enough MRIs and sufficient variety to undertake these types of brain-focused analyses?
A smart way to do this would be to use AI to generate MRIs that then could be analyzed and studied. We want to do this and be as realistic in the MRIs as we can be. It would be untoward to simply generate MRIs on a wanton basis that wouldn’t particularly reflect the true conditions that humans encounter. The MRIs must be realistic if they are going to be of valid use.
As noted in a recent online posting entitled “GenAI Helps Stanford Researchers Better Understand Brain Diseases” (Stanford Report, October 7, 2025), these key points were made (excerpts):
- “Kilian M. Pohl, professor of psychiatry and behavioral sciences and, by courtesy, of electrical engineering at Stanford, says that “future breakthrough discoveries in neuroscience will rely on AI technology. The problem currently is that this technology tends to produce unreliable results, as most brain MRI studies are simply not large enough.”
- “Pohl, who co-directs the AI for Mental Health Initiative and is a faculty affiliate of Stanford HAI and the Wu Tsai Neurosciences Institute, is most excited about applying BrainSynth toward learning about diseases that subtly affect the brain. “Many diseases or conditions that I study are ones that are not well understood, and the impact on the brain has subtle effects that you can’t often see with the naked eye,” Pohl said. “I want to use this generative AI technology to capture those subtle effects.”
The second point mentions an AI system that has been developed for the synthesis of MRIs and is referred to as BrainSynth. Let’s take a closer look at that capability.
Unpacking BrainSynth And Synthesized Data
The clever approach being undertaken consists of using generative AI to produce synthesized data and generating usable 3D brain MRIs. Vitally, the synthesized MRIs need to be anatomically plausible. Pushing toward plausibility is a tough problem to solve. It is one thing to generate an MRI, but doing so and reflecting human anatomical realism entails knotty issues.
In a research paper co-authored by Dr. Pohl that is entitled “Metadata-Conditioned Generative Models To Synthesize Anatomically-Plausible 3D Brain MRIs” by Wei Peng, Tomas Bosschieter, Jiahong Ouyang, Robert Paul, Edith V Sullivan, Adolf Pfefferbaum, Ehsan Adeli, Qingyu Zhao, and Kilian M Pohl, Medical Image Analysis, August 2024, these salient points are made (excerpts):
- “Recent advances in generative models have paved the way for enhanced generation of natural and medical images, including synthetic brain MRIs.”
- “To generate high-quality T1-weighted MRIs relevant for neuroscience discovery, we present a two-stage Diffusion Probabilistic Model (called BrainSynth) to synthesize high-resolution MRIs conditionally-dependent on metadata (such as age and sex).”
- “We then propose a novel procedure to assess the quality of BrainSynth according to how well its synthetic MRIs capture macrostructural properties of brain regions and how accurately they encode the effects of age and sex.”
- “Results indicate that more than half of the brain regions in our synthetic MRIs are anatomically plausible, i.e., the effect size between real and synthetic MRIs is small relative to biological factors such as age and sex. Moreover, the anatomical plausibility varies across cortical regions according to their geometric complexity.”
- “These results indicate that our model accurately captures the brain’s anatomical information and thus could enrich the data of underrepresented samples in a study.
The metadata aspect of this study is especially noteworthy. As noted above, the BrainSynth seeks to encode the effects of factors such as age and sex. Incorporating biological factors into the generation process substantially aids in the usability and contributes to the anatomical plausibility goals.
For those of you interested in the AI underpinnings of BrainSynth, you might consider taking a look at the GitHub site for the project, at the link here.
Double-Checking Of Synthesized Data
A significant part of any AI effort to generate synthetic data needs to involve double-checking of the generated data, which is notably identified in the BrainSynth study. The researchers carefully compared real MRIs to the synthesized MRIs. Doing so helps in ascertaining whether the synthesis is on-target and sufficiently captures the keystones of the real-world phenomena.
The same holds for anyone opting to employ AI-generated data all told.
Part of the reason that AI-generated data gets a bad rap is that synthesized data is, at times, handed out to the world without an iota of double-checking. People will potentially and mistakenly rely on synthesized data as though it is real. Meanwhile, the synthesized data might contain inaccuracies, including dreaded AI hallucinations (see my assessment of so-called AI hallucinations at the link here).
I am an outspoken advocate for the double-checking of synthesized data and labeling the generated data as having been produced by AI.
Final Thoughts For Now
A catchphrase these days that is gaining traction is that we must try to mitigate the amount of “AI slop” that is being shared as though it is valid data. The more AI slop there is, the worse things will get for society overall. In turn, I predict that this is going to spur a flurry of new laws that attempt to rein in the AI slop, but those laws are likely to inadvertently overshoot and cause as many problems as they potentially solve.
As stated eloquently by Albert Einstein: “Only a life lived for others is a life worthwhile.” If you are going to use AI to generate synthetic data, please do so with others in mind. Aim to produce valid data, double-check the data, label it as synthetic, and only then release the data for others to rely upon.
You would presumably make Einstein proud of your valiant efforts.
Disclaimer: This news has been automatically collected from the source link above. Our website does not create, edit, or publish the content. All information, statements, and opinions expressed belong solely to the original publisher. We are not responsible or liable for the accuracy, reliability, or completeness of any news, nor for any statements, views, or claims made in the content. All rights remain with the respective source.