Synthetic data for designers: what you need to know
A $2.1 billion market is emerging, unlocking new design roles— and it’s already underway.

Gartner estimates that by 2030, the use of synthetic data will completely surpass the use of real data in training artificial intelligence models. But what does this mean exactly?
Synthetic data is artificially generated data designed to mimic real-world data. It is created using algorithms and statistical models that replicate the patterns, characteristics, and relationships found in real-world data.
Imagine: until now, algorithms have been primarily fueled by data produced by humans. But soon, they will be trained by other algorithms and Synthetic Data.
In this article, we go beyond the obvious to raise questions to foster critical and creative thinking — I believe this is the best way to create desirable futures.
In this text, we will discuss:
- Why Synthetic Data?
- It’s expected to be a big market
- It’s already happening
- What does it means to designers?
Shall we?

Why Synthetic Data?
Well, we can begin by questioning ourselves — why would anyone spend lots of time, effort and resources to generate synthetic data when one can just use real data?
There are a few possible reasons:
- When your real data doesn’t exist yet (for instance, the system that you need to test is not deployed yet or it’s inaccessible);
- Some regulations or architectures inhibit you from accessing this data (this is a common factor in healthcare and finance);
- There is data but not enough variants of it or the data set is simply not big enough;
- The data set it’s imbalanced.
According to Gartner, only 3% of IT and Data and Analytics leaders say their organization did not face any challenges with real-world data. Most of the ones who adopted AI-generated synthetic data, affirm the main reasons were challenges with real-world data accessibility (60%), complexity (57%), or availability (51%).
It’s expected to be a big market
The Synthetic Data market may seem tiny now — only about $300 million in 2024— but it’s set to grow exponentially, reaching $2.1 billion in just five years, according to Gartner. They’re calling it an emerging technology because adoption by big companies is still really low.
Many people are investing this industry. This tech is still early in its journey — Gartner says it’s reached only about 1–5% of its potential market. That means over 95% of companies that could benefit from synthetic data still haven’t even tapped into it yet.
Why pay attention to that
Imagine a financial company trying to build a model to catch fraud. To really train this model, you need tons of examples of fraudulent transactions. But here’s the challenge — actual fraud cases don’t pop up that often. Sure, the model can scan customer transactions all day, but it might only spot suspicious activity every now and then… Which means it could take ages to get it working well.
That’s where synthetic data steps in.
Instead of waiting for enough real fraud cases to train on, you can generate synthetic ones that look just like the real examples. This way, the model learns faster and gets way better at spotting fraud without waiting around for real-life cases to appear.
This new approach opens up incredible opportunities, as highlighted by Tobias Hann, CEO of Mostly AI, in his SXSW talk.
In the digital world, we encounter unconscious biases that affect Artificial Intelligence models. One example is facial recognition. Since most models are developed and trained with faces of predominantly white men, minority groups are often underrepresented in the data, making it difficult, for instance, to accurately recognize black women.
An alternative? Producing more images representing these minorities to enrich the training of models. In this way, synthetic data emerges as an alternative to reduce bias, promoting a more equitable experience.
While real data often contains sensitive information, synthetic data might represent it without compromising privacy, allowing it to be used in research and analysis without exposing individuals.
It's already happening
At the end of the day, Synthetic data unlocks opportunities for experimentation and innovation where real-world data is limited or unavailable. It’s not just some futuristic concept — it’s being put to work right now, solving real problems.

Privacy compliance alternative
Synthetic data offers a privacy compliant alternative, unlike real data. And it cannot be easily reverse-engineered to extract sensitive information. Take The Ottawa Hospital in Canada.
They’re using synthetic data to analyze drug usage patterns and train machine learning models that predict a drug’s effectiveness for individual patients. It’s a workaround for strict regulations on using personal health information in ML training — and it’s working.

Dealing with high stake scenarios
It can also be used for scenarios we need to simulate extreme conditions. Waymo, the autonomous vehicle company, rely on real-world road data, they also simulate high-stakes scenarios — like a pedestrian suddenly crossing or a car braking out of nowhere.
Synthetic data enables them to train their systems on situations too rare or risky to capture in real life.
Changing perspectives
Even creative campaigns are tapping into synthetic tech. Deepfakes, for instance, are synthetic media generated using the AI technique of deep learning, trained by Synthetic Data. Dove’s #DetoxYourFeed campaign, part of their Self-Esteem Project, used deepfake technology to deliver a powerful message about harmful beauty standards.
The result? A global impact, reaching over 82 million young people in 150 countries — all while using synthetic media to keep costs and complexity in check. The application of deepfake technology allowed Dove to deliver a genuine campaign message in a cost-efficient manner.
What does it mean to designers?
Synthetic data might be more than just a solution to compliance issues — it could serve as a tool for innovation, enabling industries to reimagine what’s possible while remaining secure and ethical.
New possibilities for Design and Data
In their work on Rethinking Design, Elisa Giaccardi and Roy Bendor invite designers to responsibly anticipate and guide this transformation by proactively imagining and manifesting alternative futures. They introduce a new glossary with 17 concepts for design in the age of AI, developed in collaboration with 40 researchers and the European-funded Innovative Training Network.
Among the many intriguing concepts and discussions, I would like to highlight three:
- Reflective Data: Instead of viewing AI systems as deterministic tools, this concept envisions designers acting as curators in their use of generative AI tools and large language models (LLMs). Errors and surprises are treated not as failures but as valuable indications, insights, or opportunities to democratically contest the outcomes of AI systems.
- Calibrated Trust: Unlike traditional notions of transparency, calibrated trust focuses on aligning human trust levels with the actual capabilities of AI systems. Design plays a pivotal role in implementing interventions to ensure that trust corresponds appropriately to what the system can genuinely deliver.
- Prototeams: This concept highlights a growing set of tools that enable speculative work in real-world contexts. Designers in Prototeams address concrete cases by prototyping not only novel propositions with practical applications across various domains but also by revealing the knowledge and skills required to realize those propositions.

New paradigms of UX Research
Apala Lahiri Chavan is the Chief Design at Human Factors International. She has being talking about the future of UX Research for a long time now. In this 2016 article, she talks about humanoid user researchers long before tools like ChatGPT was available.
Although we can do everything she imagine, AI is already shaping how research is being conducted.
- AI can help us in desk research to gather data and better understand the context prior to the interview questionnaire creation.
- AI can help us to anticipate common errors and pit falls when designing user surveys, apply best practices and proactively look for biases, increasing our efficiency and productivity.
- AI can role-play and test the interview questions, to help researchers be better prepared for possible outcomes.
- AI can also provide high-level overviews, summaries, and easily accessible insights, after the research is done, helping to make sense of all information collected.
But with Synthetic Data, it seems that we can do more.

Beyond generating Synthetic Data, we are seeing some researches testing Synthetic Users. They are AI-generated profiles that attempts to mimic a user group.
Synthetic users are clusters of behaviors generated by the combination and analysis of a large dataset from an LLM (Large Language Model). To simplify understanding, we can imagine intelligent avatars simulating interactions with real users.
For me, they are a tool to help us create a narrative from a vast data set and can help us create more interest design deliverables.
Synthetic users act as digital explorers in desk research, or conversational personas in the data visualization after the research with real users. They can perform the role of virtual guides that provide valuable insights into user behavior, even in a simulated environment.
And as Maria Rosala and Kate Moran said, Synthetic Users are useful for desk research and generating hypotheses, but not for final decision-making.
As designers, we are investigators of the status quo. We are all about creating new things — objects, products, processes, futures. We should keep on changing, helping create realities we believe in.
••••••
Yes, the topic of AI is very sensitive, and we cannot afford to be naive about it. But how might we explore this new horizon with a critical and creative perspective?
••••••
References mentioned in this article
- Gartner, Is Synthetic Data the Future of AI?
- Tobias Hann, Why the Future of AI Might Lie in Synthetic Data
- Netflix, Coded Bias Documentary
- K2view, Webinar: Unlocking the Power of Synthetic Data Generation
- Nielsen, Synthetic Users: If, When, and How to Use
- Ottawa Hospital, Data in healthcare drives better care for patients
- Amelia Woodward, How Autonomous Vehicles use synthetic data
- Dove, Detox your feed campaing
- Apala Lahiri Chavan, The Future of User Research
- Elisa Giaccardi and Roy Bendor, Rethink Design
Other interesting references
- Frank Wilczek, A Beautiful Question: Finding Nature’s Deep Design
- Apala Lahiri Chavan, Innovative Solutions Book
- Marcelo Gleiser, The Island of Knowledge