Inside NewtonX’s verified synthetic research

June 25, 2026

Synthetic outputs inherit the quality of the data underneath them. Here’s how we build ours, and why the verified foundation changes the output.

By Jason Talwar, Principal VP, Methods & Innovation

Most synthetic research succeeds or fails on its foundation, long before the algorithm matters. Synthetic outputs inherit the quality of the data they’re trained on. Train a model on verified professionals and it reproduces how those professionals think. Feed it unverified panel data, which already diverges 20 to 40% from controls before any model touches it, and it reproduces those flaws with the same fidelity, fraud and inattention included.

We call that the Data Inheritance Problem. It’s the reason most synthetic providers build on shaky ground: They start from large language models and scraped public data. NewtonX starts from verified primary research. What follows is how we build on that foundation, and why it changes what comes out.

Before getting into the build, two of the four methods are worth a brief orientation. Synthetic modeling generates simulated responses using AI models trained on audience descriptions rather than a verified human dataset, which raises hallucination risk and keeps the output directional only. Synthetic twins simulate a specific individual from granular, individual-level behavioral data. In B2B, that tends to be anecdotal rather than directionally useful at scale.

Here, we focus on boosting and personas, the two methods where the verified foundation does the most work.

What “verified” actually means

Verification can’t be a step you run after the data is already in. The fraud has moved upstream of detection with 99.8% of standard survey attention checks now passable by AI agents. Catching bad respondents after the fact stopped working once the checks themselves could be defeated.

That’s why our verification happens before anyone enters the panel. Every professional is identity-verified against corporate email and LinkedIn, then screened for the expertise a given study actually requires. Our network runs to more than a billion professionals across 140+ industries, with all held to the same standard.

The case for boosting

Synthetic boosting is the simplest form of the method and the clearest demonstration of why the foundation decides everything. It adds new rows to a real survey dataset, learning the response patterns already in the data to generate statistically equivalent responses for underrepresented groups. The mechanism is statistical imputation, so it extends a real dataset rather than inventing one. It can run on a base as small as 15 respondents, and works best filling no more than about 20% of the total dataset. It can stretch to 30%, but NewtonX’s testing shows 20% as the optimal ceiling. The principle holds at any size: The model can only extend a foundation that’s already there.

Boosting is a pure statistical model, not a learned one. It runs on the survey’s structure and the responses already collected, with no outside training data feeding it. When those responses come from real professionals, boosting holds 95 to 99.5% statistical equivalence with fresh human controls across more than 200 backtested studies. That figure measures fidelity to the source data, not absolute accuracy. The model reproduces whatever the real responses contain, errors included, with equal fidelity. The number is only ever as good as what sits beneath it.

How NewtonX builds a synthetic persona

Boosting extends a dataset. A persona, on the other hand, is a different build entirely.

It starts with NewtonX and a foundation of B2B professionals whose behavioral and attitudinal signals reflect how real B2B professionals actually think instead of a general-population model that approximates them. That foundation can stand on its own as a marketplace persona, ready to query.

Clients can then layer in their own research to make it proprietary: past surveys, interview transcripts, segmentation studies, historical data. That extra layer makes the persona reflect one organization’s buyers specifically and keeps it locked to that organization.

This is harder in B2B than the consumer personas most vendors sell. B2B buyers don’t leave the public trail consumer models train on, so a persona that isn’t built on data from real professionals is guessing at the part that matters most.

The build is designed to be audited, not just trusted. Every answer a persona gives can be traced back to the research behind it, down to the specific study and date.

Where restraint matters

A persona is only as current as its training data. It can’t speak to a product that didn’t exist when it was trained, and for decisions that have to be statistically defensible, primary research is still the right call. A well-built persona respects that line and says so when the data runs thin. That restraint is the difference between a synthetic tool you can put in front of a board and one you can’t.

The data foundation is the argument

Boosting, personas, and whatever the category builds next all inherit whatever sits underneath them. A verified foundation gives the layers above it integrity. The wrong one compounds its flaws all the way up. That’s why we start where we do: with the data, and with people whose backgrounds can actually be confirmed.

Start a research project with our team today.

Sign up for our newsletter, NewtonX Insights:

Your playbook to making confident business decisions enabled by B2B research. Expect market research trends, tools, and case studies with leading enterprises, delivered monthly.

Industry & audience expertise

Analyze

Industry & audience expertise

Analyze

Inside NewtonX’s verified synthetic research

Synthetic outputs inherit the quality of the data underneath them. Here’s how we build ours, and why the verified foundation changes the output.

What “verified” actually means

The case for boosting

How NewtonX builds a synthetic persona

Where restraint matters

The data foundation is the argument

Sign up for our newsletter, NewtonX Insights:

Related Content

What is a synthetic persona? Explaining the synthetic research method built on your own buyers

A field guide to synthetic data in B2B research

How to avoid the B2B synthetic data trap