By Jason Talwar, Principal VP, Methods & Innovation
Most synthetic research succeeds or fails on its foundation, long before the algorithm matters. Synthetic outputs inherit the quality of the data they’re trained on. Train a model on verified professionals and it reproduces how those professionals think. Feed it unverified panel data, which already diverges 20 to 40% from controls before any model touches it, and it reproduces those flaws with the same fidelity, fraud and inattention included.
We call that the Data Inheritance Problem. It’s the reason most synthetic providers build on shaky ground: They start from large language models and scraped public data. NewtonX starts from verified primary research. What follows is how we build on that foundation, and why it changes what comes out.
Before getting into the build, two of the four methods are worth a brief orientation. Synthetic modeling generates simulated responses using AI models trained on audience descriptions rather than a verified human dataset, which raises hallucination risk and keeps the output directional only. Synthetic twins simulate a specific individual from granular, individual-level behavioral data. In B2B, that tends to be anecdotal rather than directionally useful at scale.
Here, we focus on boosting and personas, the two methods where the verified foundation does the most work.
Verification can’t be a step you run after the data is already in. The fraud has moved upstream of detection with 99.8% of standard survey attention checks now passable by AI agents. Catching bad respondents after the fact stopped working once the checks themselves could be defeated.
That’s why our verification happens before anyone enters the panel. Every professional is identity-verified against corporate email and LinkedIn, then screened for the expertise a given study actually requires. Our network runs to more than a billion professionals across 140+ industries, with all held to the same standard.
Synthetic boosting is the simplest form of the method and the clearest demonstration of why the foundation decides everything. It adds new rows to a real survey dataset, learning the response patterns already in the data to generate statistically equivalent responses for underrepresented groups. The mechanism is statistical imputation, so it extends a real dataset rather than inventing one. It can run on a base as small as 15 respondents, and works best filling no more than about 20% of the total dataset. It can stretch to 30%, but NewtonX’s testing shows 20% as the optimal ceiling. The principle holds at any size: The model can only extend a foundation that’s already there.
Boosting is a pure statistical model, not a learned one. It runs on the survey’s structure and the responses already collected, with no outside training data feeding it. When those responses come from real professionals, boosting holds 95 to 99.5% statistical equivalence with fresh human controls across more than 200 backtested studies. That figure measures fidelity to the source data, not absolute accuracy. The model reproduces whatever the real responses contain, errors included, with equal fidelity. The number is only ever as good as what sits beneath it.
Boosting extends a dataset. A persona, on the other hand, is a different build entirely.
It starts with NewtonX and a foundation of B2B professionals whose behavioral and attitudinal signals reflect how real B2B professionals actually think instead of a general-population model that approximates them. That foundation can stand on its own as a marketplace persona, ready to query.
Clients can then layer in their own research to make it proprietary: past surveys, interview transcripts, segmentation studies, historical data. That extra layer makes the persona reflect one organization’s buyers specifically and keeps it locked to that organization.
This is harder in B2B than the consumer personas most vendors sell. B2B buyers don’t leave the public trail consumer models train on, so a persona that isn’t built on data from real professionals is guessing at the part that matters most.
The build is designed to be audited, not just trusted. Every answer a persona gives can be traced back to the research behind it, down to the specific study and date.
A persona is only as current as its training data. It can’t speak to a product that didn’t exist when it was trained, and for decisions that have to be statistically defensible, primary research is still the right call. A well-built persona respects that line and says so when the data runs thin. That restraint is the difference between a synthetic tool you can put in front of a board and one you can’t.
Boosting, personas, and whatever the category builds next all inherit whatever sits underneath them. A verified foundation gives the layers above it integrity. The wrong one compounds its flaws all the way up. That’s why we start where we do: with the data, and with people whose backgrounds can actually be confirmed.
Learn more about how synthetic personas simulate audience segments you can query on demand between studies—without ever going back to the field. A synthetic persona is one of four methods filed under the label of
read moreBy Jason Talwar, Principal VP, Methods & Innovation The terminology around synthetic data in B2B research is inconsistent. Vendors use the same words to describe different methods. Practitioners inherit definitions from consumer research that don’t
read moreThe market is making bold claims about synthetic data, but the research tells a more complicated story. We’re here to guide you through that gap. When your budget says yes, but your research has to
read more