For years, the debate about data and market power has lived mostly in antitrust courts and academic journals. That changes when AI enters the picture. A new analysis from the European Commission’s Joint Research Centre puts a sharper edge on a question the banking and FinTech sectors can no longer treat as someone else’s problem: when does a competitor’s data advantage become a wall you cannot climb?
The answer, it turns out, depends on factors most banks and payments firms have not been tracking as competitive intelligence.
Bruno Carballa-Smichowski, a policy researcher at the EC’s Joint Research Centre, identifies five conditions that determine whether a dominant firm’s data holdings cross from advantage to barrier. Data has to be central enough to product quality that rivals cannot match performance without it. Network effects have to be strong enough to compound the lead. The data has to be difficult to reproduce independently. It has to retain value over time. And the returns from accumulating more of it have to favor the firm that already has the most.
Each of those conditions shows up in financial services. Credit scoring models improve with more transaction history. Fraud detection systems sharpen with more behavioral signals. Risk models built on years of customer data are not replicated in a year by a new entrant, however well-funded.
What Carballa-Smichowski adds to that familiar picture is a distinction that has not yet made it from policy debates into actual enforcement decisions: the difference between scale and scope.
The Distinction Regulators Are Missing
Scale means more observations, more rows in a dataset. Scope means more variables about the same observations, more columns. The research consensus is that adding more rows yields diminishing returns. Each additional data point contributes less than the one before it. Google’s former chief economist, Hal Varian, built an influential argument on exactly that logic: machine learning hits saturation, so data hoarding does not create durable barriers.
Advertisement: Scroll to Continue
The scope question complicates that argument considerably. Carballa-Smichowski and his co-authors found, in a study using health and socioeconomic data, that combining more variable types about the same individuals produced increasing returns up to a saturation point before diminishing. The pattern was S-shaped. The first few highly informative variables lifted predictive accuracy sharply. “Not all variables are equally informative to the models’ objective,” the paper notes, which is precisely why the curves are not flat.
For banking, this is the operative insight. A payments firm that combines transaction data with location data with behavioral signals with credit history is not just adding rows. It is adding columns about the same customers, and the research suggests those combinations can generate accelerating returns before they plateau. That acceleration is where the competitive gap opens and where a new entrant, starting from zero variable types about any customers, faces the steepest climb.
AI Makes the Problem Structural
The arrival of foundation models has moved this from a theoretical concern to an operational one. The FTC has warned that the data required to pre-train a generative AI model from scratch may make entry into those markets structurally difficult. The OECD has flagged the feedback loop: better models attract more users, more users generate richer interaction data, richer data produces better models. For firms that both train and deploy AI models, that loop is self-reinforcing in ways that firms entering the market later cannot replicate by simply hiring more engineers.
In payments and banking, the parallel is direct. Institutions with long customer histories, cross-product visibility into spending and saving behavior, and proprietary fraud signal libraries are in a structurally different position from challengers. The challengers can access public data and buy third-party feeds, but the variables that most accurately predict behavior, the ones that sit at the high-value portion of the S-curve, are locked inside incumbent systems.
What Regulation Has Tried, and What It Has Not
Regulators have reached for two tools: forcing dominant firms to share data with rivals, and blocking dominant firms from combining datasets they control. The EU’s Digital Markets Act attempts both. The September 2025 U.S. v. Google ruling required a one-time licensing of Google’s search index to qualified competitors, a deliberate choice to avoid perpetual sharing that might reduce recipients’ incentive to build their own data assets.
That logic does not transfer cleanly to financial services, where transaction and behavioral data expires quickly and a one-time snapshot has limited value. The sector will need instruments calibrated to short data shelf lives, where continuous access to fresh signals is as important as the underlying architecture of the dataset.
Across jurisdictions, the regulatory picture is fragmented. Brazil, the U.K., Germany, Japan, South Korea and Australia are each running different experiments with different designations, different obligations and different enforcement teeth. None of them has fully solved the scope problem, and few have built their frameworks around the S-curve dynamics that the research now describes.
That gap between what research shows and what regulation addresses is where the banking and FinTech sectors should be paying attention. The rules are being written now. The firms with the most to lose or gain from how scope-driven data advantages get defined are precisely the ones with the richest customer datasets, and precisely the ones being watched.