AdC warns of competition risks regarding access to and use of data in generative AI
Press Release 22/2024
September 27, 2024
The AdC warns of competition risks regarding access to and use of data in generative AI. This is the first paper of a new Short Papers Series which expands on the Issues Paper on generative AI from November 2023, following recent developments in the sector.
AdC Short Paper Series
The AdC has started a Short Paper Series aimed at assessing competition dynamics in generative AI markets, which expands on the Issues Paper on Competition and Generative AI published in November 2023.
The first paper of the series is focused on recent developments regarding access to and use of data in generative AI, namely on the increasing importance of data licensing agreements and their impact on competition.
Competition and Generative AI: Zooming in on Data
Data is a key input in generative AI models, alongside computing power and know-how. There have been recent developments in the sector regarding how AI developers access and use data which may impact competition.
- There has been a shift from publicly available data to proprietary data, as IP holders have begun demanding compensation. This may reinforce data-driven advantages and market concentration.
- Data licensing agreements seem to have become more prevalent. These are agreements between IP holders – such as publishers, stock image repositories or social networks – and generative AI developers. The AdC warns that competition risks arise if data agreements include exclusivities. These can be especially harmful to competition and possibly an anticompetitive practice, if data holders have a dominant position.
- Synthetic data and data pre-processing seem to be playing an increasingly important role in training efficient and performant generative AI models. Synthetic data is increasingly used by developers and can reduce entry barriers and data acquisition costs, but it presents limitations and AI developers with access to real-world data may still enjoy a competitive edge. Data pre-processing, on the other hand, may exacerbate scale effects and market concentration, as it is heavily reliant on experimentation.
To mitigate risks to competition regarding access to and use of data in generative AI, it is key to streamline access to data for developers to ensure a level playing field (e.g., by serving data through open APIs, pay-as-you-go pricing structures or making public datasets easily available). Knowledge transmission channels, such as open-source models, may also mitigate scale effects generated by experimentation.