Forthcoming IJIO article by Laura Abrardi, Carlo Cambini
@flaviopino.bsky.social "Data brokers competition, synergic datasets, and endogenous information value" doi.org/10.1016/j.ij...
@sciencedirect.bsky.social
Forthcoming IJIO article by Laura Abrardi, Carlo Cambini
@flaviopino.bsky.social "Data brokers competition, synergic datasets, and endogenous information value" doi.org/10.1016/j.ij...
@sciencedirect.bsky.social
This DBs' behavior has strong welfare implications: higher prices lead to lower entry, which in turn causes consumer harm. In other words, we show that competing DBs can cause significant harm, without the need of colluding, even in the absence of strong data synergies.
If modularity is high enough to ensure superadditivity, DBs coordinate their prices and fully extract firms' WTP, as in a nice paper by Gu,
@leonardomadio.bsky.social and @marcoreggiani.bsky.social
(Data Broker co-opetition). Coordination emerges even when data synergies are very low!
The intuition is that data brokers sell data to multiple competing firms. Then, choosing not to buy data puts firms at a disadvantage against its rivals, increasing its WTP. Anticipating this, DBs charge higher prices!
Instead, datasets are superadditive if the ๐ฏ๐๐ฅ๐ฎ๐ of the combined dataset is higher than the sum of the individual ones. We find that supermodularity is ๐ง๐จ๐ญ necessary for superadditivity!
Formally, we have two data brokers (DBs) that sell datasets of different accuracy (i.e., probability of operating first-degree price discrimination). Datasets are supermodular if the ๐๐๐๐ฎ๐ซ๐๐๐ฒ of the combined dataset is higher than the sum of the individual ones.
shorturl.at/4nuab The answer to this conundrum is: technological additionality and economic additionality differ when you take firm competition into account! We refer to the first one as "modularity", and to the second one as "additionality".
Ladies and gentlemen.. this is paper #5! published OA in IJIO (see thread)! Suppose you buy data from two sources, and combined accuracy is lower than sum of individual accuracies. First dataset increases your profits by 5, second by 2, and so you agree to pay 10 for both? What?!
Yeah that makes sense, so training data will be even more essential now
So, now that the AI stack has been de facto cut in half and cloud computing/chips seem less essential, the question that remains is: which data are more important for foundation models? Training data, or user generated data that may create reinforcement effects? Any insights from tech savvy people?
Hi, IO assistant professor focused on data markets theoretical models here!
Impressive work @jschneebacher.bsky.social
Thank you!