The data pipeline will cover three things: NPM package information enriched by GitHub repo information. And there will also be curated content, such as podcasts from the ecosystem. Note: #unjs package system, for example, is another complexity to cover here, too. I have that in mind.