Public life science data infrastructure is likely too brittle for the new wave of AI scientists, and may become a blocker as adoption broadens.
I finally started to overhaul datasets.bio / BiŌkeanós to allow agentic discovery of biomedical data “in the wild”, as deep research capabilities of even specialised systems leave me wanting.
The scrapers I use for biomedical data providers worked pretty much out of the box. Where they failed, it was mostly due to throttling and stricter scraping provisions - which is fantastic news. It is great to see the stability and security of these services being taken care of.
This, however, reminded me of their limits. And as a person who accidentally took down several ELIXIR and NIH services last week in the past, I believe that with the advent of research agents and their capabilities - e.g. ToolUniverse - a lot of services that e.g. drug discovery relies on today will either be overwhelmed or will limit the speed of discovery.
There are better ways to do this at scale - and this is how I’m thinking about Arachne.ai now. If you’re working on problems around biomedical data usage, research agents, or the slightly boring plumbing between the two, I’d be interested to hear what you’re seeing.