A week ago Anthropic published their research on using agents in biology. Other claims in the article strongly aside, it highlights the difficulty of accessing the heterogeneous, multi-scale data of the biomedical domain in an agentic setting.
As I see it, the following seem to be problematic:
- understanding the exact definition, context, and expected use case
- access methods (filtering, API definitions) implicit and not necessarily up to par
- making the data interoperable is costly
- consuming public resources for large-scale ops is asking for a meltdown
Platform-level centralisation is the current standard - sounds great and might cover what you need at the moment, but the big picture:
- limits your use cases
- compromises specificity and generalisability
- you still have to know exactly the context in order to leverage particular resources
- maintenance and change management become a considerable cost
- considerable effort is spent on rebuilding in and across orgs.
These are true for both external (buy / open source) and internal (build) tools. You can absolutely hybridise, but then you end up with the worst of both worlds - but you end up maintaining both ungoverned ad-hoc pieces and a toothless platform. I believe there is a better way forward - a system to orchestrate creation of use-case-specific auto-generated knowledge bases.
So, as usual when agents are mentioned, it should be about providing context and tools for a specific task. More specifically:
- what datasets exist out there which could be used
- has anyone already used them - and if so, how and for what
- are there already established tools or access patterns you can leverage
- being able to collate the data to your data infrastructure so you are not limited by external factors
- a unified agent-ready access layer
If you are seeing this already, I would be keen to talk. Arachne.ai was conceived around tackling these sorts of issues through rapid, context-aware, use-case-specific materialisations of biomedical knowledge bases.