Anthropic on issues with research data in biology

 

Anthropic's recent research post brings additional points to Arachne.ai's use-case-specific materialisation approach for biomedical data in agentic context.

A week ago Anthropic published their research on using agents in biology. Other claims in the article strongly aside, it highlights the difficulty of accessing the heterogeneous, multi-scale data of the biomedical domain in an agentic setting.

As I see it, the following seem to be problematic:

  • understanding the exact definition, context, and expected use case
  • access methods (filtering, API definitions) implicit and not necessarily up to par
  • making the data interoperable is costly
  • consuming public resources for large-scale ops is asking for a meltdown

Platform-level centralisation is the current standard - sounds great and might cover what you need at the moment, but the big picture:

  • limits your use cases
  • compromises specificity and generalisability
  • you still have to know exactly the context in order to leverage particular resources
  • maintenance and change management become a considerable cost
  • considerable effort is spent on rebuilding in and across orgs.

These are true for both external (buy / open source) and internal (build) tools. You can absolutely hybridise, but then you end up with the worst of both worlds - but you end up maintaining both ungoverned ad-hoc pieces and a toothless platform. I believe there is a better way forward - a system to orchestrate creation of use-case-specific auto-generated knowledge bases.

So, as usual when agents are mentioned, it should be about providing context and tools for a specific task. More specifically:

  • what datasets exist out there which could be used
  • has anyone already used them - and if so, how and for what
  • are there already established tools or access patterns you can leverage
  • being able to collate the data to your data infrastructure so you are not limited by external factors
  • a unified agent-ready access layer

If you are seeing this already, I would be keen to talk. Arachne.ai was conceived around tackling these sorts of issues through rapid, context-aware, use-case-specific materialisations of biomedical knowledge bases.