April 26, 2026

Medical Voca

Start the day healthy

Will AI become our Co-PI?

Will AI become our Co-PI?

While robotics could optimize research workflows as indefatigable staff, the path to AI becoming a collaborative principal investigator lies in enhanced hypothesis generation. LLMs excel at processing and identifying patterns in vast datasets which are inaccessible or otherwise impossible for humans to reproduce at comparable cost given biological and energy constraints. Further, though precise estimates are elusive given the proprietary nature of most LLM training sets, there is consensus that the models have now read most text data available on the internet, including scientific papers and other paywalled content, prompting the need for synthetic data which itself is subject to limitations21,22. It stands to reason that LLMs, with access to unimaginable scientific data and the ability to integrate it, may be able to identify patterns and infer causality to guide human researchers.

In other words, research models may be capable of inferential ideation that is productive in a scientific context, leading to collaborative “AI Co-PIs” where LLMs can generate hypotheses and guide specific experiments in partnership with human PIs (Fig. 1). For example, consider the European physicists of the 19th and 20th centuries, whom many would regard as the greatest minds of human history. While Bohr, Maxwell, and Einstein had access to only the best empirical data that the technology of their time afforded, they were still able to infer the physical frameworks of the standard model, quantum mechanics, and relativity. Long after their deaths, contemporary physicists have been able to retroactively validate their theories through empirical data produced from modern technology. Analogously, LLMs may not require empirical data beyond what is already accessible in the literature in order to produce guiding hypotheses of scientific value.

Fig. 1
figure 1

AI collaborators for efficient aggregation and synthesis in biomedical discovery.

The ingenuity of history’s physicists likely relied on some combination of lived experience, breadth of knowledge, creativity, speed of processing, recall, intelligence quotient, and all other physiological or psychometric factors known to underpin human quality of thought. LLMs already outperform humans on some of these metrics; for example, recent work has shown that GPT-4 consistently demonstrated higher originality and elaboration on divergent thinking tasks—including the Alternative Uses Task, Consequences Task, and Divergent Associations Task—compared to human participants, potentially suggesting greater creativity among models23. Other efforts have produced LLMs which iteratively refine hypotheses using a balance of exploration and exploitation strategies, functionally emulating human scientific reasoning where hypotheses are continuously tested and refined based on new information24. Taken together with evidence demonstrating LLM use of human decision-making heuristics as well as some alignment with human moral and causal judgements, it is reasonable to infer that AI hypothesis generation may mirror or even exceed human thinking25,26. Ultimately, the multilayer abstraction inherent in LLM architecture may allow for the interpolation of new patterns that may not be mere regurgitations of training data, potentially presenting value to human researchers27.

Novel hypotheses generated by AI evidently warrant post-facto experimental validation. History reminds us that even our greatest minds revised their views; Einstein himself abandoned the cosmological constant he once defended28. Recognizing that human reasoning is likewise fallible strengthens the case for subjecting both human and AI-generated hypotheses to the same empirical scrutiny. But by recognizing hidden relationships or correlations across disciplines, LLMs could guide the design of experiments that would otherwise remain unexplored, steering researchers toward new truths. Recent work has shown that LLMs tuned with domain-specific knowledge outperform human neuroscientists in predicting experimental outcomes in neuroscience, likely due to the ability of models to integrate broad context and subtle patterns from extensive scientific literature29. Though AI may lack human-specific context, as models improve so too should machine reasoning, perhaps enabling AI to make causal inferences of comparable quality to even the best of humanity’s thinkers. Ongoing efforts seek to replicate the thought processes of specific notable humans by fine-tuning LLMs on the sum total of an individual’s works, though results remain mixed30.

Indeed, AI Co-PIs may not be universally generalizable research tools as different fields of science warrant different approaches to innovation. Biology and physics fundamentally differ in the manner by which hypotheses, experiments, and results lead to new understanding. Physics is a “hard” science, where our experiments reveal progressively more about the fixed laws governing the unchanging universe. Biology, by contrast, is riddled with variability and heterogeneity from the cellular to organismal to societal levels making experiment control notoriously difficult. For example, the “antibody problem” refers to the paradox where even slight differences in experimental conditions—such as reagent lots, cell lines, or ambient conditions—can cause radically divergent antibody binding behaviors, making reproducibility an ongoing challenge.

A critic might argue that an AI Co-PI would be unable to make scientific progress under conditions of such variability in the biological sciences. However, AI-guided experiments and parallelized robotics might overcome variance through superior iterability and repetition. By scaling up the number and sophistication of experimental trials, AI-guided robotics can systematically probe these loose, variance-prone problems in biology until approximations become reliable truths. We already approximate this idea with high-throughput “omics” technologies, drug discovery, protein folding, and other contexts, illustrating how large-scale data generation can capture complexity that would overwhelm traditional human-led methods13,14,31. AI has demonstrated its ability to anticipate circumstantial challenges and adapt its strategies in other highly variable contexts as well, including disaster management and fraud prevention32,33. Variability and heterogeneity may reinforce, rather than undermine, the value of in-silico approaches for AI-PI collaborations in biomedical discovery.

link