Can Coding Agents Reproduce Findings in Computational Materials Science?
Researchers introduced AutoMat, a benchmark for assessing how well large language models can reproduce findings in computational materials science. Their study shows that these coding agents struggle with complex scientific workflows, achieving a maximum success rate of only 54.1%.
This is worth holding only if the practical relevance is clear from the source.
This record is extracted from a published AI Today issue and tied to the original source URL. Treat the source as the record of evidence for the summary.