Events
Date 22 Oct 2025
Time 10:30 am - 11:30 am (HKT)
Venue Lecture Theatre, T6, Meng Wah Complex
Speaker Prof. Xi YU
Institution Tianjin University
Self Photos / Files - Prof. Xi YU Seminar Poster
 
Title:
Language-Driven R&D for Chemistry and Materials
 
Schedule:
Date: 22nd October, 2025 (Wednesday)
Time: 10:30 - 11:30 am (HKT)
 
Venue: Lecture Theatre T6, Meng Wah Complex
 
Speaker:
Prof. Xi YU
Tianjin University
 
Abstract:
Modern chemistry and materials science are undergoing a paradigm shift, moving beyond the traditional triad of “experiment–theory–computation” to embrace a dual-engine model driven by data and language. This report proposes and implements a novel language-driven R&D framework, wherein large language models (LLMs) function as language organizers and reasoners, transforming unstructured and semi-structured data from literature, patents, and lab records into retrievable, interpretable, and actionable research knowledge.
 
We establish lightweight and heterogeneous databases tailored to materials science, combining hybrid retrieval strategies. Through a Retrieval-Augmented Generation (RAG) architecture, we tightly couple general-purpose LLMs with domain-specific corpora, creating a unified semantic space that spans experimental procedures, observations, mechanisms, characterization data, modeling results, structured tables, and knowledge graphs. This allows users to interact with domain knowledge as intuitively as consulting a technical dictionary—offering evidence-supported, query-sensitive insights. On the application front, we develop an intelligent expert system for thermal conductive composite materials, currently deployed in industrial pilot testing. The system enables fine-grained suggestions, integrative summarization of evidence, and open-ended recommendation generation.
 
We further introduce an Experience-Augmented Generation (EAG) methodology, forming an iterative loop of update–predict–validate to codify literature-derived knowledge into trainable “experience texts.” Taking BNNS (Boron Nitride Nanosheets) synthesis as a case study, the system outputs scalable processing windows and post-treatment routes, while offering language-grounded rationale for material and method choices, such as grinding media selection. In other domains like molecular gels, we demonstrate how structured language representations—anchored in chemical principles and referenced evidence—significantly enhance predictability. Importantly, EAG is trained much like a machine-learning model—from corpus curation and objective design to optimization and evaluation, which can be quantitatively characterize by computational linguistics to study text evolution within EAG process and to measure how the linguistic and organizational features of experience texts affect downstream performance.
 
In conclusion, this language-driven approach formalizes the process of "language-centric knowledge production" in chemical and materials R&D. It enables seamless integration of retrieval, extraction, summarization, and reasoning within a unified interface, reducing the friction between evidence and decision-making, and supporting end-to-end innovation workflows—from formulation and process guidance to mechanistic hypotheses and property prediction.

 

- - ALL ARE WELCOME - -