Researchers from Fudan University, Shanghai Jiao Tong University, Wuhan University, The Hong Kong Polytechnic University, The Chinese University of Hong Kong, and Shanghai Artificial Intelligence Laboratory have jointly developed ChemLLM, a revolutionary Chemical LLM engineered to address a wide array of chemical tasks through fluent dialogue interaction.
ChemLLM, the inaugural language model explicitly tailored for chemistry, has surpassed established benchmarks, including outperforming GPT-3.5, on pivotal chemistry tasks such as molecule recognition, property description, and reaction prediction. It is built on top of InternLM-2.
Click here to check out the model on Hugging Face.
This pioneering model not only showcases exceptional adaptability across various chemical disciplines but also demonstrates proficiency in specialised NLP tasks within the domain.
Traditional language models have encountered challenges in effectively utilising structured chemical data, often leading to compromised coherence in dialogue. To overcome this hurdle, researchers have devised a template-based instruction construction method, seamlessly transforming structured chemical knowledge into a dialogue-friendly format for model training. This innovative approach ensures ChemLLM’s ability to maintain fluent dialogue while handling diverse chemical tasks with precision.
In addition to its core competency in chemistry tasks, ChemLLM demonstrates remarkable versatility in related mathematical and physical domains, despite its primary training on chemical-centric data.
Furthermore, the model exhibits proficiency in specialised NLP tasks within chemistry, such as literature translation and cheminformatic programming, highlighting its comprehensive utility within the domain.
While ChemLLM represents a substantial leap forward in text-based chemistry applications, it is imperative to acknowledge inherent limitations. Challenges include integrating molecular graph modalities crucial for understanding molecular structures and interactions.
Additionally, concerns regarding adherence to scientific ethics, particularly in generating responses under extreme conditions, necessitate ongoing refinement to enhance functionality and ethical governance.
For those eager to delve deeper into ChemLLM, codes, datasets, and model weights are publicly accessible for reference and utilisation, fostering collaboration and innovation within the scientific community.
Read: India vs China vs US in Open Source AI
The post Researchers from China Release ChemLLM for Chemistry appeared first on Analytics India Magazine.