| Project ID |
BITS-SRIP/9EEEB4/2026 |
| Project Title |
Project Jigyasa: Agentic Interface for AlphaFold |
| Project Description |
The bioinformatics industry is rapidly moving toward Autonomous Science, yet a critical gap remains: current AI models are brains in a jar—intelligent, but disconnected from real-world scientific data. Project Jigyasa addresses this industry-wide challenge by building a production-grade Agentic Interface for the AlphaFold ecosystem. You will move beyond standard chatbots to engineer a high-impact Research Tool. By implementing advanced Tool Use protocols, you will empower an AI to actively navigate complex protein databases, validate findings, and accelerate the initial stages of drug discovery. This project simulates the cutting-edge RD workflows currently being adopted by top AI-for-Science labs.
Scope of Work Engineering Modules
1. The Tool Layer (The Infrastructure):
Create a robust Python library that wraps public APIs for UniProt (gene metadata) and AlphaFold DB (3D structures).
Challenge: Design these tools with Production Reliability, ensuring the AI can handle API rate limits, missing data, and complex queries without crashing—a key requirement for autonomous systems.
2. The Reasoning Engine (The Logic):
Use an orchestration framework (like LangChain or LlamaIndex) to connect an LLM to your Tool Layer.
Implement a ReAct (Reason + Act) workflow where the agent plans its own research steps (e.g., “Retrieve structure -> Check confidence score -> Compare with gene variant”) to solve multi-step biological problems.
3. Automated Analysis Module (The Processor):
Integrate BioPython to programmatically parse the fetched PDB files.
The Agent must automatically extract specific metrics—specifically the pLDDT score (confidence)—and classify the protein structure as High Quality or Disordered, providing immediate, actionable insights to the researcher.
4. Interactive Dashboard (The Interface):
Build a lightweight Streamlit web application.
The app should feature a Chat with Data interface where users see the Agents Thought Process in real-time, demonstrating how the AI navigates from a vague question to a verified scientific answer.
Expected Tangible Outcomes
A Bio-Agent Python Package: A modular set of tools allowing any LLM to query biological databases, usable for future research pipelines.
Streamlit Web Application: A functional demo where a user can ask questions about proteins and get summarised answers + 3D file downloads.
Technical Report: A guide on Building Agentic Workflows for Scientific Data, highlighting the move from static search to autonomous discovery. |
| Project Discipline |
Computer Science, Bioinformatics, Computational Biology, Software Engineering |
| Faculty Name |
Ashutosh Bhatia |
| Department |
Department of Computer Science & Information Systems |