| Project ID |
BITS-SRIP/5D8883/2026 |
| Project Title |
Design and Implementation of a Semantic Similarity Detection System |
| Project Description |
The project involves designing and implementing a system that automatically detects semantic similarity and redundancy in collections of text documents. The system will use modern Natural Language Processing (NLP) techniques such as text embeddings and similarity metrics to identify related or overlapping text segments. In addition, a Large Language Model (LLM) component will be used as a secondary semantic verifier to provide a more reliable similarity decision and brief explanations of why two text segments are considered overlapping. Students will focus on data preprocessing, system design, similarity analysis, and performance evaluation using real-world text datasets. Expected Outcomes: 1. Implement a Python-based system to detect semantic similarity and redundancy in text documents using embedding-based NLP methods.
2. Integrate an LLM as a secondary verifier to improve accuracy and generate brief explanations for similarity matches.
3. Design a modular and scalable pipeline that covers preprocessing, similarity search, and result reporting.
4. Evaluate and present results using tuned similarity thresholds, standard metrics, a simple UI, and concise technical documentation. |
| Project Discipline |
Electrical Engineering, Electronics and Communication Engineering, Computer Science, Data Science, or related interdisciplinary engineering programs. |
| Faculty Name |
Syed Mohammad Zafaruddin |
| Department |
Department of Electrical & Electronics Engineering |