Research

Academic research, publications, and experimental projects

PolySpeech-HS: Multilingual Non-Autoregressive Text-to-Speech Synthesis with Hidden-State Adapters

Speech Synthesis & Multilingual AI

Abstract

A non-autoregressive text-to-speech (TTS) multilingual synthesis framework designed to address the linguistic diversity and real-time deployment challenges of Indian languages. By deploying a unified encoder-decoder architecture paired with lightweight hidden-state adapters, PolySpeech-HS enables efficient cross-lingual generalization while preserving language-specific prosodic nuances. Achieved state-of-the-art performance with MOS of 4.30, MCD of 4.7 dB, and RTF of 0.13 across six Indian languages.

IEEE Transactions on Audio, Speech and Language Processing
2025
Vellore Institute of Technology
TTS
Non-Autoregressive
Hidden-State Adapters
Multilingual AI
Indian Languages
AMO-HSA

A Novel Data-Centric Transformer Fine-Tuning: A Modular Framework for Rapid Domain Adaptation and Deployment

Large Language Models & Domain Adaptation

Abstract

This research demonstrates a data-centric, hardware-light workflow for fine-tuning transformers that sidesteps the drawbacks of costly LLM APIs. By automatically scraping high-signal web content and converting it into Q&A pairs, we fine-tune a GPT-2-Medium model (355M parameters) in ≈7 minutes on a single RTX-3060. The resulting assistant achieves 67.3% accuracy (+34% over base model) with 1.4s median latency and $0 per call cost.

IEEE Transactions on Computational Social Systems
2025
Vellore Institute of Technology
GPT-2
LoRA
8-bit Adam
Domain Adaptation
Next.js
Q&A Generation
Fine-tuning

Fine-Tuning Mistral 22B: The First Large Language Model for Assamese Language Tasks

Low-Resource Language Processing

Abstract

The first fine-tuned Large Language Model specifically engineered for Assamese, a low-resource Indo-Aryan language spoken by approximately 15 million individuals. Introduces AssamText-750K dataset and custom Unicode mapping system exclusively for Assamese. This pioneering work becomes the first and only Assamese LLM backed by language-specific Unicode infrastructure, achieving 20% average improvement across text generation fluency, sentiment analysis accuracy, and Assamese-to-English translation quality.

IEEE Transactions on Neural Networks and Learning Systems
2025
Vellore Institute of Technology
Mistral 22B
LoRA
Unicode Mapping
Assamese NLP
Low-Resource Languages
AssamText-750K