Back to Projects

SG-English ASR

2025-02-15

Machine LearningNLPPyTorch

SG-English ASR

Overview

Fine-tuned speech recognition models specifically for Singaporean English accents using Whisper and Wav2Vec2.

Key Achievements

Reduced Word Error Rate (WER) from 122.81% to 21.05%
Improved Character Error Rate (CER) from 123.61% to 6.94%
Decreased inference time by 58.5% (from 0.423s to 0.176s per sample)

Technical Approach

Model Architecture

Used two state-of-the-art models:

Whisper Base - OpenAI's multilingual speech recognition model
Wav2Vec2 - Facebook's self-supervised speech representation model

Fine-tuning Strategy

Applied LoRA (Low-Rank Adaptation) for efficient fine-tuning
Used Optuna for hyperparameter optimization
Trained on Singaporean English dataset with diverse accents

Dataset

Preprocessed audio files in FLAC format
Focused on conversational and technical vocabulary
Included domain-specific terms common in Singapore

Results Analysis

The fine-tuned model showed significant improvement in handling:

Singaporean accent variations
Code-switching between English and local languages
Domain-specific terminology

Future Work

Expand dataset to include more diverse accents
Implement real-time transcription
Optimize for mobile deployment

GitHub Repository