Abstract
This paper presents a real-time, bidirectional American Sign Language (ASL) communication system that enables translation between ASL gestures and spoken English. The system integrates computer vision and deep learning to recognize hand signs and utilizes a Unity-based avatar to render spoken input as animated ASL gestures. Designed for accessibility and low-cost deployment, it runs on consumer-grade hardware-comprising a standard webcam, microphone, and mid-range laptop-without requiring specialized equipment such as gloves or depth sensors. A convolutional neural network (CNN) trained on a curated ASL alphabet dataset achieves 92% accuracy in letter recognition, with average response latency below 300 milliseconds. Spoken language is transcribed using the Google Web Speech API and visualized in near real-time. The system supports adaptive retraining through a user feedback loop to enable personalization. Emphasis is placed on inclusive design, practical usability, and potential deployment in VR/AR environments. This paper details the system architecture, methodology, dataset, evaluation metrics, and broader implications, highlighting a real-time, low-cost foundation for scalable and inclusive communication, with current support for alphabet-level gesture recognition and phrase-based ASL avatar responses.