Overview
A research project exploring real-time American Sign Language recognition displayed on AR glasses. The system interprets hand gestures captured by a camera and overlays translated text onto the wearer's field of view, enabling seamless communication for the Deaf and Hard-of-Hearing community.
Role
Lead Researcher & Developer: designed the gesture recognition pipeline, trained the ML model, and built the AR overlay interface.
Tech Stack
Why This Project
Last semester, I took a speech class where one of my classmates was Deaf. He was engaged, but every interaction in that room depended on a real-time interpreter. Without one, communication broke down, not because he had nothing to say, but because the infrastructure wasn't there for him to say it independently. That hit close to home.
The Problem
of DHH individuals surveyed reported communication barriers directly impacting their career decisions and healthcare outcomes (2025 Nagish survey, 300+ respondents)
Communication between Deaf and Hard-of-Hearing (DHH) individuals and hearing individuals remains deeply fractured. Existing ASL recognition systems tend to be desktop-bound, require complex multi-camera setups, or depend on sensor gloves, none of which are portable, unobtrusive, or practical for everyday use.
Can a low-cost, wearable device recognize ASL in real time and display translations directly in the user's field of view, without requiring the signer to wear anything on their hands?
The Research and Approach
This project sat at the intersection of hardware engineering, machine learning, and human-centered design. I owned every layer of the stack, from optics to model training to physical enclosure.



Understanding the Landscape
Before building, I surveyed the existing work. Academic projects like SignGlass (UIST 2025) used multiple Raspberry Pi Zeros and three cameras to capture both manual and non-manual ASL markers. Glove-based systems like TransASL relied on IMU sensors strapped to the signer's hands. Both showed promise in recognition accuracy but traded away wearability and social acceptability, two things that matter enormously if you want DHH individuals to actually use the device in daily life. My design goal was different: a single, glasses-form-factor device where the camera watches the person signing toward the wearer, and translations appear on a heads-up display. No gloves, no tethered laptops, no multi-camera rigs.
Data & Model
Off-the-shelf ASL datasets didn't match the perspective I needed, so I built my own. I captured images of my hands forming each letter of the ASL alphabet, creating a custom dataset shot from the device's camera angle. The model was trained using Python, OpenCV, and TensorFlow, a straightforward image classification pipeline where the model learned to recognize each letter directly from the visual input. No landmark extraction or keypoint data was involved; the model works entirely from what the camera sees, which keeps the recognition pipeline simple and the training process accessible.



Vision Pipeline
The real-time pipeline is built on Python, OpenCV, and TensorFlow, all running on a Raspberry Pi 5 paired with a Camera Module v3. The camera captures the user's hand gestures, each frame is preprocessed through OpenCV, and then passed into the trained TensorFlow model for letter classification. Once the model identifies the ASL letter, the result is sent to the 0.96-inch OLED display, which projects it through the optical system into the wearer's field of view. The entire loop: capture, classify, display, runs continuously on-device, with no cloud dependency or external processing.



Hardware & Optics
The optical system was the most physics-intensive part of the build. The challenge: human vision can only comfortably focus on objects at a minimum distance of about 25 cm, but the OLED display sits just a few centimeters from the eye inside the glasses frame. Without correction, the text would be an unreadable blur.
To solve this, I designed an optical path using a 45-degree mirror, a convex plano lens with a 100mm focal length, and a reflector/combiner. The mirror redirects the OLED's output toward the lens, which magnifies and focuses the text so it appears as a virtual image at a readable distance. The reflector/combiner lets the wearer see both the projected text and the real world simultaneously, creating the actual AR effect.


With the OLED positioned ~61.5mm from the lens (f = 100mm), the virtual image forms at ~159.7mm, comfortably within the eye's focal range.
Getting these numbers right on paper was one thing. Physically aligning the mirror angle, lens position, and display brightness so the projection was actually sharp and legible took significant hands-on iteration.
The entire assembly: Raspberry Pi 5, Camera Module v3, OLED display, mirror, lens, and combiner were housed in a custom CAD-modeled enclosure designed to mount on a glasses-style frame.
Design Thinking Throughout
Even though this was a research prototype, I made design decisions with real-world wearability in mind. The glasses had to feel like something a person would actually put on in a conversation, not a lab apparatus bolted to their face.
Would someone actually wear this in public? A technically perfect system that nobody wants to put on doesn't solve anything.


Technical Architecture
What Makes This Different
Most ASL recognition research optimizes for accuracy on benchmark datasets. This project optimized for wearability and real-world usability, a fundamentally different design target.
Challenges & Learnings

Impact & Future Directions
This project demonstrated that real-time, wearable ASL recognition is achievable at low cost with off-the-shelf components. It's a proof of concept, not a finished product, but it proves the concept meaningfully.
Future directions
Tools & Technologies