Overview Why This Project The Problem The Research and Approach Technical Architecture What Makes This Different Challenges & Learnings Impact & Future Directions

AR Smart Glasses for Real-Time ASL Recognition & Translation

Overview

A research project exploring real-time American Sign Language recognition displayed on AR glasses. The system interprets hand gestures captured by a camera and overlays translated text onto the wearer's field of view, enabling seamless communication for the Deaf and Hard-of-Hearing community.

Role

Lead Researcher & Developer: designed the gesture recognition pipeline, trained the ML model, and built the AR overlay interface.

Tech Stack

PythonTensorFlowOpenCVRaspberry Pi 5CAD Modeling

Why This Project

Last semester, I took a speech class where one of my classmates was Deaf. He was engaged, but every interaction in that room depended on a real-time interpreter. Without one, communication broke down, not because he had nothing to say, but because the infrastructure wasn't there for him to say it independently. That hit close to home.

The Problem

62 - 66%

of DHH individuals surveyed reported communication barriers directly impacting their career decisions and healthcare outcomes (2025 Nagish survey, 300+ respondents)

Communication between Deaf and Hard-of-Hearing (DHH) individuals and hearing individuals remains deeply fractured. Existing ASL recognition systems tend to be desktop-bound, require complex multi-camera setups, or depend on sensor gloves, none of which are portable, unobtrusive, or practical for everyday use.

Can a low-cost, wearable device recognize ASL in real time and display translations directly in the user's field of view, without requiring the signer to wear anything on their hands?

The Research and Approach

This project sat at the intersection of hardware engineering, machine learning, and human-centered design. I owned every layer of the stack, from optics to model training to physical enclosure.

Understanding the Landscape

Before building, I surveyed the existing work. Academic projects like SignGlass (UIST 2025) used multiple Raspberry Pi Zeros and three cameras to capture both manual and non-manual ASL markers. Glove-based systems like TransASL relied on IMU sensors strapped to the signer's hands. Both showed promise in recognition accuracy but traded away wearability and social acceptability, two things that matter enormously if you want DHH individuals to actually use the device in daily life. My design goal was different: a single, glasses-form-factor device where the camera watches the person signing toward the wearer, and translations appear on a heads-up display. No gloves, no tethered laptops, no multi-camera rigs.

Data & Model

Off-the-shelf ASL datasets didn't match the perspective I needed, so I built my own. I captured images of my hands forming each letter of the ASL alphabet, creating a custom dataset shot from the device's camera angle. The model was trained using Python, OpenCV, and TensorFlow, a straightforward image classification pipeline where the model learned to recognize each letter directly from the visual input. No landmark extraction or keypoint data was involved; the model works entirely from what the camera sees, which keeps the recognition pipeline simple and the training process accessible.

Vision Pipeline

The real-time pipeline is built on Python, OpenCV, and TensorFlow, all running on a Raspberry Pi 5 paired with a Camera Module v3. The camera captures the user's hand gestures, each frame is preprocessed through OpenCV, and then passed into the trained TensorFlow model for letter classification. Once the model identifies the ASL letter, the result is sent to the 0.96-inch OLED display, which projects it through the optical system into the wearer's field of view. The entire loop: capture, classify, display, runs continuously on-device, with no cloud dependency or external processing.

Hardware & Optics

The optical system was the most physics-intensive part of the build. The challenge: human vision can only comfortably focus on objects at a minimum distance of about 25 cm, but the OLED display sits just a few centimeters from the eye inside the glasses frame. Without correction, the text would be an unreadable blur.

To solve this, I designed an optical path using a 45-degree mirror, a convex plano lens with a 100mm focal length, and a reflector/combiner. The mirror redirects the OLED's output toward the lens, which magnifies and focuses the text so it appears as a virtual image at a readable distance. The reflector/combiner lets the wearer see both the projected text and the real world simultaneously, creating the actual AR effect.

1/f = 1/o + 1/i

With the OLED positioned ~61.5mm from the lens (f = 100mm), the virtual image forms at ~159.7mm, comfortably within the eye's focal range.

Getting these numbers right on paper was one thing. Physically aligning the mirror angle, lens position, and display brightness so the projection was actually sharp and legible took significant hands-on iteration.

The entire assembly: Raspberry Pi 5, Camera Module v3, OLED display, mirror, lens, and combiner were housed in a custom CAD-modeled enclosure designed to mount on a glasses-style frame.

Design Thinking Throughout

Even though this was a research prototype, I made design decisions with real-world wearability in mind. The glasses had to feel like something a person would actually put on in a conversation, not a lab apparatus bolted to their face.

Single front-facing cameraSimpler, lighter, less conspicuous than a multi-camera array.

0.96-inch OLED displayKept the frame compact rather than using a larger, more obtrusive screen.

Reflector/combiner opticsPreserves the wearer's view of the real world so they stay present in the conversation rather than staring into a screen.

Iterated CAD enclosureMultiple revisions to balance housing all components while keeping the form factor as close to normal glasses as the hardware allowed.

Would someone actually wear this in public? A technically perfect system that nobody wants to put on doesn't solve anything.

Technical Architecture

01InputPi Camera v3Front-facing, glasses-mounted

→

02ProcessingRaspberry Pi 5On-body, wearable

→

03PreprocessingOpenCVFrame capture and image processing

→

04ClassificationTensorFlowCustom model trained on ASL alphabet images

→

05Output0.96" OLEDProjected through mirror, lens, and combiner

EnclosureCustom CAD-modeled wearable housingAll components mounted on a glasses-style frame

What Makes This Different

Most ASL recognition research optimizes for accuracy on benchmark datasets. This project optimized for wearability and real-world usability, a fundamentally different design target.

No instrumentation on the signerThe camera watches the other person's hands, so the person signing doesn't need to wear sensors or gloves. This preserves the natural flow of ASL communication.

Heads-up display translationTranslations appear in the wearer's field of view through a real optical system, not on a phone screen or desktop monitor. The wearer stays present in the conversation.

Single-camera, single-board simplicityUnlike multi-camera or multi-Pi setups, this uses one camera and one Raspberry Pi, keeping cost low and the form factor small.

Full optical pipelineNot a screen strapped to a frame: a proper mirror, lens, and combiner system that creates a focused virtual image at a comfortable viewing distance.

End-to-end ownershipFrom dataset creation to model training to optical calibration to physical enclosure design: a full-stack research effort.

Challenges & Learnings

Inference speed on edge hardwareTensorFlow models that run comfortably on a laptop can choke on a Raspberry Pi. I had to strip the model down, reduce input resolution, and profile every stage of the pipeline to find bottlenecks.

Optical calibration was more art than scienceThe thin-lens equation gives you a starting point, but the real-world variables, lens imperfections, display brightness, ambient light, meant a lot of hands-on iteration to get a usable image.

Dataset bias was a constant concernA small, self-collected dataset means the model works well for the hands and signing styles it's seen, but generalization remains an open problem. Expanding the dataset with more diverse signers is the clearest path to improving robustness.

Impact & Future Directions

This project demonstrated that real-time, wearable ASL recognition is achievable at low cost with off-the-shelf components. It's a proof of concept, not a finished product, but it proves the concept meaningfully.

Future directions

—Find a smaller focusing lens and OLED screen to reduce weight and improve wearability

—Expand beyond the alphabet to include words and dynamic signs

—Explore alternative detection methods like hand landmark data for potentially faster inference

—Integrate non-manual markers like facial expressions, critical for ASL grammar

—Involve DHH users in co-design sessions to validate the form factor and interaction model

Tools & Technologies

PythonTensorFlowOpenCVRaspberry Pi 5Camera Module v30.96" OLED Display45° MirrorConvex Plano Lens (100mm)Reflector/CombinerCAD ModelingThin-Lens Optical Design

Next Project

Queue Up