Rahul Hathwar - Programming

Project Overview

Prize winner at Hack Midwest 2024. Badger Vision is a web application that utilizes Zoom's real-time video transmission features to help identify faces for the visually impaired. By combining computer vision, machine learning, and accessible interface design, this project provides real-time audio descriptions of people in the user's field of view.

Tech Stack:

React + TypeScript (frontend)
Python (computer vision backend)
Zoom Video SDK
Pinata (IPFS storage)
Face recognition ML models
Text-to-speech synthesis

Achievement:

🏆 Prize Winner at Hack Midwest 2024
Built in 24 hours
Working prototype with live video demo
Addresses critical accessibility need

The Problem

Social Isolation for the Visually Impaired

Visually impaired individuals face significant challenges in social settings:

Cannot identify who is speaking or entering a room
Miss nonverbal cues like facial expressions
Difficulty navigating crowded spaces
Social anxiety from not recognizing familiar faces
Professional challenges in meetings and networking

Our Solution: Real-time facial recognition with audio feedback, allowing users to "see" through sound who is around them.

Technical Architecture

System Flow

┌─────────────┐      ┌──────────────┐      ┌────────────────┐
│  Zoom Video │ ───> │  React App   │ ───> │ Python Backend │
│     SDK     │      │  (Frontend)  │      │  (CV/ML)       │
└─────────────┘      └──────────────┘      └────────────────┘
                            │                        │
                            │                        │
                            v                        v
                    ┌──────────────┐        ┌───────────────┐
                    │ Text-to-     │        │  Face         │
                    │ Speech       │        │  Recognition  │
                    └──────────────┘        │  Models       │
                                            └───────────────┘
                                                     │
                                                     v
                                            ┌───────────────┐
                                            │  Pinata/IPFS  │
                                            │  (Storage)    │
                                            └───────────────┘

Component Breakdown

Frontend (React + TypeScript):

User interface for setup and configuration
Zoom Video SDK integration
Frame capture and transmission
Audio feedback controls
Face database management UI

Backend (Python):

Face detection (face_recognition library)
Face encoding and matching
API endpoints for frame analysis
Database queries
Performance optimization

Storage (Pinata/IPFS):

Face photo storage
Face encoding metadata
User preferences
Usage analytics

Hackathon Development

36-Hour Timeline

Hour 0-4: Research & Planning

Identified accessibility gap
Researched face recognition libraries
Explored Zoom SDK capabilities
Defined MVP feature set
Set up development environment

Hour 4-16: Core Implementation

Implemented face recognition system
Integrated Zoom Video SDK
Built React UI components
Created API endpoints
Tested basic face detection

Hour 16-28: Integration & Features

Connected frontend to backend
Implemented audio feedback
Added Pinata/IPFS storage
Performance optimization
Cross-browser testing

Hour 28-36: Polish & Demo

UI refinements
Edge case handling
Demo preparation
Presentation creation
Live testing with judges

Future Enhancements

Technical Improvements:

Mobile app: Native iOS/Android versions
Wearable integration: Smart glasses with camera
Emotion detection: Identify facial expressions
Gesture recognition: Detect waving, pointing
Object detection: Identify obstacles, not just people
Multi-language: Support for international users

Feature Additions:

Social context: Remember where you met people
Conversation history: Recall previous interactions
Group dynamics: Understand social groupings
Accessibility modes: Different levels of detail
Offline mode: Pre-loaded faces for common locations

Conclusion

Badger Vision demonstrates that emerging technologies can be harnessed to solve fundamental accessibility challenges. By combining computer vision, real-time video streaming, and thoughtful interface design, we created a tool that could transform daily life for visually impaired individuals.

This project showcases:

Cross-disciplinary skills: ML, web development, accessibility design
Rapid prototyping: 36 hours to working product
Social awareness: Engineering for inclusivity
Technical integration: Multiple complex systems working together
Competitive success: Prize at major hackathon

More importantly, it reinforces that technology should serve human needs. The best innovations don't just showcase technical prowess, they make life meaningfully better for real people. Badger Vision aspires to do both.