Eye_paratus
Building spatial augemented reality for explainable robot vision
ROLE
Lead Prototyper (Hardware, integration)
TEAM
Matte Lim (Harvard)
Matte Lim (Harvard)
TIMELINE
8 weeks (Oct -Dec 2023)
TOOLS
Overview
Eye-paratus is a novel system that externalizes a robot’s internal vision processes by using Spatial Augmented Reality (SAR) to project real-time object-detection overlays directly onto physical objects.
By combining an RGBD or webcam camera, a video projector, and a low-cost computing platform, Eye-paratus makes the “black box” of robot vision transparent and socially accessible, allowing multiple users to see exactly what the robot perceives without wearing any special equipment.Problem FindingAs robots become more ubiquitous—moving beyond industrial settings into homes, hospitals, and public spaces—their sophisticated sensor suites (e.g., cameras, LiDAR, microphones) often go unnoticed by humans, creating a “passive communication gap” in Human–Robot Interaction (HRI).
While robots continuously record and classify their surroundings, people are rarely aware of what robots see or how they interpret their environment, leading to mistrust and unpredictability in collaborative scenarios. Traditional approaches to robot transparency often rely on anthropomorphic cues (e.g., digital eyes or head motions), which can be ambiguous, culturally biased, or misleading; they may improve predictability of behavior but do not meaningfully convey complex internal states.
Eye-paratus addresses this gap by offering a direct, image-based means of showing real-time vision outputs, ensuring humans stay “in the loop” and fostering trust through clarity rather than anthropomorphism.
ConceptAt the heart of Eye-paratus is the decision to externalize “sight”—specifically, object detection—through projection rather than relying on head-mounted displays or human-like gaze cues.
Instead of abstract symbols or artificial eyes, Eye-paratus projects bounding-box overlays and labels directly onto real-world objects, leveraging the intuitive power of images to communicate exactly what the robot “sees.” This avoids misleading users into attributing agency or sentience to the machine.
By mapping the robot’s detection outputs onto physical surfaces—rather than animating a symbolic gaze—the system removes ambiguity around referential intent, enabling users to focus on the shared environment instead of toggling attention between robot and object.
Eye_paratus explores the interaction between the human and machine perception by combining light projection and algorithmic processing to externalize machine vision into the physcial world.
Hardware
Lidar, Depth Camera & Motor Control
Camera + Motor
Focus Motor
Projector Auto-Focus
Rotation
Lidar Tracking Nearest Object
Real-time Updated Target Rotation
Software
The software stack runs on Python and leverages OpenCV to preprocess camera frames before feeding them into Mediapipe’s object‐detection model (trained on COCO to recognize 80 object classes). In each frame, detected objects produce x,y coordinates and class labels. A real-time calibration routine computes a projective transform that compensates for parallax between camera and projector, ensuring that projected bounding boxes align with real‐world objects.
A series of experiments was done to explore the interaction one might have with the projector. How does the machine see? Where does it see? How does it communicate what it is seeing to us in an intuitive manner?
Eye-paratus operates on Mediapipe’s object detection model to recognize 80 distinct objects. By analyzing camera inputs frame by frame, it defines what it “sees” and only information on these “seen” objects are saved and the rest gets thrown out. Conversely, what it fails to detect, it doesn’t see”.
Projecting red boundingbox to highlight detected objects
As it projects its perception, the illuminated frames signify visibility, while the unilluminated parts of the projector and the space not lit by the machine, especially in the absence of artificial light, become enveloped in “artificial darknses”. This darkness becomes sites of invisibility, a space where objects and elements outside the machine’s programmed recognition remain unseen and unacknowledged.
Red box is removed with white outlines to further enhance visual overlay
Green dot represents “target of sight” for the machine
New Affordances for HRI
- Digital Finger: Instead of requiring users to interact with the projected surface, Eye-paratus uses projection purely to convey information. Projected cues can function like a “digital finger” or arm, directing attention, offering wayfinding guidance, or acting as an extra manipulator. This capability extends the robot’s communicative and interactive reach beyond its physical form.
- Real-Time Correction: By projecting gaze information directly onto the interaction site, Eye-paratus creates a faster feedback loop: humans can immediately verify or correct what the robot sees without diverting attention. This real-time validation is critical for tasks requiring precision and mutual understanding.