Eye-paratus (2023)

Building Spatial Augmented Reality for Explainable Robot Vision

For: MEDIUMS Proseminar
Project by: Kevin Tang
Special Thanks: Matte Lim



Introduction:

In our very recent history, the surge of large digital image datasets has fueled the advancement of deep-learning algorithms in computer vision. It has thus far been fixated on teaching computers to process, analyze and understand digital images and extract usable information from the real world through image processing. However, the social and cultural aspects of the machine’s ability to see the physical world remain largely unexplored.



Concept: 

Eye-paratus addresses the disconnect between humans and machine perception, focusing on our challenge in intuitively “seeing” what a machine sees through its camera and algorithms. It explores the potential of projection technology as a new way to navigate this socio-technical landscape.

A spatial merging of perceptions between human and machine.

Eye_paratus explores the interaction between the human and machine perception by combining light projection and algorithmic processing to externalize machine vision into the physcial world.



a) What the Machine Sees
(OpenCV Object Detection)


b) Externalising What the Machine Sees
(Projection Mapping onto Real Objects)







CAD + Prototype: 


a) three stepper motors controlling rotation, tilt, and focus
b) three gear based movements
c) one Lidar and one Depth Camera




Lidar, Depth Camera & Motor Control:


Camera + Motor
Focus Motor
Projector Auto-Focus
Rotation
1. Lidar Tracking Nearest Object

2. Real-time Updated Target Rotation


Human-Projector Interaction :

A series of experiments was done to explore the interaction one might have with the projector. How does the machine see? Where does it see? How does it communicate what it is seeing to us in an intuitive manner?




Objection Detection + Projection Mapping (OpenCV):

Eye-paratus operates on Mediapipe’s object detection model to recognize 80 distinct objects. By analyzing camera inputs frame by frame, it defines what it “sees” and only information on these “seen” objects are saved and the rest gets thrown out. Conversely, what it fails to detect, it doesn’t see”. 

As it projects its perception, the illuminated frames signify visibility, while the unilluminated parts of the projector and the space not lit by the machine, especially in the absence of artificial light, become enveloped in “artificial darknses”. This darkness becomes sites of invisibility, a space where objects and elements outside the machine’s programmed recognition remain unseen and unacknowledged.


1. Projecting red boundingbox to highlight detected objects
2. Removing White Background to create a more immersive object highlight

3. Red box is removed with white outlines to further enhance visual overlay


4. Green dot represents “target of sight” for the machine