Entropy Compass

Latent space navigation via diffusion-based image generation and integrated prompt sterring

ROLE

Fullstack Web Developer

TEAM

Una Liu (Harvard)

TIMELINE

5 weeks (Nov – Dec 2024)

TOOLS

Vue.js p5.js Flask runwayML GPT-4o Vercel MongoDB

Overview

Generative AI models like Stable Diffusion offer unprecedented creative potential, yet their complex latent spaces remain mystical and challenging for users to understand and navigate.

This research introduces Entropy Compass, an innovative interface design that gamifies exploring diffusion model latent spaces through bidirectional noise manipulation and interactive prompt steering. By exposing the typically hidden diffusion steps in the middle of latent space and enabling users to add or remove noise through intuitive drag-based interactions dynamically, our system transforms the black-boxed image generation process into a playful, natural, and exploratory experience.

We propose a novel interaction paradigm with two complementary interfaces: a slingshot-style 2D canvas where directional drags generate semantically related prompts and a drag-based 1D canvas that allows granular noise and embedding modifications. Leveraging techniques from CLIP embedding, natural language processing, and generative AI, our system generates contextually relevant prompt lists and enables users to meaningfully interact with image generation at multiple steps of the diffusion process.

Research Questions
How can we expose hidden steps of the generative process?
What interface paradigms best support latent space manipulation?
How does gamification impact user engagement?

AUTOMATIC 1111

Motivation
Current AI image generation interfaces are unintuitive and lack meaningful creative workflows

Proof-of-concept Experiments
In the process of adding noise and denoising, the concept of entropy becomes essential, as it implies the uncertainty and randomness introduced into the system. In the forward process, entropy increases as images get progressively transformed into noise, maximizing the randomness of pixels and removing identifiable structures. Conversely, the reverse process decreases entropy as the model iteratively reconstructs data into something that humans could perceive and understand, reintroducing meaningful patterns and structures to synthesize coherent outputs based on the prompt input.

Adding renoise and denoising
With the same noisy image, prompt steering influences the end result a great deal
Denoising with/without prompt
Pixel Modification
Noisy Mashup
Concentrated Renoising

Interaction Design: Gamefied Canvas
Our approach specifically targets the challenge of making abstract computational concepts tangible and manipulable, creating an interaction paradigm where users can intuitively understand how small changes in input can create distinctive yet meaningful variations in output. This not only enhances user engagement but also promotes a deeper, more experiential understanding and better learning of the underlying technological mechanisms.

Drag (1D Canvas)

Clearly demonstrates the concept of “forward” and “backward”
Easy to understand

Sling-shot (2D Canvas)

Prompt steering doesn’t require typing, AI suggestion embedded
Allows for branching and seeing unexpected correlations

Interaction Design: Prompting on Drag

Entropy Compass is founded on a theoretical framework that conceptualizes and reimagines diffusion processes as manipulable entropy systems, bridging the mathematical foundations of diffusion models with intuitive user interactions. At its core, the system transforms the typically opaque process of latent space navigation into a tangible, visually memorable, and interactive experience through two fundamental principles that lead its design and implementation.

The first principle is bidirectional control. As the forward and reverse processes in stable diffusion entail, the interface enables users to dynamically manipulate the noise levels within the generative process and visually understand adding noise and denoise. This approach allows for granular control over image evolution, providing users with the ability to both introduce and reduce entropy at various stages of the generation process. By making this traditionally hidden aspect of diffusion models directly manipulable, users gain unprecedented control over the creative process.

1D Canvas

generate semantically related prompts through left/right and downward drag operations while controlling noise levels through drag distance

2D Canvas

generate semantically related prompts through directional drag operations while simultaneously controlling noise levels through drag distance

Interaction Design: Continuous Prompting

The second principle, semantic mapping, establishes a clear relationship between directional interactions and semantic changes in the latent space. This mapping creates an intuitive correlation between simple physical user actions and their effects on generated content, making the abstract concept and content within the latent space navigation more concrete and understandable.

Frontend Impelmentation: Mapped Data structure

The integration between frontend and backend components is orchestrated through a carefully designed API layer that maintains low latency while handling the complex data structures required for our unified canvas system. This architecture enables the seamless transition between 2D and 1D views while preserving all relevant parameters and state information, supporting the fluid creative exploration process that is central to our system's design philosophy.

Example of data doc structure

Backend AI Pipeline

The backend system is built upon several key components, with Stable Diffusion v1.5 serving as the core generative model. This foundation is augmented by computer vision, which involves the Hugging Face computer vision model, vit-gpt2-image-captioning, to provide a robust semantic caption of the uploaded image and every generated image. Then, the caption is passed onto ChatGPT via the OpenAI API 4o model to generate a list of the most similar words based on the caption in JSON format. With the list, a custom prompt steering algorithm maps out the prompt to both one-dimensional and two-dimensional canvas and leverages word embeddings to generate contextually relevant variations, allowing for intuitive exploration of the semantic space through user interactions.

Examples from Users

Kevin Tang

Selected Works

About

CV

Fullstack Web Developer

Una Liu (Harvard)

5 weeks (Nov – Dec 2024)

Generative AI models like Stable Diffusion offer unprecedented creative potential, yet their complex latent spaces remain mystical and challenging for users to understand and navigate.

Adding renoise and denoising
With the same noisy image, prompt steering influences the end result a great deal
Denoising with/without prompt
Pixel Modification
Noisy Mashup
Concentrated Renoising

Drag (1D Canvas)

Sling-shot (2D Canvas)

1D Canvas

2D Canvas

Kevin Tang

Selected Works

About

CV

Fullstack Web Developer

Una Liu (Harvard)

5 weeks (Nov – Dec 2024)

Generative AI models like Stable Diffusion offer unprecedented creative potential, yet their complex latent spaces remain mystical and challenging for users to understand and navigate.

Adding renoise and denoisingWith the same noisy image, prompt steering influences the end result a great dealDenoising with/without promptPixel ModificationNoisy MashupConcentrated Renoising

Drag (1D Canvas)

Sling-shot (2D Canvas)

1D Canvas

2D Canvas

Adding renoise and denoising
With the same noisy image, prompt steering influences the end result a great deal
Denoising with/without prompt
Pixel Modification
Noisy Mashup
Concentrated Renoising