Entropy Compass
Latent space navigation via diffusion-based image generation
and integrated prompt steering
2024
For: Harvard GSD 6365 Enactive Design
Role: Fullstack Web Developer
Partner: Una Liu (Harvard)
Research Questions
How can we expose hidden steps of the generative process?
What interface paradigms best support latent space manipulation?
How does gamification impact user engagement?
Overview
Generative AI models like Stable Diffusion offer unprecedented creative potential, yet their complex latent spaces remain mystical and challenging for users to understand and navigate. This research introduces Entropy Compass, an innovative interface design that gamifies exploring diffusion model latent spaces through bidirectional noise manipulation and interactive prompt steering. By exposing the typically hidden diffusion steps in the middle of latent space and enabling users to add or remove noise through intuitive drag-based interactions dynamically, our system transforms the black-boxed image generation process into a playful, natural, and exploratory experience.
We propose a novel interaction paradigm with two complementary interfaces: a slingshot-style 2D canvas where directional drags generate semantically related prompts and a drag-based 1D canvas that allows granular noise and embedding modifications. Leveraging techniques from CLIP embedding, natural language processing, and generative AI, our system generates contextually relevant prompt lists and enables users to meaningfully interact with image generation at multiple steps of the diffusion process.
Motivation
Current AI image generation interfaces are unintuitive and lack meaningful creative workflows
Proof-of-concept Experiments
In the process of adding noise and denoising, the concept of entropy becomes essential, as it implies the uncertainty and randomness introduced into the system. In the forward process, entropy increases as images get progressively transformed into noise, maximizing the randomness of pixels and removing identifiable structures. Conversely, the reverse process decreases entropy as the model iteratively reconstructs data into something that humans could perceive and understand, reintroducing meaningful patterns and structures to synthesize coherent outputs based on the prompt input.
Adding renoise and denoising With the same noisy image, prompt steering influences the end result a great deal Denoising with/without prompt Pixel Modification Noisy Mashup Concentrated Renoising
Pixel Modification
Noisy Mashup
Concentrated Renoising
Interaction Design: Gamefied Canvas
Our approach specifically targets the challenge of making abstract computational concepts tangible and manipulable, creating an interaction paradigm where users can intuitively understand how small changes in input can create distinctive yet meaningful variations in output. This not only enhances user engagement but also promotes a deeper, more experiential understanding and better learning of the underlying technological mechanisms.
Drag (1D Canvas)
- Clearly demonstrates the concept of “forward” and “backward”
- Easy to understand
Sling-shot (2D Canvas)
- Prompt steering doesn’t require typing, AI suggestion embedded
- Allows for branching and seeing unexpected correlations
Interaction Design: Prompting on Drag
Entropy Compass is founded on a theoretical framework that conceptualizes and reimagines diffusion processes as manipulable entropy systems, bridging the mathematical foundations of diffusion models with intuitive user interactions. At its core, the system transforms the typically opaque process of latent space navigation into a tangible, visually memorable, and interactive experience through two fundamental principles that lead its design and implementation.The first principle is bidirectional control. As the forward and reverse processes in stable diffusion entail, the interface enables users to dynamically manipulate the noise levels within the generative process and visually understand adding noise and denoise. This approach allows for granular control over image evolution, providing users with the ability to both introduce and reduce entropy at various stages of the generation process. By making this traditionally hidden aspect of diffusion models directly manipulable, users gain unprecedented control over the creative process.
1D Canvas
\generate semantically related prompts through left/right and downward drag operations while controlling noise levels through drag distance2D Canvas
generate semantically related prompts through directional drag operations while simultaneously controlling noise levels through drag distanceFrontend Impelmentation: Mapped Data structure
The integration between frontend and backend components is orchestrated through a carefully designed API layer that maintains low latency while handling the complex data structures required for our unified canvas system. This architecture enables the seamless transition between 2D and 1D views while preserving all relevant parameters and state information, supporting the fluid creative exploration process that is central to our system's design philosophy.
Backend AI Pipeline
The backend system is built upon several key components, with Stable Diffusion v1.5 serving as the core generative model. This foundation is augmented by computer vision, which involves the Hugging Face computer vision model, vit-gpt2-image-captioning, to provide a robust semantic caption of the uploaded image and every generated image. Then, the caption is passed onto ChatGPT via the OpenAI API 4o model to generate a list of the most similar words based on the caption in JSON format. With the list, a custom prompt steering algorithm maps out the prompt to both one-dimensional and two-dimensional canvas and leverages word embeddings to generate contextually relevant variations, allowing for intuitive exploration of the semantic space through user interactions.