Entropy Compass

Latent space navigation via diffusion-based image generation
and integrated prompt steering


2024
For: Harvard GSD 6365 Enactive Design
Role: Fullstack Web Developer
Partner: Una Liu (Harvard)






Research Questions


How can we expose hidden steps of the generative process?

What interface paradigms best support latent space manipulation?

How does gamification impact user engagement?



Overview


Generative AI models like Stable Diffusion offer unprecedented creative potential, yet their complex latent spaces remain mystical and challenging for users to understand and navigate. This research introduces Entropy Compass, an innovative interface design that gamifies exploring diffusion model latent spaces through bidirectional noise manipulation and interactive prompt steering. By exposing the typically hidden diffusion steps in the middle of latent space and enabling users to add or remove noise through intuitive drag-based interactions dynamically, our system transforms the black-boxed image generation process into a playful, natural, and exploratory experience.

We propose a novel interaction paradigm with two complementary interfaces: a slingshot-style 2D canvas where directional drags generate semantically related prompts and a drag-based 1D canvas that allows granular noise and embedding modifications. Leveraging techniques from CLIP embedding, natural language processing, and generative AI, our system generates contextually relevant prompt lists and enables users to meaningfully interact with image generation at multiple steps of the diffusion process.




AUTOMATIC 1111

Motivation


Current AI image generation interfaces are unintuitive and lack meaningful creative workflows





Proof-of-concept Experiments

In the process of adding noise and denoising, the concept of entropy becomes essential, as it implies the uncertainty and randomness introduced into the system. In the forward process, entropy increases as images get progressively transformed into noise, maximizing the randomness of pixels and removing identifiable structures. Conversely, the reverse process decreases entropy as the model iteratively reconstructs data into something that humans could perceive and understand, reintroducing meaningful patterns and structures to synthesize coherent outputs based on the prompt input.


Adding renoise and denoising
With the same noisy image, prompt steering influences the end result a great deal
Denoising with/without prompt
Pixel Modification
Noisy Mashup
Concentrated Renoising




Interaction Design: Gamefied Canvas


Our approach specifically targets the challenge of making abstract computational concepts tangible and manipulable, creating an interaction paradigm where users can intuitively understand how small changes in input can create distinctive yet meaningful variations in output. This not only enhances user engagement but also promotes a deeper, more experiential understanding and better learning of the underlying technological mechanisms.



Drag (1D Canvas)

  • Clearly demonstrates the concept of “forward” and “backward”
  • Easy to understand

Sling-shot (2D Canvas)


  • Prompt steering doesn’t require typing, AI suggestion embedded
  • Allows for branching and seeing unexpected correlations




Interaction Design: Prompting on Drag

Entropy Compass is founded on a theoretical framework that conceptualizes and reimagines diffusion processes as manipulable entropy systems, bridging the mathematical foundations of diffusion models with intuitive user interactions. At its core, the system transforms the typically opaque process of latent space navigation into a tangible, visually memorable, and interactive experience through two fundamental principles that lead its design and implementation.


The first principle is bidirectional control. As the forward and reverse processes in stable diffusion entail, the interface enables users to dynamically manipulate the noise levels within the generative process and visually understand adding noise and denoise. This approach allows for granular control over image evolution, providing users with the ability to both introduce and reduce entropy at various stages of the generation process. By making this traditionally hidden aspect of diffusion models directly manipulable, users gain unprecedented control over the creative process.



1D Canvas

\generate semantically related prompts through left/right and downward drag operations while controlling noise levels through drag distance

2D Canvas

generate semantically related prompts through directional drag operations while simultaneously controlling noise levels through drag distance



Interaction Design: Continuous Prompting
The second principle, semantic mapping, establishes a clear relationship between directional interactions and semantic changes in the latent space. This mapping creates an intuitive correlation between simple physical user actions and their effects on generated content, making the abstract concept and content within the latent space navigation more concrete and understandable.





Frontend Impelmentation: Mapped Data structure



The integration between frontend and backend components is orchestrated through a carefully designed API layer that maintains low latency while handling the complex data structures required for our unified canvas system. This architecture enables the seamless transition between 2D and 1D views while preserving all relevant parameters and state information, supporting the fluid creative exploration process that is central to our system's design philosophy.

Example of data doc structure




Backend AI Pipeline



The backend system is built upon several key components, with Stable Diffusion v1.5 serving as the core generative model. This foundation is augmented by computer vision, which involves the Hugging Face computer vision model, vit-gpt2-image-captioning, to provide a robust semantic caption of the uploaded image and every generated image. Then, the caption is passed onto ChatGPT via the OpenAI API 4o model to generate a list of the most similar words based on the caption in JSON format. With the list, a custom prompt steering algorithm maps out the prompt to both one-dimensional and two-dimensional canvas and leverages word embeddings to generate contextually relevant variations, allowing for intuitive exploration of the semantic space through user interactions.





Examples from Users







Kevin Tang © 2019 - 2024. All Rights Reserved