Many conversations are ongoing around image-generating machine learning “AI” such as Midjourney and StableDiffusion. Many of those conversations raise valid viewpoints and objections, and are worth careful deliberation. One of the more popular arguments I’ve seen lately centers around need. “We don’t need AI to make art,” one popular tweet opines. “We need AI to write emails and clean the house and deliver the groceries so humans can make more art.”
Sure, AI doing those latter things will help us. But there’s a problem we should be careful around when we declare what is “needed” and “not needed.” People have different needs and different capabilities, and if we’re gonna start trimming it all for others with a needs brush as broad as a highway, we’re already excluding certain populations and their needs from the conversation. From my perspective as an autistic adult, if approached and utilized responsibly then image-generating ML can be a powerful tool for both expressing and exploring emotional states and other feelings that are sometimes directly inaccessible.
There are a few caveats to start with:
- I am not a mental health professional. I’m just a person who engages with as much new technology as I can, mindfully, in order to find the sharp edges or to ease difficult spots in my experience. This is not expert advice.
- My use is centered around a local, non-cloud Stable Diffusion instance on a personal computer that I control. The data involved in this practice can be very sensitive and I would not do it with a cloud-based setup like Midjourney, or any other setup that I do not control myself.
As an autistic adult I go through various experiences of neurodivergence depending on circumstance, privilege, and luck. I’m a successful cybersecurity professional and yet also face periods of extreme overwhelm, depression, anxiety, and more. Depression and anxiety are both “liars” – they provide false cognitive narratives that only serve to tighten their own grip, often inaccurate. And autism can add a further, more complex veil over things. Complex interplay between it all creates an environment in which there are times I cannot identify my own emotions or their root, nor do I feel like I can express them effectively.
Which is frustrating as hell.
A while ago I added a post-it-note to my desk to refer to often: “If you’re frustrated, step back. Things go worse otherwise.” And it’s painfully true – if I feel I’m not communicating effectively I get intensely frustrated at myself, and negative emotions escalate, and I try harder, which leads to more frustration, etc. The closed-loop feedback mechanism of it all is infuriating. After more than forty years navigating all this you’d think I’d be better at it, and yet…
There are many mechanisms to intervene in this kind of cycle, express feelings, and interrogate their root, from ancient Buddhist and other mind techniques to modern-day psychotherapy; many are helpful. But it’s always worth adding to the toolchest, and that’s one of the contexts in which I engaged with Stable Diffusion (SD) image-generating machine learning/AI.
I use a Windows-friendly version of SD that comes with a prepackaged graphical user interface so that I don’t have to use the command line. It makes the system more accessible, which is key if I want to avoid further friction amidst frustration or other emotions. There are some system requirements to mind, including having a half-decent NVIDIA graphics card (though there are some other setups that use Apple M1 Macbooks instead). And to reiterate the above: the point of using SD over something like Midjourney is retaining control of the data you create; in this process, the content of the prompts the ML uses to create images.
In experimenting I’ve found that SD can help me express emotions that feel otherwise unexpressible; that feel stuck in my throat, or my chest, or deeper. And in expressing them I can explore those emotions in ways that I was perhaps not cognitively prepared to – which is wordy-as-hell, so let me simplify the process a little:
- I enter the prompt into the SD interface, tweak a few settings, and tell SD to create a few test images.
- A few minutes later I review the test images and ask myself: are the produced images identifiably close to what I’m feeling? What words do I need to change to see on the screen something nearer to what’s stuck behind my eyes?
- I refine and re-submit the prompt to SD and create a new set of images. Once I feel I’m on the right track I’ll often “set it and forget it”: I tweak n_iter to 100 to produce one hundred iterations off a single prompt. Later I’ll come back and review the whole set at once to, hopefully, find something satisfying.
- Often I’ll find in forming or refining the prompt and then reviewing the image sets that elements of what I’m feeling become more obvious, more accessible on a conscious level. There’s a good, qualitative feedback loop in this generative (creative!) process – it not only generates images, but approached mindfully, generates insight within me about myself.
I’ll provide a few recent examples to illustrate.
Prompt: detailed realistic photograph of a trellis made of light over a popular dark walkway at night, trellis leaking photons on passersby, with a hint of peril, cinematic, science fiction, futuristic
Process: I worked on this prompt amidst no small amount of uncertainty and overwhelm at my path and my current place in it; this much was obvious going into it, given words like “walkway” and “peril.” The idea of being bombarded by light while walking the path held a clear connection to the overwhelm I was experiencing.
Discovery: What I didn’t expect to find, and yet what was undeniable after I chose the representative images out of a hundred, was an accompanying sense of awe. It’s not something that had been given words but it felt accurate as I interrogated my feelings and found, along with the overwhelm and the uncertainty and the sensory assault that I am still walking my path with a sense of proper awe at where I am and where I’m headed. And that awe can help motivate and propel me further. So this exercise both helped me express and explore, as well as adding something else within my experiential toolchest to work with directly.
Prompt: infrared image of a mass of chaotically violently swirling gases with a single section in the center struggling to stay ordered and stable, high dynamic range, cyberpunk
Process: Obviously feeling overwhelmed and blown about when working with this prompt, without any proper mooring. And yet while the struggle was evident I could identify even within myself a certain core working to stay grounded and not lose what atomic progress I have made over the years. And it helped with a certain process of identification: if anything I am the core, not the ephemeral gases, nor the violent winds.
Initial prompt: person protected by armor made from steel spikes turned inward, being overwhelmed by hostile electromagnetic waves from all directions
Final refined prompt: person protected by futuristic battle armor made from carbon spikes turned inward, being overwhelmed by hostile electromagnetic waves from all directions, cinematic, science fiction, combat, action
Process: Specifically amidst a period of sensory overwhelm, and also some deep and serious criticism over self-inflicted processes I set up in order to seemingly protect myself. What was interesting, as I moved through several refinements of the prompt, was the move from armored-but-passive to a much more active scenario. Combat and action added, because (I only recognized after) I wanted the image generated to reflect that I was in active combat, still very much in the fight despite feeling hopeless and overwhelmed trying to fend off vaporous, ungraspable opposition. It may not all be flowers and rainbows, but that move from passive to active is a hell of an improvement.
Last Example, and a few parting thoughts
Prompt: single hopeful plant growing from a seed that grew through a crack in an inhospitable hostile desolate unfinished concrete basement, plant growing snaking upward toward a single source of light, dark but hopeful, solarpunk, growth, negative exposure
There are many valid criticisms about Stable Diffusion and related generative machine-learning systems. My intent is not to handwave mystically and make that all disappear, but present that things like SD can be a new tool in a toolbox for expression and exploration of one’s self, especially when expression and exploration can be frustrating and cognitively hazardous.
As a layperson but an autistic, a depressive, a person with anxiety, a swirling mass of complexities seeking a solid core, I can see its utility, and a path towards even less verbal folks among us perhaps finding new ways to make their feelings known both inward and outward. The process isn’t linear – but a chaotic overlap of feedback loops to navigate and find out.
So when we discuss generative machine learning I would prefer us all to keep a more open mind, and be careful of declaring who needs what, and how, and why; needs present themselves in different ways, and so do people. Whether it’s an image-generating AI helping a frustrated geek understand his feelings better, or a somewhat wonky, troubling, confidently wrong chat system that further advanced may help societal loneliness, there is potential. We need to explore it carefully, be mindful of impact, and elevate the experience and needs of marginalized peoples very aggressively; but damn it, we need to explore it, not shut it down.