Thoughts
Midjourney is an AI tool that helps you create visual imagery through text prompts. Ask it for an image of a field of roses with cows grazing, and you'll soon have yourself a beautiful farm. It's similar to chatGPT, in that written statements provoke artificially-generated new things. But what's different is that Midjourney is housed in Discord, which is a mostly-public chat tool. When you use Midjourney, everyone else can see what you're doing. And that's one of the best product decision I've ever seen.
If you've never used Midjourney (or Discord), take a sidetrip over, give it a shot, and then come on back.It’s questionable if typing prompts into a command line "counts" as being creative, but there's no doubt that the environment of Midjourney feels creative. It's fast paced, exciting, colorful, experiential, and unpredictable. It reminds me of how, in his book Interface Culture, author Steven Johnson described his first experience playing the original Sonic The Hedgehog:
"For all the kinesis, the hapless Sonic addict had little control over the onscreen character's actions; there were really only two options—jump and go faster—and pretty much any combination of those two would produce something interesting on the screen.”
Midjourney is the same way: throw something, anything, at the prompt, and you’ll get something interesting. But the feeling of being in a creative environment doesn’t come from what you do, but from trying desperately to keep up with what everyone else is doing. Creative and highly visual stimuli are flying back and forth at record speeds. And once you realize you can contribute to the show, it suddenly occurs to you that what's flying by has only been partially created by a computer. The rest is coming from people, making high-fidelity ideas come to life with the most low-fidelity interface.
It’s just like Johnson continues:
“The lack of control wasn't perceived as a drawback because the whole point of the game—what made it such a phenomenal success—lay in the sheer exhilaration of moving, and moving fast. You didn't so much play Sonic as ride it."
The Midjourney experience is a team sport. It's a chain of creativity. During one session, and over ten minutes, I watched someone grinding out an image of Luigi from Mario brothers, sitting on a park bench in the rain. They wrote the prompt; we waited 20 seconds or so; they made creative direction decisions on what to change; and then they re-wrote the prompt.
Interspersed was a person who requested Midjourney to " /imagine Joe Biden in handcuffs." (witness the future of Fake News in action.)
So as I was waiting and watching, I typed, "/imagine Joe Biden, sitting with Luigi on a park bench in the rain," and I made this:
And the chain kept going. Someone else moved Biden and Luigi from the park bench to a beach resort; someone else added background animals; and so-on. After about twenty minutes, I left, and I’m pretty sure the chain continued well after I wasn’t part of it anymore.
I think the person cranking out Luigi iterations had some sort of defined need for the image, and was spending serious time and thought making it just the way they wanted. Our Joe Biden in Handcuffs designer was probably screwing around, and I was certainly just playing. But we were all on the same creative team. And what makes the entire process so fascinating is that Midjourney is on the team too, as a player. It's a good player when it comes to popular figures, like Joe Biden. It's a shit player when it has little real-life material to draw from, like Luigi. And so just like we "make mistakes" in our prompts, Midjourney "makes mistakes" in its results. In a world where we often feel like it's us versus the computer, it's refreshing to have Midjourney on our side.
A prompt is a suggestion, not a directive. The trigger to ask Midjourney to make something is "/imagine" which is very different than "/render." We're giving the tool permission to explore, and so it does. It's very good at giving us a predictable exploration when it has a large set of data to work with, and when we're clear and crisp with our instructions. But when we're looser in imagination-directing, and when the tool has little to go on, it makes wonderfully broken ideas. My Luigis above look Luigi-ish, but that's about it; my favorite of the four looks more like a Nintendo version of the Jack in the Box mascot:
These misses are wonderful, and what's more wonderful is that everyone sees them. The joke's on the tool, but kindly: if Midjourney were in the studio with me sitting around the table drawing, I can imagine it laughing as hard as I would be.
That's really how I think of it, too. Many of us have had late-night, alcohol-fueled sketching sessions that turn into a wonderful mess: the right people, at the right time, making things together. I imagine Midjourney right there alongside us, drinking a beer and drawing something ridiculous. It's fun poked at the idea, not at the designer.
When we run creative workshops with our clients, we go out of our way to create environments where people can make things in front of their colleagues without experiencing the very common shame of "I can't draw." This is one of the hardest parts of creative facilitation, and it's often just downright impossible to get some people to express themselves visually in front of other people. But your creation in Midjourney is out there for all of the world to see, and there's nothing remotely embarrassing about it. In fact, the creations that are made can be really bad, and that seems to be just fine. When the tool helps us make things that we like, we feel ownership over them, as if we made them. But when I prompt it and it makes things I don't like, I view it as a third party; I didn't make that, this other person did it.
Compare this to a working brainstorm, where two people sitting next to each other are drawing the same type of thing. The first draws with confidence; the second barely makes a mark on the paper. They sneak glances around at their neighbors, and we see common signs of shame of making: slumped shoulders, aimless doodling, and ultimately, a declaration that "I can't do this." If it were socially acceptable, they would walk out of the meeting. Sometimes they do.
In fact, one can leave Midjourney's public chat rooms and enter a private one, and some do. But many don't even realize that's a choice, and sticking around in public has no impact on their confidence in making. Being creative in Discord just isn't embarrassing, even when the outcome is trash. My Luigi looked like a pumpkin. In "real life," I might have been reluctant to share that with my team, lest they ridicule me for drawing Nintendo characters poorly. In Midjourney, I share it freely and proudly.
When I was about eight years old, I discovered bulletin board systems, which were mini internets that you reached via your telephone line. The technology was extremely rudimentary by today's standards, and very, very slow. Downloading a single image would take upwards of 10 minutes. And images were in a format known as "interlaced" .gif files: images that paint in slowly, where every 8th horizontal line of the image shows up, and then every 4th, and then 2nd, until the whole image appears. Of course a young kid would never try to look at pornography, but if they did, imagine the anticipation... just to discover that the image was a picture of an apple or a dog!
The decision to have the images slowly emerge was a decision by the designers of the image format. They realized that something was better than nothing, and found a way around a technical limitation. Most of us would consider this to be inefficient, and if our internet connection took a hit like that, we'd be on the phone with tech support in a second. But there's something pretty magical about anticipation, even when it leads to disappointment, and particularly if the disappointment leads to trying again. There's a sense of having earned something: literally, a delayed dopamine kick, combined with the thrill of gambling with your time.
The speed of Midjourney, and the speed of many of the other in-vogue AI tools, is heavily constrained by the cost of computational power. When you type into chatGPT, your answer comes back slowly, and when you ask Midjourney for an image, it paints in. Each "paint" adds more refinement to the image, and the creation slowly comes into focus. It provokes exactly the same feeling as that image of my childhood loading slowly: anticipation, sometimes leading to let-down, and often leading to surprise and delight. The time-to-creation is not long, perhaps 10 seconds. It's not enough time to answer email or get a cup of coffee. It's just long enough to sit there and watch and wait. You aren't just watching your own image—you're watching all of the other creations slowly show up, too. The public waiting is a fundamental part of the experience.
Of course, many will view the slow speed as a problem that needs to be fixed, and the speed will become faster as computing power becomes cheaper. We're already seeing a Midjourney-like image generation embedded in Photoshop, which provides visual creations almost instantly, and in the context of a file. The tool generates variants, and you can click-click-click through them to select the one you want. Adding a boat to a picture of a river? No problem, click and it's there.
The computer-driven part of the process is no longer collaborative. Now, it's a utility, just like the magic wand tool or paint-bucket. We expect utilities to work quickly, efficiently, predictably, and often, privately. I asked for a boat, and I want a boat, not a weird assemblage of boats and things that are boat-ish. And I don't have the time to watch the rest of the world make boats, because I have deadlines and deliverables.
But for right now, we're at a very short moment in time of using a beautiful technology, right before it shifts from the space of collaborative, artistic expression to operational efficiency. The shift is inevitable, and it’s probably very valuable for our productivity. But it’s going to be emotionally disappointing and somewhat boring. And that's fine; I like that airplanes don't fall out of the sky because of unpredictability, and if I still used Photoshop, I could imagine benefiting from the utility of somewhat mindlessly putting a boat in water, if my client needed a boat in water. But when we see the Wright Brothers’ first flights in the early 1900s, or early pre-Photoshop ANSI images, we realize the true power of humanity is our romantic curiosity. That curiosity is strong when technology is new. It dissipates quickly when technology becomes pervasive and “optimized.”
Midjourney's designers embedded the tool in a public forum instead of in personal, isolated chat sessions like chatGPT. As a result, this AI generative process feels at once thrilling, empowering, and hilarious. Unfortunately, most of these qualities will go away as the technology matures. The thrilling-ness is a biproduct of slow rendering, and is probably considered a defect, not a feature. The empowering-ness is a result of making things in front of other people, but that will fade as the experience becomes embedded in products like Photoshop or Word. And the hilarity that comes from a poorly trained tool will disappear as the machine gets "better."
We have a very short window into really absorbing the raw human ingenuity behind a tool like Midjourney. Quick, go make Joe-Biden-Luigi-Jack-In-The-Box-Whatevers and be a part of the artistry and moment time. Get in before we inevitably, and unfortunately, trade the magic for efficiency.