The terrifying, creative, and weird world of generative AI
Insights from TED, #deepTomCruise, and my journey into MidJourney
So, as some of you may know, I’ve been thinking/ruminating/freaking out a lot about the recent advancements in artificial intelligence and machine learning and the impact they’re going to have on, well, pretty much everything.
This can get pretty existential pretty fast, so instead of curling in a ball and feeling overwhelmed by it all (which is what I have been doing, and it’s not really helping), I’m trying a new strategy: I’m going focus my freak-outs (or at least my writing about my freak-outs) on one subject at a time. And in said writing, I’m going to focus on one terrifying application of the day’s subject, and one potentially cool application. And, in the case of today’s post, I’m also going to touch on what playing around with these tools myself has made me wonder about the minds of other human beings. Sound good? Let’s get started.
Today’s subject pick: Generative artificial intelligence (aka “generative AI”)
In the words of Wikipedia (thanks, Wikipedia writers, for defining things so I don’t have to!) generative AI is “a type of artificial intelligence system capable of generating text, images, or other media in response to prompts. Generative models learn the patterns and structure of the input data, and then generate new content that is similar to the training data but with some degree of novelty.” (If that second sentence doesn’t make sense, keep reading; it will soon.)
Today’s terrifying application: Deep fakes and the erosion of trust
For those of you who haven’t heard of the term, the word is a mash-up of “deep learning” and fake, and refers to incredibly realistic images, videos, and audio that are, in fact, fakes that have been generated by artificial intelligence. As an example, check out the deep fake videos of Tom Cruise put out by #deepTomCruise, which are incredibly realistic videos of an artificially generated “Tom Cruise.”
Perhaps you already can see why this is freaking terrifying—namely, we’re quickly reaching a point where we won’t be able to trust that anything we see or hear online is real. And, for that matter, we’ll soon all have access to these tools (in fact, we already do have access to some of them, such as the image generator Dall-E and artificial chatbots like ChatGPT) and thus be able to create deep fakes ourselves—in other words, we’ll be able to make videos, recordings and photos of other people “doing” or “saying” whatever we want. Imagine what you could do if you were trying to generate a scandal against a political opponent, or blackmail an enemy, or psychologically manipulate someone, or get revenge against an ex. Imagine what someone could do to you.
Another (in this case relatively harmless) example: a YouTube video of Aloe Blacc—or, rather, an avatar of Aloe Blacc—singing his song “Wake Me Up” in his voice, in a combination of languages that he does not speak, including Mandarin Chinese.
And yes, I know we’ve been having arguments about what’s “true” for quite some time now, but these new technologies are taking it to a new level, and they’re improving at an exponential rate. As one example, when I was at TED, the co-founder of Metaphysic (the company that created the Tom Cruise deep fakes) did a live demonstration of a filter that enabled him to take the voice and appearance of Chris Anderson (the head of TED) and superimpose them onto his own voice and image, so that if you looked at the video display of what was happening on stage, you saw two Chris Andersons. And if the guy from Metaphysic talked, his voice sounded like Chris Anderson’s.
Was it perfect? No. But remember: it was being done in real time. And to give you a sense of how fast it’s developing, Metaphysic was founded in 2021, and one of the other founders of the company told me that TED was the first time that they ever had tried to transform someone’s voice live, in real time—and that they had developed that ability within the past 48 hours. (For more on the TED demonstration/conversation itself, check out this summary from Venturebeat.)
Bottom line: deep fakes and forgeries are quickly becoming near-flawless, and I’m deeply concerned about what’s going to happen to us—individually and as a society—when we lose what’s left of our ability to determine what’s real.
Ready for the positive spin?
Potentially Cool Application: Accelerating and Enabling Creativity
In the meantime (by which I mean before the erosion of trust caused by deep fakes destroys democracy—oh shoot, I’m focusing on the terrifying part again!) I’m also curious about the possible creative applications of tools like Chat GPT and image generators such as Dall-E and MidJourney.
For example, I’m in the process of trying to brainstorm/communicate ideas for a possible new paperback cover for The Power of Fun—but I have basically no graphic design skills of my own. (And I’m also currently sick and confined to a room, which probably explains why this post is so long to begin with.)
As a result of all of these factors, I spent an hour or so playing around with one of these image generators, called MidJourney. For people who haven’t played around with it yet, MidJourney works like this: you type in a text description of the image you are imagining, and the algorithm spits out four visual interpretations of your words. It does this nearly immediately, and while sometimes the images are clearly artificial, sometimes they are creepily realistic. For example, here’s a screen grab of the interface of MidJourney and some images that it created in response to a user’s prompt:
Once MidJourney has generated the four images, you can then select the images you like the most and have the algorithm use them to create new images (which you can then ask it to further refine, and on and on and on, till you get to something you like).
It doesn’t take too much thought to see how tools like this could be used for very bad purposes. (The first time I experienced an image generator like this was when Tristan Harris, co-founder and president of the Center for Humane Technology, showed me what happened when he typed something along the lines of “photograph of bomb in Ukrainian city” into an image generator called Dall-E and four photographs immediately popped up—all of them totally fake, but also completely convincing—showing explosions in Ukrainian cities.)
But, you know, these tools can also be used for creative purposes, too — like trying to generate ideas for book covers. So I started typing in /imagine (that’s the prompt you use in MidJourney to ask it to create something) and then a description, like “book cover with rainbow starburst. jewel-toned colors. Title is in white, modern font, with the word “Fun” emphasized. The cover should make people feel happy and energetic.”
It promptly began to spit things out like this:
Maybe not perfect — and I have no idea what language it thinks I am writing in, but still: whoah. I’m hoping to use some of the images that MidJourney helped me create to communicate some of my ideas to my publishing team.
And this brings us to the last section of today’s post:
What playing around with these tools myself has made me wonder about the minds of other human beings
As you can see in the first screenshot I posted, the current version of MidJourney takes place in a public chat room, where LOTS of people are interacting with the image-generating bot at the same time.
As a result, you can see everyone else’s requests, and your requested images get mixed in with everybody else’s images.
For a while, I was ignoring the other stuff in the chat and just scanning the feed for rainbow book covers. (My requests were pretty easy to spot.) But then I started to notice some of the other images (and image requests) that were floating by. Like, for example, this:
Yes, you read that right: someone had logged in, possibly even paid a subscription fee, in order to generate an image of “big battle ram with diamonds drinking wine.”
It was at this point that I began to get curious about who, exactly, was in this room with me. (And, for that matter, what the purpose of this image — which, by the way, this person requested MULTIPLE iterations of — could possibly be.)
But that was far from the only weird one. Here is a selection of the images that were being generated while I worked on my book cover (and note that I’m not leaving out lots of “normal” ones that were being generated . . . this is a pretty representative selection of the images that were being requested at the same time as mine).
/imagine: a photograph of Blake Lively bullriding a giant cat, realistic, highly detailed
Okay. . . . odd, but not totally crazy? Maybe? Then came this one.
/imagine: comedy & tragedy masks on fire sitting on film with mental asylum in the background. 2:3 with Hollywood hills, insane detail
Well, that’s not weird enough! So let’s try . . .
/imagine: Demon skull, small horns, sharp teeth, made of pure silver, handcrafted, wicked, sinister, extreme detail, Dark Fantasy
Apparently the person who requested the demon skulls was thinking the same thing that any other totally normal user would be thinking, because they immediately followed the skull request by asking MidJourney to /imagine a "necromancer, sorceress, vampire, stunning, beautiful woman, exotic blue eyes, aggressive appearance, strong body, long black hair, demonic black robe with hood, magical demonic jewelry, attractive body, she is holding a demon's skull in her palm, gloomy sky, Dark Fantasy."
And MidJourney delivered!
AM I THE ONLY ONE ASKING SERIOUS QUESTIONS ABOUT HUMAN MINDS RIGHT NOW?
The more I scrolled, the more I felt an appreciation for and kinship with this next person, who—amidst all the skulls and the sexy necromancers—kept requesting architectural photos of a house.
I mean, granted, the image that was generated doesn’t include many of the details from the description (and is more Frank Lloyd Wright’s FallingWater than Frank Gehry) . . . but still: given the choice of this one and four more “sinister skulls,” I’ll take it.
In Conclusion. . . .
If you were ever wondering what might happen if you sent me to TED, confined me to a room for ten days, and gave me time to think about and play around with generative AI, well, there you go!
To scrolling less and living more (and not freaking out),
Catherine Price
Founder of Screen/Life Balance and author of books including The Power of Fun: How to Feel Alive Again and How to Break up With Your Phone