BY KIM BELLARD
I can’t believe I somehow missed when OpenAI introduced DALL-E in January 2021 – a neural network that could “generate images from text descriptions” — so I’m sure not going to miss now that OpenAI has unveiled DALL-E 2. As they describe it, “DALL-E 2 is a new AI system that can create realistic images and art from a description in natural language.” The name, by the way, is a playful combination of the animated robot WALL-E and the idiosyncratic artist Salvator Dali.
This is not your father’s AI. If you think it’s just about art, think again. If you think it doesn’t matter for healthcare, well, you’ve been warned.
Here are further descriptions of what OpenAI is claiming:
“DALL·E 2 can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles.
DALL·E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account.
DALL·E 2 can take an image and create different variations of it inspired by the original.”
Here’s their video:
I’ll leave it to others to explain exactly how it does all that, aside from saying it uses a process called diffusion, “which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.” The end result is that, relative to DALL-E, DALL-E 2 “generates more realistic and accurate images with 4x greater resolution.”
Devin Coldeway, writing in TechCrunch, marvels:
It’s hard to overstate the quality of these images compared with other generators I’ve seen. Although there are almost always the kinds of “tells” you expect from AI-generated imagery, they’re less obvious and the rest of the image is way better than the best generated by others.
OK, it’s true that DALL-E isn’t coming up with the ideas for art on its own, but it is creating never-seen-before images, like a koala bear dunking or Mona Lisa with a mohawk. If that’s not AI being creative, it’s close.
Sam Altman, OpenAI’s CEO, had a blog post with several interesting thoughts about DALL-E 2. He starts out by saying: “For me, it’s the most delightful thing to play with we’ve created so far. I find it to be creativity-enhancing, helpful for many different situations, and fun in a way I haven’t felt from technology in a while.” I’m a big believer in Seven Johnson’s maxim that the future is where people are having the most fun, so that really hit home for me.
Mr. Altman outlines six things he believes are noteworthy about DALL-E 2:
“1. This is another example of what I think is going to be a new computer interface trend: you say what you want in natural language or with contextual clues, and the computer does it.
2. It sure does seem to “understand” concepts at many levels and how they relate to each other in sophisticated ways.
3. Although I firmly believe AI will create lots of new jobs, and make many existing jobs much better by doing the boring bits well, I think it’s important to be honest that it’s increasingly going to make some jobs not very relevant (like technology frequently does)
4. A decade ago, the conventional wisdom was that AI would first impact physical labor, and then cognitive labor, and then maybe someday it could do creative work. It now looks like it’s going to go in the opposite order.
5. It’s an example of a world in which good ideas are the limit for what we can do, not specific skills.
6. Although the upsides are great, the model is powerful enough that it’s easy to imagine the downsides.”
On that last point, OpenAI restricts what images DALL-E has been trained on, watermarks each image it generates, reviews all images generated, and restricts the use of real individuals’ faces. They recognize the potential for abuse. Oren Etzioni, chief executive of the Allen Institute for AI, warned The New York Times: “There is already disinformation online, but the worry is that this scale disinformation to new levels.”
Mr. Altman indicated that there might be a product launch this summer, with broader access, but Mira Murati, OpenAI’s head of research, was firm: “This is not a product. The idea is to understand capabilities and limitations and give us the opportunity to build in mitigation.”
OpenAI algorithms researcher Prafulla Dhariwal told Fast Company: “Vision and language are both key parts of human intelligence; building models like DALL-E 2 connects these two domains. It’s a very important step for us as we try to teach machines to perceive the world the way humans do, and then eventually develop general intelligence.”
As their video says. “DALL-E helps humans understand how advanced AI systems see and understand our world.”
I don’t have any artistic skill whatsoever, but, as Mr. Altman suggested, we’re building towards “a world in which good ideas are the limit for what we can do, not specific skills.” In that world, as Mr. Altman also suggested, AI may do creative and cognitive work before physical labor. We’ve already met Ai-Da, a an AI-driven “robot artist,” and we’re going to see other examples of creative AI.
OpenAI already has OpenAI Codex, an “AI system that can convert natural language to code.” There are AI tools that can write, including one powered by OpenAI, and ones that can compose music.
And, of course, Google has a host of AI initiatives specifically oriented towards health.
Healthcare in general, and the practice of medicine in particular, has long been seen as a uniquely human endeavor. Its practitioners claim it is a blend of art and science, not easily reducible to computer code. If healthcare is finally acknowledging that AI is good at, say, recognizing radiology images, it purports that is still a long way from diagnosing patients with their complex situations, much less advising or comforting them.
Perhaps we should ask DALL-E 2 to draw them a picture of what that might look like.
Kim is a former emarketing exec at a major Blues plan, editor of the late & lamented Tincture.io, and now regular THCB contributor.