Words to images: Blurring boundaries between real memories and AI creations

Images of Donald Trump being arrested. Pope Francis wearing a stylish white puffer jacket. Elon Musk’s robot wife. A supposed explosion near the Pentagon. Computer engineers working on a highway. An ageing Ryan Gosling. All images that have floated around social media, at different points of time. None were real, but they looked nothing less. The world just isn’t prepared, for this level of realism artificially generated (AI) images are capable of.

It is more important than ever, to distinguish between an image and a photo. “Photo of Donald Trump being arrested” – that is all it took to punch into the Midjourney AI image generator tool. And voila. It is human tendency to accept a visual message, more easily.

“We know from research that the human brain processes visual information about 60,000 times faster than text, making visual tools a critical way people search, create and gain understanding,” said Yusuf Mehdi, who is Corporate Vice President and Consumer Chief Marketing Officer at Microsoft.

This, in an era when AI tools are still evolving. They don’t always get facial expressions, skin textures, features such as teeth or even limbs correctly in one go, but the overall results look so realistic, it is difficult to point these anomalies at first glance. Those teeth just didn’t look right in the Donald Trump arrest generation.

In fact, this for now, is the only way to tell if an image is real or a deepfake created using AI. A dose of scepticism. A closer look at accuracy of features. They’ll often give away the truth/ But expect that to become even more difficult, across tools, as improvements roll out.

AI generated images: As good as actual photos

The brilliance of AI stumped judges at a photography competition. The World Photography Organisation, at Sony World Photography Awards 2023 in March, selected Berlin-based Boris Eldagsen’s submission as the creative photo category winner. An award he later turned down, for a very specific reason.

The monochrome image of two women embracing, with distinct 1950s family portrait style vibes, wasn’t really a photo. Till the creator spilled the beans, no one realised Pseudomnesia: The Electrician is the handiwork of Dall-E 2, an AI image generator. “AI is not photography,” he says.

Adobe Firefly. Midjourney. Dall-E. Stable Diffusion. Freeway. Bing Image Creator. Picsart. Nightcafe AI. Craiyon. Starry AI. Jasper. Photosonic. Just some names you may have come across. All online tools putting AI at your artistic and literary disposal. Some free to use, others paid subscriptions.

The scope is widening. Adobe has added the Firefly generative AI within the popular Photoshop app, though it started out as a standalone tool launched a few weeks prior. The core feature emerging from this is something called “Generative Fill”, which will allow users to modify and edit images with more AI tools at their disposal.

“These prompts can be used to add content, remove or replace parts of an image and extend the edges of an image,” says Pam Clark, Vice President at Adobe. The company says Firefly’s first model is trained on multiple datasets, including Adobe Stock images and licensed content.

You can always wonder later – is this photo really a memory, or something that’s been thought up, and altered by AI?

AI images have been around, longer than you’ve realised

You may feel AI generated images are new, but they aren’t. The history of AI-generated art can be traced back to the early days of computer graphics.

As far back as the 1960s, computers were used to create simple patterns and shapes. One such example is the “Matrix Multiplications,” portfolio created by a German mathematician and scientist Frieder Nake in 1967. It consisted of twelve images.

Then came the next step, computer-aided design (CAD) using complex algorithms such as Aaron, created by artist Harold Cohen in 1973. We all know that at some point, advanced software such as Adobe’s Photoshop took over.

How have AI tools become so good?

There will be an inevitable debate about potential misuse of AI, even more so because of the ever-improving realism which AI now generates. It is only natural to wonder – how is AI able to create images with such a high level of visual authenticity?

For text to image systems, an adversarial training process is the key. AI image generators work on similar lines as neural networks and use of data sets. Simply put, to generate images after you type in the query, there are two neural networks working together. They’re also called Generative Adversarial Networks (GANs).

The first is a generator network, is used to create an image based on the text input by the user – such as the descriptor you’d input in Microsoft’s Bing Image Creator, Dall-E 2 or Midjourney, much like how you use a search engine. Behind the scenes, a second discriminator neural network analyses the created image, comparing them with a pre-learned database, to try and identify misrepresentation or inaccuracies.

The challenge for the generator network therefore is to create realism, which the discriminator network subsequently cannot challenge or flag. This tussle for superiority between the two neural networks, is the reason why AI generated images look more realistic than ever before. Generator networks are improving their skills.

Here is an example. To train the Stable Diffusion model, developers Stability AI deployed as many as 4,000 Nvidia A100 GPUs (or graphics processing units) and a variant of the LAION-5B dataset. The results are super-creative images of places, people, and cartoon characters.

If you wish to merge another style to an existing image, the Neural Style Transfer (NST) algorithms can do that. They recreate an existing image merged with a new style. For instance, the prompt “Marilyn Monroe in style of The Weeping Woman by Pablo Picasso” gave us merged, albeit slightly different results on Dall-E and Microsoft Bing Image Creator despite the same underlying tech, though Adobe Firefly’s results were bravely experimental while some of Stable Diffusion’s generated images came very close to realism.

In the latest update to Midjourney’s algorithms (it is called V5.1 image systems) includes enhanced coherence, sharper image quality and reduced text artifacts. OpenAI is testing an experimental model for Dall-E2, with emphasis on photorealistic faces and sharper images. There’s more to come too, but it wasn’t easy to get here.

You’ll get more perspective from the evolving milestones in the development of AI image generators, or GANs.

In 2015, the release of Deep Convolutional Generative Adversarial Networks (DCGAN) for high-quality images of objects, animals, and people, showed the way. In 2018, tech company Nvidia introduced StyleGAN, a generational evolution of DCGAN with more realism and ability to distinguish age and gender.

In 2020, OpenAI introduced GPT-3, which is a language model capable of generating text and images from textual prompts, allowing for the creation of highly detailed images based on natural language input. Then in 2021, Dall-E arrived on the scene, an AI model capable of generating highly detailed and creative images from text prompts.

Modern AI image tools: Unique flavours, and strengths

Since the generated images of Trump and Pope did rounds, Midjourney has halted free trials. David Holz, CEO and founder of Midjourney, referenced an “extraordinary demand and trial abuse” in a post on the Discord platform. Users often make multiple accounts to generate free images, taking advantage of a liberal trial policy. The basic plan is now priced at $10 per month (or around ₹820).

Microsoft joined the AI battles with its free to use Bing Image Creator, which is based on OpenAI’s Dall-E, an incredibly popular platform. OpenAI in September last year pegged usage demographic at 1.5 million active users and more than 2 million generated images every day. No official numbers since, but they’d only have grown.

Each of these tools and platforms have unique elements.

Dall-E for instance, gives users control over lens and aperture settings, which have a huge bearing on how the final image looks. Midjourney is available via a Discord server. Bing Image Creator is embedded into the Bing chatbot. Nightcafe and Photosonic let you choose a lot of art styles.

There’s more to come: Your thoughts…

Next stop, replicate your thoughts through images, generated by AI. Systems neuroscientists Yu Takagi and Shinji Nishimoto from Osaka University in Japan, say they have created a new A.I. model that can do just that. This model can capture neural activity with around 80% accuracy, and combines written as well as visual descriptions of images, for the A.I. process that’ll reproduce thoughts.

The researchers have used the London-based Stability AI’s Stable Diffusion algorithm. A tool we know, is incredibly powerful with text to image AI.

Leave a Reply

Your email address will not be published. Required fields are marked *