Algorithms, glitches and reality: a visual cultural analysis
The website thispersondoesnotexist.com (launched in February of 2019 by Phil Wang) uses Artificial Intelligence (AI) to generate human-looking faces. Using a machine learning process called GAN (Generative Adversarial Networks), the algorithm is able to learn from a dataset of images and generate completely new and original faces, incorporating endless variations with unique features such as pose, face shape and hair style, among other elements. Since its launch, the website quickly became viral, amassing critics and followers alike, as it showcases what is possible with the current technology. The AI-generated images by this website are the focus of this visual and cultural analysis.
Why should we study AI-generated images? Are they worthy of scrutiny outside the field of Computer Science? I argue that the moment they become accessible to virtually anyone around the world, they are subject to study. Furthermore, we understand that images are cultural products, and as such, they are worthy of study. Obviously, not all images are the same or have the same importance; but we know from living in an image-driven society that they matter often in more than one way. According to Wang, the website was created to “(...) call attention to A.I. 's ever-increasing power to present as real images that are completely artificial.” (Paez, 2019). In a society where pictures and images are the standard surrogates for proof, this website demonstrates that this technology (which automates the work that once required painstaking labor on the part of imaging experts) could be both revolutionary and dangerous. In order to unpack the cultural implications of these images, I will use Prown’s methodology, starting with a single image retrieved from the website, and then incorporating observations taken from a set of 60 additional images taken from the same site.
On an elemental level, all of these images are made out of pixels, however, what makes their existence possible is the algorithm, which generates a new image every time a user accesses the website. Each image is unique and accessible only once, because as soon as the browser is refreshed, a new one emerges. Even pressing the Back button in the browser will generate a new image. In that sense, these images are ephemeral, unless saved to a computer’s hard drive, which can be done by pressing the Save button on the lower right corner of the screen.
In spite of the aforementioned inconsistencies, the faces in this website are so realistic that it’s easy to forget they are computer-generated. Many sport accessories of all sorts; from jewelry to sunglasses and somewhat ambiguous head pieces (Fig 2). Others give indication of garments such as shawls and t-shirts, while others even mimic fabric textures such as velvet, satin, lace and floral patterns—to various degrees of success. In most images one is able to identify highlights and shadows consistent with our real-life experience of taking pictures under different light sources (e.g. natural sunlight vs flash). This is particularly interesting since it leads the viewer to make distinctions such as indoor and outdoor pictures, in spite of knowing that no light source, camera, photographer or subject were involved in the making of these images (Fig 3). Furthermore, the background and colors in some images suggest scenes and contexts; such a winter scene, or a family gathering. It is only upon closer inspection that one starts to notice inconsistencies; glitches. And while the majority of these glitches do not necessarily make the face look less human or less real (except for a few examples) they do remind us of their intrinsic technological origin.
Interacting with the algorithm
There appears to be more than one way to interact with the algorithm. The first way is through the website, which promptly greets the user with an image. For what seems to be a long time (four seconds), the user is forced to pause and engage with the image, as the page doesn’t have a title or navigation menu. For a brief moment the interface consists only of a gray background and the image covering most of the screen. In the absence of buttons and UI elements, the user has no other choice but to stare at the image. The image, in turn, greets and confronts. The faces are almost always friendly and familiar—some are even attractive, thus captivating the viewer, and at the same time, prompting many questions. After four seconds, a window appears on the lower right corner of the screen (Fig 4). It is a stark grey box devoid of any styling, text-heavy and cluttered. The only colors being used are black for the text and blue for hyperlinks. Although this was clearly designed to provide information without being distracting, the kind of information it provides is obscure as it doesn’t clearly explain the origins of the image. The first line of the box reads “Imagined by a GAN (generative adversarial network)”. Here, the use of the word imagine stands out. For anyone who isn’t familiar with this jargon (GAN, StyleGAN2, Nvidia, AI) all this information is meaningless and says very little about the image. The fifth line reads: “Don’t panic. Learn how it works”. This assumes discomfort from the user, anticipating this is something people usually feel confused by. Although the gray box offers many choices, clicking on any of those links would interrupt the user experience as all links open in new windows. There is also the option to see Another and to Save, each offering a new form of interaction. Here, to click Another is to re-engage in the experience, feeding back into the loop; whereas to Save the image is to capture it, to contain its ephemerality, and to some degree, materialize the algorithm. But this containment destroys its very essence; becoming now just an image among many others in one’s computer. Without its original context it is now stripped from meaning; the link between the image and the algorithm (its origin) is now untraceable.
The algorithm itself is faceless, however the images it imagines are highly recognizable, many of them look like familiar faces we’ve seen before, or are about to encounter on our next trip to the grocery store. As I observe the new image on my screen, I can’t help but wonder whether there is actually someone out there in the non-virtual world whose face closely resembles the one imagined by the machine. I’m intrigued. I click Another. A very different, and equally compelling image appears. I start asking more questions, trying to guess their age, their name, their story. But then I realize: these are non-humans resembling humans. There is no story, there is no name, there is no age. Nevertheless, I am caught in a loop of instant gratification; one image after another. I keep clicking, and it becomes an automatic process. A new face with a familiar expression emerges, sterile, yet unthreatening. I am interacting with a face-generating machine, and the interaction quickly turns into a game in which I look for cues to determine the authenticity of the image presented; I look for glitches. The more glitchy the image, the less authentic. In this case authentic = human while unauthentic = computer-generated. As I hunt for glitches, I progressively get attuned to the smaller details and observe what the texture of the skin looks like; I notice inconsistencies and elements that seem out of place, like a woman wearing just one earring, or a person who appears to be wearing a jacket only on one shoulder. I click Another and look for another glitch, but I can’t find any. As it turns out, some of these images have no glitches at all. My ideas of authenticity and reality are suddenly challenged.
As a designer I ask myself: how can I interact with these images? I can start by building a proto-taxonomy to classify them. As I begin to sort, categorize and group these images I realize I’m approaching this task with a cultural mindset where I look for binaries; for example clustering female and male faces and then noticing those that look very androgynous, and creating a new category for them. Thinking about age, for example, one can categorize by children faces, adult faces, teens, etc. The way in which I try to make sense of these images reveals a lot about how I approach the world, and the biases and preconceptions that I carry with me. Even though it might be tempting to assume these images are empty and unexpressive because they don’t have a story, they end up telling us more about ourselves than we might have imagined. Since these are not real human faces, does this mean they are exempt from the laws, rules and regulations other images of (real) humans are subject to? Should computer-generated images of children, for example, be treated in the same way as images of real children? What ethical and moral concerns regarding identity and representation apply to these computer-generated images? As I ponder these questions, I realize that, although it might be too soon to tell how these images alter the creative process, they are already making me think differently about how I use images in my work and the various levels at which designers engage with images.
Seeking, embracing, and avoiding glitches
The glitches found in these images usually manifest as irregularities on the background and skin texture, interruption of patterns, amorphous blobs of color and texture and overly-asymmetrical faces, among others (Fig 5). For the computer scientist who constantly strives to refine these technologies, the goal is to achieve a more convincing and realistic image, thus these glitches are undesirable. Conversely, for the lay viewer who is confronted for the first time by hyper realistic, computer-generated images, glitches are desirable, as they allow her to verify the authenticity of the image presented, i.e. to discern human (real) faces from non-human (fake) faces. In this context, the absence of glitches conceals the origin of the image, thus, the glitchier the image, the more transparency there is about its origin.
While one definition of glitch is related to a “minor malfunction” (Merriam-Webster, 2019), another definition is related to authenticity, as in “a false or spurious electronic signal” (idem). In a study conducted by Lehmuskallio and colleagues in 2019, the researchers set to test whether imaging experts (professional photographers and photo editors) would be able to determine if the images presented in the study were real photographs or computer-generated. In trying to determine the authenticity of said images, those that were deemed too perfect by the participants often raised suspicion:
“In contrast to the ways in which our research subjects spoke of photographs, computer-generated images were discussed particularly via opposition to any claims to authenticity. Whereas photographs were described with terms such as ‘authentic’, ‘natural’, ‘true’ and ‘trustworthy’, computer-generated images were considered to be artificial, unnatural, made, depicting a parallel reality and too perfect.” (Lehmuskallio et al., 2019)
Here, the absence of glitch is suspicious; its presence is desirable, almost reassuring, in a way creating an aversion to perfection: the less convincing these images appear, the more we trust them. In light of the ever-increasing sophistication of these images and their respective technologies, we must find new ways to tell truth from fiction, so we look for flaws. In this scenario, glitches elicit an amount of scrutiny that we might not employ when looking at another “normal” image. Is hyperawareness how we will fight forgery and fallacy?
On the ever changing role of the glitch
Ever since we first incorporated technology into our lives, we have learned to co-exist with glitches. Thus, the significance of the glitch in our culture has evolved along with our technologies, as we have historically manipulated glitches for various purposes. For decades, glitch artists have extensively explored the expressive capabilities of glitches. On some glitch art, the artist’s role is to set up situations in which errors manifest (Barker, 2011), while in others, the artist purposefully misuses technology to produce them. In his work, glitch artists like Zach Nader purposefully misuses tools in Adobe Photoshop (such as the Content Aware tool, which is powered by AI) to create exaggerated, distorted and chaotic effects.
In other cases, glitches have been introduced to make a technology appear more authentic (human) and relatable. One example is Google Duplex, which is an "AI system for accomplishing real-world tasks over the phone on behalf of humans", such as calling to make a reservation at a restaurant. A peculiarity of this system is that, during its conversations with humans, Duplex incorporates speech disfluencies (speech glitches such as “ahs”, “umms” and “mm-hmm”) to make the voice assistant more relatable and, arguably, improve interaction. Because of these human-like traits, the recipients of the calls are less likely to suspect they are talking to a robot. While glitches have been sought and embraced, in other situations they have been avoided and feared; as exemplified in the article “Computing glitch may have doomed Mars lander”, whose titles carries a negative connotation, which may imply important loss (financial, technological and otherwise) caused by a glitch.
What are the cultural implications of AI-generated images?
It is clear that, as stated by Wang, there is a need to raise awareness of what is possible with these images, including positive applications as well as the dangers of its misuses. The main concern is that this website uses the same technology as deepfakes, which are computer-generated images superimposed on other pictures, videos, or audio, often used to pull hoaxes in which a person appears to do or say things they never did or said. So far, deepfake videos have appeared in pornography and satire, but it’s almost certain they’ll be used for other purposes beyond entertainment and commentary. Because these techniques are so new and increasingly sophisticated, people are having trouble discerning truth from fiction, thus we are morally compelled to ask how these algorithms can be used for social good.
These images also have consequences for our visual cultural landscape; not only because they are based on the human physique, but also because they are based on human cultural products. In this sense, all of these faces are a direct representation of us—physically, culturally and intellectually. One can interpret them as computer-generated renditions of what our world looks like right now. Judging by the hairstyles, jewelry and accessories, we can assert that these images look decidedly contemporary; one could reasonably place them anywhere in this decade. As we modify our bodies and appearances through methods like extreme makeup, and plastic surgery, images of overly-contoured, overly-stretched faces are more commonplace in our media landscape (particularly among celebrity culture). Will new computer-generated images incorporate this post-human aesthetic? And, can these computer-generated images be regarded as cultural evidence of our times?
In our visual experience, faces are linked to identities, and identities are infinitely complex. In this context one would be tempted to question the diversity of these images and their potential consequences for issues of identity and representation. Obviously, the answer to the diversity issue has to do with the type of images the algorithm was trained on, and whether there were any considerations made by the developers/researchers in this respect. But, do these images have to be diverse? These images reflect the technology-driven culture in which it was born, the culture of the humans involved in the algorithm’s creation, and that of the people whose faces the algorithm trained on. Thus, the range of people represented in them not only reflects what is relevant in our society, but who is relevant in our society. What real function these images will have in the near future is yet to be determined, but we should evaluate them for their current function, which is that of reflecting our ideals and values, thus, whose faces are depicted in them matters.
Gibney, E. (2016). Computing glitch may have doomed Mars lander. Nature, 538 (7626), 435-436.
glitch. 2019. In Merriam-Webster.com. Retrieved Dec 15, 2019, from https://www.merriam-webster.com/dictionary/glitch
Barker, T. (Oct. 2007) "Error, the Unforeseen, and the Emergent: The Error and Interactive Media Art," M/C Journal, 10(5). Retrieved from http://journal.media-culture.org.au/0710/03-barker.php
Krapp, P. Noise channels: Glitch and error in digital culture (Electronic mediations; v. 37). Minneapolis. University of Minnesota Press.
Leviathan, Y. et al. (2018, May 8). Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone. Google AI Blog. Retrieved from https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
Lehmuskallio, A. et al. (2019). Photorealistic computer-generated images are difficult to distinguish from digital photographs: a case study with professional photographers and photo-editors. Visual Communication, 18(4), 427–451. https://doi.org/10.1177/1470357218759809
Nader, Z. (n.d.) [Website] https://zachnader.com/
Paez, D. (2019, Feb 21). This Person Does Not Exist Creator Reveals His Site's Creepy Origin Story. Inverse. Retrieved from https://www.inverse.com/article/53414-this-person-does-not-exist-creator-interview
Wang, P. (2019). This Person Does Not Exist [Website]. Retrieved from https://thispersondoesnotexist.com/