Conversing with AIs

Throughout the later part of my childhood, through my teens and all the way up until now, I’ve been exposed to different kinds of interactive media. Some sort of programmable environment, designed to keep me entertained, and sometimes educate me. From GameBoys and PlayStations, getting acquainted with the internet, to our current state of seamless integration with the digital sphere. Most of my generation, some that came before us and all those who came after have experienced some kind of relationship with this technology. This has shaped our upbringing in many ways. For some, it has facilitated integration by fostering online communities that connect them with their peers. Others, it has alienated by deeming them unemployable, or by denying them the privilege to access such technologies. But in the end, we’re all children of programming. All those hours catching Pokemon, all those emails, texts and comments on social media have to have had an effect on us. And the fact that we invested so much time on those interactions speaks for how meaningful we perceive them to be. Winning a Pokemon battle, even against a computer, will always be something I associate with a feeling of accomplishment. When in reality, all I did was interact with something that’s been pre-programmed to act a certain way. The same goes for receiving a lot of attention on social media. Am I truly that interesting, or is the algorithm simply rewarding my time investment by showing my content to more people? Thoughts like these have made me question the meaning we derive from those interactions, as well as the effect the future of intelligent technology will have in our forms of expression. This led to the research question “What is the value of conversing with artificial systems?”

While going down this wormhole of a question, the field of artificial intelligence (AI) and machine learning quickly became a central piece of the puzzle. In fact, when I asked the class to rephrase my research question, 5 out of 9 people used the term ‘AI’ in their versions. Thus, I will start this research paper by asking what AI actually is besides a popular buzzword. Then, I’ll proceed to discuss the process that led to my project ‘HOPE’. This process touches on topics such as the demystification of AI, the definition of meaning, embodied cognition, as well as the creation and ownership of art. Further on, I will connect this project to the idea of design as toolmaking and propose a shift to hybrid forms of creation as I reflect on the future of the design practice.

Debunking AI

Screenshot showing Google image results for 'AI' on 6 Jan 2021

What is and what’s not considered artificial intelligence can be very fuzzy. The term was first coined by John McCarthy in 1956, as he assembled a team of experts from different fields to discuss the idea of “thinking machines” (Marr, “The Key Definitions of AI”, 2018). Nowadays, most definitions of AI describe it as the simulation of intelligent behaviour, most referencing intelligent human behaviour (“Artificial Intelligence”, Merriam-Webster Dictionary). Imitating human behaviour isn’t necessarily a feature of all AI systems, and the presence or not of this feature leads to two terms that are usually wrongly used by popular media: Strong AI and Weak AI. Putting it simply, a so-called ‘strong’ AI is a model that is built with the purpose to think like a human, and a ‘weak’ AI is built with the objective to solve a problem without necessarily figuring out how to think like a human (Eliot, 2020). In reality, most modern approaches fall in-between, emulating human behaviour as means to achieve a specific objective, but not as the end-goal. If it serves as any consolation for the AI apocalyptists, this means that most of the known research being carried out nowadays focuses on optimising commercial profit, not on creating world-threatening sentient beings.

Machine learning is yet another term that is heard a lot today. It’s described as ‘the leading edge’ of artificial intelligence (Marr, “What Is Machine Learning”, 2020) and has been the center stage in the r&d departments of many tech giants, such as Google and Amazon. Essentially, machine learning (often referred to as ‘ML’) is trying to push computing beyond logical yes/no algorithms by teaching it how to interpret data, classify it based on experience and learn from its failures and successes. Sounds familiar? It should, as this closely resembles the way intelligent beings learn. We observe the environment around us, make predictions based on our previous experiences and update our assumptions based on whether they were right or wrong.

Press button to interact.
Demonstration of the MobileNet Image Classifier, a machine learning model trained on the ImageNet dataset.

HOPE (which stands for Heuristic Oscillation Prediction Engine) is my attempt to converse with an artificial system, while trying to understand how all this technological voodoo works. At surface level, HOPE is also a synthesizer that responds to what you play to it (her?). However, I’d like to put that aside for now (I’ll come back to it later), and focus on why it’s actually not voodoo, and why that’s very important to know.

Topics such as AI and ML are shrouded by a cloud of mystery. Typically, one sees the input and the output, but not what goes on inside the black box. It certainly looks like magic, and even if deep down we know it’s not, the thought that it’s too complex to investigate makes us settle with the magic explanation. Once one starts to ask questions, scary terms start to come up. Linear regression, neural networks, convolutional layers… They’re all complicated computational processes at play in these systems, however there’s a whole world of other processes that don’t get talked about nearly as often. Much more human processes.

At the start of any kind of intelligent system, there’s data. In order to learn about our environment, we must sense it. Data, often seen in a processed and unreadable form, is not given at the snap of a finger. It’s collected in large numbers, organised and labeled in ways a computer can know what to do with it. As Philipp Schmitt explains in ‘Humans of AI’, this is a very human and laborious process. A machine can’t really sense its environment like we do. All it can do is cross-referencing the data it’s fed with the data that it has learned from until it finds the most similarities. It also can’t know what to make of this conclusion on its own, which takes us to the final part of this intelligent system: action. What does it do with its discovery? Well, that’s another entirely human decision. It blindly serves its master. In some cases, it decides to turn left or right. In others, it assigns a label to an image, or a color to a pixel. Point being, both the beginning and end of an AI system are completely within human control.

Press buttons to interact.
This demonstrations highlights the limitations of machine learning by having it try to classify a person. The data this algorithm is trained on focuses primarily on animals and objects and doesn't include people.

What about the middle part? What exactly goes on inside the ‘brain’ of such a system? How does it ‘interpret’ the data? Let’s focus on the very classic example of an Image Classifier, since that’s what HOPE is, and it paints a very approachable picture of how a ML model works. Image classification is the task of describing what is seen in an image, usually by assigning a name, an attribute or a category to it (orange/apple, happy/sad, person/animal). The ‘brain’ behind this task is, much like real brains, based on a network of agents that act just like neurons: they perform very specific tasks and communicate their results to the other neurons in the network. They do this in multiple steps and iterations until the pixels from an image are transformed into a label (Yiu, 2019). Think of a bunch of layers (or walls) of neurons, each layer performing a different task – detecting edges, detecting colors, compressing data, etc. This is what’s known as an artificial neural network (NN). Putting it simply, a NN boils down an image to its essence, in order to classify it (not ‘understand’ it!).

Overview of the data processing in HOPE, illustrating how a basic image classification algorithm works.

Now that the myth behind ‘intelligent’ systems is hopefully debunked (at least at a very basic level), let’s talk about how they can achieve transparency. But first of all, why should they? I can think of a few reasons: transparency leads to better understanding, which prevents fear and misinterpretation. A transparent system is fair, as it can be questioned and examined. And a transparent system is accessible and educational, allowing new talent to enter the field, ultimately pushing it forward. The first step towards transparency in AI would be making the raw data and its subsequent categorisations known and publicly accessible. An example of this is ImageNet, one of the most popular training sets for ML. Despite having many instances of questionable labeling and categorization (Crawford and Paglen 2019), ImageNet is open to the public, allowing these issues to be discussed. How the model is deployed (how does it act on its conclusions) should also become known and be justifiable. The programmers themselves should become known too, so that someone is accountable for the actions of a program when these are put into question. Lastly, the data processing needs to be explained and justified as well. The internal layers of a neural net, often regarded as the ‘black box’ or ‘hidden layers’ (Schiffman, 2017), should be made visible so that they can be reviewed by peers. This won’t only increase their effectiveness but also contribute to the creative commons and won’t allow companies to hide behind layers of complexity (O’Neil 2017: 29).

From the point of view of someone working with AI for the first time, the way to achieve transparency is to opt for the less traveled path and do everything as much ‘from scratch’ as possible. Pretrained models do exist, and can save a lot of time, however they pose two problems. For once, they’re full of superfluous junk, which is a slightly pejorative term for stuff that you don’t need, and will render your project unnecessarily complex. Secondly and as a direct consequence of the first point, this will limit the amount of control one has over their own model and they’ll be abdicating power to someone else’s ‘magic’ AI. Learning how to program HOPE from scratch was a lengthy task, but it enabled me to have a much deeper understanding on how it learns and interacts. It allowed me to experience the laborious process of creating and labeling a dataset and establish a clear distinction between a ‘tool’ and ‘technology’. By crafting HOPE, I created a tool. A process that makes use of many underlying technologies to achieve a very specific goal. These technologies are a result of intense human labour, knowledge that was gathered during many years, and it is my opinion that this human aspect should be taken in consideration when carrying out the role of a digital craftsperson.

Meaningful Conversations

What does it mean to talk to a non-living, intelligent system? Now that we’ve established what that is, and how it doesn’t actually ‘understand’ things in the same sense as we do, I will focus on it’s application in art and design. On how it can be repurposed to favour creativity, above functionality and what is the added value, if there is any, of hybrid man and machine forms of expression.

HOPE in action: young man interacts with AI.

As part of my research for this project, I spoke to Su Saka (see appendix), a psychology student interested in human behaviour and human interactions, about what a conversation can be, and the idea of giving meaning to interactions. During our conversation, we established that a conversation has to be an interactive process and it needs to involve some kind of information transfer between the participating parties. Whereas most daily human-to-human interactions involve language, there are many exceptions to the rule. Instances where information is shared via non-verbal communication. For once, following someone’s gaze and interpreting their body language. As Su pointed out to me, some forms of conversation are completely stripped of verbal language. When two people who don’t speak a common language interact, they’re able to convey meaning by using gestures, noises and expressions. Or my favourite example, when two musicians who are accustomed to play together are improvising in harmony, understanding each other’s cues without talking. However, there is another key feature of a conversation that needs to be considered. Usually, all the parties involved in a conversation are conscious beings. Su argues that, when interacting with a pre-programmed system, one is simply getting stimuli from outside and integrating it within their own consciousness. Still, she reckons that there is meaning attached to such interactions. Feelings of accomplishment, frustration, etcetera, are fairly common. However, the same feelings are present in very introspective interactions, such as solving a puzzle, or riding a bicycle. The presence of meaning and language are therefore not the defining characteristics of a conversation. Conscious participants might be, but consciousness itself is a blurry concept. Panpsychism, for example, is a view on consciousness that suggests that every information processing system can be thought of as a consciousness by itself. So, anything that exists within some kind of information transfer, could be considered conscious. This essay won’t attempt to define consciousness or dive any further in that tangent. Nevertheless, what can be drawn from the first part of our talk is that a conversation is a very broad notion and there are indeed many situations where one can meaningfully interact with an artificial system. This leads us to the following questions: what meaning is embedded in a system, when designing it? What are the other factors that contribute to the creation of meaning and how many of those factors are human? Is there a real contribution of the non-living to the production of meaning?

Later on our interview, Su and I both agreed that some views and biases of the creators bleed into any tool that’s designed by humans. She thinks we often forget that there are humans behind every technology we use, so the embedded biases do not consciously affect our reactions, but that they do influence us unconsciously. Creating technology that’s truly neutral, completely ridden of human bias, is a complex and paradoxical task that’s being taken on (and badly failed at) by tech giants (O’Neil, 2017: 22-34). These biases often lie in the data itself, and not on the programming, making it very hard to tackle. As a designer, I reckon there’s a fine line between letting something stand on its own or letting it be smothered by one’s ideals and personal preferences. This line is especially important when working collaboratively or representing groups of people that one’s not a part of. I believe it is in many aspects a matter of using the right tools, the right visual language and having a clear and solid theory. Yet, it is also a matter of personal sensibility and tact, making the fine line even thinner.

Parient undergoing treatment at psychedelic psychotherapy room a Johns Hopkins. Photo Credit: Matthew W. Johnson CC BY-SA 3.0

Besides the agents in a conversation, there are other forces contributing to the creation of meaning. These are set and setting, in other words the context in which an exchange of information takes place. I first read about the power of set and setting when reading about shamanic practices and psychedelic therapy. Apparently, a person’s environment can dramatically shift their experience with powerful drugs, from very negative to very positive. It’s such a big factor that hospitals have carefully designed rooms with comfortable furniture, art spiritual imagery just for this kind of therapy (Pollan 2019: 331–32). When I asked Su about it, she introduced me to the idea of embodied cognition. Cognition becomes embodied when features of an agent's body deeply influence cognitive processes (Wilson, Foglia 2015). In simpler terms, it states that cognition doesn’t happen just by itself, but within an environment or a context. Such as the position of an agent’s body in a room and their sensory experience. What this means for someone in the field of interaction design is that the context in which a project is inserted strongly affects how users experience it. Think of my ML synthesizer, HOPE. If it exists as an interactive tool in the browser, it can be easily shared and played with, however there is an argument for the quality of those interactions compared to the following scenario. Imagine that it’s placed on a stage instead, in the middle of a busy exhibition. There are people watching, lights pointing at it and loud musical instruments one can use to interact with it. This setting puts the whole experience in a more emotional context, making it arguably more significant than the first scenario. For a clear example of the power of set and setting in the arts, consider Marina Abramović’s 2010 performance ‘The Artist is Present’, where she invited spectators to sit across her in the middle of an exhibition, locking eyes with her for an extended duration of time. Her notorious performance brought many participants to tears as it created an extremely emotionally charged setting.

This brings us to our last important question about meaning. Given that there are a series of external forces that affect how an interaction is perceived, as we established (programmed bias, opinionated design processes, and the context in which the said interaction occurs), how can an artificial intelligence significantly contribute to an experience? Su makes the argument that AI or not, an interaction with a non-living system is still just an extension of one’s cognition. No matter how advanced, an AI is still environment-specific, she says. It’s adaptability and perception are limited to a finite set of scenarios. Without reciprocal perception, she says, any meaning derived from an interaction with a machine is still just an internal process of the living agent. These processes are, of course, influenced by everything mentioned above as well as what the system produces. It’s output. This is where I believe an artificial intelligence is infinitely more interesting than simple logical algorithms. By training a neural network with my own data and feeding it with input that it has never seen before, I’m able to turn over some of the power to the computing brain. There’s a certain level of unpredictability (as well as some level of control) on how a ML program gives output. Yet, the output isn’t random at all. It’s carefully chosen by a complex simulated intelligence, making for a series of patterns and associations that wouldn’t be present otherwise. From my point of view, that’s where different meanings and interpretations are suggested to the ones interacting with it. Suggested meaning that is partially beyond human control, partially being the very important keyword.

In the case of HOPE, the process of listening and playing back is just as described above: partially beyond my control, but not random. Choosing to create something that makes music, or sound – a distinction that this project also investigates – was my way to tackle embedded biases. Unlike words or text, music is a subjective language. There are undoubtedly some universal signifiers in music, such as minor and major chords, loud and soft noises, but most of those signs fall under the unpredictability threshold. They are chosen by HOPE itself beyond my comprehension. Music is also an universal language to converse in. I wanted to create conversations, and musical improvisation seemed like an adequate vehicle to have them in. This partial unpredictability makes an interesting case for creative AIs. I firmly believe in the potential of playful tech – one that can be easily accessed and tinkered with, serving a purpose that’s creative rather than functional. Shifting the power from big tech companies and elite scientists to the hands of artists, creatives and critical thinkers. Unreliability may be undesirable in real-world applications where lives are at stake. Let’s say self-driving cars, for example. However in instances where it is used creatively and the stakes are not so high, this partial loss of control results in a truly hybrid creation. This enables new forms of personal and artistic expression and strives for a more harmonious relationship between nature and technology, an idea that is best described by Richard Brautigan's poem, ‘All Watched Over by Machines of Loving Grace:’

“I like to think (and
the sooner the better!)
of a cybernetic meadow
where mammals and computers
live together in mutually
programming harmony
like pure water
touching clear sky.”

Towards a Hybrid Future

Hybrid Intelligence (HI) is the combined intelligence of human and machine intelligence (The Hybrid Intelligence Centre, acc. 2021), according to most definitions. It’s partly a critical response to current AI development, attempting to expand human intellect through AI instead of replacing it. In the context of this essay, I will refer to a broader notion of hybrid intelligences. By HI, I mean any combined living and non-living intelligence. I want to approach hybridity from a less human-centred philosophy, as I connect it to the future of art and design practices. I believe this future has a place for multi-species communication and will tackle problems that are beyond human.

The rise of hybrid forms of artistic creation will force us to reevaluate many aspects of knowledge production. For once, the idea of proprietary rights. These can be very simple in some cases but rather complex in others. For instance, when an artist makes a painting, most would argue that they own the piece. Upon closer examination, one could say that even though they made the painting, someone else made the brush and the canvas and maybe someone else taught them how to paint. One could nitpick infinitely but the bottom line is, some good judgement is usually required. Needless to say, in the example above this judgement goes without say. However, in the digital age the boundaries are much blurrier. The rise of complex tools of digital production, proprietary software and massive scale online publishing platforms gave way to complicated copyright laws and frequent conflicts of ownership, even in the professional sphere. Gone are the days where one could just claim a painting that they made. In the digital age, one must purchase a hefty license to use each brush, pigment and canvas. Hybrid methods of creation will henceforth be increasingly relevant, as they blur the idea of ownership even further. To a point where no one can truly claim the result of their collaboration with another form of intelligence, even an artificial one. Neither can the developers of such intelligence, as from the point it’s deployed, they only have partial control over the output.

Let’s take a closer look at HOPE, for clearness sake. I, the designer, programmed it using primarily ml5.js, an open source library for machine learning in the web. While making it, I acquired practical knowledge through many tutorials online. During the design process, I have control over some of its properties: the data I train it with, how that data is processed, the types of sound it can play, the functions of the interactive interface and the context in which it is presented. The participant who chooses to interact with it has control over what sound he sends in as input and the functions built into the interface, such as the option of toggling drums and piano or controlling the volume and tempo at which it plays back. HOPE itself has the task of cross-referencing the input with the data I made available to it. How it does this depends mostly on the type, number and specific combination of image processing layers in it’s neural network. The outcome can be influenced by changing these properties, but not predicted. Not by a human brain at least, since it deals with extremely large numbers changing in real time, the type of operations suited for a calculator.

Influencers of the HOPE project's output, illustrating the idea of hybrid creation.

This intricate matrix of dependencies and influences makes for an output that’s not determined (only influenced!) by any of the participating agents alone (the developper, the participant and the machine). Much like a conversation, hybrid creation produces collaborative results that can’t be claimed.

I believe hybridity can be beneficial as it generates knowledge that can be accessed by anyone. Extending this idea beyond HOPE, a future where all of the content we hybrid intelligences generate belongs to the public will enable many to become creatives, experimenters and critical thinkers. It’s just as important that the tools themselves, such as this synthesizer, are part of the public domain. Going even further and extending this idea beyond humans, a world where HI ecompasses other living systems allows for new forms of between-species communication, maybe resulting in better collective understanding. A true cybernetic forest.

Conclusion

Artificial Intelligence and its many branches are not voodoo. They’re likely not the ultimate solution for a better future or the one thing that will end us all. They’re a tool that, just like any other, can be used in a plethora of different ways. Through this essay and my project ‘HOPE’, I investigated the potential of using such technologies creatively. I set out to find what meaning could we extract from interacting with these systems and found that meaning can be present in many shapes and forms. It’s not restricted to language or to any one of the senses, it’s unique to everyone and can be attached to anything, from complex living systems to lifeless objects. When interacting with an intelligent machine, there are a series of forces at play creating meaning, many of which are within human manipulation and some which are not. Biased data, opinionated design processes and setting are all big influencers. They’re possible to control but hard to suppress. However, the way an AI reacts to new input is hard to predict, as it’s a very data-heavy process, but it’s far from random. This duality happens because there is a logic assigned to an AI, which is based solely on what it knows: the data it’s fed. A living being, on the other hand, has a much broader and more dynamic frame of reference, making it unable to access the same logic as it’s artificial peer. This creates a fun interaction where the living intelligence can detect meaningful patterns in the non-living intelligence’s output and react accordingly. This interaction is still influenced by human forces, of course. Nevertheless, I believe it is the closest one can get to a real conversation with an artificial system at this time.

I believe that this process is especially valuable when applied to artistic and personal expression, or other creative endeavors. When creating collaboratively, the participant and the computer become a hybrid intelligence. Firstly, this opens up new ways of expression. Secondly, hybrid forms of creation force us to rethink the whole process of knowledge production. They blur the concepts of ownership and proprietary rights, making for knowledge that’s easily and universally accessible.

We shouldn’t let ourselves be stopped by debates of techno-optimism versus negativism nor accept technology as it is. We should strive to investigate, reappropriate and speculate in order to find the right answers.

Bibliography

Abramović, Marina. “The Artist is Present.” 2010

“Artificial Intelligence.” The Merriam-Webster Dictionary, www.merriam-webster.com/dictionary/artificial%20intelligence Accessed 6 Dec. 2020.

Brautigan, Richard. “All Watched Over By Machines of Loving Grace.” Limited Edition, The Communication Company, 1967.

Crawford, Kate and Paglen, Trevor. Excavating AI: The Politics of Training Sets for Machine Learning (September 19, 2019) excavating.ai.

Eliot, Lance. “Strong AI Versus Weak AI Is Completely Misunderstood, Including for AI Self-Driving Cars.” Forbes, 15 July 2020, www.forbes.com/sites/lanceeliot/2020/07/15/strong-ai-versus-weak-ai-is-completely-misunderstood-including-for-ai-self-driving-cars/?sh=5dddedf9227c

“Hybrid Intelligence: Augmenting Human Intellect.” The Hybrid Intelligence Centre, www.hybrid-intelligence-centre.nl. Accessed 2 Jan. 2021.

Marr, Bernard. “The Key Definitions of Artificial Intelligence (AI) That Explain Its Importance.” Forbes, 14 Feb. 2018, www.forbes.com/sites/bernardmarr/2018/02/14/the-key-definitions-of-artificial-intelligence-ai-that-explain-its-importance/?sh=feedb514f5d8.

Marr, Bernard. “What Is Machine Learning - A Complete Beginner’s Guide.” Bernard Marr, 2020, www.bernardmarr.com/default.asp?contentID=1140.

O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, 2017.

Pollan, Michael. “Chapter Six – The Trip Treatment: Psychedelics in Psychotherapy.” How to Change Your Mind: The New Science of Psychedelics, Penguin Random House, 2019

Schiffman, Daniel. “10.1: Introduction to Neural Networks - The Nature of Code.” YouTube, uploaded by The Coding Train, 26 June 2017, www.youtube.com/watch?v=XJ7HLz9VYz0&t=1s

Schmitt, Philipp. “Humans of AI,” 2019, humans-of.ai/editorial

Wilson, Robert, and Foglia, Lucy. “Embodied Cognition.” Stanford Encyclopedia of Philosophy, 8 Dec. 2015, plato.stanford.edu/entries/embodied-cognition.

Yiu, Tony. “Understanding Neural Networks - Towards Data Science.” Medium, 4 Aug. 2019, towardsdatascience.com/understanding-neural-networks-19020b758230.

Tools Utilised

ml5.js library, Friendly Machine Learning for the Web. Website. Github.

p5.js library, a JavaScript library for Creative Coding. Website. Github.

ImageNet image database. image-net.org/.

Appendix

Conversation with Su Saka (pdf)