13th Gwangju Biennale — Minds Rising Spirits Tuning

Sign up for minds rising journal

Matteo Pasquinelli and Vladan Joler

The Nooscope Manifested: Illuminating AI as Instrument of Knowledge Extractivism

Lens

  1. Some enlightenment on the project to mechanise reason.

 

The Nooscope is a cartography of the limits of artificial intelligence, as a provocation to both computer science and the humanities. Any map is a partial perspective, a way to provoke a debate. Not differently, this map is a manifesto — of AI dissidents. Its main purpose is to challenge the mystifications of artificial intelligence. First, as a technical definition of intelligence and, second, as a political form that would be autonomous from society and the human. In the expression ‘artificial intelligence’ the adjective ‘artificial’ carries the myth of autonomous technology, hinting to caricature ‘alien minds’ self-reproducing in silico but, actually, mystifying two other phenomena: the growing geopolitical autonomy of hi-tech companies within the crisis of nation states and the invisibilisation of workers’ social autonomy worldwide. The modern project to mechanise human reasoning has mutated, in the 21st century, into a regime of knowledge extractivism and epistemic colonialism.[1] No surprise, since machine learning algorithms are the most powerful algorithms to compress, and, then, capitalise information.

The purpose of the Nooscope map is to secularize AI, to downgrade it from the ideological status of ‘intelligent machine’ to the more prosaic reality of instrument. It would be more correct to frame machine learning as an instrument of knowledge magnification that helps to perceive features, patterns, and correlations through vast spaces of data beyond human reach (like optical instruments did already in the history of science). In this sense, a machine learning system is a nooscope (from the ancient Greek skopein ‘to look at, to examine’ and noos ‘knowledge’).[2]

Borrowing this idea from Leibniz, the Nooscope diagram applies the analogy of optical media to the structure of all machine learning apparatuses. Discussing the power of his calculus ratiocinator and ‘characteristic numbers’ (the idea to design a numerical universal language to codify and solve all the problems of human reasoning), Leibniz made an analogy with instruments of visual magnification such as microscope and telescope. He wrote:

Once the characteristic numbers are established for most concepts, mankind will then possess a new instrument which will enhance the capabilities of the mind to a far greater extent than optical instruments strengthen the eyes, and will supersede the microscope and telescope to the same extent that reason is superior to eyesight.[3]

Although the purpose of this text is not to reiterate the opposition between quantitative and qualitative cultures, Leibniz’s credo needs not to be followed. Controversies cannot be computed. Machine learning is not the ultimate form of intelligence.

Instruments of measurement and perception always come with inbuilt aberrations. In the same way that the lenses of microscopes and telescopes are never perfectly curvilinear and smooth, the logical lenses of machine learning embody faults and biases. To understand machine learning and register its impact on society is to study the degree by which social data are diffracted and distorted by these lenses. This is generally known as the debate on bias in AI, but the political implications of machine learning logical form are deeper. Machine learning is not bringing a new dark age but one of diffracted rationality, in which, as it will be shown, an episteme of causation is replaced by one of automated correlations. More in general, AI is a new regime of truth, scientific proof, social normativity and rationality, which often does take the shape of a statistical hallucination. This diagram manifesto is another way to say that AI, the king of computation (patriarchal fantasy of mechanised knowledge, ‘master algorithm’ and alpha machines) is naked. Here, we are peeping into its black box. 

 

On the the invention of metaphors as instrument of knowledge magnification.
Emanuele Tesauro, Il canocchiale aristotelico [The Aristotelian Telescope], frontispiece of the 1670 edition, Turin.

 

 

  1. The assembly line of machine learning: Data, Algorithm, Model.

 

The history of AI is a history of experiments, of machine failures, academic controversies, and epic rivalries around military funding, popularly known as ‘winters of AI.’[4] Although corporate AI today speaks the language of ‘black magic’ and ‘superhuman cognition’ to describe the powers of machine learning, current techniques are actually still at the stage of experiment.[5] AI is at the same stage as when the steam engine was invented, but the laws of thermodynamics to explain and control its inner workings had not yet been discovered. Similarly, today, there are efficient neural networks for image recognition, but there is no theory of learning to explain why they work so well and how they fail so badly. As any invention, the paradigm of machine learning consolidated slowly in the last half a century. There has never been a master algorithm appearing overnight, but a gradual construction of a method of computation that still has to find a common language today. Manuals of machine learning for students, for instance, do not share yet a common terminology. How to sketch, then, a critical grammar of machine learning that may be concise and accessible, without playing into the paranoid game to define General Intelligence?

Qua instrument of knowledge, machine learning can be said to composed by an object to be observed (training dataset), an instrument of observation (learning algorithm) and a final representation (statistical model). The assemblage of the three elements is proposed here as a spurious and baroque diagram of machine learning that is extravagantly termed Nooscope.[6] Keeping the analogy of optical media, it can be said that the information flow of machine learning is like a light beam that is projected by the training data, compressed by the learning algorithm and diffracted by the lens of the statistical model towards the world.

The Nooscope diagram aims at illustrating, at the same time, two sides of machine learning: how it works and how it fails — enumerating its main components but also enumerating the broad spectrum of errors, limitations, approximations, biases, faults, fallacies and vulnerabilities that are native to its paradigm.[7] This double operation is to stress that AI is not a monolithic paradigm of rationality but a spurious architecture made of adaptive techniques. Besides, the limits of AI are not simply technical but an imbrication of human and technical ones. (In the Nooscope diagram the essential components of machine learning are represented at the centre, human biases and interventions on the left, while technical biases and limitations on the right.[8] Optical lenses symbolize biases and approximations, representing the compression and distortion of the information flow. The total bias of machine learning is represented by the central lens of the statistical model through which the perception of the world is substantially diffracted.)

The limitations of AI are primarily perceived as the debate on bias, that is the amplification of gender, race and class discrimination by algorithms. In machine learning it is necessary to distinguish, at least, between historical bias, dataset bias and algorithm bias, that occur at different stages of the information flow.[9] Historical bias (or, world bias) is already apparent in society before technological intervention. Nonetheless, the naturalisation of such bias, that is the silent integration of inequality into an apparently neutral technology is by itself harmful.[10] Ruha Benjamin has called it as the New Jim Code: ‘the employment of new technologies that reflect and reproduce existing inequalities but that are promoted and perceived as more objective or progressive than the discriminatory systems of  a previous era.’[11] Dataset bias is introduced through the preparation of training data by human operators,. The most delicate part of the process is data labelling, in which old and conservative taxonomies can cause a distorted view of world cultures, misrepresenting social diversities and exacerbating social hierarchies, as it will be seen in the case of ImageNet.

Algorithmic bias (also known as machine bias, statistical bias or model bias, to which the Nooscope diagram gives particular attention) is the further amplification of historical bias and dataset bias by machine learning algorithms. The problem of bias is originated mostly from the fact that machine learning algorithms are among the most efficient algorithms for information compression that engenders issues of information resolution, diffraction and loss.[12] Since ancient times, algorithms have been procedures of an economic nature, designed to achieve a result in the shortest number of steps consuming the least amount of resources, in terms of space, time, energy and labour.[13] The arms race of AI companies is, still today, about finding the fastest and simplest algorithms to capitalise data. Eventually, in AI companies information compression measure the ratio of profit — but, on the side of society, it measures the ratio of discrimination and the loss of cultural diversity.

While the social limitations of AI are popularly understood under the issue of bias, the common understanding of technical limitations is known as the black box problem. The black box effect is an actual issue of deep neural networks (which filter information so much that their chain of reasoning cannot be reversed), but has become a generic alibi to say that AI systems are not just inscrutable and opaque, but even ‘alien’ and out of control.[14] The black box effect is part of the nature of any experimental machine at the early stage of development (it has been already noticed, that the functioning of the steam engine remained a mystery for a while even after been successfully tested). The problem is the black box rhetoric itself that shares regions with conspiracy theory sentiments for which AI is an occult power that cannot be really studied, known, and politically controlled.

 

 

  1. The training dataset: the social origins of the intelligence of machines.

 

Mass digitalisation, which expanded with the Internet in the 1990s and escalated with datacentres in the 2000s, has made vast resources of data available that, for the first time in history, are free and unregulated. A regime of knowledge extractivism (then known as Big Data) gradually employed efficient algorithms to extract ‘intelligence’ from these open sources of data, mainly for the purpose to predict consumer behaviours and sell ads. The knowledge economy morphed into a novel form of capitalism, called cognitive capitalism and then surveillance capitalism by different authors.[15] It was the Internet information overflow, its vast datacentres, faster microprocessors and algorithms for data compression that laid the groundwork for the rise of AI monopolies in the 21st century.

What kind of cultural and technical object is the dataset that constitutes the source of AI? The quality of training data is the most important factor affecting the quality of the so-called ‘intelligence’ that machine learning algorithms extract. There is an important perspective to take into account in order to understand AI as a nooscope. Data are first, they are the source of value and intelligence. Algorithms are second, they are the machines that compute such value and intelligence. However, training data are never raw, independent and unbiased.[16] The carving, formatting and editing of training datasets is a laborious and delicate undertaking, which is probably more significant, for the final results, than the technical parameters that control the learning algorithm. The act of selecting one data source rather than another is the profound mark of human intervention into the domain of the ‘artificial’ minds.

Data

 

The training dataset is a cultural construct, not just a technical one: it usually comprises input data that are associated with ideal output data, such as pictures with their description, also called labels or metadata.[17] The canonical example would be a museum collection and its archive, in which artworks are organised by metadata such as author, year, medium, etc. The semiotic gesture to assign a name or a category to a picture, however, is never impartial: this leaves another deep human imprint on the final result of machine cognition. A training dataset for machine learning is usually composed through the following steps: 1) production: individual labour or phenomena that produce information; 2) capture: the encoding of information into a data format by an instrument. 3) formatting: the organisation of data into a metadata format and dataset. 4) labelling: the application of categories of a given taxonomy to data.

Machine intelligence is trained on vast datasets that are accumulated in ways neither technically neutral nor socially impartial. Raw data do not exist, as they are dependent on individual labour, personal data and social behaviours that accrue over long periods of time, from extended networks and controversial taxonomies.[18] The main training datasets for machine learning (NMIST, ImageNet, Labelled Faces in the Wild, etc.) originated in corporations, universities, and military agencies of the Global North. Taking a more careful look, one discovers a profound division of labour that innervates into the Global South via crowdsourcing platforms that are used to edit, label and validate data (such as Amazon Mechanical Turk, cynically termed ‘artificial artificial intelligence’).[19]

The parable of the ImageNet dataset exemplifies the troubles of many AI datasets. ImageNet is a training dataset for machine learning that has became the de facto benchmark for image recognition algorithms: indeed, the Deep Learning revolution started in 2012 when Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton won the annual ImageNet challenge with the convolutional neural network AlexNet.[20] ImageNet was initiated by computer scientist Fei-Fei Li back in 2006.[21] Fei-Fei Li had three intuitions in the building of a reliable dataset for image recognition. First, to download millions of free images from web services such as Flickr and Google. Second, to adopt the computational taxonomy WordNet for image labels.[22] Third, to outsource the work of labelling millions of images via the crowdsourcing platform Amazon Mechanical Turk. At the end of day (and of the assembly line), anonymous workers from all over the planet were paid few cents to label hundreds of pictures per minute according to the WordNet taxonomy: this ended up to be the engineering of a controversial cultural construct. AI scholars Kate Crawford and artists Trevor Paglen have investigated and disclosed the sedimentation of racist and sexist categories in ImageNet taxonomy: for instance, there is the implicit legitimisation of the  category ‘failure, loser, nonstarter, unsuccessful person’ for a hundred picture of people.[23]

The voracious data extractivism of AI has caused unforeseeable backlash on digital culture: in the early 2000s Lawrence Lessig could not predict that the large repository of online images credited by Creative Commons licenses would become an unregulated resource for surveillance technologies of face recognition a decade later. In similar ways, personal data continue to disappear into privatised datasets for machine learning unknowingly and without transparency. In 2019, for the first time, artist and AI researcher Adam Harvey has disclosed the use of personal photos without consent in training datasets for face recognition, causing Harvard University, Duke University and Microsoft to withdraw their datasets in a major privacy infringement scandal.[24] Online training datasets trigger issues of data sovereignty, privacy and civil rights that traditional institutions are slowly becoming aware of (the GDPR data privacy regulation that was passed by the European Parliament in May 2018 is an improvement compared to the regulation missing in the United States). If 2012 is the year in which the Deep Learning revolution began, 2019 is definitely the year in which it was discovered that its sources are vulnerable and corrupted.[25]

 

Combinatorial patterns and Kufic scripts, Topkapi scroll, ca. 1500, Iran.

Combinatorial patterns and Kufic scripts, Topkapi scroll, ca. 1500, Iran.

 

 

 

  1. The evolution of AI as the automation of pattern recognition. 

 

The need to demystify AI (at least from the technical point of view) is understood in the corporate world too. Head of Facebook AI and godfather of convolutional neural networks Yann LeCun repeats that current AI systems are not a sophisticated versions of cognition, but rather, of perception.[26] Similarly, the Nooscope diagram exposes the skeleton of the AI black box and shows that AI is not a thinking automaton but an algorithm to perform pattern recognition. The notion of pattern recognition contains implicit issues to elaborate. What is a pattern, by the way? Is a pattern uniquely a visual entity?  What does it mean to read social behaviours as patterns? Is pattern recognition an exhaustive definition of intelligence? Most likely not. To clarify these issues, it would be good to do a bit of archaeology of AI.

The archetype machine for pattern recognition is Frank Rosenblatt’s Perceptron. Invented in 1957 at Cornell Aeronautical Laboratory in Buffalo, New York, its name is a shorthand for ‘Perceiving and Recognizing Automaton.’[27] Given a visual matrix of 20×20 photoreceptors, the Perceptron can learn how to recognise simple letters. A visual pattern is recorded as a continuous impression on a network of artificial neurons, that are firing up in concert to the repetition of similar images and activating one single output neuron: firing 1=true, if a given image is recognised, or 0=false, if a given image is not recognized.

The automation of perception, or better of the labour of perception as visual montage of pixels along a computational assembly line, was originally the idea of McCulloch and Pitt’s neural network.[28]  Once the algorithm for visual pattern recognition survived the ‘winter of AI’ and proved to be efficient in the late 2000s, it was applied also to non-visual datasets, properly inaugurating the age of Deep Learning (the application of pattern recognition techniques to any kind of data, not just visual). Today, in the case of self-driving cars, the patterns to recognize are objects in a road scenario. In the case of automatic translation, the patterns to recognize are the most common sequences of words across bilingual texts. Regardless of their complexity, from the numerical perspective of machine learning, notions such as image, movement, form, style and ethical decision can be all described as statistical distributions of a pattern. In this sense, pattern recognition has truly become a new cultural technique. For explanatory purposes, the Nooscope is described as a machine that operates on three modalities: training, classification, and prediction. In more intuitive terms, they can be termed: pattern extraction, pattern recognition and pattern generation.

Rosenblatt’s Perceptron was the first algorithm of machine learning in the contemporary sense. At a time when ‘computer science’ had not yet been adopted as definition, this field was called ‘computational geometry’ and specifically ‘connectionism’ by Rosenblatt himself, but the business of these neural networks was to calculate a statistical inference. What a neural network computes, in fact, is not an exact pattern but the statistical distribution of a pattern. Just scraping the surface of the anthropomorphic marketing of AI, one finds another technical and cultural object that needs examination: the statistical model. What is the statistical model in machine learning? How is it calculated? What is the relation between a statistical model and human cognition? These are crucial issues to clarify. In terms of the labour of demystification to be done (also to see some naïve questions evaporate), it would do good to reformulate the trite question ‘Can a machine think?’ into the theoretically sounder question ‘Can a statistical model think?’, ‘Can a statistical model develop consciousness?’, et cetera.

 

 

  1. The learning algorithm: compressing the world into a statistical model.

 

The algorithms of AI are often evoked as alchemic formulas to distil ‘alien’ forms of intelligence. But what do the algorithms of machine learning really do? Few bother to check, including the followers of AGI (Artificial General Intelligence). Algorithm is the name of a process, of a machine performing a calculation. The product of such machine is a statistical model (that would be more correct to term ‘algorithmic statistical model’). In the developer community, the term ‘algorithm’, in fact, is increasingly replaced with ‘model.’ The terminological confusion arises from the fact that the statistical model does not exist separately from the algorithm: somehow, the model exists inside the algorithm under the form of distributed memory across its parameters. Also for this reason, it is basically impossible to visualise an algorithmic statistical model as it is done with simple mathematical functions in books of mathematics, but the challenge is worthwhile.

In machine learning, there are many algorithm architectures: simple Perceptron, deep neural network, Support Vector Machine, Bayesian network, Markov chain, autoencoder, Boltzmann machine, etc. Each of these architectures has a different history (often rooted in military agencies and corporations of the Global North, it is repeated). Artificial neural networks started as simple computing structures that evolved into complex ones which are controlled nowadays by few hyperparameters that express millions of parameters.[29] For instance, convolutional neural networks are described by a few hyperparameters (number of layers, number of neurons per layer, type of connection, behaviour of neurons, etc.) that project a complex topology of thousands of artificial neurons with millions of parameters in total. The algorithm starts as a blank slate and, during the process called training, or ‘learning from data,’ it adjusts its parameters until it reaches a good representation of the input data. In image recognition, as already seen, the computation of millions of parameters has to resolve into a simple binary output: 1=true, a given image is recognised; or 0=false, a given image is not recognized.[30]

 

Source: https://www.asimovinstitute.org/neural-network-zoo

Source: www.asimovinstitute.org/neural-network-zoo

 

 

Attempting an accessible explanation of the relation between algorithm and model, let’s have a look at the complex Inception v3 algorithm, a deep convolutional neural network for image recognition designed by Google and trained on the ImageNet dataset. Inception v3 is said to have a 78% accuracy in identifying the label of a picture, but the performance of ‘machine intelligence’ in this case can be measured also by the proportion between the size of training data and the trained algorithm (or model). ImageNet contains 14 million images with associated labels and occupies circa 150 gigabytes of memory. On the other hand, Inception v3, which is meant to represent the information contained in ImageNet, occupies only 92 megabytes. The ratio of compression between training data and model partially describes also the ratio of compression and diffraction of the Nooscope lenses. A table from the Keras documentation compares these values (numbers of parameters, layer depth, file dimension and accuracy) for the main models of image recognition.[31] This is a brutalist but effective way to show the relation between model and data, to show how the ‘intelligence’ of machine learning algorithms is measured and assessed in the developers community.

 

Documentation for individual models

 

Model Size Top-1 Accuracy Top-5 Accuracy Parameters Depth
Xception 88 MB 0.790 0.945 22,910,480 126
VGG16 528 MB 0.713 0.901 138,357,544 23
VGG19 549 MB 0.713 0.900 143,667,240 26
ResNet50 98 MB 0.749 0.921 25,636,712
ResNet101 171 MB 0.764 0.928 44,707,176
ResNet152 232 MB 0.766 0.931 60,419,944
ResNet50V2 98 MB 0.760 0.930 25,613,800
ResNet101V2 171 MB 0.772 0.938 44,675,560
ResNet152V2 232 MB 0.780 0.942 60,380,648
InceptionV3 92 MB 0.779 0.937 23,851,784 159
InceptionResNetV2 215 MB 0.803 0.953 55,873,736 572
MobileNet 16 MB 0.704 0.895 4,253,864 88
MobileNetV2 14 MB 0.713 0.901 3,538,984 88
DenseNet121 33 MB 0.750 0.923 8,062,504 121
DenseNet169 57 MB 0.762 0.932 14,307,880 169
DenseNet201 80 MB 0.773 0.936 20,242,984 201
NASNetMobile 23 MB 0.744 0.919 5,326,716
NASNetLarge 343 MB 0.825 0.960 88,949,818

Source: keras.io/applications

 

Statistical models have always been influencing culture and politics. They did not just emerge with machine learning: machine learning is just a new way to automate the technique of statistical modelling. When Greta Thunberg warns ‘Listen to science’, what she really means, being a good student of mathematics, is ‘Listen to the statistical models of climate science.’ No statistical models, no climate science: no climate science, no climate activism. Climate science is indeed a good example to start with to understand statistical models.[32] Climate change is calculated, first, by collecting a vast dataset of temperatures from the whole planet surface in a given period and, second, by applying a mathematical model that plots the curve of temperature variations in the past and projects the same pattern into the future. Climate models are historical artefacts that are tested and debated within the scientific community and, today, also beyond that.[33] Machine learning models, on the opposite, are opaque and inaccessible to community debate. Given the degree of myth-making and social bias around its mathematical constructs, AI has indeed inaugurated the age of statistical science fiction (and of this large statistical cinema the Nooscope is the projector).

 

  1. All models are wrong, but some are useful.

 

‘All models are wrong, but some are useful’ — the canonical dictum of the British statistician George Box has encapsulated for all this time the logical limits of statistics and now of machine learning.[34] This maxim, however, is often used to legitimize the bias of corporate and state AI. Computer scientists argue that human cognition is about the capacity to abstract and approximate patterns: so what’s the problem with machines being approximate and doing exactly the same? Along this argument, it is rhetorically repeated that ‘the map is not the territory’, which sounds reasonable, but what should be contested is that AI is a heavily compressed and distorted map of the territory and that this map, as many forms of automation, is not open to community negotiation. AI is a map of the territory without community access and community consent.

How does machine learning plot a statistical map of the world? Let’s face a specific case, that is image recognition (the basic form of the labour of perception, which has been codified and automated as pattern recognition among other ‘economies of attention’).[35] Given an image to be classified, the algorithm detects the edges of an object as the statistical distribution of dark pixels surrounded by light ones (a typical visual pattern). The algorithm does not know what an image is, does not perceive an image as human cognition does, it only computes pixels, numerical values of brightness and proximity.[36] The algorithm is programmed to record only the dark edge of a profile (that is to fit that desired pattern only) and not all the pixels across the image (that would result in overfitting and repeating the whole visual field). A statistical model is said to be trained successfully when it can elegantly fit only the important patterns of the training data and generalise those patterns also to new data ‘in the wild.’ If a model learns the training data too well, it recognises only exact matches of the original patterns and will overlook those with close similarity ‘in the wild.’ In this case, the model is overfitting, because it has meticulously learnt everything (including noise) and is not able to distinguish a pattern from its background. On the other hand, the model is underfitting when it is not able to detect meaningful patterns from the training data.[37] The notions of data overfitting, fitting and underfitting can be visualised on a Cartesian plane.

 

Approximation

 

The challenge around the accuracy of machine learning is about calibrating the equilibrium between data underfitting and overfitting, which is difficult to achieve because of different machine biases. In fact, machine learning is a term that, as much as ‘AI,’ anthropomorphize a piece of technology: machine learning learns nothing in the proper sense of the word as a human does; it just maps a statistical distribution of numerical values, it draws a mathematical function that hopefully approximates the forms of human comprehension. Although, machine learning can, for this reason, cast a different light on the forms of human comprehension.

The statistical model of machine learning algorithms is also an approximation in the sense that it guesses the missing parts of the data graph: either through interpolation, which is the prediction of an output y within the known interval of the input x in the training dataset, or through extrapolation, which is the prediction of output y beyond the limits of x, often with high risks of inaccuracy. This is what ‘intelligence’ means today within machine intelligence: to extrapolate a non-linear function beyond known data boundaries. As Dan McQuillian aptly puts it: ‘There is no intelligence in Artificial Intelligence, nor does it really learn, even though it’s technical name is machine learning, it is simply mathematical minimisation.’[38]

It is important to remind that the ‘intelligence’ of machine learning is not driven by exact formulas of mathematical analysis but by algorithms of brute force approximation. The shape of the correlation function between input x and output y is calculated algorithmically, step by step, through tiresome mechanical processes of gradual adjustment (like gradient descent, for instance) that are equivalent to the differential calculus of Leibniz and Newton. Neural networks are said to be among the most efficient algorithms because these differential methods allow to approximate the shape of any function given enough layers of neurons and abundant computing resources.[39] Brute-force gradual approximation of a function is the core feature of today’s AI, and only from this perspective one can understand its potentialities and limitations, particularly its escalating carbon footprint (training deep neural networks require exorbitant amount of energy because of gradient descent and similar training algorithms that operate on the basis of continuous infinitesimal adjustments).

 

 

  1. World to vector.

 

The notions of data fitting, overfitting, underfitting, interpolation and extrapolation can be easily visualised in two dimensions, but statistical models usually operate along multidimensional spaces of data. Before being analysed, data are encoded into a multi-dimensional vector space that is far from intuitive. What is a vector space and why is it multi-dimensional? Cardon, Cointet and Mazière describe the vectorialisation of data in this way:

A neural network requires the inputs of the calculator to take on the form of a vector. Therefore, the world must be coded in advance in the form of a purely digital vectorial representation. While certain objects such as images are naturally broken down into vectors, other objects need to be ‘embedded’ within a vectorial space before it is possible to calculate or classify them with neural networks. This is the case of text, which is the prototypical example. To input a word into a neural network, the Word2vec technique ‘embeds’ it into a vectorial space that measures its distance from the other words in the corpus. Words thus inherit a position within a space with several hundreds dimensions. The advantage of such a representation resides in the numerous operations offered by such a transformation. Two terms whose inferred positions are near one another in this space are equally similar semantically; these representations are said to be distributed: the vector of the concept ‘apartment’ [-0.2, 0.3, -4.2, 5.1…] will be similar to that of ‘house’ [-0.2, 0.3, -4.0, 5.1…]. […] While natural language processing was pioneering for ‘embedding’ words in a vectorial space, today we are witnessing a generalization of the embedding process which is progressively extending to all applications fields: networks are becoming simple points in a vectorial space with graph2vec, texts with paragraph2vec, films with movie2vec, meanings of words with sens2vec, molecular structures with mol2vec, etc. According to Yann LeCun, the goal of the designers of connectionist machines is to put the world in a vector (world2vec).[40]

Such multi-dimensional vector space is another reason why the logic of machine learning is intuitively difficult to grasp. It is the field of digital humanities, in particular, that is studying the technique of vectorialisation through which, invisibly, our collective knowledge is rendered and processed. William Gibson’s original definition of cyberspace prophesized, most likely, the coming of a vector space rather than virtual reality: ‘A graphic representation of data abstracted from the banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the nonspace of the mind, clusters and constellations of data. Like city lights, receding.’[41]

 

Statua citofonica

Right: Vector space of seven words in three contexts.

 

 

Rather than abstract formulas applied to the world, machine learning resembles more craftsmanship. The history of AI is a history of hacks and tricks rather than of mystical intuitions. One trick of information compression, called dimensionality reduction, for instance, is used to avoid the Curse of Dimensionality, that is the exponential growth of the variety of features in the vector space. The dimensions of the categories that show low variance in the vector space (i.e. whose values fluctuate only a little) are aggregated to reduce calculation costs. Dimensionality reduction can be used to cluster word meanings (such as in the model word2vec) but can lead also to category reduction, which has clearly social impact when these statistical models represent cultural diversity. Dimensionality reduction can shrink cultural taxonomies and introduce bias: in this way, the statistical effect of machine learning on world diversity is normalisation, that is the equalisation of anomalies to an average norm and the obliteration of unique identities.[42]

 

Dimensions

 

 

  1. The society of classification and prediction bots.

 

Most of the contemporary applications of machine learning can be described according to the two modalities of classification and prediction, which gradually outline but the contours of a society of control and statistical governance. Classification is known as pattern recognition, while prediction can be defined also as pattern generation. A new pattern is recognized or generated by interrogating the inner core of the statistical model.

Machine learning classification is used to recognize a sign, an object or a face and to assign a category (label) according to a taxonomy, a cultural convention or previous unsupervised data. An input file (e.g. a face captured by a surveillance camera) is compared with the model to determine whether it falls within its statistical distribution or not. If so, it is assigned the corresponding output label. Since the times of the Perceptron, classification has been the originary application of neural networks: with Deep Learning this technique is found ubiquitously in face recognition classifiers that are deployed by state agencies and smartphone manufactures alike.

Machine learning prediction is used to project future trends and behaviours according to past ones, that is to complete a piece of information knowing only a portion of it. In the prediction modality, a small sample of input data (a primer) is used to predict the missing part of the information following once again the statistical distribution of the model (this could be the part of a numerical graph oriented toward the future, or the missing part of an image or audio file that has to be completed). Incidentally, other modalities of machine learning exist: the statistical distribution of a model can be explored in a dynamic way in a modality that is called latent space navigation and, in some recent design applications, also pattern exploration.

Machine learning classification and prediction are becoming ubiquitous techniques that constitute new forms of surveillance and governance. Some apparatuses, such as self-driving vehicles and industrial robots, can be an integration of both modalities. A self-driving vehicle is trained to recognize different objects on the road (people, cars, obstacles, signs) and predict future actions on the basis of decisions that a human driver has taken in similar circumstances. Even if recognising an obstacle on a road seems to be a neutral gesture (it’s not), identifying a human being according to categories of gender, race and class, as increasingly state institutions are doing, is clearly the gesture of a new disciplinary regime. The hubris of automated classification has caused the revival of Lombrosian techniques, that were thought to have been consigned to history, such as Automatic Gender Recognition (AGR), ‘a subfield of facial recognition that aims to algorithmically identify the gender of individuals from photographs or videos.’[43]

 

Modes

 

The predictive or generative modality of machine learning has recently had cultural implications: its use in the production of visual artefacts has been received by mass media as the idea that AI is ‘creative’ and can autonomously make art. In fact, an ‘artwork created by AI’ hides, in most of the cases, a human operator that has applied the generative modality of a neural network trained on a specific dataset. In this modality, the neural network is run backwards (moving from the smaller output layer toward the larger input layer) to generate new patterns after being trained at classifying them (that usually follows from the larger input layer to the smaller output layer). The generative modality, however, has some useful applications: it can be used as a sort of reality-check to reveal what the model has learnt, i.e. to show how the model ‘sees the world.’ It can be applied to the model of a self-driving car, for instance, to check how the road scenario is projected.

A famous way to illustrate how a statistical model ‘sees the world’ is Google DeepDream. DeepDream is a convolutional neural network based on Inception (which is trained on the ImageNet dataset mentioned above) that was programmed by Alexander Mordvintsev to project hallucinatory landscapes by enhancing patterns in images via algorithmic pareidolia. Mordvintsev had the idea to ‘turn the network upside down’, that is to take a classifier and turned into a generator, using some random noise or generic landscape images as input.[44] It was discovered that ‘neural networks that were trained to discriminate between different kinds of images have quite a bit of the information needed to generate images too.’ In DeepDream first experiments, bird feathers and dog eyes started to emerge everywhere as dog breeds and bird species are vastly overrepresented in ImageNet. It was also discovered that the category ‘dumbbell’ was learnt with a surreal human arm always attached to it. Proof that, most likely, many other categories of ImageNet are misrepresented.

The two main modalities of classification and generation can be assembled in further complex architectures such as in the Generative Adversarial Networks. In the GAN architecture a neural network with the role of discriminator (a traditional classifier) has to recognize an image produced by a neural network with the role of generator in a reinforcement loop that trains the two statistical models simultaneously. For some converging properties of their respective statistical models, GANs have been proved to be very good at generating highly realistic pictures, that has prompted their abuse in rising industry of ‘deep fakes’.[45] In terms of regime of truth, a similar controversial application is the use of GANs to generate synthetic data in cancer research, in which neural networks trained on unbalanced datasets of cancer tissues started to hallucinate cancer where there was none.[46] In this case ‘instead of discovering things, we are inventing things,’ Fabian Offert notices, ‘the space of discovery is identical to the space of knowledge that the GAN has already had. […] While we are thinking to see through a GAN, looking at something with the help of a GAN, we are actually seeing into a GAN. GAN vision is not augmented reality, it is virtual reality. GANs do blur discovery and invention.’[47] The GAN simulation of brain cancer is a tragic example of AI hubris as scientific hallucination.

 

Tumor

Joseph Paul Cohen, Margaux Luck and Sina Honari. ‘Distribution Matching Losses Can
Hallucinate Features in Medical Image Translation’, 2018. Courtesy of the authors.

 

 

 

  1. Faults of a statistical instrument: the undetection of the new.

 

The normative power of AI in the 21st century has to be scrutinised in these epistemic gestures: what does it mean to frame collective knowledge as patterns, what is to draw vector spaces and statistical distributions of social behaviours? According to Foucault, in early modern France, statistical power was already about measuring social norms, discriminating what was normal and what abnormal.[48] AI easily extends the ‘power of normalisation’ of modern institutions, among others the normative power of bureaucracy, medicine and statistics (originally, the numerical knowledge possessed by the state about its population) that passes now in the hands of global IT corporations.[49] The institutional norm has become a computational one: the classification of the subject, of bodies and behaviours, seems no longer to be an affair for public registers, but for algorithms and datacentres and. ‘Data-centric rationality’, Paula Duarte concludes, ‘should be understood as an expression of the coloniality of power.’[50]

A gap, a friction, a conflict, however, persists between machine learning statistical models and the human subject that is supposed to be measured and controlled. This logical gap between AI statistical models and society is usually debated as bias. It has been extensively demonstrated how face recognition misrepresents social minorities and how black neighbourhoods, for instance, are bypassed by AI-driven logistics and delivery service.[51] Gender, race and class discriminations are amplified by machine learning algorithms (Ruha Benjamin’s ‘New Jim Code’), but this is just the tip of a more profound problem. The logical and, at the same time, political limitation of AI is the difficult recognition and prediction of a new event by the information compression of its statistical models. How is machine learning dealing with a truly unique anomaly, an uncommon social behaviour, a disruptive innovation? The two modalities of machine learning, classification and prediction, display kinds of limitations that are not simply bias.

A logical limit of machine learning classification, or pattern recognition, is the inability to recognise a unique anomaly that appears for the first time, such as a new metaphor in poetry, a new joke in everyday conversation or an unusual obstacle (a pedestrian? a plastic bag?) on the road scenario. The ‘undetection’ of the new (something that has never ‘been seen’ by a model and therefore never classified before in a known category) is a particularly hazardous problem for self-driving cars, which has already caused fatalities. Machine learning prediction, or pattern generation, show similar faults in the guessing of future trends and behaviours. In statistical models, the temporal dimension is probably the key limitation. As a technique of information compression, machine learning but automates the dictatorship of the past, of past taxonomies and behavioural patterns over the present. This problem can be termed as the regeneration of the old.

Interestingly, in machine learning, the logical definition of a security issue also describes the logical limit of its creative potential. The problems in the prediction of the new are logically related to the problems in the generation of the new. The way a machine learning algorithm is asked to predict a trend on a time chart is identical to the way it has to generate a new artwork on the basis of the learnt patterns. The trite question ‘Is AI creative?’ should be reformulated in technical terms: is machine learning able to create works that are not imitations of the past? Is machine learning able to extrapolate beyond the stylistic boundaries of its training data? The ‘creativity’ of machine learning is limited to the detection of the old styles from the training data and the subsequent random improvisation along such styles. In other words, machine learning can explore and improvise only within the logical boundaries that are set by the training data. For all these issues and its degree of information compression, it would be more accurate, eventually, to term machine learning art as statistical art.

 

Statua citofonica

Lewis Fry Richardson, Weather Prediction by Numerical Process, Cambridge University Press, 1922.

 

 

Another profound problem of machine learning is how the statistical correlation between two elements can be adopted to explain causation from one to the other. In statistics, it is commonly understood that correlation does not imply causation, meaning that a statistical coincidence alone is not sufficient to demonstrate causation. Superficially mining data, machine learning can construct arbitrary correlations that are then perceived as real.[52] Such a logical fallacy easily becomes a political one. According to Dan McQuillan, when machine learning is applied to society in this way, predictive correlations are transformed into a political apparatus of preemption and endorse the predictive policing algorithms of police forces worldwide.[53] Ultimately, machine learning obsessed with ‘curve fitting’ imposes a statistical culture and replaces the traditional episteme of causation (and political accountability) with one of correlations (driven by the blind automation of decision making).[54]

 

 

  1. Adversarial intelligence vs. artificial intelligence.

 

So far the statistical diffractions and hallucinations of machine learning have been followed step-by-step through the multiple lenses of the Nooscope. But, at this point, the orientation of the instrument has to be reversed: scientific theories as much as computational devices are inclined to consolidate an abstract perspective, the scientific ‘view from nowhere’, that is often just the point of view of power. The obsessive study of AI can suck up the scholar into an abyss of computation and into the illusion that it is the technical form to illuminate the social one. As Paola Ricaurte remarks: “Data extractivism assumes that everything is a data source.’[55] It is time to realise that it is not the statistical model to construct the subject but, rather, the subject to structure the statistical model. Internalist and externalist studies of AI have to blur: the subject of control makes the mathematics of control from within, not from without. To second what Guattari once said of machines in general, one may say that machine intelligence too is constituted of “hyper-developed and hyper-concentrated forms of certain aspects of human subjectivity.”[56]

Rather than only studying how a technical apparatus works, a critical inquiry also studies how it breaks, how subjects rebel to its normative control, how workers sabotage its workings. A sound way to understand the limits of AI is to look at hacking practices. Hacking is an important method of knowledge production, a crucial epistemic probe into the obscurity of AI (the relation between AI and hackers, though, is not antagonistic as it may appear: it often resolves into a loop of mutual learning, of evaluation and reinforcement). For instance, new systems of face recognition based on machine learning have triggered new forms of counter-surveillance activism. Through techniques of face obfuscation, individuals decide to become unintelligible to artificial intelligence, to become themselves black boxes in respect to AI. Interestingly, the traditional techniques of obfuscation against surveillance acquire, in the age of AI, a mathematical dimension. AI artist and researcher Adam Harvey, for instance, has invented a camouflage textile called Hyperface that fools computer vision algorithms to see multiple human faces where there is none.[57] What is a face for a human and what for a computer vision algorithm? The glitches of Hyperface shows what a human face looks like to a machine: this gap between human perception and statistical perception helps to introduce, at this point, the growing field of adversarial attacks.

 

Adam Harvey, HyperFace pattern, 2016.

Adam Harvey, HyperFace pattern, 2016.

 

 

Adversarial attacks exploit blindspots and weak regions in a statistical model, usually, to fool a classifier and make the machine perceive something that is not there. In object recognition an adversarial example can be a doctored image of a turtle, for instance, that looks innocuous to a human eye but gets misclassified by a neural network as a rifle.[58] Adversarial examples can be realised as 3D objects and even stickers for road signs that can misguide self-driving cars (which may read a speed limit of 120 km/h where it actually says 50 km/h).[59] Adversarial examples can be designed by a human knowing what a machine has never seen, by reverse-engineering the statistical model or by polluting the training dataset. The technique of data poisoning targets the training dataset by introducing ad-hoc doctored data to alter the accuracy of the statistical model and even create a backdoor that can be used by specific adversarial examples. Although, data poisoning can be used to protect privacy by entering anonymized or random information into the dataset.

As Ian Goodfellow et al. have remarked, adversarial attacks point to a general mathematical vulnerability that seems common across all statistical models: ‘An intriguing aspect of adversarial examples is that an example generated for one model is often misclassified by other models, even when they have different architectures or were trained on disjoint training sets.’[60] Adversarial attacks reminds us of the discrepancy between human and machine perception and that the logical limit of machine learning is also a political one. It is clear that the logical and ontological boundary of machine learning is the unruly subject or anomalous event that escapes classification and control. The subject of algorithmic control, nevertheless, can fire back. Adversarial attacks are a way to sabotage the assembly line of machine learning, by inventing a virtual obstacle that can set the control apparatus out of joint. An adversarial example is the sabot in the age of AI.

 

 

  1. Labour in the age of AI

 

The nature of the ‘input’ and ‘output’ of machine learning apparatuses have to, eventually, be clarified. AI is not just an issue of bias, of information diffraction, but also a matter of labour. AI is not only a control apparatus, but a productive one. As already mentioned, an invisible workforce is involved in the dataset composition, algorithm supervision, model evaluation and application. Pipelines of endless tasks innervate from the Global North into the Global South: crowdsourcing platforms of workers from Venezuela, Brazil and Italy are crucial to teach German self-driving cars ‘how to see.’[61] Against the idea of alien intelligences at work, it must be stressed that in the whole computing process of AI the human worker has never left the loop (or better, the virtual assembly line). Mary Gray and Siddharth Suri have defined as ‘ghost work’ this work that make AI appearing artificially autonomous.

Beyond some basic decisions, today’s artificial intelligence can’t function without humans in the loop. Whether it’s delivering a relevant newsfeed or carrying out a complicated texted-in pizza order, when the artificial intelligence (AI) trips up or can’t finish the job, thousands of businesses call on people to quietly complete the project. This new digital assembly line aggregates the collective input of distributed workers, ships pieces of projects rather than products, and operates across a host of economic sectors at all times of the day and night.

Automation is actually a myth: some authors have suggested to replace automation with the more accurate term heteromation, as machines, including AI, constantly call for human help.[62]

Yet there is a more profound way in which labour constitutes AI. The source of machine learning can be called whatever ‘input data’, ‘training dataset’ or just ‘data’, but they are all a representation of human skills, activities and behaviours, social production at large. All training datasets are, implicitly, a diagram of a division of human labour that AI has to analyse and automate. Datasets for image recognition, for instance, record the visual labour (or, labour of perception) that drivers, guards and supervisors usually perform during their tasks. Even in the case of scientific datasets, it is about, obviously, scientific labour, research and observation. The whole information flow of AI has to be understood as an apparatus to extract ‘analytical intelligence’ from the most diverse forms of labour and to transfer such intelligence into machines (obviously including, within the definition of labour, extended forms of social, cultural and scientific production).[63] In short, the origin of machine intelligence is the division of labour and its main purpose is the automation of labour.

Historians of computation have already stressed the early steps of machine intelligence in the 19th century project to mechanise the division of mental labour, specifically the task of hand calculation.[64] The enterprise of computation has been since the one of combined surveillance and discipline of labour, of optimal calculation of surplus-value and planning of collective behaviours.[65] Computation was established by and still enforce a regime of visibility and intelligibility, not just of logical reasoning. The genealogy of AI as an apparatus of power is confirmed today by its widespread employment in technologies of identification and prediction, yet the core anomaly to compute still remains the disorganisation of labour.

As a technology of automation, AI will have a tremendous impact on the job market. If AI keeps an error rate of 1% in image recognition, for instance, it means that roughly 99% of routine jobs based on visual tasks (e.g. security at airports) can be replaced, legal restrictions and political opposition permitting. The impact of AI on labour is well synthetized (from the perspective of workers, finally) by a paper of the European Trade Union Institute, which highlights ‘seven essential dimensions that future regulation should address in order to protect workers: 1) safeguarding worker privacy and data protection; 2) addressing surveillance, tracking and monitoring; 3) making the purpose of AI algorithms transparent; 4) ensuring the exercise of the ‘right to explanation’ regarding decisions made by algorithms or machine learning models; 5) preserving the security and safety of workers in human–machine interactions; 6) boosting workers’ autonomy in human–machine interactions; 7) enabling workers to become AI literate.’

Regarding this last point, the Nooscope diagram wishes to address the need of a novel Machinery Question in the age of machine learning, a sort of Intelligence Machinery Question, that is of a popular movement which, like the campaign during the Industrial Revolution, may claim more collective intelligence about machine intelligence, more public education about learning machines and their regime of knowledge extractivism.[66] Too often, the relation between artificial intelligence and the production of collective knowledge as a common good is left unmentioned in the background. Corporate AI, as further stage of cognitive capitalism, is the capture and monopolisation of collective intelligence itself, its mechanisation with algorithmic techniques that render it gradually politically inert. The Nooscope’s modest purpose is to illuminate the invisible knowledge and labour that, like in the old Mechanical Turk trick, makes machine intelligence appear ideologically alive.

 

mechanical Turk

 

 

Thanks to Wietske Maas, Claire Glanois, Fabian Offert, Ariana Dongus and … for their comments and ideas.

 

_____

 

[1] On the epistemic colonialism of AI see: Matteo Pasquinelli, ‘Three Thousands Years of Algorithmic Rituals: The Emergence of AI from the Computation of Space,’ e-flux 101, June 2019.

[2] Digital humanities term a similar technique distant reading, which has gradually involved data analytics and machine learning in literary and art history. See: Franco Moretti, Distant Reading, London: Verso, 2013.m Londo.Books, 2013. udies.  machine learning ed. They are defined here as:g techniques, social as

[3] Gottfried Wilhelm Leibniz, ‘Preface to the General Science’, 1677.

[4] For a detailed and concise history of machine learning see: Dominique Cardon, Jean-Philippe Cointet and Antoine Mazières, ‘Neurons Spike Back. The Invention of Inductive Machines and the Artificial Intelligence Controversy’, Réseaux 2018/5 (n. 211), pp. 173-220.setectingngwhere it exists or adding where there is none.alf cancer tissues can start hallucinate cancer  deep neural networ.

[5] Alexander Campolo and Kate Crawford, ‘Enchanted Determinism: Power without Control in Artificial Intelligence,’ Engaging Science, Technology, and Society vol. 6, 2020, 1-19.

[6] The use of the visual analogy has also the intention to record the fading distinction between image and logic, representation and inference in the current technical composition. The statistical models of machine learning are operative representations (as in the send of Farocki’s ‘operative images’). See: Aud Sissel Hoel, ‘Operative Images: Inroads to a New Paradigm of Media Theory’, in:  Image – Action – Space: Situating the Screen in Visual Practice.  Walter de Gruyter, 2018.

[7] For a systematic studies of the logical limitations of machine learning see: Momin M. Malik, ‘A hierarchy of limitations in machine learning’. Arxiv preprint, 2020. https://arxiv.org/abs/2002.05193

[8] Internalist and externalist studies of machine intelligence merge where technical and social faults meet.

[9] For a more detailed list of AI biases see: Harini Suresh and John Guttag, ‘A framework for understanding unintended consequences of machine learning.’ Arxiv preprint (2019). arxiv.org/abs/1901.10002 See also:  Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. ‘A survey on bias and fairness in machine learning.’ Arxiv preprint (2019). arxiv.org/abs/1908.09635

[10] See: Virigina Eubanks, Automating Inequality, New York: St. Martin’s Press, 2018. See also: Kate Crawford, ‘The trouble with bias’, keynote lecture, Conference on Neural Information Processing Systems, 2017.

[11] Ruha Benjamin, Race After Technology: Abolitionist Tools for the New Jim Code. Cambridge: Polity, 2019, 5.

[12] In fact, computer scientists are more at ease with defining machine learning as a technique of information compression rather than as superhuman cognition. Computer scientists would argue that AI belongs to a subfield of signal processing, that is data compression.

[13] Pasquinelli, The Eye of the Master, forthcoming.

[14] Projects such as Explainable Artificial Intelligence, Interpretable Deep Learning and Heatmapping among others have demonstrated that breaking into the ‘black box’ of machine learning is possible. Nevertheless, the full interpretability and explicability of machine learning statistical models remains a myth, too. See: Zacharay C. Lipton, ‘The Mythos of Model Interpretability’, arXiv preprint, January 10, 2016. https://arxiv.org/abs/1606.03490

[15] A.Corsani, P. Dieuaide, M. Lazzarato, J.M. Monnier, Y. Moulier-Boutang, B. Paulré, and C.Vercellone, Le Capitalisme cognitif comme sortie de la crise du capitalisme industriel. Un pro- gramme de recherché, Paris: Laboratoire Isys Matisse, Maison des Sciences Economiques, 2004. Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. London: Profile Books, 2019.

[16] Cp. Lisa Gitelman (ed.), Raw Data is an Oxymoron, Cambridge, MA, MIT Press, 2013.

[17] In the so-called supervised learning. Also unsupervised learning or self-supervised learning maintain forms of human intervention.

[18] As Foucault has already elucidated for the taxonomies of the modern age. See: Michel Foucault, The Order of Things, London: Routledge, 2005.

[19] Jeff Bezos introduced the term. See: Jason Pontin. ‘Artificial intelligence, with help from the humans.’ The New York Times, 25 March 2007.

[20] The rise of Deep Learning is marked by this paper: Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. ‘ImageNet classification with deep convolutional neural networks’. Communications of the ACM 60 (6), 2017. Although the convolutional architecture dates back to Yann LeCunn’s work in the late 1980s.

[21] For an accessible (yet acritical) account of the ImageNet development see: Melanie Mitchell, Artificial Intelligence: A Guide for Thinking Humans. London: Penguin, 2019.

[22] WordNet is ‘a lexical database of semantic relations between words’ which was initiated by George Armitage at Princeton University in 1985. It provides a strict tree-like structure of definitions.

[23] Kate Crawford and Trevor Paglen, ‘Excavating AI: The Politics of Training Sets for Machine Learning (September 19, 2019) https://excavating.ai

[24] See: Adam Harvey’s project Megapixel (megapixels.cc). And: Madhumita Murgia, ‘Who’s using your face? The ugly truth about facial recognition’, Financial Times, 19 April 2019.

[25] In fact, bias can be introduced in the training dataset also by an act of data poisoning, see below.

[26] The distinction (or the impossibility of) between perception and cognition cannot be fully expanded here. See: Matteo Pasquinelli, The Eye of the Master, London: Verso, forthcoming.

[27] Frank Rosenblatt, ‘The Perceptron: A Perceiving and Recognizing Automaton’. Report 85-460-1. Cornell Aeronautical Laboratory, Buffalo, NY, 1957.

[28] Warren McCulloch and Walter Pitts, “How We Know Universals: The Perception of Auditory and Visual Forms”. The Bulletin of mathematical biophysics 9.3 (1947): 127-147. The evolution of pattern recognition is part of the history of the mechanisation of the gaze and the ‘cinematic mode of production’ that entered digital capitalism and its attention economy by other means. See: Jonathan Beller, The Cinematic Mode of Production: Attention Economy and the Society of the Spectacle. UPNE, 2012, 9.

[29] The parameters of a model that are learnt from data are called ‘parameters’, while parameters that are not learnt from data and are fixed manually are called ‘hyperparameters.’

[30] This value can be also a percentage value between 1 and 0.ue can be also ahow the n  ty? he contermpoary senseception is at the birht vers  two.

[31] https://keras.io/applications (Documentation for individual models)

[32] See: Paul Edwards, A Vast Machine: Computer Models, Climate Data, and The Politics of Global Warming. Cambridge, MA: MIT Press, 2010.

[33] See the the Community Earth System Model (CESM) that is developed by the National Center for Atmospheric Research in Bolder, Colorado, since 1996. ‘The Community Earth System Model is a fully coupled numerical simulation of the Earth system consisting of atmospheric, ocean, ice, land surface, carbon cycle, and other components. CESM includes a climate model providing state-of-art simulations of the Earth’s past, present, and future.’ http://www.cesm.ucar.edu

[34] George Box, ‘Robustness in the strategy of scientific model building’. Technical Report #1954, Mathematics Research Center, University of Wisconsin-Madison, May 1979.

[35] In late capitalism, ‘to look is to labor’, as Jonathan Beller reminds us. Jonathan Beller, The Cinematic Mode of Production: Attention Economy and the Society of the Spectacle. UPNE, 2012, 2.

[36] See the Gestalt controversy at the beginning of connectionism. Matteo Pasquinelli, The Eye of the Master. London: Verso, forthcoming.

[37] A third case maybe be given when a model learns a wrong pattern. If apophenia is the human tendency to perceive meaningful patterns in random data, machine apophenia happens when a statistical model records a pattern that is not there, that is, it reads noise as similar to an existing pattern.

[38] Dan McQuillan, ‘Manifesto on Algorithmic Humanitarianism’, presented at the symposium Reimagining Digital Humanitarianism, Goldsmiths, University of London, February 16, 2018.

[39] As proven by the so-called Universal Approximation Theorem.

[40] Dominique Cardon, Jean-Philippe Cointet and Antoine Mazières, ‘Neurons Spike Back. The Invention of Inductive Machines and the Artificial Intelligence Controversy’, Rseaux 2018/5 (n. 211), pp. 173-220.

[41] William Gibson,. Neuromancer. New York: Ace Books, 1984, 69.

[42] Samira Samadi, Uthaipon Tantipongpipat, Jamie H. Morgenstern, Mohit Singh, and Santosh Vempala, ‘The Price of Fair PCA: One Extra Dimension.’ Advances in Neural Information Processing Systems 31 (2018).

[43] Os Keyes, ‘The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition.’ Proceedings of the ACM on Human-Computer Interaction, vol. 2, n CSCW, article 88 (November 2018). https://doi.org/10.1145/3274357

[44] Alexander Mordvintsev, Christopher Olah and Mike Tyka, (2015). ‘Inceptionism: Going Deeper into Neural Networks’. Google Research blog, 17 June 2015.ages via algorithmic pareidolia.ed into a generator, oint is how to organise the transition. https://ai.googleblog.com/2015/06/ inceptionism-going-deeper-into-neural.html

[45] Deep fakes are synthetic media in which a person in a video is replaced with someone else’s facial features, often for the purpose to forge fake news.

[46] Joseph Paul Cohen, Margaux Luck, and Sina Honari. ‘Distribution matching losses can hallucinate features in medical image translation.’ International conference on medical image computing and computer-assisted intervention. Springer, Cham, 2018.

[47] Fabian Offert, talk at Transmediale festival, Neural Network Cultures panel, 1 February 2020.

[48] Michel Foucault, Abnormal: Lectures at the Collège de France 1974-1975. New York: Picador, 2004, 26.

[49] See: Matteo Pasquinelli, ‘Arcana Mathematica Imperii: The Evolution of Western Computational Norms’, in Maria Hlavajova et al. (eds.), Former West, Cambridge, MA, MIT Press, 2017, pp. 281-293.

[50] Ricaurte, ‘Data Epistemologies.’

[51] David Ingold and Spencer Soper, ‘Amazon Doesn’t Consider the Race of Its Customers. Should It?’, Bloomberg, April 21, 2016.

[52] On apophenia see also Matteo Pasquinelli, ‘Anomaly Detection: The Mathematization of the Abnormal in the Metadata Society’, paper presented at transmediale, 2015. Available at: https://www.academia.edu/10369819

[53] See: Dan McQuillan, ‘People’s Councils for Ethical Machine Learning’, Social Media and Society, 4 (2), 2018, 3.

[54] Even pioneer of Bayesian networks Judea Pearl considers machine learning obsessed with ‘curve fitting’, namely recording correlations without providing explanations. See: Judea Pearl and Dana Mackenzie. The Book of Why: The New Science of Cause and Effect. New York: Basic Books, 2018.

[55] Paola Ricaurte, ‘Data Epistemologies, The Coloniality of Power, and Resistance.’ Television & New Media, 7 March 2019.

[56] Felix Guattari, Schizoanalytic Cartographies, London: Continuum, 2013, 2.

[57] Adam Harvey, https://ahprojects.com/hyperface/

[58] Anish Athalye et al. ‘Synthesizing Robust Adversarial Examples.’ arXiv:1707.07397 (2017).

[59] Nir Morgulis et al. ‘Fooling a Real Car with Adversarial Traffic Signs.’ arXiv:1907.00374 (2019).

[60] Ian Goodfellow, Jonathon Shlens and Christian Szegedy. ‘Explaining and Harnessing Adversarial Examples’, arXiv 1412.6572 (2014).

[61] Florian A. Schmidt, ‘Crowdsourced Production of AI Training Data: How Human Workers Teach Self-Driving Cars to See’, Düsseldorf: Hans-Böckler-Stiftung, 2019.

[62] Hamid Ekbia and Bonnie Nardi, Heteromation, and Other Stories of Computing and Capitalism. Cambridge, MA: MIT Press, 2017.

[63] For the idea of analytical intelligence see: Lorraine Daston, ‘Calculation and the Division of Labour 1750–1950’, Bulletin of the German Historical Institute 62 (Spring 2018), 9–30.

[64] Simon Schaffer, ‘Babbage’s Intelligence: Calculating Engines and the Factory System’, Critical Inquiry 21 (1994), 203–227. Lorraine Daston, ‘Enlightenment calculations’. Critical Inquiry 21 (1994), 182-202. Matthew L. Jones, Reckoning with Matter: Calculating Machines, Innovation, and Thinking about Thinking from Pascal to Babbage. Chicago: University of Chicago Press, 2016.

[65] Matteo Pasquinelli, ‘On the Origins of Marx’s General Intellect’. Radical Philosophy, 2.06, winter 2019.

[66] Maxine Berg, The Machinery Question and the Making of Political Economy. Cambridge: Cambridge University Press, 1980. In fact, even the Economist has recently warned about ‘The return of the machinery question’ in the age of AI. See: Tom Standage, ‘The return of the machinery question.’ The Economist, 23 June 2016.