If we ever do receive a targeted message from another star – as opposed to picking up, say, leakage radiation – will we be able to decipher it? We can’t know in advance, but it’s a reasonable assumption that any civilization wanting to communicate will have strategies in place to ease the process. In today’s essay, Brian McConnell begins a discussion on SETI and interstellar messaging that will continue in coming weeks. The limits of our understanding are emphasized by the problem of qualia; in other words, how do different species express inner experience? But we begin with studies of other Earth species before moving on to data types and possible observables. A communication systems engineer and expert in translation technology, Brian is the author of The Alien Communication Handbook — So We Received A Signal, Now What?, recently published by Springer Nature under their Astronomer’s Bookshelf imprint, and available through Amazon, Springer and other booksellers.
by Brian McConnell
What do our attempts to understand animal communication have to say about our future efforts to understand an alien transmission or information-bearing artifact, should we discover one? We have long sought to communicate with “aliens” here on Earth. The process of deciphering animal communication has many similarities with the process of analyzing and comprehending an ET transmission, as well as important differences. Let’s look at the example of audio communication among animals, as this is analogous to a modulated electromagnetic transmission.
The general methodology used is to record as many samples of communication and behavior as possible. This is one of the chief difficulties in animal communication research, as the process of collecting recordings is quite labor intensive, and in the case of animals that roam over large territories it may be impossible to observe them in much of their environment. Animals that have a small territory where they can be observed continuously are ideal.
Once these observations are collected, the next step is to understand the basic elements of communication, similar to phonemes in human speech or the letters in an alphabet. This is a challenging process as many animals communicate using sounds outside the range of human hearing, and employ sounds that are very different from human speech. This typically involves studying time versus frequency plots of audio recordings, to understand the structure of different utterances, which is also very labor intensive. This is one area where AI or deep learning can help greatly, as AI systems can be designed to automate this step, though they require a large sample corpus to be effective.
Time vs frequency plot of duck calls (click to enlarge).Credit: Brian McConnell.
The next step, once the basic units of communication are known, is to use statistical methods to understand how frequently they are used in conjunction with each other, and how they are grouped together. Zipf’s Law is an example of one method that can be used to understand the sophistication of a communication system. In human communication, we observe that the probability of a word being used is inversely proportional to its overall rank.
A log-log plot of the frequency of word use (y axis) versus word rank (x axis) from the text of Mary Shelley’s Frankenstein. Notice that the relationship is almost exactly 1/x. Image credit: Brian McConnell, The Alien Communication Handbook.
Conditional probability is another target for study. This refers to the probability that a particular symbol or utterance will follow another. In English, for example, letters are not used with equal frequency, and some pairs or triplets of letters are encountered much more often than others. Even without knowing what an utterance or group of utterances means, it is possible to understand which are used most often, and are likely most important. It is also possible to quantify the sophistication of the communication system using methods like this.
A graph of the relative frequency of use of bigrams (2 letter combinations) in English text (click to enlarge). You can see right away that some bigrams are used extensively while others very rarely occur.. Credit: Peter Norvig.
With this information in hand, it is now possible to start mapping utterances or groups of utterances to meanings. The best example of this to date is Con Slobodchikoff ’s work with prairie dogs. They turned out to be an ideal subject of study as they live in colonies, known as towns, and as such could be observed for extended periods of time in controlled experiments. Con and his team observed how their calls differed as various predators approached the town, and used a solve for x pattern to work out which utterances had unique meanings.
Using this approach, in combination with audio analysis, Con and his team worked out that prairie dogs had unique “words” for humans, coyotes and dogs, as well as modifiers (adjectives) such as short, tall, fat, thin, square shaped, oval shaped and carrying a gun. They did this by monitoring how their chirps varied as different predators approached, or as team members walked through with different color shirts, etc. They also found that the vocabulary of calls varied in different towns, which suggested that the communication was not purely instinctual but had learned components (cultural transmission). While nobody would argue that prairie dogs communicate at a human level, their communication does appear to pass many of the tests for language.
The challenge in understanding communication is that unless you can observe the communication and a direct response to something, it is very difficult to work out its meaning. One would presume that if prairie dogs communicate about predators, they communicate about other less obvious aspects of their environment that are more challenging to observe in controlled experiments. The problem is that this is akin to listening to a telephone conversation and trying to work out what is being said only by watching how one party responds.
Research with other species has been even more limited, mostly because of the twin difficulties of capturing a large corpus of recordings, along with direct observations of behavior. Marine mammals are a case in point. While statistical analysis of whale and dolphin communication suggests a high degree of sophistication, we have not yet succeeded in mapping their calls to specific meanings. This should improve with greater automation and AI based analysis. Indeed, Project CETI (Cetacean Translation Initiative) aims to use this approach to record a large corpus of whale codas and then apply machine learning techniques to better understand them.
That our success in understanding animal communication has been so limited may portend that we will have great difficulty in understanding an ET transmission, at least the parts that are akin to natural communication.
The success of our own communication relies upon the fact that we all have similar bodies and experiences around which we can build a shared vocabulary. We can’t assume that an intelligent alien species will have similar modes of perception or thought, and if they are AI based, they will be truly alien.
On the other hand, a species that is capable of designing interstellar communication links will also need to understand information theory and communication systems. An interstellar communication link is essentially an extreme case of a wireless network. If the transmission is intended for us, and they are attempting to communicate or share information, they will be able to design the transmission to facilitate comprehension. That intent is key. This is where the analogy to animal communication breaks down.
An important aspect of a well designed digital communication system is that it can interleave many different types of data or media types. Photographs are an example of one media type we may be likely to encounter. A civilization that is capable of interstellar communication will, by definition, be astronomically literate. Astronomy itself is heavily dependent on photography. This isn’t to say that vision will be their primary sense or mode of communication, just that in order to be successful at astronomy, they will need to understand photography. One can imagine a species whose primary sense is via echolocation, but has learned to translate images into a format they can understand, much as we have developed ultrasound technology to translate sound into images.
Digitized images are almost trivially easy to decode, as an image can be represented as an array of numbers. One need only guess the number of bits used per pixel, the least to most significant bit order, and one dimension of the array to successfully decode an image. If there are multiple color channels, there are a few additional parameters, but even then the parameter space is very small, and it will be possible to extract images if they are there. There are some additional encoding patterns to look for, such as bitplanes, which I discuss in more detail in the book, but even then the number of combinations to cycle through remains small.
The sender can help us out even further by including images of astronomical objects, such as planets, stars and distant nebulae. The latter are especially interesting because they can be observed by both parties, and can be used to guide the receiver in fine calibrations, such as the color channels used, scaling factors (e.g. gamma correction), etc. Meanwhile, images of planets are easy to spot, even in a raw bitstream, as they usually consist of a roundish object against a mostly black background.
An example of a raw bitstream that includes an image of a planet amid what appears to be random or efficiently encoded data. All the viewer needs to do to extract the image is to work out one dimension of the array along with the number of bits per pixel. The degree to which a circular object is stretched into an ellipse also hints at the number of bits per pixel. Credit: Brian McConnell, The Alien Communication Handbook.
What is particularly interesting about images is that once you have worked out the basic encoding schemes in use, you can decode any image that uses that encoding scheme. Images can represent scenes ranging from microscopic to cosmic scales. The sender could include images of anything, from important landmarks or sites to abstract representations of scenes (a.k.a. art). Astute readers will notice that these are uncompressed images, and that the sender may wish to employ various compression schemes to maximize the information carrying capacity of the communication channel. Compressed images will be much harder to recognize, but even if a relatively small fraction of images are uncompressed, they will stand out against what appears to be random digits, as in the example bitstream above.
The sender can take this a step further by linking observables (images, audio samples) with numeric symbols to create a semantic network. You can think of a semantic network like an Internet of ideas, where each unique idea has a numeric address. What’s more, the address space (the maximum number of ideas that can be represented) can be extremely large. For example, a 64 bit address space has almost 2 x 1019 unique addresses.
An example of a semantic network representing the relationship between different animals and their environment (click to enlarge). The network is shown in English for readability but the nodes and the operators that connect them could just as easily be based on a numeric address space.
The network doesn’t need to be especially sophisticated to enable the receiver to understand the relationships between symbols. In fact, the sender can employ a simple way of saying “This image contains the following things / symbols” by labeling them with one or more binary codes within the images themselves.
An example of an image that is labeled with four numeric codes representing properties within the image. Credit: Brian McConnell, The Alien Communication Handbook.
Observables Versus Qualia
While this pattern can be used to build up a large vocabulary of symbols that can be linked to observables (images, audio samples, and image sequences), it will be difficult to describe qualia (internal experiences). How would you describe the concept of sweetness to someone who can’t experience a sweet taste? You could try linking the concept to a diagram of a sugar molecule, but would the receiver make the connection between sugar and sweetness? Emotional states such as fear and hunger may be similarly difficult to convey. How would you describe the concept of ennui?
Imagine an alien species whose nervous system is more decentralized like an octopus. They might have a whole vocabulary around the concept of “brain lock”, where different sub brains can’t reach agreement on something. Where would we even start with understanding concepts like this? It’s likely that while we might be successful in understanding descriptions of physical objects and processes, and that’s not nothing, we may be flummoxed in understanding descriptions of internal experiences and thoughts. This is something we take for granted in human language, primarily because even with differences in language, we all share similar bodies and experiences around which we build our languages.
Yet all hope is not lost. Semantic networks allow a receiver to understand how unknown symbols are related to each other, even if they don’t understand their meaning directly. Let’s consider an example where the sender is defining a set of symbol codes we have no direct understanding of, but we have previously figured out the meaning of symbol codes that define set membership (?), greater/lesser in degree (<>), and oppositeness (?) .
Even without knowing the meaning of these new symbol codes, the receiver can see how they are related and can build a graph of this network. This graph in turn can guide the receiver in learning unknown symbols. If a symbol is linked to many others in the network, there may be multiple paths toward working out its meaning in relation to symbols that have been learned previously. Even if these symbols remain unknown, the receiver has a way of knowing what they don’t know, and can map their progress in understanding.
The implication for a SETI detection is that we may find it is both easier and more difficult to understand what they are communicating than one may expect. Objects or processes that can be depicted numerically via images, audio or image sequences may enable the formation of a rich vocabulary around them and with relative ease, while communication around internal experiences, culture, etc may remain partially understood at best.
Even partial comprehension based on observables will be a significant achievement, as it will enable the communication of a wide range of subjects. And as can be shown, this can be done with static representations. An even more interesting scenario is if the transmission includes algorithms, functions from computer programs. Then it will be possible for the receiver to interact with them in real time, which enables a whole other realm of possibilities for communication.
More on that in the next article…