If you were crafting a transmission to another civilization — and we recently discussed Alexander Zaitsev’s multiple messages of this kind — how would you put it together? I’m not speaking of what you might want to tell ETI about humanity, but rather how you could make the message decipherable. In the second of three essays on SETI subjects, Brian McConnell now looks at enclosing computer algorithms within the message, and the implications for comprehension. What kind of information could algorithms contain vs. static messages? Could a transmission contain programs sufficiently complex as to create a form of consciousness if activated by the receiver’s technnologies? Brian is a communication systems engineer and expert in translation technology. His book The Alien Communication Handbook (Springer, 2021) is now available via Amazon, Springer and other booksellers.

by Brian S McConnell

In most depictions of SETI detection scenarios, the alien transmission is a static message, like the images on the Voyager Golden Record. But what if the message itself is composed of computer programs? What modes of communication might be possible? Why might an ETI prefer to include programs and how could they do so?

As we discussed in Communicating With Aliens : Observables Versus Qualia, an interstellar communication link is essentially an extreme version of a wireless network, one with the following characteristics:

  • Extreme latency due to the speed of light (eight years for round trip communication with the nearest solar system), and in the case of an inscribed matter probe, there may be no way to contact the sender (infinite latency).
  • Prolonged disruptions to line of sight communication (due to the source not always being in view of SETI facilities as the Earth rotates).
  • Duty cycle mismatch (it is extremely unlikely that the recipient will detect the transmission at its start and read it entirely in one pass).

Because of these factors, communication will work much better if the transmission is segmented so that parcels received out of order can be reassembled by the receiver, and so that those segments are encoded to enable the recipient to detect and correct errors without having to contact the transmitter and wait years for a response. This is known as forward error correction and is used throughout computing (to catch and fix disc read errors) and communication (to correct corrupted data from a noisy circuit).

While there are simple error correction methods, such as the N Modular Redundancy or majority vote code, these are not very robust and dramatically reduce the link’s information carrying capacity. There exist very robust error correction methods, such as the Reed Solomon coding used for storage media and space communication. These methods can correct for prolonged errors and dropouts, and the error correction codes can be tuned to compensate for an arbitrary amount of data loss.

In addition to being unreliable, the communication link’s information carrying capacity will likely be limited compared to the amount of information the transmitter may wish to send. Because of this, it will be desirable to compress data, using lossless compression algorithms, and possibly lossy compression algorithms (similar to the way JPEG and MPEG encoders work). Astute readers will notice a built-in conflict here. Data that is compressed and encoded for error correction will look like a series of random numbers to the receiver. Without knowledge about how the encoding and compression algorithms work, something that would be near impossible to guess, the receiver will be unable to recover the original unencoded data.

The iconic Blue Marble photo taken by the Apollo 17 astronauts. Credit: NASA.

The value of image compression can be clearly shown by comparing the file size for this image in several different encodings. The source image is 3000×3002 pixels. The raw uncompressed image, with three color channels with 8 bits per pixel per color channel, is 27 megabytes (216 megabits). If we apply a lossless compression algorithm, such as the PNG encoding, this is reduced to 12.9 megabytes (103 megabits), a 2.1:1 reduction. Applying a lossy compression algorithm, this is further reduced to 1.1 megabytes (8.8 megabits) for JPEG with quality set to 80, and 0.408 megabytes (3.2 megabits) for JPEG with quality set to 25, which results in a 66:1 Reduction.

Lossy compression algorithms enable impressive reductions in the amount of information needed to reconstruct an image, audio signal, or motion picture sequence, at the cost of some loss of information. If the sender is willing to tolerate some loss of detail, lossy compression will enable them to pack well over an order of magnitude more content into the same data channel. This isn’t to say they will use the same compression algorithms we do, although the underlying principles may be similar. They can also interleave compressed images, which will look like random noise to a naive viewer, with occasional uncompressed images, which will stand out, as we showed in Communicating with Aliens : Observables Versus Qualia.

So why not send programs that implement error correction and decompression algorithms? How could the sender teach us to recognize an alien programming language to implement them?

A programming language requires a small set of math and logic symbols, and is essentially a special case of a mathematical language. Let’s look at what we would need to define an interpreted language, call it ET BASIC if you like. An interpreted language is abstract, and is not tied to a specific type of hardware. Many of the most popular languages in use today, such as Python, are interpreted languages.

We’ll need the following symbols:

  • Delimiter symbols (something akin to open and close parentheses, to allow for the creation of nested or n-dimensional data structures)
  • Basic math operations (addition, subtraction, multiplication, division, modulo/remainder)
  • Comparison operations (is equal, is not equal, is greater than, is less than)
  • Branching operations (if condition A is true, do this, otherwise do that)
  • Read/write operations (to read or write data to/from virtual memory, aka variables, which can also be used to create input/output interfaces for the user to interact with)
  • A mechanism to define reusable functions

Each of these symbols can be taught using a “solve for x” pattern within a plaintext primer that can be interleaved with other parts of the transmission. Let’s look at an example.

1 ? 1 = 2
1 ? 2 = 3
2 ? 1 = 3
2 ? 2 = 4
1 ? 3 = 4
3 ? 1 = 4
4 ? 0 = 4
0 ? 4 = 4

We can see right away that the unknown symbol refers to addition. Similar patterns can be used to define symbols for the rest of the basic operations needed to create an extensible language.

The last of the building blocks, a mechanism to define reusable functions, is especially useful. The sine function, for example, is used in a wide variety of calculations, and can be approximated via basic math operations using the Taylor series shown below:

And in expanded form as:

This can be written in Python as:

The sine() function we just defined can later be reused without repeating the lower level instructions used to calculate the sine of an angle. Notice that the series of calculations used reduce down to basic math and branching operations. In fact any program you use, whether it is a simple tic-tac-toe game or a complex simulation, reduces down to a small lexicon of fundamental operations. This is one of the most useful aspects of computer programs. Once you know the basic operations, you can build an interpreter that can run programs that are arbitrarily complex, just as you can run a JPEG viewer without knowing a thing about how lossy image compression works.

In the same way, the transmitter could define an “unpack” function that accepts a block of encoded data from the transmission as input, and produces error corrected, decompressed data as output. This is similar to what low level functions do to read data off a storage device.

Lossless compression will significantly increase the information carrying capacity of the channel, and also allow for raw, unencoded data to be very verbose and repetitive to facilitate compression. Lossy compression algorithms can be applied to some media types to achieve order of magnitude improvements, with the caveat that some information is lost during encoding. Meanwhile, deinterleaving and forward error correction algorithms can ensure that most information is received intact, or at least that damaged segments can be detected and flagged. The technical and economic arguments for including programs in a transmission are so strong, it would be surprising if at least part of a transmission were not algorithmic in nature.

There are many ways a programming language can be defined. I chose to use a Python based example as it is easy for us to read. Perhaps the sender will be similarly inclined to define the language at a higher level like this, and will assume the receiver can work out how to implement each operation in their hardware. On the other hand, they might describe a computing system at a lower level, for example by defining operations in terms of logic gates, which would enable them to precisely define how basic operations will be carried out.

Besides their practical utility in building a reliable communication link, programs open up whole other realms of communication with the receiver. Most importantly, they can interact with the user in real-time, thereby mooting the issue of delays due to the speed of light. Even compact and relatively simple programs can explain a lot.

Let’s imagine that ET wants to describe the dynamics of their solar system. An easy way to do this is with a numerical simulation. This type of program simulates the gravitational interactionsof N number of objects by summing up gravitational forces acting on each object and steps forward an increment of time to forecast where they will be, and then repeats this process ad infinitum. The program itself might only be a few kilobytes or tens of kilobytes in length since it just repeats a simple set of calculations many times. Additional information is required to initialize the simulation, probably on the order of about 50 bytes or 400 bits per object, enough to encode position and velocity in three dimensions at 64 bit accuracy. Simulating the orbits of the 1,000 most significant objects in the solar system would require less than 100 kilobytes for the program and its starting conditions. Not bad.

This is just scratching the surface of what can be done with programs. Their degree of sophistication is really only limited by the creativity of the sender, who we can probably assume has a lot more experience with computing than we do. We are just now exploring new approaches to machine learning, and have already succeeded at creating narrow AIs that exceed human capabilities in specialized tasks. We don’t know yet if generally intelligent systems are possible to build, but an advanced civilization that has had eons to explore this might have hit on ways to build AIs that are better and more computationally efficient than our state of the art. If that’s the case, it’s possible the transmission itself may be a form of Intelligence.

How would we go about parsing this type of information, and who would be involved? Unlike the signal detection effort, which is the province of a small number of astronomers and subject experts, the process of analyzing and comprehending the contents of the transmission will be open to anyone with an Internet connection and a hypothesis to test. One of the interesting things about programming languages is that many of the most popular languages were created by sole contributors, like Guido van Rossum, the creator of Python, or by small teams working within larger companies. The implication being that the most important contributions may come from people and small teams who are not involved in SETI at all.

For an example of a fully worked out system, Paul Fitzpatrick, then with the MIT CSAIL lab, created Cosmic OS, which details the ideas explored in this article and more. With Cosmic OS, he builds a Turing complete programming language that is based on just four basic symbols: 0 and 1, plus the equivalent of open and close parentheses.

There are risks and ethical considerations to ponder as well. In terms of risk, we may be able to run programs but not understand their inner workings or purpose. Already this is a problem with narrow AIs we have built. They learn from sets of examples instead of scripted instructions. Because of this they behave like black boxes. This poses a problem because an outside observer has no way of predicting how the AI will respond to different scenarios (one reason I don’t trust the autopilot on my Tesla car). In the case of a generally intelligent AI of extraterrestrial provenance, it goes without saying that we should be cautious in where we allow it to run.

There are ethical considerations as well. Suppose the transmission includes generally intelligent programs? Should they be considered a form of artificial life or consciousness? How would we know for sure? Should terminating their operation be considered the equivalent of murder, or something else? This idea may seem far fetched, but it is worthwhile to think about issues like this before a detection event.