A New Look at Asteroid Distribution

We know that understanding Near-Earth Objects is vital not only for assessing future asteroid surveys and spacecraft missions, but also for tracking potential impactors on Earth. Projects like the Catalina Sky Survey and its now defunct southern hemisphere counterpart, the Siding Spring Survey, are all about asteroid and comet discovery, with a more specific goal of looking for objects posing a potential hazard to our planet. We lost the Siding Spring effort in 2013 due to funding problems, but the Catalina Sky Survey (CSS) is still in robust operation.

The survey draws data from a 1.5 meter telescope on the peak of Mt. Lemmon (Arizona) and a 68 centimeter instrument nearby at Mt. Bigelow. Now we have word that Mikael Granvik (University of Helsinki) and an international team of researchers have drawn on about 100,000 images acquired by the Catalina Sky Survey to study the properties of some 9000 NEOs detected in an eight-year period. The goal is to construct a model for the population of NEOs, giving us better insights into their origins and subsequent trajectories.

Most Near-Earth Objects are thought to originate in the main asteroid belt between the orbits of Mars and Jupiter. An individual asteroid’s orbit can vary over time as heat from the Sun is released from its uneven surface, an effect named after the Russian engineer Ivan Osipovich Yarkovsky, who noted that even this tiny force could have cumulative effects on an asteroid’s orbit. The Estonian astronomer Ernst J. Öpik would subsequently bring the effect to the attention of the larger astronomical community after its initial publication by Yarkovsky.

Asteroid orbits can eventually be affected by the gas giants Jupiter and Saturn, changing their trajectory to push them into the inner system. From this we get our population of NEOs, so classified, according to this University of Hawaii at Manoa news release, when their smallest distance from the Sun during an orbit is less than 1.3 times the average Earth-Sun distance.

disintergratingAsteroid_1576x927

Image: Artist’s impression. An asteroid’s orbit is altered as it passes close to Jupiter, Earth or Venus, such that its new orbit takes it near the Sun. The intense heat from the Sun causes the asteroid’s surface to expand and fracture, and some of the material breaks off. As the surface material disintegrates, it creates dust and pebbles that spread out along the asteroid’s orbit with time. If the orbit of the dust and pebbles ever intersects Earth, it can create a meteor shower. Credit: Karen Teramura, UH IfA.

Most NEOs have been thought to eventually end their lives by plunging into the Sun, but now we learn that they may not make it that far. Using calculations developed by Robert Jedicke (University of Hawaii), the Near-Earth Object population modeling project was able to compute the probabilities that asteroids on different orbits would have been detected by the Catalina Sky Survey. Using CSS data and theoretical orbit distributions of NEOs originating in different parts of the main belt, they developed a more detailed model of the NEO population than any previously available.

The model, however, predicted almost ten times more objects on orbits that approach the Sun within ten Solar diameters than have been observed. Eliminating these asteroids could be accomplished by assuming that a large number of NEOs are destroyed as they move closer to the Sun. The asteroids do not fall into the Sun but are broken up as they approach. Darker asteroids are destroyed further from the Sun than brighter ones, implying a different internal structure and composition. Granvik sees this latter as the most significant result of the research:

“Perhaps the most intriguing outcome of this study is that it is now possible to test models of asteroid interiors simply by keeping track of their orbits and sizes. This is truly remarkable and was completely unexpected when we first started constructing the new NEO model.”

The work also helps to explain meteor streams that should follow in the path of the asteroids or comets from which they are dislodged, and yet most meteor streams have not been matched with known objects. The Granvik study concludes that the parent bodies of these meteor streams were destroyed when they approached too close to the Sun.

The paper is Granvik et al., “Super-catastrophic Disruption of Asteroids at Small Perihelion Distances,” Nature 530 (18 February 2016), 303-306 (abstract).

tzf_img_post

Alien Life or Chemistry? A New Approach

Working in the field has its limitations, as Alex Tolley reminds us in the essay that follows, but at least biologists have historically been on the same planet with their specimens. Today’s hottest news would be the discovery of life on another world, as we saw in the brief flurries over the Viking results in 1976 or the Martian meteorite ALH84001. We rely, of course, on remote testing and will increasingly count on computer routines that can make the fine distinctions needed to choose between biotic and abiotic reactions. A new technique recently put forward by Robert Hazen and James Cleaves holds great promise. Alex gives it a thorough examination including running tests of his own to point to the validity of the approach. One day using such methods on Mars or an ice giant moon may confirm that abiogenesis is not restricted to Earth, a finding that would have huge ramifications not just for our science but also our philosophy.

by Alex Tolley


Perseverance rover on Mars – composite image.

Cast your mind back to Darwin’s distant 5-year voyage on HMS Beagle. He could make very limited observations, make drawings and notes, and preserve his specimen collection for his return home to England.

Fifty years ago, a field biologist might not have much more to work with. Hours from a field station or lab with field guides and kits to preserve specimens, with no way to communicate. As for computers to make repetitive calculations, fuggedaboutit.

Fast forward to the late 20th and early 21st centuries, and fieldwork is extending out to the planets in our solar system to search for life. Like Darwin’s voyage, the missions are distant and long. Unlike Darwin, samples have not yet been returned from any planets, only asteroids and comets. Communication is slow, more on the order of field experiences. But instead of humans, our robot probes are “Going where no one has gone before” and humans may not go until much later. The greater the communication lag, the more problematic the central command to periphery control model. Reducing this delay demands a need for more peripheral autonomy at the periphery to make local decisions.

The 2006 Astrobiology Field Laboratory Science Steering Group report recommended that the Mars rover be a field laboratory, with more autonomy [17]. The current state of the art is the Perseverance rover taking samples in the Jezero crater, a prime site for possible biosignatures. Its biosignature instrument, SHERLOC, uses Raman spectrography and luminescence to detect and identify organic molecules [6]. While organic molecules may have been detected [19], the data had to be transmitted to Earth for interpretation, maintaining the problem of lag times between each sample to be chosen and analyzed.

As our technology improves, can these robots operating on planetary surfaces be able to do more effective in situ analyses in the search for extant or extinct life, so that they can operate more quickly like a human field scientist, in the search for life?

While we “know life when we see it”, nevertheless we still struggle to define what life is, although with terrestrial life we have sufficient characteristics except for edge cases like viruses and some ambiguous early fossil material. However, some defining characteristics do not apply to dead, or fossilized organisms and their traces. Fossil life does not metabolize, reproduce, or move, and molecules that are common to life no longer exist in their original form. Consider the “fossil microbes” on the Martian meteorite ALH84001 that caused such a sensation when announced but proved ambiguous.

Historically, for fossil life, we have relied on detecting biosignatures, such as C13/C12 ratios in minerals (due to chlorophyll carbon isotope preference), long-lasting biomolecules like lipids, homochirality of organic compounds, and disequilibria in atmospheric gases. Biomolecules can be ambiguous, as the amino acids detected in meteorites are most likely abiotic, something the Miller-Urey experiment demonstrated many decades ago.

Ideally, we would like a detection method that is simple, robust, and whose results can be interpreted locally without requiring analysis on Earth.

A new method to try to identify the probably biotic nature of samples with organic material is the subject of a new paper from a collaboration under Prof. Robert Hazen and James Cleaves. The team not only uses an analytical method—pyrolysis gas chromatography coupled to electron impact ionization mass spectrometry (Pyr-GC-EI-MS) to heat (pyrolyze), fractionate volatile components (gas chromatography), and determine their mass (mass spectrometry), but also analyzes the data to classify whether the new samples contain organic material of biological origin. Their reported early results are very encouraging [10, 11, 12].

The elegance of Hazen et al’s work has been to apply the Pyr-GC-EI-MS technique [3, 15, 18] that is not only available in the laboratory, but is also designed for planetary rovers to meet the need for local analysis. Their innovation has been to couple this process with computationally lightweight machine learning models to classify the samples, thereby bypassing the time lags associated with distant terrestrial interpretation. A rover could relatively rapidly take samples in an area and determine whether any might have a biosignature based on a suite of different detected compounds and make decisions locally on how to proceed.

The resulting data of masses and extraction time can be reduced and then classified using the pre-trained Random Forest [4], which is a suite of Decision Trees (see Figure 3) using samples of the feature set of masses, to provide a classification, which with the currently tested samples, provides a better than 90% probability of correct classification. The reported experiment used 134 samples, 75 labeled as abiotic and 59 as biotic or of biotic origin. The data set ranged in mass from 50 to 700 and several thousand scans over time. This data was reduced to a manageable size by reducing the mass and time ranges to 8147 values. The samples were then run against several machine learning methods, of which the Random Forest worked best.

To provide a visualization of which mass and time values were most instrumental in classifying the data, the 20 most informative data points were extracted and overlaid on the MS data as shown in Figure 1.

Figure 1 above shows 2 samples of data. One is the Allende meteorite which is classified as a 4.5 billion years old abiotic sample. This is contrasted with one of the microbial samples. While the details of the meteorite sample were not provided, older analyses by others indicated that the surface was contaminated with terrestrial material, whilst the interior matrix included polycyclic aromatic hydrocarbons, a common material found in samples from space missions [7,8]. The bacterial sample, as expected, shows many more compounds after pyrolysis, as the organism is composed of a large variety of organic compounds including amino acids, nucleobases, lipids, and sugars which will decompose with heating. A key point is that the discriminant features are not coincident with the most common masses in the samples, but rather in the rarer compounds as indicated by their intensities. [The lower bound mass bin ensures that common pyrolysis products such as low carbon number compounds will be excluded from the analysis and visualization. The data is normalized to the highest peak so that relative values rather than absolutes are analyzed to eliminate sample amounts.] Most of the defining compounds are in the 140 – 200 mass range, which would imply all-carbon compounds with 12-16 atoms.

Figure 2 shows a 2-dimensional Principal Components Analysis (PCA) using the 20 most informative features that illustrate the separation of the sample types. The expanded box encompasses all the abiotic samples.

I note that even though the biotic and natural samples were given the same classification label, these samples separate quite clearly too, indicating that the natural samples appear almost distinct from the biotic samples. What is perhaps surprising is that biological materials like cedarwood (oils derived from the bark) cluster with the abiotic samples, and even cyanobacteria seem similar in this view. Notice that the dried oak leaf, clearly a degraded living material, is remarkably similar to a cysteine (amino acid) and glucose Maillard reaction (used in the searing of foods to create flavors). A number of the natural materials that were classified as of biological origin or containing material of biological origin, also cluster closely with the abiotic samples, such as Quisqueite and Asphaltum. The peat sample (labeled natural) is placed between the bulk of both biological and natural samples.

Why should this technique work to classify samples according to the type labels? It has been argued that living things are both complex, but composed of molecules that occupy a relatively small space of possible diversity. [Work by Lee Cronin’s group has looked at the way biology restricts the possible structures of organic molecules to create complex macromolecules using few subunits. For example, the amino acid glycine is both important as a constituent of proteins, forming much of the structure of collagen, and is central to several biosynthesis pathways, that include the synthesis of porphyrins and thence to heme in red blood corpuscles. Some macromolecules such as cellulose are formed entirely of D-glucose, as are most complex sugar macromolecules. Cronin calls his technique Assembly Theory [1].]

But larger molecules constructed of a small number of simpler molecules alone are insufficient. Cellulose is a polymer of D-glucose molecules, but clearly, we would not state that a sheet of wet paper was once living, or formed by natural processes. A minimal complexity is required. Life relies on a suite of molecules connected by metabolic pathways that exquisitely restrict the possible number of resulting molecules, however complex, such as proteins that are constructed from just 20 of the possible much greater number of amino acids. At the heart of all life is the Krebs cycle which autotrophs use in the reverse direction to oxidation as part of carbon fixation to build biomass, often glucose to build cellulose cell walls.

The Pyr-GC-EI-MS technique detects a wide range of organic molecules, but the machine learning algorithm uses a set of specific ones to detect the requisite complexity as well as the abiotic randomness. In other words, this is complementary to Cronin’s “Assembly Theory” of life.

I would note that the PCA uses just 20 variables to separate the abiotic and biotic/natural samples. This appears adequate in the majority of the sample set but may be fewer than the variables used in the Random Forest machine learning algorithm. [A single Decision Tree using my reduced data uses just 12 rules – (masses and normalized frequency), but the accuracy is far lower. The Random Forest using different rules (masses and quantities, would be expected to use more features.]

How robust is this analysis?

The laboratory instrument generates a large amount of data for each sample, over 650 mass readings repeated over 6000 times over the scan time. The data was reduced for testing which in this case was 8149 values. There were 134 samples, 59 were classed as biotic or natural, and 75 were abiotic samples. A Random Forest (a suite of Decision Trees) algorithm proved the best method to classify the samples. This resulted in a 90+% correct classification of the sample types. The PCA visualization in Figure 2 is instructive as it shows how the samples were likely classified by the Random Forest model, and which samples were likely misclassified. The PCA used just 20 of the highest-scoring variables to separate the 2 classes of samples.

Generally, the Pyr-GC-EI-MS technique is considered robust with respect to masses extracted from different samples of the same material. The authors included replicates in the samples which should, ideally, be classified together in the same leaf in each Decision Tree in the Random Forest. That this is the case in this experiment is hinted by the few labels that point to 2 samples that are close together in the PCA shown in Figure 2, e.g. the cysteine-glucose Maillard reaction. That replicates are very similar is important as it indicates that the sample processing technique reliably produces the same output and therefore single samples are producing reliable mass and time signals with low noise. [In my experiment (see Appendix A) where K-means clustering was used, in most cases, the replicate pairs were collected together in the same cluster indicating that no special data treatment was needed to keep the replicates together.]

The pyrolysis of the samples transforms many of the compounds, often with more species than the original. For example, cellulose composed purely of D-Glucose will pyrolyze into several different compounds [18]. The assumption is that pyrolysis will preserve the differences between the biotic and abiotic samples, especially for material that has already undergone heating, such as coal. As the pyrolysis products in the mass range of 50 to 200 may no longer be the same as the original compounds, this technique can be applied to any sample containing organic material.

The robustness of the machine learning approach can be assessed by the distribution of the accuracy of the individual runs of the Random Forest. This is not indicated in the article. However, the high accuracy rate reported does suggest that the technique will report this level of accuracy consistently. What is not known is whether this existing trained model would continue to classify new samples accurately. This will also indicate the likely boundary conditions where this model works and whether retraining will be needed after the sample set is increased. This will be particularly important when assessing the nature of any confirmed extraterrestrial organic material that is materially different from that recovered from meteorites.

The robustness may be dependent on the labeling to train the Random Forest model. The sample set labels RNA and DNA as abiotic because they were sourced from a laboratory supply, while the lower complexity insect chitin exoskeleton was labeled biotic. But note that the chitin sample is within the abiotic bounding box in Figure 2, as well as the DNA sample.

Detecting life from samples that are fossils, degraded material, or parts of an organism like a skeletal structure, probably requires being able to look for both complexity and material that is composed of fewer, simpler subunits. In extremis, a sample with few organic molecules even after pyrolysis will likely not be complex enough to be identified as biotic (e.g. the meteorite samples), while a large range of organic molecules may be too varied and indicate abiotic production (e.g. Maillard reactions caused by heating). There will be intermediate cases, such as the chitinous exoskeleton of an insect that has relatively low molecular complexity but which the label defines as biotic.

What is important here is that while it might be instructive to know what the feature molecules are, and their likely pre-heated composition, the method does not rely on anything more than the mass and peak appearance time of the signal to classify the material.

Why does the Random Forest algorithm work well, and exceed that of a single Decision Tree or 2-layer Perceptron [a component of neural networks used in binary classification tasks]? A single Decision Tree requires that the set of features have a strong common overlap for all samples in the class. The greater the overlap, the fewer rules are needed. However, a single Decision Tree model is brittle in the face of noise. This is overcome with the Random Forest by using different subsets of the features to build each tree in the forest. With noisy data, this builds robustness as the predicted classification is based on a majority vote. (See Appendix A for a brief discussion on this.)

Is this technique agnostic?

Now let me address the important issue of whether this approach is agnostic to different biologies, as this is the crux of whether the experimental results will detect not just life, but extraterrestrial life. Will this approach address the possibly very different biologies of life evolved from a different biogenesis?

Astrobiology, a subject with no examples, is currently theoretical. There is almost an industry trying to provide tests for alien life. Perhaps the most famous example is the use of the disequilibria of atmospheric gases, proposed by James Lovelock. The idea is that life, especially autotrophs like plants on Earth, will create an imbalance in reactive gases such as oxygen and methane that keeps them apart from their equilibrium. This idea has since been bracketed with constraints and additional gases, but the basic idea remains a principal approach for exoplanets where only atmospheric gas spectra can be measured.

As life is hypothesized to require a complex set of molecules, yet far fewer than a random set of all possible molecules, or as Cronin has suggested, reuse of molecules to reduce the complexity of building large macromolecules, it is possible that there could be fossil life, either terrestrial or extraterrestrial, that has the same apparent complexity, but largely non-overlapping molecules. The Random Forest could therefore build some Decision Trees that could select different sets of molecules to make the same biotic classification, suggesting that this is an agnostic method. However, this has yet to be tested as there are no extraterrestrial biotic samples to test. It may require such samples, if found and characterized as biotic, to be added to a new training set should they not be classified as biotic using the current model.

As this experiment assumes that life is carbon-based, clearly truly exotic life based on other key elements such as silicon would be unlikely, but not impossible, to be detected if volatile non-organic materials in a sample could be classified correctly.

The authors explain what agnostic in their experiment means:

Our Proposed Biosignature is Agnostic. An important finding of this study is that abiotic, living, and taphonomic suites of organic molecules display well-defined clusters in their high-dimensional space, as illustrated in Fig. 2. At the same time, large “volumes” of this attribute space are unpopulated by either abiotic suites or terrestrial life. This topology suggests the possibility that an alien biochemistry might be recognized by forming its own attribute cluster in a different region of Fig. 2—a cluster that reflects the essential role in selection for function in biotic systems, albeit with potentially very different suites of functional molecules. Abiotic systems tend to cluster in a very narrow region of this phase space, which could in principle allow for easy identification of anomalous signals that are dissimilar to abiotic geochemical systems or known terrestrial life.

What they are stating is that their approach will detect the signs of life in both extant organisms and the resulting decay of their remains when fossilized, such as shales and fossil fuels like coal and oil. As the example PCA of Figure 2 shows, the abiotic samples are tightly clustered in a small space compared to the far greater space of the biotic and once-biotic samples. The authors’ Figure 1 shows that their chosen method results in fewer different molecules found in the Allende meteorite compared to a microbe. I note that the dried oak leaf that is also within the abiotic cluster of the PCA visualization is possibly there because the bulk of the material is cellulose. Cellulose is made of chains of polymerized D-glucose, and while the pyrolysis of cellulose is a physical process that creates a wider assortment of organic compounds [18], this still limits the possible pyrolysis products.

This analysis is complementary to Cronin’s Assembly Theory which theorizes a reduced molecular space of life compared to the randomness and greater complexity of purely chemical and physical processes. This is because life constrains its biochemistry to enzyme-mediated reaction pathways. Assembly Theory [1] and other complexity theories of life [15] would be expected to reduce the molecular space compared to the possible arrangements of all the atoms in an organism.

The authors’ method is probably detecting the greater space of molecules from the required complexity of life compared to the simpler samples and reactions that were labeled as abiotic.

For any extraterrestrial “carbon units” that are theorized to follow organizing principles, this method may well detect extraterrestrial life, whether extant or fossilized, from a unique abiogenesis. However, I would be cautious of this claim simply because there were no biotic extraterrestrial samples used, because we have none, only presumed abiotic samples such as the organic material inside meteorites that should not be contaminated with terrestrial life.

The authors suggest that an alien biology using very different biological molecules might form their own discrete cluster and therefore be detectable. In principle, this is true, but I am not sure that the Random Forest machine learning model would detect the attributes of this cluster without training examples to define the rules needed. Any such samples might simply expose any brittleness in the model and either cause an error or be classified as a false positive for either a biotic or abiotic sample. Ideally, as Asimov once stated, the phrase most associated with interesting discoveries “is not ‘Eureka’ but ‘That’s funny . . .’”, might be associated with an anomalous classification. This might be particularly noticeable if the technique indicates that the sample is abiotic, while a direct observation by microscope clearly shows wriggling microbes.

In summary, it is yet to be tested against new, unknown samples to confirm whether it is both robust, and also agnostic, for other carbon-based life.

The advantage of this technique for remote probes

While the instrument data would likely be sent to Earth regardless of local processing and any subsequent rover actions, the trained Random Forest model is computationally very lightweight and easy to run on the data. Inspection of the various Decision Trees in the Random Forest allows an explanation for which features best classify the samples. As the Random Forest is updated by larger sample sets, it is easy to update the model to analyze samples in the lab or on a remote robotic instrument, in contrast to Artificial Neural Network architectures (ANN) that are computationally intensive. Should a sample that looks like it could be alien life but produces an anomalous result (That’s funny…”), the data can be analyzed on Earth and then assigned a classification, and the Random Forest model rerun with the new data either on Earth and the model uploaded, or locally on the probe.

Let me stress again that the instrumentation needed is already available for life-detection missions on robotic probes. The most recent is the Mars Organic Molecule Analyzer (MOMA) [9] which is to be one of the suite of instruments on the Rosalind Franklin rover as part of the delayed ExoMars mission which is now planned for a 2028 launch. MOMA will use both the Pyr-GC-EI-MS sample processing approach, plus a UV laser on the organic material extracted from 2-meter subsurface drill cores to characterize the material. I would speculate that it might make sense to calibrate the sample set with the MOMA instruments to determine if the approach is as robust with this instrument as the lab equipment for this study. The sample set can be increased and run on the MOMA instruments and finalized well before the launch date.

[If the Morningstar Mission to Venus does detect organic material in the temperate Venusian clouds, perhaps in 2025, this type of analysis using instruments flown on a subsequent balloon mission might offer the fastest way to determine if that material is from a life form before any later sample return.]

While this is an exciting, innovative approach to classifying organic molecules and classifying them as biotic or abiotic, it is not the only approach and should be considered complementary. For example, terrestrial fossils may be completely mineralized, with their form indicating origin. A low-complexity fragment of an insect’s exoskeleton would have a form indicative of biotic origin. The dried oak leaf in the experiment that clusters with the abiotic samples would leave an impression in the sediment indicative of life, just as we see occasionally in coal seams. Impressions left by soft-bodied creatures that have completely decayed would not be detectable by this method even though their shape may be obviously from an organism. [Although note that shape alone was insufficient for determining the nature of the “fossils” in the Martian meteorite, ALH84001.]

Earlier, I mentioned that the cellulose of paper represents an example with low complexity compared to an organism. However, if a robot probe detected a fragment of paper buried in a Martian sediment, we would have little hesitation in identifying it as a technosignature. Similarly, a stone structure on Mars might have no organic material in its composition but clearly would be identified as an artifact built by intelligent beings.

Lastly, isotopic composition of elements can be indicative of origin when compared to the planetary background isotopic ratios. If we detected methane (CH4) with isotope ratios indicative of production by subsurface methanogens, that would be an important discovery, one that would be independent of this experimental approach.

Despite my caveats and cautions, local life detection, rather like the attempts with the 1976 Viking landers may be particularly important now that the Mars Sample Return mission costs are ballooning and may result in a cancelation, stymying the return to Earth of the samples Perseverance is collecting [16]. One of the major benefits of training the Apollo astronauts to understand the geology and identify important rock samples was the local decisions made by the astronauts over which rock samples to collect, rather than taking random samples and hoping the selection was informative. A mission to an icy moon would benefit from such local life detection efforts if multiple attempts need to be made in fairly rapid succession without requiring communication delays with Earth for analysis and decision-making and where no sample return to Earth was likely. This innovative technique appears to be an important contribution to the search for extraterrestrial life in our system, and possibly even beyond if our probes capture samples from interstellar objects.

The paper is “Cleaves, J et al, Hazen, R, “A robust, agnostic molecular biosignature based on machine Learning,” PNAS 120 (41) (September 25, 2023) e2307149120. Abstract.

———————————————————————

Appendix A. My experiment with the supplied data. [12]

Method

To test some of the feedback from the authors, I ran some simple machine-learning experiments on the data. Rather than reduce the data to the number of variables in the paper, I used a simple data reduction by collapsing the scan data dimension so that only the single mass values remained. I normalized to the largest mass value in a sample that was set to 100 and all normalized floating point numbers were reduced to integers. All the resulting values of less than 1 were therefore set to 0. I used the classification labels as given. I also shuffled the class labels to test that the information in the data was lost with this operation. I used the Weka ML software package for running Decision Trees, Random Forests, and other ML methods [20].

Results and Discussion

Using the example I ran [figure 3] it is clear that the presence of a molecule[s] of mass 280 is sufficient to classify 14 of the 59 biological samples with no other rules needed, and if that rule fails, passing a rule with the presence of a molecule about ½ the mass of the first rule, adds a further 8 samples correctly classified as biological. However, it takes a further 6 rules to classify another 22 biological samples, and 7 rules to select 48 (1 sample was a false positive) of the 75 abiotic samples. The rules used mostly used larger molecules to determine the classifications because they had the most discriminatory power, as suggested by the number of the larger molecules of the 20 used in the PCA visualization. Of the 12 rules in my experiment, all but 3 used masses of 100 or greater, with 3 rules of 200 or greater. It should be noted that many rules simply needed the presence or absence (less than 1% of the peak frequency) of a molecule. The 2 largest biotic and abiotic leaves each required 7 rules, but about half required some non-zero value. The biotic leaf with 22 samples had just 3 rules with peak values that were present, while the abiotic leaf with 49 classified samples had all 7 rules with no peak value or values below a threshold.

Figure 3. The model for a Decision Tree output for a reduced collapsed set of data. It shows the rule tree of different mass normalized frequencies to classify abiotic [A], and biotic and natural [B], samples as leaves. There are 134 samples, For training, all the samples were used, 75 are classed abiotic, and 59 and biotic/natural. [The few misclassified samples were excluded for simplicity and clarity]. As all samples were used, there was no out-of-sample testing of the model.

The best classifier was the Random Forest, as found by the authors. This far exceeded a single Decision Tree. It even exceeded a 2 layer Perceptron. The Random Forest managed to reach a little more than 80% correct classification, which fell to random with the shuffled data. While the results using the more greatly reduced data were less accurate than those of the paper, this is expected by the data reduction method.

To test whether the data had sufficient information to separate the 2 classes simply by clustering, I ran a K-Means clustering [14] to determine how the data separated.

1. The 2 clusters were each comprised of about 60% of one class. Therefore while the separation was poor, there was some separation using all the data. Shuffling the labels destroyed any information in the samples as it did with the Decision Tree and Random Forest tests.

2. The replicate pairs almost invariably stayed in the same cluster together, confirming the robustness of the data.

3. The natural samples, i.e. those with a biogenic origin, like coal, tended mostly to cluster with the abiogenic samples, rather than the biotic ones.

I would point out that the PCA in Figure 2 was interpreted to mean that abiotic samples clustered tightly together. However, an alternative interpretation is that the abiotic and natural samples separate from the biotic if a separation is drawn diagonally to separate the biotic samples from all the rest.

One labeling question I have was placing the commercially supplied DNA and RNA samples in the abiotic class. If we detected either as [degraded] samples on another world, we would almost certainly claim that we had detected life once the possibility of contamination was ruled out. Switching these labels made very little difference to my Random Forest classification overall, but it did switch more samples to be classified as biotic, in excess of the switch of the 2 samples to biotic labels. It did make a difference for a simpler Decision Tree. It increased the correct classifications (92 to 97 of 134), mostly reducing the misclassification of abiotic to biotic classes, (23 to 16). The cost of this improvement was 2 extra nodes and 1 leaf in the Decision Tree.

The poor results of the 2-layer Perceptron indicate that the nested rules used in the Decision Trees are needed to classify the data. Perceptrons are 2-layer artificial neural networks (ANNs) that have an input and output layer, but no hidden neural layers. Perceptons are known to fail the exclusive-OR test (XOR) although the example Decision Tree in Figure 3 does not require any variables to overcome this issue. A multilayer neural net with at least 1 hidden layer would be needed to match the results of the Random Forest.

In conclusion, my results show that even with a dimensionally reduced data set, the data contains some information in total that allows a weak separation of the 2 classification labels and that the random Forest is the best classifier of many that were available in the WEKA ML software package.

References

1. Assembly Theory (AT) – A New Approach to Detecting Extraterrestrial Life Unrecognizable by Present Technologies www.centauri-dreams.org/2023/05/16/assembly-theory-at-a-new-approach-to-detecting-extraterrestrial-life-unrecognizable-by-present-technologies/

2. Venus Life Finder: Scooping Big Science
www.centauri-dreams.org/2022/06/03/venus-life-finder-scooping-big-science/

3. Pyrolysis – Gas Chromatography – Mass Spectroscopy en.wikipedia.org/wiki/Pyrolysis%E2%80%93gas_chromatography%E2%80%93mass_spectrometry

4. Random Forest en.wikipedia.org/wiki/Random_forest accessed 10/05/2023/

5. PCA “Principal Component Analysis” en.wikipedia.org/wiki/Principal_component_analysis accessed 10/05/2023

6, SHERLOC “Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals“ en.wikipedia.org/wiki/Scanning_Habitable_Environments_with_Raman_and_Luminescence_for_Organics_and_Chemicals accessed 10/06/2023

7. Han, J et al, Organic Analysis on the Pueblito de Allende Meteorite Nature 222, 364–365 (1969). doi.org/10.1038/222364a0

8. Zenobi, R et al, Spatially Resolved Organic Analysis of the Allende Meteorite. Science, 24 Nov 1989 Vol 246, Issue 4933 pp. 1026-1029 doi.org/10.1126/science.246.4933.1026

9. Goesmann, F et al The Mars Organic Molecule Analyzer (MOMA) Instrument: Characterization of Organic Material in Martian Sediments. Astrobiology. 2017 Jul 1; 17(6-7): 655–685.
Published online 2017 Jul 1. doi: 10.1089/ast.2016.1551

10. Cleaves, J et al, Hazen, R, A robust, agnostic molecular biosignature based on machine Learning, PNAS September 25, 2023, 120 (41) e2307149120
doi.org/10.1073/pnas.2307149120

11. __ Supporting information. www.pnas.org/action/downloadSupplement?doi=10.1073%2Fpnas.2307149120&file=pnas.2307149120.sapp.pdf

12. __ Mass Spectroscopy data: osf.io/ubgwt

13. Gold, T. The Deep Hot Biosphere: The Myth of Fossil Fuels. Springer Science and Business Media, 2001.

14. K-means clustering en.wikipedia.org/wiki/K-means_clustering

15. Chou, L et al Planetary Mass Spectrometry for Agnostic Life Detection in the Solar System Front. Astron. Space Sci., 07 October 2021 Sec. Astrobiology Volume 8 – 2021
doi.org/10.3389/fspas.2021.755100

16. “Nasa’s hunt for signs of life on Mars divides experts as mission costs rocket“ Web access 11/13/2023 www.theguardian.com/science/2023/nov/12/experts-split-over-nasa-mission-to-mars-costs-rocket

17. The Astrobiology Field Laboratory. September 26, 2006. Final report of the MEPAG Astrobiology Field Laboratory Science Steering Group (AFL-SSG). Web: mepag.jpl.nasa.gov/reports/AFL_SSG_WHITE_PAPER_v3.doc

18. Wang, Q., Song, H., Pan, S. et al. Initial pyrolysis mechanism and product formation of cellulose: An Experimental and Density functional theory(DFT) study. Sci Rep 10, 3626 (2020). https://doi.org/10.1038/s41598-020-60095-2

19. Sharma, S., Roppel, R.D., Murphy, A.E. et al. Diverse organic-mineral associations in Jezero crater, Mars. Nature 619, 724–732 (2023). https://doi.org/10.1038/s41586-023-06143-z

20. Weka 3: Machine Learning Software in Java https://www.cs.waikato.ac.nz/ml/weka/

Cometary Impacts: Looking for Life in the Right Places

If you had to choose, which planetary system would you gauge most likely to house a life-bearing planet: Proxima Centauri or TRAPPIST-1? The question is a bit loaded given that there are seven TRAPPIST-1 planets, hence a much higher chance for success there than in a system that (so far) has produced evidence for only two worlds. But there are other factors having to do with the delivery of prebiotic materials by comet, which is the subject of a new paper from Richard Anslow (Cambridge Institute of Astronomy). “It’s possible that the molecules that led to life on Earth came from comets,’’ Anslow reminds us, “so the same could be true for planets elsewhere in the galaxy.”

So let’s untangle this a bit. We don’t know whether comets are vital to the origin of life on Earth or any other world, and Anslow (working with Cambridge colleagues Amy Bonsor and Paul B. Rimmer) does not argue that they are. What their paper does is to examine the environments most likely to be affected by cometary delivery of organics, which in turn could be useful as we begin to study exo-atmospheres for biosignatures. If we can narrow the kind of systems where cometary delivery is likely, that could explain future findings of life signs as opposed to systems without such mechanisms, ultimately supporting the comet delivery model.

Image: Comets contain elements such as water, ammonia, methanol and carbon dioxide that could have supplied the raw materials that upon impact on early Earth would have yielded an abundant supply of energy to produce amino acids and jump start life. Credit: Lawrence Livermore National Laboratories.

The idea of delivering life-promoting materials by impacts is hardly new. We’ve learned from asteroid return samples like those from Ryugu and now Bennu that the inventory of prebiotic molecules is rich. We’ve also found intact amino acids in meteorite samples, showing the survival of these materials from their entry into the atmosphere. The authors point out that comets contain what they call ‘prebiotic feedstock molecules’ like hydrogen cyanide (HCN) along with basic amino acids, and some studies suggest that by way of comparison with asteroids, comets have delivered two orders of magnitude more organic materials than meteorites because of their high carbon content. But survival is the key, and that involves impact velocity upon arrival.

HCN is particularly useful to consider because its strong carbon-nitrogen bonds may make it more likely to survive the high temperatures of atmospheric entry. What Anslow and team call the ‘warm comet pond’ scenario demands a relatively soft landing. This is interesting, so let me quote the paper on it. A ‘soft landing’:

…excavates the impact point and forms a dirty pond from the cometary components. Climatic variations are thought to cause the episodic drying of these ponds, promoting the rapid polymerisation of constituent prebiotic molecules. It is thought this wet-dry cycling will effectively drive the required biogeochemical reactions crucial for RNA production on the early-Earth, and therefore play an important role in the initial emergence of life… Relatively high concentrations of prebiotic molecules are required for there to be sufficient polymerisation, and so this scenario still requires low-velocity impacts. Specific prebiotic molecules are more (or less) susceptible to thermal decomposition by virtue of their molecular structure, and so the inventory of molecules that can be effectively delivered to a planet is very sensitive to impact velocity.

The question that emerges, once we’ve examined the delivery of molecules like HCN, is what kind of stellar system is most likely to benefit from a cometary delivery mechanism? To address this, the authors construct an idealized planetary system with planets of equal mass that are equally spaced to study the minimum impact velocity that can emerge on the innermost habitable planet. They then use N-body simulations to model the necessary interactions between comets and planets in terms of their position and velocity through time. The snowline marks the boundary between rocky and volatile-rich materials in the disk, as below:

Image: This is Figure 1 from the paper. Caption: Schematic diagram of the idealised planetary system considered in this work with equally spaced planets (brown circles, semi-major axis ai ) scattering comets (small dark blue circles) from the snow-line. The blue region represents the volatile-rich region of the disc where comets occur, and the green region represents the habitable zone. Low velocity cometary impacts onto habitable planets will follow the lower arrows, which sketch the dynamically cold scattering between adjacent planets. The dynamically hot scattering as shown by the upper arrows, will result in high velocity impacts.

That equal spacing of planets is interesting. The authors call it a ‘peas in a pod’ system and note what other astronomers have observed, that “…individual exoplanet systems have much smaller dispersion in mass, radius, and orbital period in comparison to the system-to-system variation of the exoplanet population as a whole.” And indeed, tightly packed systems with equal and low-mass planets have been shown to be highly efficient at scattering comets into the inner system and hence into the habitable zone. Giant outer planets, it’s worth noting, may form in these tight systems, their effects helping to scatter comets inward, but they are not assumed in the author’s model.

The impactor’s size and velocity tell the tale. What we’d like to see is a minimum impact velocity below 15 kilometers per second to ensure the survival of the interesting prebiotic molecules like HCN. The simulations show that the impact velocity around stars like the Sun is reduced for lower mass planets, the effect being enhanced if there are planets in nearby orbits. Impact velocities drop even more for planets around low mass stars in tightly-packed systems (here again, think TRAPPIST-1), for here the comets tend to be delivered on low eccentricity orbits, a significant factor because impact speeds around low-mass stars are typically high. From the paper:

…the results of our N-body simulations demonstrate that the overall velocity distribution of impactors onto habitable planets is very sensitive to both the stellar-mass and planetary architecture, with the fraction of low-velocity impacts increasing significantly for planets around Solar-mass stars, and in tightly-packed systems. It will be these populations of exoplanets where cometary delivery of prebiotic molecules is most likely to be successful, with significant implications for the resulting prebiotic inventories due to the exponential decrease in survivability with impact velocity.

We learn from all this that we have to be attuned to the mass of the host star and the nature of planetary distribution there to be able to predict whether or not comets can effectively deliver prebiotic materials to worlds in the habitable zone. If this seems purely theoretical, consider that telescope time on future missions to study exoplanet atmospheres will be a precious commodity, and these factors may emerge as an important filter for observation. But we’ll also find out whether the correlations the authors have uncovered re lower mass, tightly packed planets are demonstrated in the presence of the biosignatures we are looking for. That may tell us whether comets are a significant factor for life’s emergence on distant worlds as well as our own.

The paper is Anslow, Bonsor & Rimmer. “Can comets deliver prebiotic molecules to rocky exoplanets?” Proceedings of the Royal Society A (2023). Full text. Thanks to my friend Antonio Tavani for the pointer to this work.

M-Dwarfs: The Asteroid Problem

I hadn’t intended to return to habitability around red dwarf stars quite this soon, but on Saturday I read a new paper from Anna Childs (Northwestern University) and Mario Livio (STScI), the gist of which is that a potential challenge to life on such worlds is the lack of stable asteroid belts. This would affect the ability to deliver asteroids to a planetary surface in the late stages of planet formation. I’m interested in this because it points to different planetary system architectures around M-dwarfs than we’re likely to find around other classes of star. What do observations show so far?

You’ll recall that last week we looked at M-dwarf planet habitability in the context of water delivery, again involving the question of early impacts. In that paper, Tadahiro Kimura and Masahiro Ikoma found a separate mechanism to produce the needed water enrichment, while Childs and Livio, working with Rebecca Martin (UNLV) ponder a different question. Their concern is that red dwarf planets would lack the kind of late impacts that produced a reducing atmosphere on Earth. On our planet, via the reaction of the iron core of impactors with water in the oceans, hydrogen would have been released as the iron oxidized, making an atmosphere in which simple organic molecules could emerge.

If we do need this kind of impact to affect the atmosphere to produce life (and this is a big ‘if’), we have a problem with M-dwarfs, for delivering asteroids seems to require a giant planet outside the radius of the snowline to produce a stable asteroid belt.

Depending on the size of the M-dwarf, the snowline radius is found from roughly 0.2 to 1.2 AU, close enough that radial velocity surveys are likely to detect giant planets near but outside this distance. The transit method around such small stars is likewise productive, but we find no such giant planets in those M-dwarf systems where we currently have discovered probable habitable zone planets:

The Kepler detection limit is at orbital periods near 200 days due to the criterion that three transits need to be observed in order for a planet to be confirmed (Bryson et al. 2020). However, in the case of low signal-to-noise observations, two observed transits may suffice, which allows longer-period orbits to be detected. This was the case for Kepler-421 b, which has an orbital period of 704 days (Kipping et al. 2014). Furthermore, any undetected exterior giant planets would likely raise a detectable transit timing variation (TTV) signal on the inner planets (Agol et al. 2004). For these reasons, while the observations could be missing long-period giant planets, the lack of giant planets around low-mass stars that are not too far from the snow line is likely real.

Image: A gas giant in orbit around a red dwarf star. How common is this scenario? We know that such planets can exist, but so far have never detected a gas giant outside the snowline around a system with a planet in the habitable zone. Credit: NASA, ESA and G. Bacon (STScI).

In the search for stable asteroid belts, what we are looking for is a giant planet beyond the snowline, with the asteroid belt inside its orbit, as well as an inner terrestrial system of planets. None of the currently observed planets in the habitable zone around M-dwarfs shows a giant planet in the right position to produce an asteroid belt. Which is not to say that such planets do not exist around M-dwarfs, but that we do not yet find any in systems where habitable zone planets occur. Let me quote the paper again:

By analyzing data from the Exoplanet Archive, we found that there are observed giant planets outside of the snow line radius around M dwarfs, and in fact the distribution peaks there. This, combined with observations of warm dust belts, suggests that asteroid belt formation may still be possible around M dwarfs. However, we found that in addition to a lower occurrence rate of giant planets around M dwarf stars, multiplanet systems that contain a giant planet are also less common around M dwarfs than around G-type stars. Lastly, we found a lack of hot and warm Jupiters around M dwarfs, relative to the K-, G-, and F-type stars, potentially indicating that giant planet formation and/or evolution does take separate pathways around M dwarfs.

Image: This is Figure 2 from the paper. Caption: Locations of the giant planets, r, normalized by the snow-line radius in the system, vs. the stellar mass, M?. The point sizes in the top plot are proportional to m?. Red dots indicate planets around M dwarf stars and blue dots indicate planets around FGK-type stars. The point sizes in the legend correspond to Jupiter-mass planets. The bottom plot shows normalized histograms of the giant planet locations for both single planet and multiplanet systems. The location of the snow line is marked by a black dashed vertical line. Credit: Childs et al.

The issues raised in this paper all point to how little we can say with confidence at this point. Are asteroid impacts really necessary for life to emerge? The question would quickly be resolved by finding biosignatures on an M-dwarf planet without a gas giant in the system, presuming no asteroid belt had formed by other methods. As one with a deep curiosity about M-dwarf planetary possibilities, I find this work intriguing because it points to different architectures around red dwarfs than other stars. It’s a difference we’ll explore as we begin to fill in the blanks by evaluating M-dwarf planets for early biosignature searches.

The paper is Childs et al., “Life on Exoplanets in the Habitable Zone of M Dwarfs?,” Astrophysical Journal Letters Vol. 937, No. 2 (4 October 2022), L42 (full text).

tzf_img_post

Hayabusa2: Multiple Paths for Analyzing an Asteroid

Ryugu is classified as a carbonaceous, or C-type asteroid, a class of objects thought to incorporate water-bearing minerals and organic compounds. Carbonaceous chondrites, the dark carbon-bearing meteorites found on Earth, are thought to originate in such asteroids, but it has been difficult if not impossible to determine the source of most individual meteorites.

Hence the significance of the Hayabusa2 mission. JAXA’s successful foray to Ryugu represents the first time we’ve been able to examine a sample of a C-type asteroid through direct collection at the site. Ralph Milliken is a planetary scientist at Brown University, where NASA maintains its Reflectance Experiment Laboratory (RELAB). The laboratory expects samples collected at Ryugu to arrive in short order. Milliken is interested in the history of water in the object:

“One of the things we’re trying to understand is the distribution of water in the early solar system, and how that water may have been delivered to Earth. Water-bearing asteroids are thought to have played a role in that, so by studying Ryugu up close and returning samples from it, we can better understand the abundance and history of water-bearing minerals on these kinds of asteroids.”

Milliken is also one of many co-authors on a new paper in Nature Astronomy looking at the thermal history of subsurface materials exposed on Ryugu. The paper examines data the spacecraft collected during its operations there, which can now be compared to the sample collection. We learn that the asteroid may not be as water-rich as originally thought, leading to various scenarios about how it might have lost its water. High on the list of possibilities is that this ‘rubble pile’ asteroid — essentially loose rock maintaining its shape because of gravity — dried after a collision or other disruption and subsequent reformation.

Image: Japan’s Hayabusa2 spacecraft snapped pictures of the asteroid Ryugu while flying alongside it two years ago. The spacecraft later returned rock samples from the asteroid to Earth. Credit: JAXA.

Bear in mind how Hayabusa2 proceeded with its sampling at Ryugu. During the 2019 rendezvous, the spacecraft fired a projectile into the asteroid’s surface that exposed the subsurface rock examined here. A near-infrared spectrometer was used to compare the water content of the surface with the material below, showing the two to be similar in water content. The authors see that as a clue that Ryugu’s parent body dried out, rather than the surface of Ryugu being dried out by the Sun, perhaps in a close solar pass earlier in its history.

In other words, heating by the Sun in one or more close solar passes would be likely to occur at the surface, without penetrating deep into the asteroid. What Hayabusa2’s spectrometer shows is that surface and sub-surface are both comparatively dry, which is an indication that it was the parent body of Ryugu, rather than an event happening to Ryugu itself, that produced this result.

Ahead for the Ryugu analysis is the need to study the size of the particles excavated from below the surface, which could play a role in how the spectrometer measurements are interpreted.

“The excavated material may have had a smaller grain size than what’s on the surface,” says Takahiro Hiroi, a senior research associate at Brown and another study co-author. “That grain size effect could make it appear darker and redder than its coarser counterpart on the surface. It’s hard to rule out that grain-size effect with remote sensing.”

The beauty of the successful sample return is that hypotheses about Ryugu’s past can now be evaluated through comparison of the remote sensing data and actual laboratory work. These are exciting times indeed for the scientists studying the extensive collection of asteroid debris Hayabusa 2 brought back. We can expect significant papers on all this in 2021.

The paper is Kitazato et al., “Thermally altered subsurface material of asteroid (162173) Ryugu,” Nature Astronomy 4 January 2021 (abstract).

tzf_img_post