Working in the field has its limitations, as Alex Tolley reminds us in the essay that follows, but at least biologists have historically been on the same planet with their specimens. Today’s hottest news would be the discovery of life on another world, as we saw in the brief flurries over the Viking results in 1976 or the Martian meteorite ALH84001. We rely, of course, on remote testing and will increasingly count on computer routines that can make the fine distinctions needed to choose between biotic and abiotic reactions. A new technique recently put forward by Robert Hazen and James Cleaves holds great promise. Alex gives it a thorough examination including running tests of his own to point to the validity of the approach. One day using such methods on Mars or an ice giant moon may confirm that abiogenesis is not restricted to Earth, a finding that would have huge ramifications not just for our science but also our philosophy.

by Alex Tolley

Perseverance rover on Mars – composite image.

Cast your mind back to Darwin’s distant 5-year voyage on HMS Beagle. He could make very limited observations, make drawings and notes, and preserve his specimen collection for his return home to England.

Fifty years ago, a field biologist might not have much more to work with. Hours from a field station or lab with field guides and kits to preserve specimens, with no way to communicate. As for computers to make repetitive calculations, fuggedaboutit.

Fast forward to the late 20th and early 21st centuries, and fieldwork is extending out to the planets in our solar system to search for life. Like Darwin’s voyage, the missions are distant and long. Unlike Darwin, samples have not yet been returned from any planets, only asteroids and comets. Communication is slow, more on the order of field experiences. But instead of humans, our robot probes are “Going where no one has gone before” and humans may not go until much later. The greater the communication lag, the more problematic the central command to periphery control model. Reducing this delay demands a need for more peripheral autonomy at the periphery to make local decisions.

The 2006 Astrobiology Field Laboratory Science Steering Group report recommended that the Mars rover be a field laboratory, with more autonomy [17]. The current state of the art is the Perseverance rover taking samples in the Jezero crater, a prime site for possible biosignatures. Its biosignature instrument, SHERLOC, uses Raman spectrography and luminescence to detect and identify organic molecules [6]. While organic molecules may have been detected [19], the data had to be transmitted to Earth for interpretation, maintaining the problem of lag times between each sample to be chosen and analyzed.

As our technology improves, can these robots operating on planetary surfaces be able to do more effective in situ analyses in the search for extant or extinct life, so that they can operate more quickly like a human field scientist, in the search for life?

While we “know life when we see it”, nevertheless we still struggle to define what life is, although with terrestrial life we have sufficient characteristics except for edge cases like viruses and some ambiguous early fossil material. However, some defining characteristics do not apply to dead, or fossilized organisms and their traces. Fossil life does not metabolize, reproduce, or move, and molecules that are common to life no longer exist in their original form. Consider the “fossil microbes” on the Martian meteorite ALH84001 that caused such a sensation when announced but proved ambiguous.

Historically, for fossil life, we have relied on detecting biosignatures, such as C13/C12 ratios in minerals (due to chlorophyll carbon isotope preference), long-lasting biomolecules like lipids, homochirality of organic compounds, and disequilibria in atmospheric gases. Biomolecules can be ambiguous, as the amino acids detected in meteorites are most likely abiotic, something the Miller-Urey experiment demonstrated many decades ago.

Ideally, we would like a detection method that is simple, robust, and whose results can be interpreted locally without requiring analysis on Earth.

A new method to try to identify the probably biotic nature of samples with organic material is the subject of a new paper from a collaboration under Prof. Robert Hazen and James Cleaves. The team not only uses an analytical method—pyrolysis gas chromatography coupled to electron impact ionization mass spectrometry (Pyr-GC-EI-MS) to heat (pyrolyze), fractionate volatile components (gas chromatography), and determine their mass (mass spectrometry), but also analyzes the data to classify whether the new samples contain organic material of biological origin. Their reported early results are very encouraging [10, 11, 12].

The elegance of Hazen et al’s work has been to apply the Pyr-GC-EI-MS technique [3, 15, 18] that is not only available in the laboratory, but is also designed for planetary rovers to meet the need for local analysis. Their innovation has been to couple this process with computationally lightweight machine learning models to classify the samples, thereby bypassing the time lags associated with distant terrestrial interpretation. A rover could relatively rapidly take samples in an area and determine whether any might have a biosignature based on a suite of different detected compounds and make decisions locally on how to proceed.

The resulting data of masses and extraction time can be reduced and then classified using the pre-trained Random Forest [4], which is a suite of Decision Trees (see Figure 3) using samples of the feature set of masses, to provide a classification, which with the currently tested samples, provides a better than 90% probability of correct classification. The reported experiment used 134 samples, 75 labeled as abiotic and 59 as biotic or of biotic origin. The data set ranged in mass from 50 to 700 and several thousand scans over time. This data was reduced to a manageable size by reducing the mass and time ranges to 8147 values. The samples were then run against several machine learning methods, of which the Random Forest worked best.

To provide a visualization of which mass and time values were most instrumental in classifying the data, the 20 most informative data points were extracted and overlaid on the MS data as shown in Figure 1.

Figure 1 above shows 2 samples of data. One is the Allende meteorite which is classified as a 4.5 billion years old abiotic sample. This is contrasted with one of the microbial samples. While the details of the meteorite sample were not provided, older analyses by others indicated that the surface was contaminated with terrestrial material, whilst the interior matrix included polycyclic aromatic hydrocarbons, a common material found in samples from space missions [7,8]. The bacterial sample, as expected, shows many more compounds after pyrolysis, as the organism is composed of a large variety of organic compounds including amino acids, nucleobases, lipids, and sugars which will decompose with heating. A key point is that the discriminant features are not coincident with the most common masses in the samples, but rather in the rarer compounds as indicated by their intensities. [The lower bound mass bin ensures that common pyrolysis products such as low carbon number compounds will be excluded from the analysis and visualization. The data is normalized to the highest peak so that relative values rather than absolutes are analyzed to eliminate sample amounts.] Most of the defining compounds are in the 140 – 200 mass range, which would imply all-carbon compounds with 12-16 atoms.

Figure 2 shows a 2-dimensional Principal Components Analysis (PCA) using the 20 most informative features that illustrate the separation of the sample types. The expanded box encompasses all the abiotic samples.

I note that even though the biotic and natural samples were given the same classification label, these samples separate quite clearly too, indicating that the natural samples appear almost distinct from the biotic samples. What is perhaps surprising is that biological materials like cedarwood (oils derived from the bark) cluster with the abiotic samples, and even cyanobacteria seem similar in this view. Notice that the dried oak leaf, clearly a degraded living material, is remarkably similar to a cysteine (amino acid) and glucose Maillard reaction (used in the searing of foods to create flavors). A number of the natural materials that were classified as of biological origin or containing material of biological origin, also cluster closely with the abiotic samples, such as Quisqueite and Asphaltum. The peat sample (labeled natural) is placed between the bulk of both biological and natural samples.

Why should this technique work to classify samples according to the type labels? It has been argued that living things are both complex, but composed of molecules that occupy a relatively small space of possible diversity. [Work by Lee Cronin’s group has looked at the way biology restricts the possible structures of organic molecules to create complex macromolecules using few subunits. For example, the amino acid glycine is both important as a constituent of proteins, forming much of the structure of collagen, and is central to several biosynthesis pathways, that include the synthesis of porphyrins and thence to heme in red blood corpuscles. Some macromolecules such as cellulose are formed entirely of D-glucose, as are most complex sugar macromolecules. Cronin calls his technique Assembly Theory [1].]

But larger molecules constructed of a small number of simpler molecules alone are insufficient. Cellulose is a polymer of D-glucose molecules, but clearly, we would not state that a sheet of wet paper was once living, or formed by natural processes. A minimal complexity is required. Life relies on a suite of molecules connected by metabolic pathways that exquisitely restrict the possible number of resulting molecules, however complex, such as proteins that are constructed from just 20 of the possible much greater number of amino acids. At the heart of all life is the Krebs cycle which autotrophs use in the reverse direction to oxidation as part of carbon fixation to build biomass, often glucose to build cellulose cell walls.

The Pyr-GC-EI-MS technique detects a wide range of organic molecules, but the machine learning algorithm uses a set of specific ones to detect the requisite complexity as well as the abiotic randomness. In other words, this is complementary to Cronin’s “Assembly Theory” of life.

I would note that the PCA uses just 20 variables to separate the abiotic and biotic/natural samples. This appears adequate in the majority of the sample set but may be fewer than the variables used in the Random Forest machine learning algorithm. [A single Decision Tree using my reduced data uses just 12 rules – (masses and normalized frequency), but the accuracy is far lower. The Random Forest using different rules (masses and quantities, would be expected to use more features.]

How robust is this analysis?

The laboratory instrument generates a large amount of data for each sample, over 650 mass readings repeated over 6000 times over the scan time. The data was reduced for testing which in this case was 8149 values. There were 134 samples, 59 were classed as biotic or natural, and 75 were abiotic samples. A Random Forest (a suite of Decision Trees) algorithm proved the best method to classify the samples. This resulted in a 90+% correct classification of the sample types. The PCA visualization in Figure 2 is instructive as it shows how the samples were likely classified by the Random Forest model, and which samples were likely misclassified. The PCA used just 20 of the highest-scoring variables to separate the 2 classes of samples.

Generally, the Pyr-GC-EI-MS technique is considered robust with respect to masses extracted from different samples of the same material. The authors included replicates in the samples which should, ideally, be classified together in the same leaf in each Decision Tree in the Random Forest. That this is the case in this experiment is hinted by the few labels that point to 2 samples that are close together in the PCA shown in Figure 2, e.g. the cysteine-glucose Maillard reaction. That replicates are very similar is important as it indicates that the sample processing technique reliably produces the same output and therefore single samples are producing reliable mass and time signals with low noise. [In my experiment (see Appendix A) where K-means clustering was used, in most cases, the replicate pairs were collected together in the same cluster indicating that no special data treatment was needed to keep the replicates together.]

The pyrolysis of the samples transforms many of the compounds, often with more species than the original. For example, cellulose composed purely of D-Glucose will pyrolyze into several different compounds [18]. The assumption is that pyrolysis will preserve the differences between the biotic and abiotic samples, especially for material that has already undergone heating, such as coal. As the pyrolysis products in the mass range of 50 to 200 may no longer be the same as the original compounds, this technique can be applied to any sample containing organic material.

The robustness of the machine learning approach can be assessed by the distribution of the accuracy of the individual runs of the Random Forest. This is not indicated in the article. However, the high accuracy rate reported does suggest that the technique will report this level of accuracy consistently. What is not known is whether this existing trained model would continue to classify new samples accurately. This will also indicate the likely boundary conditions where this model works and whether retraining will be needed after the sample set is increased. This will be particularly important when assessing the nature of any confirmed extraterrestrial organic material that is materially different from that recovered from meteorites.

The robustness may be dependent on the labeling to train the Random Forest model. The sample set labels RNA and DNA as abiotic because they were sourced from a laboratory supply, while the lower complexity insect chitin exoskeleton was labeled biotic. But note that the chitin sample is within the abiotic bounding box in Figure 2, as well as the DNA sample.

Detecting life from samples that are fossils, degraded material, or parts of an organism like a skeletal structure, probably requires being able to look for both complexity and material that is composed of fewer, simpler subunits. In extremis, a sample with few organic molecules even after pyrolysis will likely not be complex enough to be identified as biotic (e.g. the meteorite samples), while a large range of organic molecules may be too varied and indicate abiotic production (e.g. Maillard reactions caused by heating). There will be intermediate cases, such as the chitinous exoskeleton of an insect that has relatively low molecular complexity but which the label defines as biotic.

What is important here is that while it might be instructive to know what the feature molecules are, and their likely pre-heated composition, the method does not rely on anything more than the mass and peak appearance time of the signal to classify the material.

Why does the Random Forest algorithm work well, and exceed that of a single Decision Tree or 2-layer Perceptron [a component of neural networks used in binary classification tasks]? A single Decision Tree requires that the set of features have a strong common overlap for all samples in the class. The greater the overlap, the fewer rules are needed. However, a single Decision Tree model is brittle in the face of noise. This is overcome with the Random Forest by using different subsets of the features to build each tree in the forest. With noisy data, this builds robustness as the predicted classification is based on a majority vote. (See Appendix A for a brief discussion on this.)

Is this technique agnostic?

Now let me address the important issue of whether this approach is agnostic to different biologies, as this is the crux of whether the experimental results will detect not just life, but extraterrestrial life. Will this approach address the possibly very different biologies of life evolved from a different biogenesis?

Astrobiology, a subject with no examples, is currently theoretical. There is almost an industry trying to provide tests for alien life. Perhaps the most famous example is the use of the disequilibria of atmospheric gases, proposed by James Lovelock. The idea is that life, especially autotrophs like plants on Earth, will create an imbalance in reactive gases such as oxygen and methane that keeps them apart from their equilibrium. This idea has since been bracketed with constraints and additional gases, but the basic idea remains a principal approach for exoplanets where only atmospheric gas spectra can be measured.

As life is hypothesized to require a complex set of molecules, yet far fewer than a random set of all possible molecules, or as Cronin has suggested, reuse of molecules to reduce the complexity of building large macromolecules, it is possible that there could be fossil life, either terrestrial or extraterrestrial, that has the same apparent complexity, but largely non-overlapping molecules. The Random Forest could therefore build some Decision Trees that could select different sets of molecules to make the same biotic classification, suggesting that this is an agnostic method. However, this has yet to be tested as there are no extraterrestrial biotic samples to test. It may require such samples, if found and characterized as biotic, to be added to a new training set should they not be classified as biotic using the current model.

As this experiment assumes that life is carbon-based, clearly truly exotic life based on other key elements such as silicon would be unlikely, but not impossible, to be detected if volatile non-organic materials in a sample could be classified correctly.

The authors explain what agnostic in their experiment means:

Our Proposed Biosignature is Agnostic. An important finding of this study is that abiotic, living, and taphonomic suites of organic molecules display well-defined clusters in their high-dimensional space, as illustrated in Fig. 2. At the same time, large “volumes” of this attribute space are unpopulated by either abiotic suites or terrestrial life. This topology suggests the possibility that an alien biochemistry might be recognized by forming its own attribute cluster in a different region of Fig. 2—a cluster that reflects the essential role in selection for function in biotic systems, albeit with potentially very different suites of functional molecules. Abiotic systems tend to cluster in a very narrow region of this phase space, which could in principle allow for easy identification of anomalous signals that are dissimilar to abiotic geochemical systems or known terrestrial life.

What they are stating is that their approach will detect the signs of life in both extant organisms and the resulting decay of their remains when fossilized, such as shales and fossil fuels like coal and oil. As the example PCA of Figure 2 shows, the abiotic samples are tightly clustered in a small space compared to the far greater space of the biotic and once-biotic samples. The authors’ Figure 1 shows that their chosen method results in fewer different molecules found in the Allende meteorite compared to a microbe. I note that the dried oak leaf that is also within the abiotic cluster of the PCA visualization is possibly there because the bulk of the material is cellulose. Cellulose is made of chains of polymerized D-glucose, and while the pyrolysis of cellulose is a physical process that creates a wider assortment of organic compounds [18], this still limits the possible pyrolysis products.

This analysis is complementary to Cronin’s Assembly Theory which theorizes a reduced molecular space of life compared to the randomness and greater complexity of purely chemical and physical processes. This is because life constrains its biochemistry to enzyme-mediated reaction pathways. Assembly Theory [1] and other complexity theories of life [15] would be expected to reduce the molecular space compared to the possible arrangements of all the atoms in an organism.

The authors’ method is probably detecting the greater space of molecules from the required complexity of life compared to the simpler samples and reactions that were labeled as abiotic.

For any extraterrestrial “carbon units” that are theorized to follow organizing principles, this method may well detect extraterrestrial life, whether extant or fossilized, from a unique abiogenesis. However, I would be cautious of this claim simply because there were no biotic extraterrestrial samples used, because we have none, only presumed abiotic samples such as the organic material inside meteorites that should not be contaminated with terrestrial life.

The authors suggest that an alien biology using very different biological molecules might form their own discrete cluster and therefore be detectable. In principle, this is true, but I am not sure that the Random Forest machine learning model would detect the attributes of this cluster without training examples to define the rules needed. Any such samples might simply expose any brittleness in the model and either cause an error or be classified as a false positive for either a biotic or abiotic sample. Ideally, as Asimov once stated, the phrase most associated with interesting discoveries “is not ‘Eureka’ but ‘That’s funny . . .’”, might be associated with an anomalous classification. This might be particularly noticeable if the technique indicates that the sample is abiotic, while a direct observation by microscope clearly shows wriggling microbes.

In summary, it is yet to be tested against new, unknown samples to confirm whether it is both robust, and also agnostic, for other carbon-based life.

The advantage of this technique for remote probes

While the instrument data would likely be sent to Earth regardless of local processing and any subsequent rover actions, the trained Random Forest model is computationally very lightweight and easy to run on the data. Inspection of the various Decision Trees in the Random Forest allows an explanation for which features best classify the samples. As the Random Forest is updated by larger sample sets, it is easy to update the model to analyze samples in the lab or on a remote robotic instrument, in contrast to Artificial Neural Network architectures (ANN) that are computationally intensive. Should a sample that looks like it could be alien life but produces an anomalous result (That’s funny…”), the data can be analyzed on Earth and then assigned a classification, and the Random Forest model rerun with the new data either on Earth and the model uploaded, or locally on the probe.

Let me stress again that the instrumentation needed is already available for life-detection missions on robotic probes. The most recent is the Mars Organic Molecule Analyzer (MOMA) [9] which is to be one of the suite of instruments on the Rosalind Franklin rover as part of the delayed ExoMars mission which is now planned for a 2028 launch. MOMA will use both the Pyr-GC-EI-MS sample processing approach, plus a UV laser on the organic material extracted from 2-meter subsurface drill cores to characterize the material. I would speculate that it might make sense to calibrate the sample set with the MOMA instruments to determine if the approach is as robust with this instrument as the lab equipment for this study. The sample set can be increased and run on the MOMA instruments and finalized well before the launch date.

[If the Morningstar Mission to Venus does detect organic material in the temperate Venusian clouds, perhaps in 2025, this type of analysis using instruments flown on a subsequent balloon mission might offer the fastest way to determine if that material is from a life form before any later sample return.]

While this is an exciting, innovative approach to classifying organic molecules and classifying them as biotic or abiotic, it is not the only approach and should be considered complementary. For example, terrestrial fossils may be completely mineralized, with their form indicating origin. A low-complexity fragment of an insect’s exoskeleton would have a form indicative of biotic origin. The dried oak leaf in the experiment that clusters with the abiotic samples would leave an impression in the sediment indicative of life, just as we see occasionally in coal seams. Impressions left by soft-bodied creatures that have completely decayed would not be detectable by this method even though their shape may be obviously from an organism. [Although note that shape alone was insufficient for determining the nature of the “fossils” in the Martian meteorite, ALH84001.]

Earlier, I mentioned that the cellulose of paper represents an example with low complexity compared to an organism. However, if a robot probe detected a fragment of paper buried in a Martian sediment, we would have little hesitation in identifying it as a technosignature. Similarly, a stone structure on Mars might have no organic material in its composition but clearly would be identified as an artifact built by intelligent beings.

Lastly, isotopic composition of elements can be indicative of origin when compared to the planetary background isotopic ratios. If we detected methane (CH4) with isotope ratios indicative of production by subsurface methanogens, that would be an important discovery, one that would be independent of this experimental approach.

Despite my caveats and cautions, local life detection, rather like the attempts with the 1976 Viking landers may be particularly important now that the Mars Sample Return mission costs are ballooning and may result in a cancelation, stymying the return to Earth of the samples Perseverance is collecting [16]. One of the major benefits of training the Apollo astronauts to understand the geology and identify important rock samples was the local decisions made by the astronauts over which rock samples to collect, rather than taking random samples and hoping the selection was informative. A mission to an icy moon would benefit from such local life detection efforts if multiple attempts need to be made in fairly rapid succession without requiring communication delays with Earth for analysis and decision-making and where no sample return to Earth was likely. This innovative technique appears to be an important contribution to the search for extraterrestrial life in our system, and possibly even beyond if our probes capture samples from interstellar objects.

The paper is “Cleaves, J et al, Hazen, R, “A robust, agnostic molecular biosignature based on machine Learning,” PNAS 120 (41) (September 25, 2023) e2307149120. Abstract.


Appendix A. My experiment with the supplied data. [12]


To test some of the feedback from the authors, I ran some simple machine-learning experiments on the data. Rather than reduce the data to the number of variables in the paper, I used a simple data reduction by collapsing the scan data dimension so that only the single mass values remained. I normalized to the largest mass value in a sample that was set to 100 and all normalized floating point numbers were reduced to integers. All the resulting values of less than 1 were therefore set to 0. I used the classification labels as given. I also shuffled the class labels to test that the information in the data was lost with this operation. I used the Weka ML software package for running Decision Trees, Random Forests, and other ML methods [20].

Results and Discussion

Using the example I ran [figure 3] it is clear that the presence of a molecule[s] of mass 280 is sufficient to classify 14 of the 59 biological samples with no other rules needed, and if that rule fails, passing a rule with the presence of a molecule about ½ the mass of the first rule, adds a further 8 samples correctly classified as biological. However, it takes a further 6 rules to classify another 22 biological samples, and 7 rules to select 48 (1 sample was a false positive) of the 75 abiotic samples. The rules used mostly used larger molecules to determine the classifications because they had the most discriminatory power, as suggested by the number of the larger molecules of the 20 used in the PCA visualization. Of the 12 rules in my experiment, all but 3 used masses of 100 or greater, with 3 rules of 200 or greater. It should be noted that many rules simply needed the presence or absence (less than 1% of the peak frequency) of a molecule. The 2 largest biotic and abiotic leaves each required 7 rules, but about half required some non-zero value. The biotic leaf with 22 samples had just 3 rules with peak values that were present, while the abiotic leaf with 49 classified samples had all 7 rules with no peak value or values below a threshold.

Figure 3. The model for a Decision Tree output for a reduced collapsed set of data. It shows the rule tree of different mass normalized frequencies to classify abiotic [A], and biotic and natural [B], samples as leaves. There are 134 samples, For training, all the samples were used, 75 are classed abiotic, and 59 and biotic/natural. [The few misclassified samples were excluded for simplicity and clarity]. As all samples were used, there was no out-of-sample testing of the model.

The best classifier was the Random Forest, as found by the authors. This far exceeded a single Decision Tree. It even exceeded a 2 layer Perceptron. The Random Forest managed to reach a little more than 80% correct classification, which fell to random with the shuffled data. While the results using the more greatly reduced data were less accurate than those of the paper, this is expected by the data reduction method.

To test whether the data had sufficient information to separate the 2 classes simply by clustering, I ran a K-Means clustering [14] to determine how the data separated.

1. The 2 clusters were each comprised of about 60% of one class. Therefore while the separation was poor, there was some separation using all the data. Shuffling the labels destroyed any information in the samples as it did with the Decision Tree and Random Forest tests.

2. The replicate pairs almost invariably stayed in the same cluster together, confirming the robustness of the data.

3. The natural samples, i.e. those with a biogenic origin, like coal, tended mostly to cluster with the abiogenic samples, rather than the biotic ones.

I would point out that the PCA in Figure 2 was interpreted to mean that abiotic samples clustered tightly together. However, an alternative interpretation is that the abiotic and natural samples separate from the biotic if a separation is drawn diagonally to separate the biotic samples from all the rest.

One labeling question I have was placing the commercially supplied DNA and RNA samples in the abiotic class. If we detected either as [degraded] samples on another world, we would almost certainly claim that we had detected life once the possibility of contamination was ruled out. Switching these labels made very little difference to my Random Forest classification overall, but it did switch more samples to be classified as biotic, in excess of the switch of the 2 samples to biotic labels. It did make a difference for a simpler Decision Tree. It increased the correct classifications (92 to 97 of 134), mostly reducing the misclassification of abiotic to biotic classes, (23 to 16). The cost of this improvement was 2 extra nodes and 1 leaf in the Decision Tree.

The poor results of the 2-layer Perceptron indicate that the nested rules used in the Decision Trees are needed to classify the data. Perceptrons are 2-layer artificial neural networks (ANNs) that have an input and output layer, but no hidden neural layers. Perceptons are known to fail the exclusive-OR test (XOR) although the example Decision Tree in Figure 3 does not require any variables to overcome this issue. A multilayer neural net with at least 1 hidden layer would be needed to match the results of the Random Forest.

In conclusion, my results show that even with a dimensionally reduced data set, the data contains some information in total that allows a weak separation of the 2 classification labels and that the random Forest is the best classifier of many that were available in the WEKA ML software package.


1. Assembly Theory (AT) – A New Approach to Detecting Extraterrestrial Life Unrecognizable by Present Technologies

2. Venus Life Finder: Scooping Big Science

3. Pyrolysis – Gas Chromatography – Mass Spectroscopy

4. Random Forest accessed 10/05/2023/

5. PCA “Principal Component Analysis” accessed 10/05/2023

6, SHERLOC “Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals“ accessed 10/06/2023

7. Han, J et al, Organic Analysis on the Pueblito de Allende Meteorite Nature 222, 364–365 (1969).

8. Zenobi, R et al, Spatially Resolved Organic Analysis of the Allende Meteorite. Science, 24 Nov 1989 Vol 246, Issue 4933 pp. 1026-1029

9. Goesmann, F et al The Mars Organic Molecule Analyzer (MOMA) Instrument: Characterization of Organic Material in Martian Sediments. Astrobiology. 2017 Jul 1; 17(6-7): 655–685.
Published online 2017 Jul 1. doi: 10.1089/ast.2016.1551

10. Cleaves, J et al, Hazen, R, A robust, agnostic molecular biosignature based on machine Learning, PNAS September 25, 2023, 120 (41) e2307149120

11. __ Supporting information.

12. __ Mass Spectroscopy data:

13. Gold, T. The Deep Hot Biosphere: The Myth of Fossil Fuels. Springer Science and Business Media, 2001.

14. K-means clustering

15. Chou, L et al Planetary Mass Spectrometry for Agnostic Life Detection in the Solar System Front. Astron. Space Sci., 07 October 2021 Sec. Astrobiology Volume 8 – 2021

16. “Nasa’s hunt for signs of life on Mars divides experts as mission costs rocket“ Web access 11/13/2023

17. The Astrobiology Field Laboratory. September 26, 2006. Final report of the MEPAG Astrobiology Field Laboratory Science Steering Group (AFL-SSG). Web:

18. Wang, Q., Song, H., Pan, S. et al. Initial pyrolysis mechanism and product formation of cellulose: An Experimental and Density functional theory(DFT) study. Sci Rep 10, 3626 (2020).

19. Sharma, S., Roppel, R.D., Murphy, A.E. et al. Diverse organic-mineral associations in Jezero crater, Mars. Nature 619, 724–732 (2023).

20. Weka 3: Machine Learning Software in Java