Sunday, December 25, 2011

Bits of Behavior

I recently took up computer programming using Python and I've started on a new project - a simple animal movement model - which I'd like to integrate with ArcMap via the ArcPy package. It would be pretty cool to "create" some virtual creatures and watch them interact with each other and with their environment. With ArcPy, I can not only incorporate environmental information such as terrain, vegetation, resource location, climate, etc... (assuming I have access to the appropriate datasets), I can also dynamically update the map display as the model runs. In any case, that's the idea. It's a learning experience and, so far, it has been quite enlightening.

The first challenge was to create a Python class which could represent individual organisms. I named the class "bios" - the Greek word for "life". The class includes a number of parameters relevant to animal movement such a current (x, y) location, maximum move possible per iteration, vectors to other "bios" instances (other virtual critters, if you will), and many others. The real trick (at least, for me) was creating a class that could be called from an instance of itself to create new instances in addition to the original. Essentially, I needed instances of the bios class to be able to reproduce themselves. The approach I settled on uses the native Python dictionary data structure. The first bios instance is initialized as an object inside the dictionary (the parameter serves as the key and the instance object itself as the value). I created a method for the bios class (appropriately named "reproduce") which creates new instances of the bios class inside the same python dictionary as the original instance. Using this approach had some significant benefits including: the dictionary of key:value (name:instance) pairs can be used to save pertinent information (such as location, age, health, etc...) to a geodatabase, new instances can be initialized as objects in the dictionary using data from a geodatabase, and the dictionary serves as an iterable list of all bios instances currently in memory (handy for calculating things like the next move). For example, using a dictionary assigned to "i", the first instance could be initialized as follows in the Python prompt:

>>> i["bios1"] = bios()

Now I've got a dictionary containing my first instance of the bios class "bios1". To create 9 new, child instances in the dictionary I would do the following:

>>> i["bios1"].reproduce(9)

Now the dictionary "i" contains a total of ten bios instances. The nine new instances have been automatically assigned the keys "bios2", "bios3", "bios4", etc... using a counter which is concatenated to the base name "bios" as each new instance is created. Each new instance is also randomly assigned an x, y position within 1 unit of the parent instance (an arbitrary choice really). In this case, "bios1" was originally at 0,0 and the nine new instances were initialized randomly around this origin (see figure).

Plot of bios instance tracks over 100 calculated moves with a maximum move of 1 (arbitrary units).

Using other methods of the bios class I can then calculate a new position for each of the instances:

count = 0

while count < 99:
            for key in i:
                i[key].calc_vectors() #method calculates vectors from each instance to all other instances
                i[key].calc_move() #method calculates delta x, delta y based on vectors (above)
                i[key].move(i[key].nextMove[0], i[key].nextMove[1]) #method updates self.location

(full code - just follow the link and select "Download" to get the goods)

The code block above glosses over a lot of the more tedious elements of the algorithm, but hopefully you get the idea (if you're not much for programming, I'm probably boring you to death). Presently, each instance moves away from other instances in an inverse distance squared fashion. That is, the closer one bios instance is to another, the more it will move away from that instance. This effect grows exponentially as the distance between instances decreases, resulting in a rapid separation of closely spaced instances. However, the rate of separation quickly falls as the distance between them grows. Calculating the next move for each instance was a matter of summing the influences of all the other instances. This isn't necessarily accurate for animal behavior, but the movement calculation is fairly easily modified. I just wanted to validate the approach before complicating things.

As you can see, all the bios instances are initialized within 1 unit of 0, 0 but they quickly "repel" each other. Because the size of each move is proportional to 1 over the distance squared, even after 100 moves, all the instances are still within 15 units of the origin.

In the future, I'd like to create different classes (perhaps one for predators and one for prey), incorporate the influence of environmental factors into the model, and display the output in ArcMap. That will require me to work in angular units (latitude vs. longitude) rather the standard Cartesian coordinates; hopefully that won't be a problem. Maybe I'll create a web map where people can create their own creatures, drop them into an existing environment and see how they do. Like I said at the beginning, this project is just beginning. I've learned a lot about Python but there's still a long way to go.

Tuesday, December 6, 2011

NASA Discovery: Kepler-22b, an Earth-like Planet?

NASA recently announced the discovery of Kepler-22b, a potentially earth-like planet about 600 light-years away, which is orbiting within the habitable zone of its central star. The habitable zone for a planet is the range around its star where solar radiation is sufficient to maintain liquid water at the planet's surface, and is dependent both on the characteristics of the parent star and the planet itself (see below, courtesy of NASA).

Kepler-22b -- Comfortably Circling within the Habitable Zone (courtesty of NASA's Kepler mission homepage)

This marks the first such discovery. Previous observations of extra-solar planets within their habitable zones indicated that the bodies were far larger than the Earth and likely more akin to Jupiter or Saturn in their compositions.

Unfortunately, the Kepler space telescope provides information only about the sizes of planets but not their masses. The combination of size and mass data would, of course, allow for an estimate of density which would shed some light on the material properties of Kepler-22b. An estimate of a planet's composition would at least allow us to make an educated guess about the raw materials available for complex chemical reactions. A relatively dense planet may be like the Earth, composed primarily of rock, whereas a less dense planet would be either a water world or, in the extreme case, a gas planet. Given the size of Kepler-22b, however, a purely gas composition seems unlikely. A relatively warm (irradiated) gas planet with modest mass (thus, modest gravity) will lose its gas molecules to space over time. Unless Kepler-22b is fairly young, it isn't likely to be composed of deep gasses like Jupiter or Saturn.

Kepler's mission is to identify and catalog planets worthy of future study - planets that may harbor life. The next step in this quest would be to gather spectral data about the planets that are likely to hold liquid water (based on estimates of composition), a key ingredient for all life as we know it. A spectral signature could provide precise information about the atmospheric and surface composition of a planet. Is the atmosphere oxidizing or reducing? What molecules does it contain? Could it provide protection from radiation, as our Earth's atmosphere does? Collecting such data is, of course, complicated by the fact that we are currently only able to "see" these planets as they transit in front of their respective central stars (another detection method, which measures the "wobble" of a central star, is currently effective for deducing the presence of planets in star systems within about 160 light years of Earth). Actually, what we observe is a slight dip in the intensity of the star as a planet passes between us and its sun, blocking out some of the star's light. The reduction is so slight, in fact, that intensity data from the star must be continuously collected over many hours for the effect to be statistically significant.

Even if we could see the planet itself, we would be observing it with a star background. In other words, we would be looking at the "dark" side of the planet completely surrounded by very intense light from the star, not ideal for generating a good spectral signature for the planet. Given that the intensity reduction resulting from planetary occlusion of starlight is itself barely detectable, it would presumably be even harder to detect light from a planet illuminated by its sun in a part of its orbit that provides us with a space background because only a fraction of the incident starlight would be reflected or re-emitted toward the Earth (the exception could be "glint" from a planetary ocean).  Alas, it seems unlikely that we will be collecting high-quality spectral data for exoplanets any time soon.

Of course, the very ability to detect extra-solar planets was impossible for a very long time before we did it. The idea that we humans can now observe other celestial bodies like our own, hundreds or even thousands of light-years away, is awe-inspiring. Unfortunately, our powers of observation vastly outstrip our powers of transportation - if only we could travel to the other-worldly places we observe. Eventually, no doubt, the challenges will be overcome.

In light of this new discovery, I may have to change the name of my blog. Perhaps On A Pale Blue Dot would be more appropriate.

Sunday, September 18, 2011

Musings on a spider web (and other evolving structures)

For a few days now, in the mornings, I've noticed a spider web built between my car door and the ground. Every day I drive to work destroying the web yet each morning it has been rebuilt. This prompted me to think about how spider webs are constructed. In turn, I began to think about the way in which spiders knows how to build webs. Unlike humans, who generally have very attentive, mentoring parents (compared to arachnids at least), spiders aren't taught the ways of the world by their moms and dads (in fact, among certain species, mom may have eaten dad long before the little spiders hatch from their silken egg sacks). Nevertheless, spider webs are built with incredible precision and with remarkable similarity among members of the same species. In fact, the design of a spider's web can tell you to which species the builder belongs.

And spiders aren't alone. Think of a single queen ant. She begins her royal career by flying away from a home nest to some place that is not yet inhabited by her species. She then creates a very modest, often subterranean, home in which she lays her eggs. In time these hatch and release several classes of offspring which mature into workers, soldiers, etc... (for a far more thorough description check out E.O. Wilson). Each of the offspring perform specific functions and are guided by relatively simple rules yet, working together, they expand the nest, building sophisticated networks of chambers segregated by function: nurseries, food storage areas, ventilation systems, and water traps. Workers fan out over the surrounding landscape in search of resources. Guards stand ready at the nest's portals to the outside world. Wars are waged against competing species. In some species, crops of fungi are carefully cultivated by their symbiotic hosts as a source of food. And it all began with a single ant. One which was never taught anything. Does the whole of an ant kingdom lie within the genes of a single queen? Well, yes.

Complex physical structures (or, at the very least, the processes for making them) are indeed encoded in each organism's genes. That is, the behaviors which ultimately result in intricately constructed webs and nests, are hard-coded into spider and ant DNA, respectively. Of course, there isn't a single web gene or nest gene. These structures emerge as a consequence of specific behaviors which are linked to genes. They are the cumulative result of many genes (and their protein products) working in concert, under the influences of their natural habitat, to produce both the building materials (silk, in the case of spiders) and the necessary building behaviors. It is extraordinary to think that such a complex system can arise from a single individual ant (which was, at an earlier developmental stage, a single ant cell).

It's also strange to think of webs and nests (and the domiciles of many other social insects), all non-living physical structures, as being under the influence of evolution, yet they must be - so much so that the homes of closely related species often display unique construction patterns. When I think of evolution I often envision genetic sequences or anatomical adaptations, but less frequently consider the impact of evolution on the physical substrate of the living world. If we could play through the cyclical building of structures such as ant nests, spider webs and bee hives over evolutionary time, we would see the physical world transforming before our eyes under the influence of tiny, ever-adapting construction workers.

So tomorrow, when I sacrifice the spider's web on my daily trip to work, I might feel a new twinge of guilt, knowing that eons of evolution were required to make that particular pattern. Fortunately for the spider, it will never forget how to make a replacement.

Saturday, June 4, 2011

The Mountain

Time to catch up on the blog! This video reminded me of why I started this blog in the first place - to share the awe inspiring majesty of nature. The rolling clouds at 0:45 are especially captivating. You have to listen to the music for the full experience. Enjoy! More to come.

Thursday, June 2, 2011

The Art and Science of Remote Sensing

Here's an RGB image I created by combining three monthly chlorophyll a composites generated from data collected by the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard NASA's Aqua satellite. Red = July '04; Green = Nov '04; Blue = Mar '05. Central America is visible in the top right corner. Artistic, no?

Wednesday, April 6, 2011

Landscape Genetics

After some additional research, I've discovered that 'landscape genetics' is the field that (most directly) addresses the interrelationship between genomes and their environments. I found a great article about this emerging discipline here. I really think the next ten years will see an explosion in this area. Exciting stuff!

Sunday, March 20, 2011

The Geonome: Putting Genetic Information in Its Place

Over the past five years, the cost of genetic sequencing has dropped by a factor of approximately 3000. Yep, that's right. Gathering raw sequence data cost about $1000 per megabase (1 million base pairs) in 2006. Today the cost is around 30 cents. And a number of new sequencing technologies promise to bring the price down even farther. With costs falling rapidly, and demand for genomic information rising, we can expect a flood of genetic data in the near future. The hurdle of sequencing cost seems to have been overcome. The challenge of meaningful genomic analysis, however, remains. Once we have all of this genetic data, what should we do with it?

Now, I'm not a geneticist but I have worked on a few microbiology projects that required some sequencing work. These were small scale research projects using sequencers from 2004-2005. Nonetheless, I feel that I have a basic understanding of the gene sequencing process. It can be tedious, repetitive, and produces mountains of data for analysis. My current work as a space-based remote sensing analyst, which also produces mountains of data, is similar to gene sequencing in that it requires the collection and management of large datasets. It also requires heavy use of Geographic Information Systems (GIS) to manage that data. Hmmm...

Before pressing on, I'd like to take a moment to consider the most fundamental theory in biology - the theory of evolution. While various definitions of biological evolution exist, it is essentially the non-random selection of (more or less) random genetic variation. The criteria for selection is reproductive success. If a particular genetic variation leads to greater reproductive success, that genetic information becomes more prevalent in the gene pool. The factors that influence genetic variation and success are both biological and non-biological. The interaction of a species with other species (predator-prey relationships, competition for resources, parasitism, etc...), as well as with members of the same species (e.g., sexual selection) are examples of biological factors. Temperature, altitude (or depth), soil type, rainfall, terrain, etc... are all examples of non-biological factors. For a given genome, the combination of biological and non-biological factors which exert selective pressure can be taken as the genome's evolutionary environment (for individual genes, other genes within the same genome may be considered part of the evolutionary environment as well). Understanding the interactions between these various factors is essential to creating a robust, predictive evolutionary model.


Currently, genetic sequence datasets are publicly available via online databases such as GenBank and EMBL, but they rarely include geospatial information (to be fair, most currently sequenced organisms are already well characterized, and are either domesticated or otherwise not confined to specific geographic locations). While researchers can perform comparative analyses of the genomes within these databases, there is little, if any, contextual information about the genomes' environments. This isolation of genetic information fails to capture the most important element of evolutionary theory - the interactions between genes and their environments.

The ever-decreasing price of sequencing technology, coupled with ever-increasing portability and rising demand for genomic data will inevitably lead to an increase in field sequencing efforts where geospatial information is not only available but extremely relevant. The resulting georeferenced genetic information could provide the basis for a large-scale synthesis of genetics, ecology, evolutionary biology, and geography. This data (genetic sequence + geolocation) will constitute the "geonome" of the sample area and such data will quickly out pace the ability of researchers to keep up with sequencing projects relevant to their own work. This is where GIS play an important role.

Geographic information systems are ideally suited for managing and analyzing large georeferenced datasets. Some also provide the capability to build custom models and extensions for particular analysis tasks. Imagine having the ability to enter a genome sequence (or a partial sequence) into a GIS (likely via a custom interface application) and find related sequences (along with relevant environmental data) elsewhere in the world. Or one could search for all the genetic sequence information collected from environments with a set of similar conditions and compare those sequences for similarities. A genomic GIS, rich with environmental information, could allow for unprecedented insights into the nature of genetic variability, and provide the context necessary for understanding evolution at a truly global level.*

Although publications in biogeography often reference GIS modelling, I have yet to see a call for publicly available georeferenced genomic data (think GenBank with geographic information [and appropriate metadata] included). Such a repository is the starting point from which an appropriate GIS could be built (please, please let me know if such a thing exists).

This idea is certainly too complicated to fully explore in a single post, but I think it's worth pursuing. Individual genomes are important for understanding how organisms work, but The Geonome (the global geonome) could help us understand exactly how evolution works. And that's worth knowing.

* Because environmental factors are something that we can gain a lot of information about through remote sensing, understanding their influence at the genetic level could also inform our search for life beyond Earth.