Wednesday, May 5, 2010

Linux vs. Genome

A comparison of the networks formed by genetic code and the Linux operating system has given insight into the fundamental differences between biological and computational programming.

The shapes are very dissimilar, reflecting the evolutionary parameters of each process. Biology is driven by random mutations and natural selection. Software is an act of intelligent design.

“One of the biggest problems of biological data is that you have no intuitions about it. It’s just a bunch of gobbledygook symbols. One way to get intuition is to map its structure onto something we know about,” said study co-author and Yale University informaticist Marc Gerstein. “Linux is evolving and changing. But unlike evolution in biology, we know exactly what’s going on.”

Several years ago, he refined a technique for turning gene-network “hairballs” — densely tangled depictions of gene interaction — into hierarchical maps. At the top of each map are what Gerstein calls master regulators, which steer the activity of many other genes. At the bottom are workhorses, which pump out protein code. In between are the middle managers, which do a bit of both.

Since then, Gerstein has compared the structure of gene networks between species, and contrasted biological networks with corporate and governmental structures. He hopes the contrasts will illuminate how network structure shapes genomic function.

In the latest study, published April 4 in the Proceedings of the National Academy of Sciences, he compared the genome of E. coli, a widely studied microbe, to Linux, the popular open source operating system. Though Gerstein hoped for insight into biological networks, the study also suggests strategies for social and technological engineers.

“If we don’t have designers fine-tuning things, and we have to deal with random changes, then what do we need to do in the control structure to make it robust?” said Gerstein.

E. coli’s network proved to have a pyramid-like shape, with a few master regulators, more middle managers, and many workhorses. In stark contrast, the Linux kernel call graph — the network of interactions between different pieces of program code — looks almost like an inverted pyramid. A great many top-level programs call on a few common subroutines.

Gene network structures start to resemble the Linux call graph as species become more complex, according to Sergei Maslov, a Brookhaven National Laboratory systems biologist not involved in the study. However, their pyramids never become as top-heavy as Linux. There seems to be a natural limit to this progression. The new study suggests why.

“If you update a low-level function, then you need to update all the functions that use it. That’s doable if you’re an engineer. You just go through all the code. But it’s impossible in biology,” Maslov said.

Indeed, when Gerstein’s team tracked the evolution of Linux kernel code since its original 1991 version, they found that its basic components had undergone extensive alteration. Biologically analagous are so-called evolutionarily conserved genes, which are used in a great many functions, but these have hardly changed at all. When a mutation is added, evolution can’t quickly update the rest of the genetic code.

Asked if human software engineers have outpaced natural evolution, Gerstein said the opposite was true. The computer model may be so extreme that it can’t be scaled to biological levels of complexity. “You can easily see why software systems might be fragile, and biological systems robust. Biological networks are built to adapt to random changes. They’re lessons on how to construct something that can change and evolve,” said Gerstein.

For now, the researchers have no plans to compare genomes to the most widely-used operating system of all, Windows.

“That’s forbidden,” said study co-author and Stony Brook University biophysicist Koon-Kiu Yan. “Windows is not open source.”

Image: Network structures of E. coli genome and Linux./PNAS.

Read More

No comments: