Science@Berkeley Lab nameplate Berkeley Lab logo
February, 2007
Engineering the Fruit Fly Genome

Fruit flies are only a couple of millimeters long, and even close up they don't look much like people. So if you're a researcher who wants to learn something about genetics in human development and disease, why would you bother with Drosophila melanogaster?

Science image Spacer image
Drosophila melanogaster is an invaluable model organism, partly because it shares so many genes with humans.

Although Drosophila is an insect whose genome has only about 14,000 genes, roughly half the human count, a remarkable number of these have very close counterparts in humans; some even occur in the same order in the fly's DNA as in our own. This, plus the organism's more than 100-year history in the lab, makes it one of the most important models for studying basic biology and disease.

To take full advantage of the opportunities offered by Drosophila, researchers need improved tools to manipulate the fly's genes with precision, allowing them to introduce mutations to break genes, control their activity, label their protein products, or introduce other inherited genetic changes.

"We now have the genome sequences of lots of different animals — worms, flies, fish, mice, chimps, humans," says Roger Hoskins of Berkeley Lab's Life Sciences Division. "Now we want improved technologies for introducing precise changes into the genomes of lab animals; we want efficient genome engineering. Methods for doing this are very advanced in bacteria and yeast. Good methods for worms, flies, and mice have also been around for a long time, and improvements have come along fairly regularly. But with whole genome sequences in hand, the goals are becoming more ambitious."

In the fruit fly, the technology and resources for two basic approaches for manipulating genes are fast improving. The first is a highly efficient method for inserting an engineered DNA sequence into the fly's chromosomes at many different random locations, often disrupting genes. The second is a complementary approach, less efficient but more precise, in which a particular gene of interest is manipulated in the lab and then inserted into a known location on a fly chromosome.

The science of gene disruption at random

The Drososphila Gene Disruption Project (GDP) is using the first approach to disrupt fruit fly genes on an unprecedented scale. The GDP was initiated by Gerald Rubin and Allan Spradling in the early 1990s as part of the Berkeley Drosophila Genome Project, based at Berkeley Lab and UC Berkeley and led by Rubin until 2003. (Rubin is now a vice president of the Howard Hughes Medical Institute and director of the HHMI Janelia Farm Research Campus in Virginia.) Over the years the GDP, now led by Hugo Bellen of the Baylor College of Medicine, with Hoskins and Allan Spradling of the Carnegie Institution as co-principal investigators, has tagged more than 60 percent of the fly's genes with insertions of engineered DNA fragments.

The first method for introducing recombinant DNA into the fly genome was invented by Rubin and Spradling in 1982; it revolutionized fly genetics. Key to the method was the use of a transposable element, or transposon, which is a segment of DNA that has the ability to "hop" around the genome, inserting itself in the chromosomal DNA.

"The first transposable element used in this way was one called the P element, which occurs naturally in fruit flies," says Hoskins. "Recently other transposons have been used in a similar fashion. Geneticists learned to turn the P element on and off, and then began using it as a mutagen to disrupt genes. P‑insertion mutations can have a range of effects on genes, from subtle interference to complete knockout and lethality."

In the Rubin and Spradling method, any DNA sequence can be loaded into a P element and moved into the fly genome — including combinations of gene sequences, like splice donors and acceptors, or enhancers that turn gene expression on in specific cells at specific times. Many useful tricks using the technique have been developed over the years. Researchers can use transposable elements with these features to search out genes, proteins, or regulatory elements such as enhancers. Moreover, a P element can be induced to hop back out of the DNA, often bringing some of the surrounding DNA sequences with it; such an "imprecise excision" can remove a gene completely.

As its name suggests, the Gene Disruption Project breaks genes. Through a process of genetic selection, DNA sequencing, and annotation, a growing number of genes has been tagged by insertion of a transposable element. Each strain containing a different insertion is used to establish a mating population, which is deposited in the Bloomington Stock Center at Indiana University. There it is available for public distribution, for unrestricted use in research labs.

Last year alone, more than 60,000 fly stocks from the GDP collection were sent from Bloomington to researchers around the world. The P-element insertions remain in place unless the researcher chooses to reactivate them, which allows researchers to employ a range of tricks to study each disrupted gene.

  Science image
By using transposons to insert GFP sequences (green fluorescent protein sequences) in different genes, researchers can see where and when the proteins coded by those genes are expressed. From upper left back, CG17342 is expressed in cytoplasm and DOM in cell nuclei, while cathepsin K is expressed extracellularly. From upper right back, Picot is expressed in cell membranes, stwl in chromatin, and fs(2)ket in the nuclear membrane.

One of the tricks that can be pulled with transposable-element insertions is tagging a gene's protein product with a segment of green fluorescent protein, or GFP, a jellyfish protein that glows green when stimulated by light under a microscope. The inserted fluorescent segment often does not interfere with the normal localization of the protein, so it can be used to visualize where and when the protein is expressed (turned on) in a living animal.

For example, one protein tagged with GFP may appear in the cell membranes of ovary cells and another only in certain neurons — information that's a valuable clue to what an unknown protein's role might be. As recent papers from the GDP demonstrate, this technique of "protein trapping" will be useful in reaching the ultimate goal of an atlas that indicates where and when each of the fly's proteins is expressed.

Supplementing the P element

"The P-element approach to disrupting fly genes has been extremely successful," says Hoskins. "Basically it uses various engineered P elements to hop blindly around the genome. 'Did I hit a gene we hadn't hit before?' is what you want to know. If yes, then you save that one for the collection."

By 2004 the GDP had used P elements to tag well over 5,000 genes with insertions. In that year the drug company Exelixis published its own collection of more than 5,000 Drosophila gene disruptions, which it had made using a different transposable element called piggyBac, derived from a moth. While much of the two collections overlapped, there were enough distinct new genes hit to increase the total to over 50 percent of the approximately 14,000 genes in the fly genome.

P elements and piggyBac transposable elements work in slightly different ways, and while they move freely throughout the genome, their insertion sites are not really random. About half the time, P elements insert themselves in control regions near the beginning of a gene's coding region. PiggyBac, on the other hand, seeks out a specific short sequence of bases; it more often inserts itself in the coding regions of genes. While P elements are capable of imprecise excision — which is a good thing, when you want to completely knock out a gene — piggyBacs are not.

Not long ago the Drosophila Gene Disruption Project began using yet another transposable element called Minos, derived from a different species of fruit fly, with a distribution of insertion sites that is much closer to random. Minos inserts at a high rate in genes that were not hit by either P elements or piggyBacs. It also inserts at any location within a gene, or between genes. Moreover, like the P elements, Minos can excise imprecisely to make gene knockouts. Within the past year the GDP has switched completely to this new element, resulting in a big jump in the number of tagged genes in the insertion collection.

"We now think we'll be able to get insertions in about 90 percent of the Drosophila genes over the next four years," says Hoskins. "We also have a newly designed Minos with even broader functionality, and we're testing it now. The future looks good for continuing with the transposon-based approach. Because Minos hops in a broad range of hosts including mammalian tissue culture cells, the technology we're developing promises to have applications beyond flies.”

Manipulating genes with precision

Since 2004 the number of fly genes tagged by the GDP has continued to increase rapidly, "but since the GDP approach is random at best, the approach to saturation will be asymptotic" — coming ever closer to saturation but never quite reaching it — "so there will be genes that we never hit," Hoskins says.

"So we also need a flexible system to put a variety of different genes and other sequences, including very large DNA fragments, into the fly genome," says Hoskins. "Even better, we need a system in which 'cassettes' combining different sequences can be pulled out and replaced with quite a different set of functions, once they've been inserted in the genome. To do this, the Bellen lab has incorporated several new technologies into a new engineering tool called P[acman]."

P[acman] is a refinement and extension of Rubin and Spradling's P-element transgenesis method, Hoskins says. "The original method involves loading your favorite gene within the sequence of an engineered P element, and then microinjecting it into fly embryos. A marker gene, also within the P element, is used to track the presence of the P element when it inserts somewhere in a fly's chromosome and is subsequently transmitted to its progeny," he explains. "For example, the marker gene 'white' can turn the eyes of a mutant fly from white back to their normal red color. When a red-eyed fly appears alongside white-eyed siblings, you know that the engineered DNA, including your gene of interest, has been incorporated into that fly's genome."  

One limitation of P-element transgenesis, however, is that fragments of DNA only up to about 30,000 base pairs can be incorporated into the fly genome in this way. Another limitation is that the DNA fragment is inserted into a random location in the genome. The P[acman] technology represents considerable progress on both of these fronts.

Science image
From upper left: P[acman] combines custom components, including marker genes (indicated by red-orange boxes), in plasmids that can be maintained and manipulated in E. coli bacteria. A gap is opened and the desired large DNA fragment (yellow boxes) is introduced into the gap, which is then repaired. Initially each bacterium contains only one or two copies of the P[acman] plasmid, but these have been equipped with "inducible origin of replication" sequences (green boxes), among other components, which can be triggered to produce multiple copies in E. coli. The P[acman] DNA is then extracted from the bacteria and injected into the developing fruit fly embryo. (Omitted from this diagram are the additional steps needed to insert the DNA at a precise, predetermined location in the fly's chromosomes.) One commonly used marker gene will cause a red-eyed fly to appear among its mutant white-eyed siblings, signaling that the marker gene and the DNA fragment it accompanies have been incorporated into that fly's genome.

First, P[acman] makes it practical to insert large DNA fragments, well over 100,000 base pairs, into the fly genome. Second, it inserts these fragments at specific predetermined locations. Finally, using a process called "recombineering," P[acman] makes it easy to engineer the DNA sequence in any desired fashion — analogous to rearranging and editing text in a word processor — before putting it into the fly genome.

The researcher begins by using a method from recombinant DNA technology called "gap repair": gaps are opened in circular loops of DNA in Escherichia coli bacteria, known as plasmids; the desired large fragment and other DNA sequences are assembled in the gap, which is then closed by DNA repair. Because plasmids carrying too many copies of a large DNA fragment become unstable in the bacterial host, at this stage the plasmids are present at only one or two copies in each bacterium, which also makes editing them through recombineering more efficient.

Since many copies of the inserted DNA fragment will be needed, the P[acman] plasmid has been equipped with a high-copy-number "origin of replication" sequence. Once the DNA fragment has been engineered to satisfaction, the replication sequences are triggered and a large number of P[acman] plasmids carrying the engineered fragment are made within the bacterial host. The plasmid DNA is then extracted from the bacteria and injected into the developing fruit fly embryos.

Along with the plasmid DNA, a messenger RNA coding for an integrase enzyme from the bacteriophage phiC31 (a bacterial virus) is injected into the fly embryo. The integrase guides the large DNA fragment to a predetermined site in the fly genome, in this way:

  1. Previously a recognition sequence called an attP site has been inserted into the fly's genome, carried there by a piggyBac transposon, and the exact locations of these attP sites on the chromosomes, in or between specific genes, have been mapped.

  2. The P[acman] DNA fragment has been equipped with a phiC31 recognition sequence called an attB site. The phiC31 integrase recognizes the attB site on the P[acman] and the attP site on the chromosome and catalyzes recombination between the two sites, resulting in precise insertion.

"It's useful to put P[acman] into a precise site so you can compare what happens when different engineered versions of sequences are inserted at the same site," says Hoskins. "This allows you to put each sequence you want into the same genomic context, reducing the noise in the experiment — that is, eliminating the variability that can result from putting DNA at different places in the genome."

Science image
A variety of gene traps, protein traps, enhancer traps, and enhancer blockers are among the "tricks" genome engineers have performed using DNA sequences loaded into transposons like P elements and piggyBacs. P[acman] can perform all these tricks, plus other feats of genome engineering requiring much larger sequences, placing the modified DNA with precision at predetermined sites in the genome.

P[acman]'s versatility doesn't stop there. "These techniques are not species-specific," Hoskins says. "The integrase sites will work in many animals besides the fruit fly, and many transposons will work in other species, too. These additions to the toolkit for fly biology should be applicable in many other model organisms."

Hoskins emphasizes that all of the specific techniques combined in P[acman] were invented by others. The use of phiC31 attP and attB sites for introducing short DNA sequences into the fly genome was first developed at Stanford, the copy-number control used in the P[acman] plasmid was developed at the University of Wisconsin, and recombineering has been a standard tool in mouse genetics for a number of years. What the Bellen team, and most notably graduate student Koen Venken, has contributed is the combination of these techniques in a powerful and wide-ranging system for genome engineering.

Additional information