A 5-step guide to understanding gene synthesis
The ability to analyze gene expression patterns is required for understanding protein function, biological pathways, and cellular responses to external and internal stimuli. This article will provide a brief overview of the processes that underpin gene expression and techniques for quantifying the expression of specific genes.
Step 1: Sequence Optimization and Oligo Design
After you’ve chosen your gene of interest, you must create the sequence that will be synthesized. Consider the end goal: codon optimization may be appropriate if your goal is to maximize heterologous protein expression levels, but it may not be suitable if your goal is to study endogenous gene expression regulation.
Make sure your intended reading frame is maintained throughout the entire coding region for constructs with multiple segments. Additionally, short flanking sequences are frequently added to facilitate later excision or recombination via restriction enzymes or similar tools. Significantly, check that your sequence does not contain restriction enzyme recognition sites or other sequences that could disrupt your downstream workflow. Be aware of functional domains such as cis-regulatory elements or RNase splice sites that you may unintentionally introduce during codon optimization.
Oligo Design: Following the completion of the Gene Synthesis, sequence analysis is required to determine the best way to divide the entire gene into fragments that will be synthesized and assembled. You should typically divide the whole sequence into 500-1000 kB chunks for substantial synthetic genes to be synthesized separately and assembled later. Furthermore, numerous oligo design software programs are available to help with oligo design.
Step 2: Oligo Synthesis
Today, all DNA fabrication begins with sequentially adding nucleotide monomers to form short oligonucleotides using phosphoramidite chemistry. Oligo synthesis via phosphoramidite chemistry employs modified nucleotides known as phosphoramidites to ensure that nucleotides assemble correctly and to prevent the growing strand from participating in unwanted reactions during synthesis.
The phosphoramidite group is attached to the 3′ O to prevent unwanted branching and contains both a methylated phosphite and a protective di-isopropylamine. It is used because phosphorite reacts faster than phosphate. However, to avoid unwanted reactions until the oligonucleotide synthesis is complete, methyl groups are attached to the phosphate, and amino-protecting groups are added to the bases.
Step 3: Gene Assembly
Many methods for assembling oligos into complete genes or larger genome building blocks have been developed and proven effective. In vitro assembly methods based on polymerase or ligase are sufficient for relatively short sequences (up to 1 kb). In vivo, recombination-based methods may be preferable for longer sequences. A high-fidelity enzyme (such as DNA polymerase or ligase) is required for proper assembly.
Step 4: Sequence Verification and Error Correction
Due to the inherent possibility of error in each step of gene synthesis, all synthetic sequences should be verified before use. Mutated sequences must be identified and either removed or corrected from the pool. Internal insertions, deletions, and premature termination are common in synthetic DNA sequences. Significantly, only about 30% of any synthesized 100-mer is the desired sequence due to the accumulation of errors from phosphoramidite chemical synthesis. Inadequate oligo assembly can also result in heterogeneity in the final pool of synthetic gene products.
Cloning newly synthesized sequences into a plasmid vector can aid in verifying sequences. Using sequencing primers that bind to vector regions flanking the gene insert ensures that the ends of the synthesized gene insert are sequenced correctly.
Plasmid DNA can also be clonally amplified to generate a homogeneous pool of DNA with the correct sequence. If the correct sequence cannot be obtained and amplified from the pool of synthesized DNA, various error detection and correction methods have been developed and used successfully. Extensive purification via electrophoresis, mass spectrometry, and other biochemical methods; mismatch-binding or mismatch-cleavage via prokaryotic endonucleases; selection of correct coding sequences via functional assays; and site-directed mutagenesis after sequencing are some examples.
Step 5: Preparing Synthetic DNA for Downstream Applications
Cloning: Synthetic genes must be cloned into appropriate vectors for most applications. These may include plasmid vectors for cell transfection or electroporation and viral vectors (such as adenovirus, retrovirus, or lentivirus) for cell or live animal transduction.
However, to facilitate cloning, synthetic genes may be designed with restriction enzyme sites, recombination arms, or flanking sequences.
Recombination-based methods, as an alternative to introducing particular flanking sequences, begin with PCR extension of the gene insert to introduce 15 bases of sequence homology to the linearized vector, facilitating homologous recombination without adding unwanted bases.
Propagation of Synthetic Genes: Propagation of synthetic gene constructs may be complex for various reasons. Some plasmid constructs with “low copy number” do not amplify well in commonly used bacterial propagation hosts. Because of the energetic burden on the host cell, long genes are frequently challenging to maintain and propagate.
Significantly, some genes can cause their hosts’ physiology to change, resulting in abnormal culture temperature requirements or other conditions that can be difficult to identify and accommodate. In some host cells, genes may be toxic or unstable, but not in others. Screening many cell lines to find the best host for propagation can be time-consuming, but it is sometimes necessary for complex sequences.
Bottomline
The development of gene synthesis technology has transformed our understanding of how DNA functions as the blueprint for life and our ability to manipulate DNA for experimental, medical, and industrial purposes. While the capabilities and speed of gene synthesis have steadily increased, its cost has decreased from $10 per bp to $0.35 per bp over the last decade, coinciding with advances in DNA sequencing and chip-based bioassays and following Moore’s Law.