Several combinatorial methods have been developed to create focused or diverse chemical libraries with a wide range of linear or macrocyclic chemical molecules: peptides, non-peptide oligomers, peptidomimetics, small-molecules, and natural product-like organic molecules. Each combinatorial approach has its own unique high-throughput screening and encoding strategy. In this article, we provide a brief overview of combinatorial chemistry in drug discovery with emphasis on recently developed new technologies for design, synthesis, screening and decoding of combinatorial library. Examples of successful application of combinatorial chemistry in hit discovery and lead optimization are given. The limitations and strengths of combinatorial chemistry are also briefly discussed. We are now in a better position to truly leverage the power of combinatorial technologies for the discovery and development of next-generation drugs.
Keywords: combinatorial chemistry, combinatorial library, drug discovery, high-throughput screening, computer-assisted drug design, one-bead one-compound library, DNA-encoded chemical library
Combinatorial chemistry involves the generation of a large array of structurally diverse compounds, called a chemical library, through systematic, repetitive and covalent linkage of various “building blocks”. Once prepared, the compounds in the chemical library can be screened, concurrently, for individual interactions with biological targets of interest. Positive compounds can then be identified, either directly (in position-addressable libraries) or via decoding (using genetic or chemical means).
The concept of combinatorial chemistry was developed in the mid 1980’s, with Geysen’s multi-pin technology [1] and Houghten’s tea-bag technology [2] to synthesize hundreds of thousands of peptides on solid support in parallel. In 1991, Lam et al. [3] introduced the one-bead one-compound (OBOC) combinatorial peptide libraries and Houghten et al. [4] described the solution-phase mixtures of combinatorial peptide libraries. In 1992, Bunin and Ellman reported the first example of a small-molecule combinatorial library [5]. In addition to being displayed on microbeads, peptides and other synthetic compounds can be displayed on planar surfaces or solid supports, such as glass, to form planar microarrays [6]. In 1985, Smith described the phage-display peptide library method [7]. Similar to OBOC libraries, each M13 phage displays one unique peptide entity (five copies); i.e., one-phage one-peptide. Positive phages can then be isolated for amplification, re-panning, and eventually decoding with DNA sequencing. Unlike synthetic library methods, early biological libraries (phage-display, yeast-display, polysome-display peptide libraries) are restricted to the use of the 20 natural L-amino acids and simple cyclization with disulfide bonds. In the mid 2000’s, Frankel et al. [8] Josephson et al. [9], and Murakami et al. [10] reported the mRNA-display macrocyclic peptide libraries using unnatural and D-amino acids as building blocks. In 2009, Heinis et al. introduced the method of post-translational chemical modification of phage-displayed peptide libraries [11]. The latter approaches enable the generation of libraries of conformationally constrained peptides with greater chemical diversity and resistance to proteolysis, and are, thus, potentially more useful as drugs. Recent advances in DNA-encoded chemical libraries (DECLs) have allowed investigators to create and decode huge diversity small-molecule organic, peptide or macrocyclic libraries.
Combinatorial chemistry has been used for both drug lead discovery and optimization [12,13,14•]. Figure 1 summarizes the various combinatorial library methods, the nature of the library compounds involved and the screening methods available to each of the technologies. As shown in Figure 1 (orange boxes), most of the combinatorial library methods have the ability to generate hugely diverse chemical libraries (e.g. >1 million). These include the phage-display, yeast-display, bacteria-display, mRNA-display, OBOC, DECL, and solution phase mixture libraries. In addition to generating a huge number of compounds, these combinatorial library methods also allow rapid concurrent screening against specific drug targets (see below). The parallel synthesis library and synthetic planar microarray library methods (black boxes, Figure 1 ) are much lower throughput, and the resultant libraries far more focused, than the aforementioned methods. The planar microarray method has mostly been used as a tool for peptide research; although, in theory, other types of compounds can be chemically prepared in situ, via automation. The highly focused parallel synthesis small-molecule libraries (hundreds to thousands of compounds), when developed in conjunction with computational chemistry, are particularly useful for optimization of drug leads (see below). The subject of combinatorial chemistry has been extensively documented and reviewed [14–16]; as such, this short review covers only recent advances in combinatorial library design, synthesis and high-throughput screening methods. Selected examples that utilize combinatorial library approaches for drug discovery will also be briefly discussed; however, nucleic acid-based combinatorial libraries (e.g. aptamer library [17]) will not be discussed here.
Overview of combinatorial technologies. The various combinatorial technologies are shown in orange (diverse and focused libraries) and black (focused small library), the nature of chemical compounds is shown in blue, and the two broad groups of screening assays are shown in green. Depicted within the red ovals are the screening assays and nature of library compounds pertaining to each technology. The question mark indicated that, in practice, synthetic planar microarray is limited to peptides and simple oligomers.
As the fields of combinatorial chemistry and computational chemistry began to mature, it became clear that combining the two would lead to higher hit rates. It is more cost-effective to design and screen virtual chemical libraries in silico, such that subsets of the chemical space of likely hits can be defined, prior to the actual synthesis and screening of the libraries. Computer-assisted drug design, such as generation of virtual libraries, analogue docking and in silico screening now becomes the standard procedure used in drug discovery programs. Fragment-based drug design (FBDD) involves the experimental screening of libraries of small chemical fragments, via nuclear magnetic resonance (NMR) spectroscopy or other biophysical technologies such as surface plasmon resonance (SPR) for low affinity hits (low mM to high μM), or in silico screening of virtual fragments if the structural information of the target is available. Proper linkers are then used to connect the fragment hits while maintaining their relative positions in the sub-pockets. High-affinity ligands have been found with these approaches [18,19]. Vemurafenib is the first drug discovered via FBDD to gain FDA approval [20]. To enhance the probability of obtaining hits that are more drug-like, ADMET (absorption, distribution, metabolism, excretion and toxicity) filters have also been included in the algorithm for library design [21]. Examples of other library design methods include multi-objective optimization methods [22], the “adaptive” library approach with a simulated evolutionary process [23], and the multiple copy simultaneous search method which uses active site mapping and a de novo structure-based design tool [24]. A rapid and simple Python-based method for target-focused combinatorial library design was recently developed by Li et al. [25]. This method utilizes flexible SMILES strings, which are concatenated by Python language, to encode structures of molecules and create the library at a rate of approximately 70,000 molecules per second. The authors used the hybrid 3D similarity calculation software SHAFTS to help refine the size of the libraries and improve hit rates. Although the aforementioned computational methods can be applied to both diverse and focused library design, they are particularly important for the development of focused libraries of limited diversity, so that the hit rate can be increased.
Parallel synthesis of combinatorial libraries can be achieved manually or robotically, in solution or on solid support. Diversity of these libraries tends to be small (hundred to a few thousands) but the choice of coupling chemistry is not limiting, and each library compound can be purified via automatic chromatography if needed. The intended structures of each of the library compounds are known. In contrast, the OBOC libraries are synthesized on microbeads using the split-pool synthesis strategy [3,4,26], resulting in greater diversity (thousands to millions) of bead-bound library compounds. However, these library compounds are non-addressable, and the positive bead isolated from screening must be decoded via a chemical or physical barcode, which can be constructed during library synthesis. Solution-phase positional scanning libraries can be prepared on solid support via split-pool synthesis, and later cleaved off the beads into a compound mixture in solution. Methods for the generation of biological peptide libraries such as phage-display, yeast-display, mRNA-display, and chemically modified phage-display libraries have been well described in the literature [14,27] and will not be discussed here. DECL libraries can be assembled via proximity ligation of DNA-tagged building blocks to form peptides, small-molecules or macrocycles. The available coupling chemistries for DECL; however, are more limited because they must be mild and compatible with the oligonucleotide tags. For reviews on the synthesis of chemical libraries, please refer to references [28–30] and the series of “Comprehensive Survey of Combinatorial Library Synthesis” in the Journal of Combinatorial Chemistry (currently ACS Combinatorial Science). Here, we would like to highlight several recently developed new chemical approaches and technologies in the preparation of combinatorial libraries.
Huang and Bode recently reported a “synthetic fermentation” method that does not require the use of organisms, enzymes or reagents to generate a combinatorial library of complex organic molecules “grown” from small building blocks in water [31••]. In this method, the authors adapted ketoacid ligation, which produces β-amino acid linkages. By adjusting the reaction conditions and the building blocks, products with different sequences, structures and compositions can be modulated. The authors prepared a 6,000-membered library from 23 simple building blocks and discovered a 1.0-μM inhibitor against hepatitis C virus NS3/4A protease.
Litovchick et al. developed a chemical ligation method for the construction of DECLs [32•]. The method relies on the ability of the Klenow fragment of DNA Polymerase I to translocate to a DNA backbone through triazole linkages via click cycloaddition. The authors have developed a strategy that allows for repetitive and specific installation of multiple oligonucleotide tags. Compared with previous DECL methods, this chemical ligation method represents an advance over, and could expand the scope and diversity of chemistry suitable for DECLs.
Many bioactive peptidic natural products contain macrocyclic structures. Suga and Bashiruddin recently published a review article [33] on the construction and screening of large libraries of natural product-like macrocyclic peptides using reconstituted translation systems where designated codons are made vacant and then reassigned to unnatural amino acids. Ribosomal synthesis of macrocyclic peptides can be achieved with a custom-made in vitro translation system containing flexizymes, amino acids (natural and unnatural), as well as unnatural amino acid capable of crosslinking with other amino acids. Fasan et al. recently reported a novel and versatile method for generating side chain-to-tail cyclic peptide macrocycles from ribosomally derived polypeptides in vitro in a pH-triggered manner or directly in living bacterial cells [34••]. Unnatural amino acids bearing a side chain of 1,3-aminothiol (AmmF) or 1,2-aminothiol (MeaF) are first ribosomally inserted into intein-containing precursor proteins ( Figure 2 ). Then spontaneous post-translational cyclization via a C-terminal ligation/ring contraction is achieved via an intein-catalyzed intramolecular transthioesterification, followed by ring closure through an irreversible S, N acyl transfer rearrangement. More recently, the Suga group reported a strategy for efficient post-translational modification of a library of ribosomally translated peptides by introducing exogenous free thiols, followed by ligation of carbohydrates to generate proteolytically stable thioglycopeptides [35].
Strategy for generating side chain-to-tail macrocyclic peptides in vitro in a pH-triggered manner or directly in living cells.