The drug design process requires that the binding site be known as accurately as possible. Second, protein residues within a suitable range of the probe clusters are identified, which could be used for functional site identification and comparison.
In both cases it is important to keep the predicted ligand binding site as small as possible without compromising accuracy. In particular, Laskowski et al. This trend is likely to be a geometric property of proteins, as the sizes of ligands are not likely to be related to protein volume.
We therefore measure how accurately our predicted sites mapped onto ligand coordinates, and used this measurement to provide a threshold for success.
It has been used in defining binding sites in many applications including docking Rarey et al. All the coordinates in the PDB entries were used.
This subset was used instead of all proteins described by Nissink et al. Coordinates of the ligand s were placed in a separate file. Residues covalently bound to the protein were retained in the file containing the protein coordinates. All solvent molecules were discarded including phosphate, sulphate and metal ions. Q-SiteFinder is not designed to detect the binding sites of small solvent molecules.
Groups of non-water atoms that have fewer than atoms are identified as potential ligands. LigandSeek also identifies residues that the user may not wish to define as a ligand, such as protein phosphotyrosine residues, cofactors such as Haem, peripherally bound carbohydrate residues and small solvent molecules such as SO 4.
In the online versions of Q-SiteFinder and Pocket-Finder, options are therefore provided to retain these residues along with protein atoms for binding site analysis or to discard selected residues. We created a dataset of 35 structurally distinct proteins in the unbound state which share structural similarity with 35 proteins in the ligand-bound dataset. The proteins were used rather than just the proteins of the GOLD set to yield enough pairs of homologues.
High-resolution structures were favoured where possible. The bound protein—ligand complexes were superimposed onto their unbound homologues. Ligands were then extracted for use with the unbound homologues. Both sets of proteins and ligands were analysed using Q-SiteFinder and the success rates were compared. The protein pairs used in the experiment are shown in Table 2. Q-SiteFinder uses several separate procedures to perform ligand binding site prediction shown in Supplementary Figure 1.
First, ligand coordinates should be separated from the other atom coordinates using LigandSeek. Hydrogen atoms are then added to protein atoms by the method described by Jackson et al. The coordinates are rotated about the geometric centre to minimize the volume of the box enclosing the protein. This reduces the number of grid points requiring analysis. The same pre-processing steps are also performed when using Pocket-Finder.
The program Liggrid calculates the non-bonded interaction energy of a probe type with the protein at each position on a defined 3D grid, using the GRID force field parameters as described previously Jackson, The probes with the most favourable binding energy are retained based on an interaction energy threshold.
The probe coordinates are saved in PDB format, and the coordinates are rotated back to match the original orientation of the protein.
Individual probe coordinates are then clustered according to their spatial proximity, and the total interaction energies of probes within each cluster are calculated. Probe clustering uses a variable known as the connection range, which determines the maximum distance between two probes that can be connected as part of the same cluster.
This value should be greater than the probe grid resolution used to generate the probe output file. The default used here is 1. This connects all adjacent sites but not those on the diagonals of the cube. The probe clusters are ranked according to their total interaction energies, with the most favourable being identified as the first predicted binding site. The speed of the overall process is dependent on protein size, but it is usually 10—15 s on the current server 1.
The Clustering program also calculates site volume, and can identify which protein atoms are within a defined range of cluster sites. It is also used in this capacity in Pocket-Finder discussed below. The parameters for estimation of site volume and identification of protein residues are different for Q-SiteFinder and Pocket-Finder.
Values of 5. For the volume calculation, a distance threshold was used to calculate the number of cubes of dimension 0. These values reflect the fact that probe sites identified with Q-SiteFinder approach the protein within van der Waals vdW contact, i. This was found to produce sites in both cases with approximately a single layer of protein atoms surrounding the probes and approximately the same site volume.
An interaction between the protein and probe sphere occurs if the centre of a protein atom is found inside the probe sphere. A pocket is identified if an interaction occurs followed by a period of no interaction, followed by another interaction. This is referred to as a protein—site—protein PSP event. The definition of the pocket is somewhat dependent on the angle of rotation of the protein relative to the axes.
This makes the identification of protein pockets much less dependent on the orientation of the protein on the 3D grid. Each grid point has seven scanning lines passing through it in the x , y and z directions and the four cubic diagonals. The grid points are initially set to zero. Every time a grid point is identified as being in a pocket in a PSP event, the grid point is incremented by one.
Grid points can therefore register from zero not part of a pocket to seven deeply buried in a cavity PSP events. Grid points are only retained if they exceed a threshold number of PSP events. Pockets are defined by cubes of retained grid points with sides of length equal to the grid resolution. We use a grid resolution of 0. These values reduce the average volume of the first predicted site when compared with the parameters used by Hendlich et al. Pocket-Finder generates a probe output file that is compatible with the clustering method described above.
However, the sites produced by the Pocket-Finder program are ranked according to the number of probes in the site rather than by probe energy. PDBVolume gives an estimate of the protein volume.
It is a requirement that the PDB file is first pre-processed described above. PDBVolume creates a 3D grid with resolution 0. If the probe overlaps with a protein atom, the grid point is marked as being occupied. The number of cubes with sides of length 0. A comparison between protein volume calculations carried out by Laskowski et al.
PDBVolume was also used to calculate ligand volume. Hydrogen atoms were added to the ligands and a higher grid resolution of 0. Q-SiteFinder analyses clusters of energetically favourable methyl binding sites to predict the ligand binding sites.
Three sets of results are presented here: development and calibration of the method, comparison with two pocket-detection algorithms and testing its ability to predict ligand binding sites on proteins in the unbound state.
We measure how well a predicted site maps onto the ligand coordinates using a precision threshold. We define a successful prediction using a precision threshold. We feel this is a very stringent measure of success, since 1 a significant number of probe sites must be within the range of the ligand and 2 simply predicting very large pockets that include the ligand binding sites will not be counted as a success.
However, such a prediction is of little utility for guiding docking studies, de novo drug design or functional site comparisons. If a ligand is successfully predicted in more than one site on a protein, it is counted as a success only in the higher ranking site, since these predicted sites can be considered to be part of the same binding site.
If more than one ligand is found in the same site, only the success with the highest precision is counted for this site. This affected only four cases: 1glp, 1glq, 1ukz and 2phh. If two ligands are successfully predicted in two different sites on a protein, these are counted separately. The results have been derived using the coordinates of structures corresponding to the GOLD docking test set described by Nissink et al.
Their actual coordinates were not used, since they contain only the binding site and surrounding atoms. The coordinates were taken in their entirety from the PDB entries Table 1 using all protein chains and not solely single subunits. This cut-off was used to generate the other results presented in this report. It is desirable to have both a high rate of success and a high precision of binding site prediction.
However, this varies between 0. This can be thought of as a burial threshold, and PSP values for each grid point vary from 0 not a pocket to 7 deeply buried. Hendlich et al. Figure 2B also shows the relationship between site volume and precision. Smaller sites have a higher average precision. This is expected, since sites with high volumes will usually incorporate locations on the protein surface that are not part of the binding site.
Such grid points form part of a cavity, since they are bound on all sides by protein. This suggests that about one-third of the proteins in our dataset undergo a conformational change on binding that completely encloses the ligand.
Q-SiteFinder has a higher success rate in each of the top three predicted binding sites. It has three maltose sugar moieties which bind at the protein surface, and are in very shallow clefts. Large probe clusters are therefore not generated at these sites. However, the catalytic site of the protein is in a cleft, and binds to cyclodextrin Uitdehaag et al. The fourth predicted site identifies this binding site and is within 5.
This success was not identified during analysis because the coordinates of cyclodextrin are not present in the 1cdg structure. However, only one symmetrical unit a dimer is described by the PDB coordinates used in this study. The biologically relevant tetramer forms two thyroxine binding sites between two symmetrical units. When analysis was performed on the tetramer [coordinates taken from the PQS database Henrick and Thornton, ], the two binding sites were successfully identified by Q-SiteFinder in the first and third predicted sites.
Similarly, 3cla is a trimer formed from three symmetrical units. Only a single unit was described by the PDB coordinates. There was a fairly high degree of overlap in the detection of ligand binding sites by Q-SiteFinder and Pocket-Finder Fig. Pocket-Finder identified only 10 ligand binding sites that were not identified by Q-SiteFinder in the first predicted site. However, all 10 were identified by Q-SiteFinder in the second or third predicted sites. Q-SiteFinder identified 54 that were not identified by Pocket-Finder.
It is anticipated that Q-SiteFinder will be used to detect binding sites on proteins that are not bound to ligands. It is possible that ligand binding may cause a conformational change in the protein that biases the program to select a particular site. To test unbound conformations, 35 structurally distinct unbound proteins were compared with 35 homologous ligand-bound proteins as described in the Methods section.
The reduced success rate for the unbound conformation is caused by a number of factors. However, in the unbound conformation, the loop folds away from the binding site.
This alters the structure of the binding site, but it is still successfully identified by Q-SiteFinder in the fourth predicted site compared with the first predicted site in the bound conformation Fig. The main chain of the ligand binding site of the unbound form 1hsi is much more open.
This reduces the interaction in the binding site and, consequently, no large probe clusters are formed Fig. Figure 6A and B show the relationship between the predicted cleft volume of the first predicted binding site and the protein volume for Q-SiteFinder and Pocket-Finder.
The volumes of the sites predicted by Q-SiteFinder are only weakly dependent on protein volume Fig. This trend closely parallels the relationship between protein volume and the volume occupied by the ligand where there is little correlation between protein volume and ligand volume Fig. However, for the pocket detection algorithms, the size of the pocket is more closely related to protein volume; therefore, as protein volume increases, so does the average volume of the first predicted pocket.
Hence, Q-SiteFinder predicts sites with volumes that are most appropriate for the size definition of a ligand binding site. No significant difference was noted between the volumes of successful predictions and unsuccessful predictions for Q-SiteFinder in the first predicted site.
Bigger sites often encompass large areas that are not occupied by ligand atoms. In previous studies no precision threshold has been applied, the only criterion being that the ligand is found somewhere in the predicted pocket. However, this is at the cost of a significant increase in the volume of the cavity for Pocket-Finder Fig.
This implies that the method is relatively insensitive to change in the precision threshold unlike Pocket-Finder. Hence, Q-SiteFinder would appear to be more robust than Pocket-Finder, and better able to pinpoint the location of the ligand binding site. It can be concluded that ligands have a preference for regions of the protein that are more buried Pocket-Finder and better able to participate in van der Waals interactions with the protein Q-SiteFinder. Precision is a useful method for measuring how well probes map onto ligand coordinates Fig.
The main disadvantage of precision is that a high score can be achieved if the probe cluster maps accurately onto only a part of the ligand.
In many cases, this is justified, since only a part of the ligand may be bound to the protein. However, in some cases, a high precision can be achieved even though a part of the ligand bound to the protein has not been identified by the probe cluster. Other studies have used different measures of success.
For example, Peters et al. This definition of success has two major problems. First, a very large predicted site such as one that spreads across the whole surface of the protein would be considered successful providing it incorporated at least seven protein atoms in contact with ligand atoms, even though such a site would be very imprecise. False positive protein residues are not taken into account.
Second, if fewer than seven protein atoms were in contact with the ligand, no prediction could be defined as a success even if all of the protein atoms in contact with the ligand were correctly identified. Protein and ligand atoms were defined to be in contact with each other if they were within a distance of the sum of the van der Waals radii plus 0. The main disadvantage of this method is that false positive protein residues are not taken into account.
They measured the success of their predictions by finding the maximum, minimum and average distances between ligand atoms and the nearest probe whose type matched the ligand atom in question.
The reported distances were low. However, this method for calculating success disregards all probes that bind further away from the ligand false positives. Hence good results could be reported even if the predicted site was very large for example, covering the entire surface of the protein.
A method that gives a high precision is a suitable starting point for ligand docking studies, de novo drug design and functional site definition. Hence, we conclude that a precision-based threshold for success is suited to measuring the ability of a method to achieve this aim.
We have presented a method, Q-SiteFinder, for ligand binding site prediction that is based on determining energetically favourable binding sites on the surface of a protein. The method is better able to pinpoint the location of the ligand binding site than a comparable pocket detection algorithm Pocket-Finder on a dataset of proteins. One of the strengths of the method is its prediction of relatively small sites.
The sites have volumes roughly equivalent to ligand volumes irrespective of the overall size of the protein. This is in contrast to pocket detection, where predicted site volumes show a much greater tendency to increase with protein size.
This property would appear to be a result of using probe site binding energies with the appropriate energy cut-off rather than purely geometric criteria to determine favourable binding sites on proteins. The individual probe sites relate most closely to the favoured high-affinity binding sites on the protein surface. These favourable binding sites relate to locations where a putative ligand could bind and optimize its van der Waals interaction energy.
Such sites would be expected to correspond closely to a high-affinity ligand binding site. This is supported by the high level of success of the method. First, it would appear that this measure is general enough to be of predictive value for a broad range of proteins and ligands of different chemical composition. Furthermore, given the high level of success in unbound protein sites, it is also a property of binding sites that do not have a ligand already bound. Q-SiteFinder was shown to identify sites with high precision.
The advantage of this is that putative binding sites are identified as closely as possible to the actual binding site. It is important to keep the predicted ligand binding site as small as possible without compromising accuracy for a range of applications such as molecular docking, de novo drug design and structural identification and comparison of functional sites.
The dataset used in testing Q-SiteFinder with proteins in the unbound conformation. Examples of different levels of predicted binding site precision for a definition of precision, see text. Probe centres are shown in black wireframe. A The success rates in the first predicted binding site and the average precision when different probe binding-energy cut-offs are used in Q-SiteFinder.
Complete failures i. Overlap in ligand binding site prediction in the first predicted site. Q-SiteFinder predicts 54 sites that were not predicted by Pocket-Finder and 41 sites are predicted by both methods.
Success rates of binding site prediction when Q-SiteFinder was used for 35 ligand-bound proteins and 35 unbound homologues. Backbone structures of homologous ligand-bound mid-grey and unbound dark grey proteins have been superimposed with their ligands light grey.
A 1byb mid-grey and 1bya dark grey. B 1ida mid-grey and 1hsi dark grey. Class 1 enzymes are defined by Laskowski et al. Class 2 enzymes are defined to have the ligand in the second predicted site. Explore Magazines. Editors' Picks All magazines. Explore Podcasts All podcasts. Difficulty Beginner Intermediate Advanced. Explore Documents. Q Sitefinder. Uploaded by chandan kumar. Document Information click to expand document information Original Title q sitefinder.
Did you find this document useful? Is this content inappropriate? Report this Document. Flag for inappropriate content.
Download now. Save Save q sitefinder For Later. Original Title: q sitefinder. Related titles. Carousel Previous Carousel Next. Jump to Page. Search inside document. Documents Similar To q sitefinder.
Hanisha Muvvala. Izzuddin Ghani. Uzma Nazar. Mihnea Lames. Johnathan Bottisti. Adriano Mabile Cruz. Luis Alberto Mendoza Salas. Vaibhav Jain. Maroof Raza. Carlos Hollanda. Arnold Onia. Chitrang Bohra. Meaghan Scott. South African Instrumentation and Control Anonymous zdCUbW8Hf. More From chandan kumar. Popular in Ligand. Edit 5 Fernanda Retana. Tamyris Cunha. Pranav Bashetti. Sniper Archery. Muhammad Javed. Sunil Murapaka. Alexandra Gordu. Kapil Raghunandan Tripathy.
Febriyani Sulistyaningsih. Siddharth Arya. Hamit Rana. Alexandra Vaideanu. Lohith Loli. Caryl Franchete. AsHes Maswati Ash.
0コメント