Summary of the results of docking a library to a target by Autodock-4, Autodock-Vina, eHiTS, GOLD or DOCK
Jan. 19, 2012.
The program Dockres scans the result of Autodock (Version 4) or Autodock-Vina or eHiTS or GOLD or DOCK docking runs with a series of ligands. It gathers the top binders and diplays a variety of statistics, both on the ligand set and on the top binding poses.
Input of the program
Besides the structure file for the target macromolecule (of the form macro.pdb*, or (for GOLD) macro.mol2) Dockres assumes the availability of the following files (the notation macro stands for the name of the macromolecule file's name without the .pdbqs .pdbqt, .pdb or .mol2 extension):
Format of the file macro_<sw>.dir:
mm.gpf 1 ligx.mm.dlg 2 ligy.mm.dlg 3 ligz.mm.dlg
In addition, Dockres needs
Dockres can be run both interactively from a terminal or in batch mode, specifying the run parameters as command-line options. When compiled with the parallel code included it has to be run in batch mode.
In interactive mode it starts with asking (possibly a subset of the) for the following information:
The result summary starts with printing on the terminal the list of the top-scoring poses, the number of poses in the top-score ranges, and a plot showing the distribution of the location of poses over the macromolecule's residues. The program then gives the user the option to
In batch mode the following information can be specified:
Ligand type number list:
H :H-C = 1 C :>C< =11 N :>N< =21 O :C-O-H=31 O :C-O-C=41 O-:C-On =51
H :H-N = 2 C :>C= =12 N :-N< =22 O :N-O-H=32 O :C-O-N=42 O-:P-On =52
H :H-O = 3 C :C=-C =13 N :-N= =23 O :O-O-H=33 O :C-O-O=43 O-:S-On =53
H :H-P = 4 C :C=-N =14 N :-N-=C=24 O :P-O-H=34 O :C-O-P=44 O-:*On* =54
H :H-S = 5 C :Carom=15 N :*N* =25 O :S-O-H=35 O :C-O-S=45
H :H* = 6 C :*C* =16 O :C=O =36 O :*OX* =46
O :P=O =37
O :S=O =38 P :P* =58
S :S* =59
**:* =60
A possible batch run call can be
> dockres -mm hemoglobin -sw eHiTS -np 20 -ol 2 -ib 99
for the rest of the input that can be specified in interactive mode, defaults are used. Batch run with flexible macromolecule has not yet been implemented.
Batch runs use default values for several filtering and output options. To use a non-default option for which no command-line input is implemented, an interactive run is required that can be started from the checkpoint file. It will not be CPU intensive since the time-consuming data gathering has been completed already.
Output of the program
Dockres will create the following files:
The file macro_<sw>.res will contain
A typical example of such output (for information in the number of rotatable bonds in a ligand) is
Distribution of number of rotatable bonds over 11 ligands
Average= 4.1818 S.D.= 1.6414
.00 .00 .18 .18 .18 .36 .00 .00 .09 .00
+----+----+----+----+----+----+----+----+----+----+
1.00 | |
.90 | |
.80 | |
.70 | |
.60 | |
.50 | |
.40 | |****| |
.30 | |****| |
.20 | |****|****|****|****| |
.10 | |****|****|****|****| |****| |
+----+----+----+----+----+----+----+----+----+----+
0 1 2 3 4 5 6 7 8 9
Here the X axis is the value of the property for which the distribution is
calculated;
the Y axis is the fraction of ligands having a particular value of the property;
the numbers on the top give the actual fractions.
In this example, the highest column is for 5. This means that most ligands
in this library have 5 rotatable bonds.
The hight of the column is at 0.4, meaning that between 30% and 40% of
the ligands have 5 rotatable bonds - the actual number is 36%, shown on top.
The number on top shows 0.36, meaning
that 36%
+---------+---------+---------+---------+---------+-
10| |
| |
| |
| * |
| * |
| * |
| * |
| * * |
|** * * |
1|** * * |
+---------+---------+---------+---------+---------+-
51 100
+---------+---------+---------+---------+---------+-
10| * |
| * |
| * |
| * |
| * |
| * |
| * |
| * |
| * |
1| * |
+---------+---------+---------+---------+---------+-
101 150
The height of the column is proportional to the number of ligands docked to
the residue represented by the X axis.
The residues to which a ligand is docked can be assigned by using the residue
where the closes contact is or using all residues that include atoms on the
contact list (
Largest count= 7
+---------+---------+---------+---------+---------+-
-4.40| 2 |
|1 2 |
|17 |
| 7 |
| |
|1 4 |
| |
| |
| |
-7.15| |
+---------+---------+---------+---------+---------+-
51 100
+---------+---------+---------+---------+---------+-
-4.40| |
| |
| |
| |
| |
| 4 |
| M |
| 1 |
| 1 |
-7.15| 1 |
+---------+---------+---------+---------+---------+-
101 150
Here again the X axis represent the residue number, the Y axis the docking
energy or free energy.
Whenever a number appears, it indicates that ligands were docked to
the corresponding residue, having the corresponding docking (free) energy.
The number (between 0 and 9) is proportional to the number of ligands
in that category. M (instead of a digit) shows the residue/energy combination
with the largest number of members (the value is given before the plots);
the digits 0 - 9 represent proportionally smaller number of members.
Compilation of the program
The program is written in Fortran 77. Its size is governed by the parameters (the number between the braces is the value set in the source code), established in the first line of the program
g77 -O4 -o dockres.exe dockres.f
The optional parallelization is using the MPI library. To compile Dockres with the parallel code included, first remove the 'C@DM' string from the source code:
> cat dockres.f | sed 'C@DM'd > dockres_mpi.f
> f77mpi -o dockres -O4 dockres_mpi.f
The name of the MPI-enabled compiler may be different in your system and additional libraries may also be needed to be invoked.
For parallelized runs, the parameter MAXMOL can be set to less than the total number of ligands - it should be just large enough to hold data for Nmolec/NCPU. In this case, however, the program stops after writing the checkpoint file and a separate single-CPU run, compiled with the parameter MAXMOL set large enough to hold all ligands should be used to print/save the results. This option is useful for distributed memory systems where the majority of nodes have relatively small memory.
Note that if Fortran-90 is used for one compilation, then it should be used for the other as well, otherwise the binary files will be incompatible.