Description of the program Dockres:

Summary of the results of docking a library to a target by Autodock-4, Autodock-Vina, eHiTS, GOLD or DOCK

Mihaly Mezei

Department of Structural and Chemical Biology
Mount Sinai School of Medicine
New York, NY 10029

Mihaly.Mezei@mssm.edu

Jan. 19, 2012.

The program Dockres scans the result of Autodock (Version 4) or Autodock-Vina or eHiTS or GOLD or DOCK docking runs with a series of ligands. It gathers the top binders and diplays a variety of statistics, both on the ligand set and on the top binding poses.

Input of the program

Besides the structure file for the target macromolecule (of the form macro.pdb*, or (for GOLD) macro.mol2) Dockres assumes the availability of the following files (the notation macro stands for the name of the macromolecule file's name without the .pdbqs .pdbqt, .pdb or .mol2 extension):

  • Except for DOCK, a file called macro_<sw>.dir listing the docking result files. where <sw> is a one letter code for the screening software used: It can be created with the script getdir.csh (or by the the user with a text editor) prior to running Dockres.

    Format of the file macro_<sw>.dir:

    For example, docking with Autodock-4 ligands ligx.mol2, ligy.mol2 and ligz.mol2 to macomolecule mm.pdbqt will result in files ligx.mm.dlg, ligy.mm.dlg, and ligz.mm.dlg. Thus the user has to prepare a file called mm.dir, with the following content
    mm.gpf
    1 ligx.mm.dlg
    2 ligy.mm.dlg
    3 ligz.mm.dlg
    
  • Note that for GOLD, Dockres assumes that each pose is in a sperate file in the result directory and they are of the form gold_soln_<structure>#l_#p.mol2 where #l is the ligand number and #p is the pose number.

    In addition, Dockres needs

    Dockres can be run both interactively from a terminal or in batch mode, specifying the run parameters as command-line options. When compiled with the parallel code included it has to be run in batch mode.

    In interactive mode it starts with asking (possibly a subset of the) for the following information:

    Once this information is given, the docking result files are read and the data is extracted from each. Besides the coordinates of the pose, the program extracts two 'scores': for Autodock, the energy and free energy estimates and for eHiTS the values labeled energy and score. This may take some time - for larger libraries the program periodically will print a report of the progress. Once the data is gathered, a checkpoint file is written and the result summary starts.

    The result summary starts with printing on the terminal the list of the top-scoring poses, the number of poses in the top-score ranges, and a plot showing the distribution of the location of poses over the macromolecule's residues. The program then gives the user the option to

    In batch mode the following information can be specified:

    A possible batch run call can be

    > dockres -mm hemoglobin -sw eHiTS -np 20 -ol 2 -ib 99

    for the rest of the input that can be specified in interactive mode, defaults are used. Batch run with flexible macromolecule has not yet been implemented.

    Batch runs use default values for several filtering and output options. To use a non-default option for which no command-line input is implemented, an interactive run is required that can be started from the checkpoint file. It will not be CPU intensive since the time-consuming data gathering has been completed already.

    Output of the program

    Dockres will create the following files:

  • A file called macro_<sw>.res where all result will be printed. If it is already present, it will write instead to macro_<sw>_N.res where N is the smallest integer such that no file with that number exists.
  • A file called macro_<sw>.ckp containig all the information gathered allowing the repeated extraction of data with different filtering criteria without having to perform the time-consuming scan of the .dlg files
  • If requested, PDB file(s) containing extracted ligand poses with the macromolecule

    The file macro_<sw>.res will contain

    Compilation of the program

    The program is written in Fortran 77. Its size is governed by the parameters (the number between the braces is the value set in the source code), established in the first line of the program

    It should be compiled at the highest optimization level for maximum speed. For example, using the g77 compiler the compilation can be executed by

    g77 -O4 -o dockres.exe dockres.f

    The optional parallelization is using the MPI library. To compile Dockres with the parallel code included, first remove the 'C@DM' string from the source code:

    > cat dockres.f | sed 'C@DM'd > dockres_mpi.f

    > f77mpi -o dockres -O4 dockres_mpi.f

    The name of the MPI-enabled compiler may be different in your system and additional libraries may also be needed to be invoked.

    For parallelized runs, the parameter MAXMOL can be set to less than the total number of ligands - it should be just large enough to hold data for Nmolec/NCPU. In this case, however, the program stops after writing the checkpoint file and a separate single-CPU run, compiled with the parameter MAXMOL set large enough to hold all ligands should be used to print/save the results. This option is useful for distributed memory systems where the majority of nodes have relatively small memory.

    Note that if Fortran-90 is used for one compilation, then it should be used for the other as well, otherwise the binary files will be incompatible.