Description of the programs and scripts written for the screening of library with AUTODOCK (4, or Vina) or eHiTS

Mihaly Mezei

Department of Structural and Chemical Biology,

Mount Sinai School of Medicine

New York, NY 10029

Mihaly.Mezei@mssm.edu

Oct. 29, 2011.

The screening of a library of ligands by Autodock (Version 4) or Autodock-Vina or eHiTS requires

The programs and scripts in this package help in all three steps. It is assumed that the ligands are available in either and the executables of Autodock or eHiTS suite of programs are available, and (for Autodock 4) Python 2.4 (or higher) is installed. Semiempirical partial charge calculations also require the availabity of the program Gaussian.

Furthermore, some of the programs in this package have to be compiled with a Fortran compiler (e.g., g77). To obtain an executable form the source file program.f type (replace g77 with your compiler's name, if different)

>g77 -O4 -o program program.f

I. Ligand preparation The programs and scripts used for preparation make sure that there is a single directory containing files with a single molecule each, with likely valid partial charges.

I.1. Prepare individual .mol2 or .pdb* files from a single file The program splitmol (written by D.A. Gschwend and adapted to xlf by M. Mezei) reads in a file containing several .mol2, .sdf or .pdb* structures and creates several files with a limited number (e.g., one) structures each. A sample run looks like this:

% splitmol
 Running splitmol DAG/MM v.2.16...        24-Oct-0

 This program supports command-line arguments.
 Type splitmol -h for a brief description.

 Use PDB-type or mol2 file (<P>/M/S)? m
 Enter name of the mol2 file: ../split_test.mol2
 Enter name to add to the molecule number: xxx
 Enter starting and ending molecule numbers to extract.
 (Enter 0 0 to do all, enter a negative number to use a list file.) 0 0
 Split this file into groups of how many molecules? 1
 Label by absolute or relative molecule # (<A>/R)? a
 Output file names will be of the form <number>.xxx.mol2
 All molecules in input will be extracted.
 Each molecule will be put into an individual file.
 Absolute molecule numbers will be used.

 Working...

 Execution completed.
 2 molecules written.
 2 file(s) created.
%

I.2. Filter a .mol2 ligand directory

The C-shell script filtermol2.csh filters files by the following criteria:

Running filtermol2.csh requires the executables prepmol2 and filtermol2.

A sample run looks like this:

% filtermol2.csh
Filtering a directory of .mol2 files - Version 02/25/2011
This script requires the following file to be present in this directory:
     The executable program prepmol2
     The executable program filtermol2

Name of the input directory containing the ligand .mol2 files=new_in
Name of the directory containing the filtered ligand .mol2 files=new_out
 Checking for filtermol2
 Checking for prepmol2
Minimum molecular weight to keep=100
Maximum molecular weight to keep=400
Minimum ligand H-bond donor-acceptor length to keep=1.5
Maximum ligand H-bond donor-acceptor length to keep=4.5
Current directory: /scratch/1/Users/mezeim01/docking/filter_test

Number of files found in new_in : 3
Trying file 00001.chbr.mol2
Finished file 00001.chbr.mol2
Trying file 00002.chbr.mol2
Finished file 00002.chbr.mol2
Trying file 00003.chbr.mol2
Finished file 00003.chbr.mol2
Number of files kept: 2
%
A file filtermol2.log will contain detailed information about the filtering of each ligand.

I.3. Get a list of Autodock 4 atom types used in a ligand library

The C-shell script get_typlist.csh reads all the ligand structures in a directory, extracts a list of Autodock 4 atom types and creates a file all.pdbqt with atoms having all the atomtypes found - this file will be needed for the screening run.

Running get_typlist.csh requires the executable get_typlist and the template file all0.pdbqt. It also uses Python and thus has to set the path to it. Current implementation runs on Faradis only,

A sample run looks like this:

 % get_typlist.csh 
Gather atom types used in this library - written by Mihaly Mezei
Version 10/22/2007 - Autodock 4
Running Thu Oct 25 11:23:52 EDT 2007

Running on faradis.mssm.edu
Host specific information (to be customized):
    MGLROOT = /Library/MGLTools/1.4.5
    pythonutil = /Library/MGLTools/1.4.5/MGLToolsPckgs/AutoDockTools/Utilities24
End host specific information

Name of the ligand  database file contaning the .mol2 files=../mol2_files_t
/private/var/automount/Users/mezei/autodock4/mol2_files_t
Number of files found in ../mol2_files_t : 3
 get_typlist: file 000007.pdbqt opened OK
 get_typlist: file all0.pdbqt opened
 Read  26  lines from 000007.pdbqt
 Read  24 lines from all0.pdbqt
 Types found so far= C  OA
 Types found so far= C  OA A  Cl NA HD
 get_typlist: file all0.pdbqt opened as new
 mol2 file No: 1
 get_typlist: file 000008.pdbqt opened OK
 get_typlist: file all.pdbqt opened
 Read  34  lines from 000008.pdbqt
 Read  24 lines from all.pdbqt
 Types found so far= C  OA A  Cl NA HD
 Types found so far= C  OA A  Cl NA HD N 
 get_typlist: file all.pdbqt opened as new
 mol2 file No: 2
 get_typlist: file 000009.pdbqt opened OK
 get_typlist: file all.pdbqt opened
 Read  21  lines from 000009.pdbqt
 Read  24 lines from all.pdbqt
 Types found so far= C  OA A  Cl NA HD N 
 mol2 file No: 3
 get_typlist: file 000011.pdbqt opened OK
 get_typlist: file all.pdbqt opened
 Read  23  lines from 000011.pdbqt
 Read  24 lines from all.pdbqt

Final all.pdbqt file (copied to /private/var/automount/Users/mezei/autodock4/alkynes):
REMARK  3 active torsions:
REMARK  status: ('A' for Active; 'I' for Inactive)
ROOT
ATOM      1  C1  <0> d           1.327   3.455  -0.398  0.00  0.00     0.036 C
ATOM      2  O2  <0> d           1.420   1.980  -0.001  0.00  0.00    -0.032 OA
ATOM      3  C1  <0> d           2.376   1.251  -0.914  0.00  0.00     0.031 A
ATOM      4 C12  <0> d           3.741   1.077  -0.220  0.00  0.00     0.035 Cl
ATOM      5  N5  <0> d           4.048   2.342   0.567  0.00  0.00     0.013 NA
ATOM      6  H19 <0> d           3.145   2.387   1.810  0.00  0.00     0.008 HD
ATOM      7  N9  <0> d           1.805   1.812   1.483  0.00  0.00     0.013 N
ATOM      8  C8  <0> d           0.592   2.429   2.198  0.00  0.00     0.027 C
ATOM      9  C9  <0> d          -0.610   1.949   1.340  0.00  0.00     0.006 C
ATOM     10  C10 <0> d          -0.017   1.420   0.010  0.00  0.00     0.240 C
ATOM     11  C13 <0> d           3.178   3.801   2.370  0.00  0.00     0.033 C
ATOM     12  C14 <0> d           4.601   4.036   2.906  0.00  0.00     0.070 C
ATOM     13  C15 <0> d           5.624   3.720   1.831  0.00  0.00    -0.059 C
ATOM     14  C16 <0> d           6.588   4.614   1.625  0.00  0.00    -0.092 C
ATOM     15  C17 <0> d           7.646   4.378   0.641  0.00  0.00     0.383 C
ATOM     16  O2  <0> d           8.375   5.279   0.269  0.00  0.00    -0.452 OA
ATOM     17  C18 <0> d           7.797   2.971   0.104  0.00  0.00     0.029 C
ATOM     18  C19 <0> d           6.408   2.416  -0.189  0.00  0.00     0.030 C
ATOM     19  C20 <0> d           5.503   2.449   1.031  0.00  0.00     0.172 C
ENDROOT
TORSDOF 0
Processed 23 .mol2 files
% 

I.4. Replace the .mol2 partial charges with AM1-calculated charges

The C-shell script gausscharge.csh runs Gaussian for all ligands in a directory, and replaces the partial charges with the result of the Mulliken population analysis of their AM1 wavefunction. Optionally, the conformation can be minimized with Gaussian. Running gausscharge.csh requires the C-shell script mol2togauss.csh and the executables prepmol2, gausstomol2, mol2togauss. mol2togauss prepares the input file for Gaussian and gausstomol2 extracts the charges from the Gaussian output file and replaces the values in the .mol2 file.

The programs mol2togauss and gausstomol2 can be run independently. The first command-line argument is the full name of the .mol2 file. mol2togauss takes the second argumnt as an indicator for optimization: it will set up input for an optimization run if the first character is 'y' or 'Y'. For example,
% mol2togauss test.mol2 yes
will create an input file test.mol2.g99 from the structure in test.mol2 asking for optimization.
% gausstomol2 test.mol2
will replace the charges and coordinates of test.mol2 by the charges and coordinates calculated by Gaussian (read from the Gaussian output file test.mol2.g99out) and write a new mol2 file called test.mol2.am1.

A sample run looks like this:

% gausscharge.csh
Name of the directory containing the ligand .mol2 files=hits_mol2
Name of the directory to put the converted .mol2 files=hits_mol2_am1
First molecule to use [1]=
Last molecule to use [1000000]=
Name of the Gaussian executable [gaussian]=
Is your Gaussian version 03 or higher? (y/n)? n
Do you want to remove all Gaussian files (y/n)? n
Number of CPUs to use [10]=4
Do you want to optimize the structure as well (y/n)? n
Number of CPUs to be used: 4
Current directory: /hosts/fulcrum/home/mezei/MOLMOD/autodock/morph
 
Converting ligands in directory hits_mol2 to directory hits_mol2_am1
Number of files found in hits_mol2 : 2
Trying file 5236332.1.mol2
Calling prepmol2 for ligand No 1 5236332.1.mol2
 Prepmol2: file 5236332.1.mol2 opened OK
 Prepmol2: file 5236332.1.mol2.new opened OK
 Number of lines=  87 number of atoms=  37 number of bonds= 39
 Number of substitutions= 0
 Number of atoms dropped=  0 number of bonds dropped= 0
 Charge sum= 0.00000
ls: No match.
Starting conversion for 5236332.1.mol2
 mol2togauss: file 5236332.1.mol2 opened OK
 mol2togauss: file 5236332.1.mol2.g99 opened OK
 MOLECULE 5236332.1.mol2
 Number of atoms=  37 Charge sum= 0.00000
User = mezei
Host = pepi
WARNING:
Reserve directory  /hosts/pepi/reserve/mezei is not found
/hosts/pepi/scr/mezei  will be used instead
more WARNING:
You do not even have a scr directory here, the current directory:
/hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph  will be used instead

Input file: 5236332.1.mol2.g99
GAUSS_EXEDIR = /hosts/fulcrum/homes/molmod/src/g99
GAUSS_ARCDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph
GAUSS_SCRDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph
job 5236332.1.mol2 done
 File 5236332.1.mol2 opened OK
 File 5236332.1.mol2.am1 opened OK
 File 5236332.1.mol2.g99out opened OK
 Sum of rounded charges= -2.6822090E-07
 Adjusting the charge of atom            1 with -2.6822090E-07
 Finished writing to file 5236332.1.mol2.am1  37 atoms
Trying file 5464817.1.mol2
Calling prepmol2 for ligand No 2 5464817.1.mol2
 Prepmol2: file 5464817.1.mol2 opened OK
 Prepmol2: file 5464817.1.mol2.new opened OK
 Number of lines= 103 number of atoms=  45 number of bonds= 47
 Number of substitutions= 0
 Number of atoms dropped=  0 number of bonds dropped= 0
 Charge sum= 0.00000
ls: No match.
Starting conversion for 5464817.1.mol2
 mol2togauss: file 5464817.1.mol2 opened OK
 mol2togauss: file 5464817.1.mol2.g99 opened OK
 MOLECULE 5464817.1.mol2
 Number of atoms=  45 Charge sum= 0.00000
User = mezei
Host = pepi
WARNING:
Reserve directory  /hosts/pepi/reserve/mezei is not found
/hosts/pepi/scr/mezei  will be used instead
more WARNING:
You do not even have a scr directory here, the current directory:
/hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph  will be used instead

Input file: 5464817.1.mol2.g99
GAUSS_EXEDIR = /hosts/fulcrum/homes/molmod/src/g99
GAUSS_ARCDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph
GAUSS_SCRDIR = /hosts/fulcrum/homes/mezei/MOLMOD/autodock/morph
job 5464817.1.mol2 done
 File 5464817.1.mol2 opened OK
 File 5464817.1.mol2.am1 opened OK
 File 5464817.1.mol2.g99out opened OK
 Sum of rounded charges= -4.0026009E-04
 Adjusting the charge of atom            1 with -4.0026009E-04
 Finished writing to file 5464817.1.mol2.am1  45 atoms
% 

I.5. Creating an eHiTS clip file from Autodock grid information

The program make_clipfile.f can read the description of an Autodock grid (easily obtainable with AutoDockTools) and create a clip file that eHiTS can read. make_clipfile can be run independently or can be called from fullscreen.csh, the script setting up a virtual screening.

An interactive run of make_clipfile looks like this:

 >make_clipfile.exe
 Name of the macromolecule (without the .pdb)=xxx
 X coordinate of the box center [ 0.00000 ]=11
 Y coordinate of the box center [ 0.00000 ]=22
 Z coordinate of the box center [ 0.00000 ]=33
 Number of gridpoints in the X direction [   0]=44
 Number of gridpoints in the Y direction [  44]=
 Number of gridpoints in the Z direction [  44]=
 Grid size [ 0.37500 ]=0.5
 Created clip file xxx_clip.pdb
>

Note that by default eHiTS extends the box thus defined by 10 Å in each direction to define the target atoms that are included in the docking calculations.

II. Runnig the screening

II.1. Submitting the job(s)

The screening script fullscreen.csh requires individual .mol2 or .pdbqt or .sdf files in a single directory. It also requires a sample file all.pdbqt for Autodock 4 containing all the atom types that are used in the library. These files are provided in the distribution - see also the script get_typlist.csh.

There are placeholders in the script to specify the path to the

If these are not changed in the script to the particular system the screening is to be run, the user will be prompted for them.

The script fullscreen.csh asks the user to specify

fullscreen.csh run results in the creaton of the directories needed for docking:

as well as a log file macro_<sw>.log. Here sw is a 1-character symbol specifying the software used for docking - see the documentation of the program Dockres; This is achieved by calling the script screenlist_loop_4.csh for Autodock 4 and Autodock Vina, and ehits.sh (part of the eHiTS dirstribution) for eHiTS. Launcher.sge on TACC/Ranger or TACC/Lonestar using the Launcher utility

screenlist_loop_4.csh runs on system with the SGE, PBS or LSF queuing system, on any shared-memory Unix/Linux system or on any Unix/Linux system using a single CPU. For other queing systems the script has to be extended.

The Autodock 4 runs use the directory macro_grid and the results are collected in the directory macro_dock - this way the working directory will never have a large number of files.

For Autodock 4, these C-shell scripts prepare the input for Autogrid, run it and for each ligand prepare the corresponding input file for Autodock and run it. They use the executable prepmol2 and a system-dependent script submit_*.csh that calls Autodock.

Autodock Vina does not need separate Autogrid run, just the preparation of the ligand input file(s) in ,pdbqt format. There are submit scripts for three queing systems:

Once the grid maps are ready (if needed), it submits the allowed number of docking jobs to run on different CPUs. It creates a directory runcount where each job creates a file when starting the run and deleting it when the run ends. The number of files in runcount can is also used to control the submission of subsequent jobs for runs on a shared memory system while runs on the PBS queing system or the Sun grid engine use the result of a qstat statement. Note, that the number of CPU's to be used can be modified during the run - see the instruction printed in the sample run below.

There is also an option to run Autogrid and the various scripts preparing the input for the docking runs (.pdbqt and .dpf files for each ligand) but not running Autodock. Using this option allows the perparation of a system that can be run on a different system (e.g. on a supercomputer) that has only Autodock installed, but not the script libraries.

This option is complemented by the option of skipping all preliminaries and assuming that all grid map (created by Autogrid) and docking input files are already present and just perform the docking runs.

A sample run looks like this:

 % fullscreen.csh
                Automated screening using Autodock
             Written by Mihaly Mezei - version 06/02/2010

Select docking software:
Autodock-4   : 4
Autodock-Vina: 4
eHiTS:         e 4
Docking with Autodock-4 selected
Will the grid have more than 128 points in any direction (y/n)? n
Host is unrecognized. Select your queing system from the list below
SUN grid engine:       g
PBS queuing system     p
TACC Launcher:         l
Condor pool:           c
Single CPU (no que):   1
Multiple CPU (no que): s
None of the above:     n
Your choice (g/p/l/c/1/s/n): g
Path to the Python executable (pythonsh)=/share/apps/MGLTools-1.5.4/bin/pythonsh
Path to the Python utilities (pythonutil)=/share/apps/MGLTools-1.5.4/MGLToolsPckgs/AutoDockTools/Utilities24
Path to the directory where the Autodock executables are=/share/apps/autodock4mezei/autodocksuite-4.0.1/bin/i86Linux2
Maximum number of CPU's to use allowed=200
Name of the que to submit the jobs=orte
NOTE: you can replace the placeholder NEWHOST in this script with your host
and make the assignments within the script (instead of entering it interactively)
Name of the macromolecule file (without the .pdbqs or .pdbqt or .pdb)=target
target.pdbqt found
Do you have flexible residues (y/n) [n]? n
This script requires the following files to be present in this directory:

     The executable program prepmol2
     The executable program checkresnum
     The sample structure file with all atom types all.pdbqt
     The awk script mod_gpf.awk
     The csh script add_dpf.csh
     The c-shell script screenlist_loop_4.csh
     The c-shell script dockit_gridengine.csh

Checking checkresnum
Checking prepmol2
Checking mod_gpf.awk
Checking mod_dpf.awk
Checking screenlist_loop_4.csh
Checking dockit_gridengine.csh
 Checking residue numbers in file target.pdbqt
 Residue number check - Version 09/23/08
Number of CPUs to use [20]=100
Do you want to adjust the number of CPUs automatically (y/n)? y
Minimum number of CPUs to use [30]=
NOTE: The automatic adjustment can be turned off via a file target_A.NEWNCPU
First molecule to use [1]=
Last molecule to use [99999999]=
Name of the directory containing the ligand files=../CHEMBR_am1_opt_mol2
Number of dockings per ligand (50 or more recommended) [50]=100
RMSD tolerance for clustering (in A) [1.0]=
rmstol set to 1.0
Maximum number of GA energy evaluations (0 torsions) [250000]=
Maximum number of GA energy evaluations (1-2 torsions) [500000]=
Maximum number of GA energy evaluations (3-5 torsions) [1000000]=
Maximum number of GA energy evaluations (6-10 torsions) [2000000]=
Maximum number of GA energy evaluations (11-  torsions) [3000000]=
Maximum number of GA generations [27000]=
Maximum number of GA populations [200]=
Ligand charge option
     Add Gasteiger charges : g
     Add Kollman   charges : k
     Keep input    charges : i [g]

X[Grid center] (in A) [0]=10
Y[Grid center] (in A) [0]=3
Z[Grid center] (in A) [0]=1
Grid spacing (in A) [0.375]=
gridspace set to 0.375
Number of gridpoints in the X direction (even number)=100
Number of gridpoints in the Y direction (even number) [100]=
Number of gridpoints in the Z direction (even number) [100]=
Do you want to ignore symmetry for ligand pose clustering (y/n) [n]?
Do you want to print all members of the clusters (y/n) [n]?
Do you want to drop files with names of the form H.M.T
       where only the M part differs (y/n) [n]?
Do you want to partition the number of dockings for files  of the form H.M.T
       where only the M part differs (y/n) [y]?
Minimum number of docking attempts per copy [20]=
Are the ligand files already in .pdbqt form (instead of .mol2) (y/n) [n]
Do you want to only run the setup but skip docking for now (y/n) [n]?
Do you want to remove all ligand .mol2, .pdbqt, .dpf  files (y/n) [n]? y

All ligand .mol2, .pdbqt, .dpf files will be removed
Docking ligands in directory ../CHEMBR_am1_opt_mol2 to macromolecule target.pdbqt
RMSD tolerance for clustering: 1.0 A
Files with names of the form H.M.T where only the M part differs will
share the number of docking attempts
Minimum number of docking attempts per copy= 20
Number of dockings/ligand: 100
Maximum number of GA energy evaluations (0 torsions)=250000
Maximum number of GA energy evaluations (1-2 torsions)=500000
Maximum number of GA energy evaluations (3-5 torsions)=1000000
Maximum number of GA energy evaluations (6-10 torsions)=2000000
Maximum number of GA energy evaluations (11- torsions)=3000000
Maximum number of GA generations: 27000
Maximum number of GA populations: 200
Grid center at < 10 , 3 , 1 >
Number of gridpoints in the x, y, and z direction: 100, 100, and 100
Grid spacing = 0.375 A
Gasteiger charges will be added to the ligand
Grids will be generated for atom types (based on all.pdbqt):
1 C
2 OA
3 A
4 N
5 HD
6 Cl
7 NA
8 SA
9 P
10 F
11 Br
12 S
13 I

Number of CPUs to be used: 100
The number of CPUs to use will be adjusted automatically every hour
Calculations will be logged on file target_4.log
The rm command (to delete a file) is aliased to
For proper functioning of this script, the alias will be removed
thus files will be removed without confirmation.
OK to submit the run (y/n)? y
[1] 23784
 %
In this example, the script did not recognize our host and prompted for the system dependent indormation. There is a template in the script to enter this data specific to a particular host - thiw way the user will not be prompted for that information.

The screening with eHiTS is also initiated with the fullscreen.csh script. A sample run looks like this:

 % fullscreen.csh
                Automated screening using Autodock
             Written by Mihaly Mezei - version 06/02/2010

Select docking software:
Autodock-4: 4
eHiTS:      e e
Docking with eHiTS selected
Host is unrecognized. Select your queing system from the list below
SUN grid engine:       g
PBS queuing system     p
TACC Launcher:         l
Condor pool:           c
Single CPU (no que):   1
Multiple CPU (no que): s
None of the above:     n
Your choice (g/p/l/c/1/s/n): g
Path to the directory where the script ehits.sh is=../eHiTS
Maximum number of CPU's to use allowed=200
Name of the que to submit the jobs=orte
NOTE: you can replace the placeholder NEWHOST in this script with your host
and make the assignments within the script (instead of entering it
interactively)
Name of the macromolecule file (without the .pdb)=target
target.pdb found
New directory target_ehits has been created

Checking checkresnum
 Checking residue numbers in file target.pdb
 Residue number check - Version 02/03/10
Number of CPUs to use [20]=100
Docking accuracy (1-6)[6]=
Do you want to create a clipfile from ATD box info (y/n) [y]?
X[Grid center] (in A) [0]=3
Y[Grid center] (in A) [0]=-1
Z[Grid center] (in A) [0]=
Grid size (in A) [0.375]=
Number of gridpoints in the X direction=88
Number of gridpoints in the Y direction [88]=90
Number of gridpoints in the Z direction [88]=70
Clip file target_clip.pdb has been successfully created
Name of the file (with FULL PATH) containing the ligand files=/scratch/1/Users/mezeim01/autodock4/FDA_2k_agg.mol2
Ligand files in file /scratch/1/Users/mezeim01/autodock4/FDA_2k_agg.mol2 will be used
Ligands are expected in mol2 format
Best pose for each ligand (that was docked successfully) will be in the file
/scratch/1/Users/mezeim01/autodock4/test/target/results.sdf
For analysis with Dockres, run splitmol to create individual .sdf files
Working directory for docking: /scratch/1/Users/mezeim01/autodock4/test/target_work
Clip file: target_clip.pdb
OK to submit the run (y/n)? y
[1] 23784
 %

II.2.Tracking the jobs

The scripts screenlist_loop_*.csh keep messages from the jobs running in the directory runcount. If a docking fails to complete, the file (woth extension run will be copied into the directory where the docking log files are kept. If the script screenlist_loop_*.csh aborts then the file corresponding to this job (that would be deleted when the job exits normally) may not be deleted. The script cleanruncount.csh checks for such files and offers to delete them.

II.3.Controlling the number of CPU's during the run

For Autodock runs using the Sun grid engine, the PBS queueing system or a shared-memory system the number of CPUs used by a job (i.e., the number of ligands docked simulatanously on different CPU's) can be controlled by creating a file in the directory running the job called macro_<sw>.NEWNCPU where <sw> is A or V for Autodock-4 or Autodock-Vina, resp. This file should have one line, containing

III. Post-processing

These functions include the sorting, filtering and extracting of the results, as well as some house cleaning.

III.1. Gather a list of docking log files

The script getdir.csh looks into the directory of docking log files and extracts the name of the grid parameter file and of all the logfiles. This information is written into a file macro_<sw>.dir and used by the program dockres. macro_<sw>.dir is a simple text file; its format is described in the documentation of the program Dockres;

A sample run looks like this:

% getdir_new.csh
Select docking software:
Autodock-4   : 4
Autodock-Vina: v
eHiTS:         e
GOLD:          g
Docking with Autodock-4 selected
Name of the macromolecule file (without the .pdbq*)=cbx_h
cbx_h.pdbqt found
Existing cbx_h_A.dir file is removed
Directory of .dlg files: cbx_h_dock - OK (y/n)? y
Number of files found in cbx_h_dock : 2114
Number of cbx_h .dlg files found in cbx_h_dock : 208
%
The run above read a directory that was prepared by the fullscreen.csh script. For directories that were obtained in a different manner the user has to specify the directory name and possibly the name and the location of the grid-parameter file (.gpf).

For runs with eHiTS there are two options: the single file results.sdf containing the top scoring ligand pose for each ligand can be split into single ligand pose files with the program splitmol or the script getdir.csh could be asked to gather a list of the .sdf files containing all the poses saved by eHiTS.

III.2. Sort, filter and extract docked poses

The program Dockres gathers the top binders and diplays a variety of statistics, both on the ligand set and on the top binding poses. It writes the result in a file with extension .res and, if requested, extracts docked poses into .pdb files. Since it can be run independently of these utilities, it has a separate documentation.

If Dockres extracted a PDB file with the target and a number of ligands, the program makepml.f can make a Pymol log script that defines each extracted ligand pose as a separate object, making it easy to show/hide them.

III.3. Combine results of different screening runs

The program compare_pose reads the .res files created by dockres from docking of the same ligand library to different conformations of the same macromolecule and for each ligand appearing on the top-scoring list gathered by dockres collects the binding free energies and the multiplicities of each hit.

Ligand ID's of the form nnnnnn_aa, nnnnnn_bb are assumed to be stereoisomers of the same molecule. If such ligands are found, the user is given the option of lumping stereoisomers together for calculating the averager free-energy estimates.

Running the program requires the specification of the names of the .res files. A sample run looks like this:

% compare_pose
 Name of the (next) .res file (w/o the .res)=pcafpm
 Name of the (next) .res file (w/o the .res)=pcafpl
 Name of the (next) .res file (w/o the .res)=

 Files read:
  1 pcafpm.res
  2 pcafpl.res
  FE   1  lig  FE   2  lig  FE  
 -10.6   10817 -12.4   10815
 -10.4    1865 -11.8    2029
 -10.1   11209 -11.5    2944
 -10.0    5784 -11.4   11995
 -10.0    2945 -11.3    7831
  -9.9    2945 -11.2    5782
  -9.9   10817 -11.2    1864
  -9.8    5784 -11.1    2944
  -9.7   11213 -11.1    2817
  -9.6    7833 -11.0    6629
  -9.6    1865 -10.8    1864
  -9.6   11997 -10.8    2944
  -9.6   11209 -10.8    7831
  -9.6   12316 -10.6    3793
  -9.6   10817 -10.6   10815
  -9.4   11213 -10.5    1864
  -9.4    9441 -10.4   11207
  -9.4    2411 -10.4    2029
  -9.3    2945 -10.4   11207
  -9.3    3795 -10.4   11995
 Ligand      5791272 #  10815 file#= 2 pose=  4 rank=  1
                              Contact res:   83(THR ) fe= -12.4 multiplicity=43
 Ligand      5791272 #  10815 file#= 2 pose=  6 rank= 15
                              Contact res:   42(GLU ) fe= -10.6 multiplicity=43
 Ligand      5791272 #  10815 = -11.5 # of hits=  2 Distribution:  0  2
                              # of poses=  86 Distribution=   0  86

 Ligand      5348595 #   5782 file#= 2 pose=  2 rank=  6
                              Contact res:   83(THR ) fe= -11.2 multiplicity=15
 Ligand      5348595 #   5782 = -11.2 # of hits=  1 Distribution:  0  1
                              # of poses= 101 Distribution=   0 101

 ..............
%
(output is truncated)

The ligands are sorted by the average free energy of all poses on all targets.

III.4. 'House-cleaning' of the docking log directory

When a directory contains a large number of files the ls command in the C shell is known to fail. To circumvent this, there are three scripts that clean, copy, compress and uncompress files in a directory, irrespective of the nuber of files there:

clean_dock_dir.csh and compressdir.csh accept the directory name as an argument, e.g.,
% compressdir x_dock
will process all files in the directory x_dock. If no argument is given, each script will prompt for the directory name to clean, compress or uncompress.