A Users' Guide for the Molecular Modeling Core Facility

of the Human Genomic Institute.

Mount Sinai School of Medicine, NYU.

Mihaly Mezei, Ph.D.

Department of Physiology & Biophysics

Annenberg, Rm 21-78A

(212) 241-2186 (Ext. 42186)

E-mail address: mezei@inka.mssm.edu

You are invited to send comments, suggestions and corrections

Aug. 24, 2001.

SECTIONS

CONTENTS

  • 6.  USAGE POLICIES

    Note: The documentation for programs marked with --> DOCU can be found on line on the SGI server (fulcrum) in the directory /e5share/softlib/docu. A link is also provided. Note, however, that some of the documentations consist of more than one part and the link will only reach one of them.

    1. INTRODUCTION

    The Molecular Modeling Core, one of the Core Facilities at Mount Sinai, located at the Department of Physiology & Biophysics, offers molecular modeling capabilities at several levels. This document gives a concise guide to these services and describes the access to each. For actual usage, the user is referred to the corresponding manuals (or to people already experienced in using them).

    Access is provided to several important databases, e.g., the Protein Data Bank (PDB) and Genbank.

    Sequence analysis and comparison tools allow the display, editing, transforming and analysis of protein and nucleic acid sequence data. This includes homology searches, multiple sequence alignments, the enumeration of the product of digestion by given enzymes and secondary structure prediction based on empirical rules.

    Analysis of molecular structure can be performed by quantum-mechanical methods to obtain, among others, the electronic structure, molecular properties and a description of electrostatic interactions. They also provide theoretical estimates of optimum molecular geometries, energies of reactions, and are useful in evaluating spectroscopic data.

    Macromolecular associations, e.g., protein-protein or protein-DNA interactions, can be studied with the aid of high-resolution graphics display systems: optimal interactions can be determined, and conformational searches can be performed. Specialized computational and graphical tools are also available for the analysis of structural and energetic properties. These enable studies of structure-function relations, e.g., from consequences of mutations.

    Molecular simulation methods provide useful tools for description of molecular mechanisms, and are also used in conjunction with x-ray and NMR experiments for macromolecular structure determination.

    All of the calculations can be performed either by using a sophisticated graphical interface that allows the user to build the system from preprogrammed building blocks, to import data from databases, or by logging on to one of the mainframe computers and executing the requisite programs. The resulting structures can be viewed in 2D or 3D renderings.

    The Molecular Modeling Core facility is based on a variety of computers, some of them oriented toward graphics, others toward number crunching. Different uses of the facility require different levels of computer literacy. The present document collects information on the hardware and the software that is specific to the facility. On each individual software extensive documentationis available that the material contained herein does not intend to duplicate. Rather, the purpose of this document is to provide the knowledge necessary to operate in our environment: run, compile and write programs. To be able to do these, one has to be cognizant of the computers, software packages and the support available.

    A companion guide is also available providing the general survival skills necessary to operate in a Unix or VMS environment. The fundamentals of editing files, managing and navigating around files, directories and file systems, use the network to access other computers, moving files between computers and send/receive electronic mail are provided there.

    This guide is intended to be updated frequently as the hardware configuration changes or new software is added. Any comment, suggestion or correction would be greatly appreciated and will be incorporated expeditiously - contact Dr. Mezei by e-mail, telephone (X 42186) or in person (Anneberg 21-78A).

    Changes, additions, tips are announced periodically in short bulletins. Bulletins 1-44 are available in files

    /e5share/softlib/docu/bulletin/bulletinxx

    where xx is the year the bulletin was issued. Bulletins starting with #45 are in html and can be found in the directory /e5share/softlib/docu/bulletin with names bull_NN.html where NN is the bulletin number.

    This document is stored on the SGI server as /e5share/softlib/docu/guide.html and can be accessed from the URL http://inka.mssm.edu.

    2. COMPUTERS

    The computers, also called hosts when on a network, belonging to the facility or accessible to users of the facility are listed in the table below, with their physical location.







      Servers and special purpose computers  
    Hostname Hardware Location
    fulcrum.mssm.edu SGI 2xR10000 Irix 6.5 (21-33)
    concave.mssm.edu SGI 12xR10000 Irix 6.5 (21-33)
    gprotein.mssm.edu SGI O200 2xR10000 Irix 6.5 (21-33)
    msvax.mssm.edu DEC 3000 AXP Model 800S (13- )












      Graphics workstations  
    Hostname Hardware Location
    prion.mssm.edu SGI R10000 Irix 6.5 GR 21-87
    tata.mssm.edu SGI R4400 Irix 6.5 GR 21-60
    fermat.mssm.edu SGI R5000 Irix 6.5 GR 21-77B
    fermi.mssm.edu SGI R5000 Irix 6.5 GR 21-77B
    bonus.mssm.edu SGI R4400 Irix 6.5 GR 21-83
    flexi.mssm.edu SGI R5000 Irix 6.5 GR 21-64
    cadiz.physbio.mssm.edu SGI Octane Irix 6.5 GR 21-77B
    valdez.physbio.mssm.edu SGI Octane Irix 6.5 GR 21-77B
    cyclops.physbio.mssm.edu SGI Octane Irix 6.5 GR 21-77B



























      Parallel systems  
    Hostname Hardware Location
    york.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    lexington.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    park.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    madison.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    lenox.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    douglass.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    cpw.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    columbus.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    amsterdam.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    westend.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    wall.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    canal.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    prince.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    spring.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    bleeker.physbio.mssm.edu SGI O200 4xR10000 Irix 6.5 (21-33)
    eagle1.physbio.mssm.edu Compaq Tru64 UNIX V5.0A Irix 6.5 (21-33)
    eagle2.physbio.mssm.edu Compaq Tru64 UNIX V5.0A Irix 6.5 (21-33)
    eagle3.physbio.mssm.edu Compaq Tru64 UNIX V5.0A Irix 6.5 (21-33)
    eagle4.physbio.mssm.edu Compaq Tru64 UNIX V5.0A Irix 6.5 (21-33)
    eagle5.physbio.mssm.edu Compaq Tru64 UNIX V5.0A Irix 6.5 (21-33)
    batman.physbio.mssm.edu Linux (21-33)

    Fulcrum is the file server for all the SGI's in the Annenberg building and atlas is the file server for the structural biology group (in the East Building). An alias of fulcrum is inka. inka.mssm.edu is the address of the mail and web server of the Department of Physiology & Biophysics. Unless utherwise stated, logging to any of systems will place the user into the same home directory located on fulcrum or atlas (see Sec. 3.6 for details).

    The file servers also act as 'password server'. This means that your password is the same on all machines served by fulcrum and atlas, respectively. Note, that it is important to choose a nontrivial password (i.e., NOT your username). In particular, it should not be a dictionary word.

    Gprotein and concave are shared-memory multiprocessor machines, mostly for number crunching. Note that concave belongs to the Computational Core and gprotein is for the use of the Weinstein group. Cyslops, cadiz, valdez, prion, fermi, fermat are newer, tata and bonus are older graphics workstations running the interactive molecular graphics programs. They are in a publicly accessible area in a room accessible all the time. Msvax is at the department of Biomathematical Sciences, runnig the statistical packages.

    The computers york, lexington, park, madison, lenox, douglass, cpw, columbus, amsterdam and westend (collectively referred to as the 'farm') belong to a consortium of NIH investigators, most of them in the Department of Physiology & Biophysics. They are primarily intended for parallel computations. People belonging to the research group of the consortium members get accounts automatically, others on a discretionary basis. Contact Dr. Mezei for access. The computers wall, canal, prince, spring and bleeker belong to the Department of Physiology & Biophysics - they are integrated with the farm and also use york as the server.

    The computers eagle1 - eagle5 are using the Alpha architecture. Batman is a PC running Linux. Special care must be exercised for compatibility of binary data generated on the alpha's or on Linux systems with data generated on the SGI's - contact Dr. Mezei for details.

    Calculations requiring more power can be performed at one of the national supercomputer centers, like the Cornell Theory Center.

    3. BASIC OPERATIONS.

    3.1. Obtaining accounts.

    To use any of the computers in the list above it is necessary to obtain an account first. For the Silicon Graphics machines (except concave), contact the Physiology & Biophysics (P&BP) system manager, Mr. Benjamin Goldsteen at Ext. 41614, (E-mail address: ben@inka.mssm.edu). For an account on concave, and msvax, contact Mr. Kevin Kelliher, Ext. 40493, Rm. 13-36 (E-mail address: kelliher@msvax.mssm.edu). The national supercomputer centers require a (small) grant application that can be obtained from their web sites.

    3.2. Accessing the computers, logging in.

    Each of the computers listed above can be accessed through the Mount Sinai network. The workstations can be used also directly (in Rm. 21-77B) - it is a necessity for the graphics applications. The access to the network is described in the companion guide.

    Once logged in, application programs can be run. In some instances, it is enough to start the program and everything will be handled by the program itself, e.g., it will create the necessary files and possibly ask you to provide the name of datafiles. In most of the cases, however, one has to prepare some input files before a program can be successfully executed. To be able to do this, some familiarity with the operating system running on the host is required: creating directories, copying, renaming and moving files. A brief introduction to the Unix and VAX/VMS operating systems and the basics of editing files is described in the companion guide.

    There are a growing number of applications that use the so-called X-windows graphics interface. First, to run such applications, you have to be logged on to an X-terminal. If that terminal is on the machine you want to run the program you are ready to run. If the application resides on a different system, you have to tell that system that you actually logged on from an X terminal - see Sec. 2.2.4. of the companion guide.

    3.3. Creating files.

    To create a new file, one has to use an editor or transfer a file from elsewhere. Transferring files across the network can be done with ftp, as described in Sec. 2.3.2. of the companion guide. It is also possible to transfer files between microcomputers and mainframes or workstations, using Kermit (described in Sec. 4.3.1. of the companion guide). In the latter case, the file was prepared on the microcomputer (most likely with a wordprocessor) first. An introduction to editors is given in the companion guide. The table below gives the name of the editors available at our hosts.




    Hostname vi emacs edt edit
    SGI's yes yes   yes
    msvax   yes yes  

    3.4. Printing and viewing files.

    We have a script developed locally by Ben Goldsteen called qprint is able to print files that are in a wide variety of formats. qprint detects the file fomats and prints them on the output device requested. qprint recognizes text (i.e., ASCII) files (the simplest and the most transferable format), postscript files (.ps), Adobe acrobat (.pdf) files, and various graphics files: .rgb, .tiff, .jpeg and possibly .gif.

    The format of the command is

    qprint -to <printer name> [<format>] <filename>

    where <printer name> can be

    and <format> can be

    By default, images are printed at 300dpi. It is usually desirable to print images captured from the screen at 150 dpi. The option for that is:

    qprint -to claser -dpi 150 density.rgb

    To obtain a printout on the lineprinter with a header giving the date and the filename, type

    pr -f -l90 <filename> | <print command>

    where <print command> is the normal command sending the file <filename> to the lineprinter.

    We also have the public-domain program a2ps that can print a text formatted in a variety of ways (multi column, headers, line numbers, etc.). Type man a2ps to obtain a list of options. For example, a2ps myfile | qprint -to laser will give a nice 2-column printout, in landscape orientation, with copious labels specifying the origin of the file and page numbers.

    The view the content of a text file, either an editor can be used or the appropriate command of your operating system that lists the content of a file on the terminal (see the companion guide).

    Postscript (.ps) files can be viewed by the public-domain program ghostview (running on an X-terminal) or by SGI's DesktopTool xpsview. The command acrobat opens the Adobe Acrobat viewer on the SGI graphics workstations allowing to view .pdf files.

    The following options are available for viewing, editing and/or generating printed images a graphics file:

    To obtain a screen dump in RGB format that can be further manipulated (cropped, scaled, sharpened, etc.) by these utilities, follow the procedure described below.

    3.5. Space saving: compress, gzip and zip.

    On Unix systems there are two utilities that are able to shorten significantly the length of most files without losing anything. Issuing the command compress <filename> or gzip <filename> will change the file <filename> into a shorter file called <filename>.Z and <filename>.gZ, respectively. Wild-cards can also be used in <filename>. To reverse the process, use the commands uncompress <filename>, and gzip -d <filename>, respectively. gzip is from the Open Software Foundation and is steadily becoming the Unix standard as there are no royalties to be paid for its use.

    Files in network archives are often stored in compressed form. Note, that there are several other compacting programs in widespread usage, producing files with extensions such as .zip, .zoo, .lzh. Files with .zip extension can be created or uncompressed by the zip and unzip commands on Concave, Fermi, and Fermat. The others can be uncompressed on a PC - contact Dr. Mezei for a copy of those decompression programs.

    3.6. Permanent file systems.

    There are several file systems on the various machines that contain users' home directories, scratch space, software and databases. These are generally available on many other systems (via NFS). The naming convention for the file systems containing user file is /hosts/<nodename>/<filesystem type>/ where <filesystem type> can be either

    Application software is currently found in several different file systems: Eventually, all software will be migrated to /usr/global

    3.7. Running number-crunching applications.

    Calculations that require more than a few minutes should be run in the background. Furthermore, if the machine has batch queues set up users should send their jobs (a script file with the commands to be executed) there and the operating system will execute them in turn (a few at a time).

    To run a job in the background, append the execution command with & , e.g., calculate < input.data >& output.data &. The command above will execute the program calculate using the file input.data as the standard input, put both the results and system error messages on the file output.data and run in the background. This has the additional advantage that when you log off, the job will continue to run.

    To submit a script file <fn> to the queue named <que> issue the command qsub -q <que> <fn>. The script file's protection has to include x, otherwise the submission will fail - type chmod a+x <fn> to include it. To check on the status of your job(s) issue the command qstat [<que>] (the <que> is optional). To delete a jobs from a queue called <que> issue the command qdel -k <jobid>.

    3.8. Writing programs

    The writing of new programs (or modifying of existing ones) governed by the rules of the programming language chosen and is not described here. A corresponding compiler has to be invoked to produce an executable code. Note that while the source code is frequently transferable from machine to machine, the executable is NOT. Major programming languages are reasonably well standardized, but the compilers may have different names and options on different operating systems. Most compilers are able to optimize at various levels. For debugging purpose, there is the index-check option which, when set, causes the program to perform a run-time check on the array and string elements to see if they are within bounds. It adds to the execution time, but it is a very important debugging tool. Most of the compilers also have additional debugging facilities (e.g., dbx) that allow you to probe the status of your variables during the run (consult the manual!). The table below gives the minimum information necessary to compile a Fortran 77 program.






    Hostname Compile Optimize Index check Debugger
    SGI's f77 -O2 or -O3 -C -g
    Alpha's f77 -O4 or -O5 -C -g
    Linux f77 or g77 -O4 or -O5 -C -g
    Vaxen FOR (optimizes) /CHECK=BOUNDS  

    On the VAX, the FOR command has to be followed by a LINK command.

    3.8.1. Using the debugger

    Once an execution terminates with a core dump the dbx debugger can be used to examine the status of the program. On the SGI's the command dbx <executable name> starts the debugger, yielding a dbx> prompt. On the alpha's you have to type dbx <executable name> core Typing l either results in few lines of code printed, with the line the program stopped marked, or with the message Source not available. In the latter case, you have to keep typing the command up until it prints the line it aborted. If this place is inside a subroutine then further up commands will give you the listing of the place it was called from.

    Additionally, if the program was compiled with the -g compilation option then you can query your variables: the command p <variable name> will print the actual value of that variable at the time the program stopped. Note that the -g switch prevents optimization thus it should only be used in anticipation of the core dump.

    There is an important difference between the SGI and alpha systems in their behavior with respect to floating point exceptions. On the SGI's the default is to continue after a floating point exception occurred while on the alphas's the default behavior results in a core dump. On the alpha's any of the compilation options -fpe1, -fpe2, -fpe3, -fpe4 let the program continue (see the man pages for the difference between them). To abort on floating-point exception on the SGI's you have to end the compilation command with -trapuv -lfpe and set the environmental variable TRAP_FPE. The command setenv TRAP_FPE "ALL=ABORT,trace(1)" will abort the program even when an underflow occurs. More selective control can be exercised, however, as in setenv TRAP_FPE "UNDERFL=ZERO;INT_OVERFL=ZERO;OVERFL=ABORT;INVALID=ABORT;DIVZERO=ABORT, trace(1)"

    Note also, that the -C compilation option will abort the program (on the SGI's without producing a core dump) when an index boundary is exceeded. This is quite often the prelude to floating-point exceptions.

    3.9. Running parallel programs

    There are several ways jobs can be run in parallel. They depend mainly in the manner the data communication/sharing between processors is done.

    3.9.1. Running programs using PVM

    Jobs using PVM (like Charmm) can be executed in the following way:

    IMPORTANT: NEVER do a kill -9 pid on the pid numbers of the job. Always stop the jobs by halting the daemon with pvm.

    3.9.2. Running programs using MPI

    To execute a parallel job using the MPI communication library, simply type

    mpirun -np <number of CPU's> <run command>

    where the <run command> is what you would type for running a single CPU job (e.g., charmm < job.inp >& job.out &

    3.10. Tape and CD-ROM I/O

    There are several tape reading and writing devices available, as well as CD readers. There is a 1/2" cartridge I/O, an 8 mm tape I/O on concave, a 4 mm tape I/O on prion, fermi, bonus (unreliable - use it at your own risk) and tata. Msvax can also read TK-50 and TK-70 cartridges, as well as read and write CompacTape III - contact Mr. Kelliher.

    CD readers - contact the P&BP system manager for its use.

    To access the drives on concave, you need the computer room (21-33) key from the P&BP system manager. Make sure that you leave it locked once you are done.

    The tape devices are accessed by the tar command (see the companion guide).

    To ensure compatibility of tapes written on the different drives, make sure that

    It is possible to use a drive without being logged on to the machine that has the tape drive you want (although you need an account there) - contact the P&BP system manager for the scripts to do it.

    Please note that magnetic media has an average shelf life of 5 years. Depending on exposure to the elements, it may be longer or shorter then the average. You should verify your old tapes at least once a year.

    3.11. SGI architectures.

    The SGI machines span several different hardware architectures

    and (sometimes different) operating systems (see Sec. 2.): They are only downward compatible (at best), i.e., newer architectures can run programs compiled for the older architectures, but not the reverse.

    Trying to run a newer executable on an older machine results in a message about invalid architecture. Running an older executable on a newer machine, while possible, results in reduced performance.

    IMPORTANT: It is the users' responsibility to make sure that they run the most efficient executable available, especially for runs taking hours or days to complete.

    3.12. File conversions

    3.12.1. WordPerfect to html, TeX, etc. converter

    On the SGI's the command wp2x command can convert a WP 5.0 document to an html, tex, latex, troff, gml, script document by typing

    wp2x /e5share/local/lib/wp2x/ffff.cfg in.wp > out.ffff

    where in.wp is the name of the WordPerfect file, and ffff is the output format name (i.e., one of html, tex, latex, troff, gml, script) and out.ffff is the converted file.

    3.12.2. HPGL to Postscript converter

    To convert an HPGL (Hewlett-Packard) plot to a variety of graphics formats (including Postscript) logon to an SGI and type

    hp2xx -m <type> input.hp where <type> can be cad (CAD compatible), em (), epic (TeX macros), eps (encapsulated Postscript), hpgl (simplified HPGL), img (GEM's Image format), mf (Metafont source), pbm (Portable Bitmap), pcl (HP-PCL Level 3), pcx (Paintbrush format), rgrip (Uniplex RGRIP format), pre (Previewmode; no output), tex (line drawing with TeX/epic macros).

    3.12.3. Postscript to Encapsulated Postscript

    The command ps2eps <filename.ps> converts a regular Postscript file to encapsulated Postcript.

    3.12.4. Postscript to PDF

    The Program Adobe Distiller installed in the NT workstation converts a Postscript file to Adobe Acrobat (.pdf) format.

    4. DATABASES

    4.1. Brookhaven Protein Database (PDB).

    The Brookhaven database contains molecular structures of macromolecules, most of them determined by X-ray diffraction experiments. Structures obtained by NMR are also present. Besides the structure, information about the sequence, position of crystal waters and the method used can also be found there.

    A local copy of the PDB is now kept updated weekly on concave. The (compressed) structures are in the directory /global/pdb/data/structures/all/pdb /pdb. File names containing individual structures are of the form pdbdABd.ent where dabd is the four-letter PDB identifier of the structure. The latest version of the PDB is available at the Rutgers Univeristy via the WWW at URL http://www.rcsb.org. It provides for searching the database and viewing the structures, as well as downloading them.

    There is also a WWW site containing a graphical annotation of the PDB structures in a concise form (Maintained by Dr. Janet Thorton's lab) in London, UK) called PDBsum at http://www.biochem.ucl.ac.uk/bsm/pdbsum. There are also several databases that specialize to specific families of proteins. These can be conveniently accesses from the site http://www.blocks.fhcrc.org/~steveh/protein.html

    The sequences of the proteins in the PDB have been extracted in GCG (see Sec. 5.5.1.) format. It is stored on concave in /concave/pdb/seq.

    4.1. Nucleic Acid Database (NDB).

    The Nucleic Acid database is maintained at Rutgers University by the research group of Dr. Helen Berman. It is accessible at the URL ndbserver.rutgers.edu.

    4.3. Cambridge crystallographic database.

    The Cambridge database contains the crystal structures of smaller molecules. It is installed on Concave, together with software allowing access to it. To start the GUI interface type cq (make sure your DISPLAY is set!). To access other parts of the package firs type use csd followed by the command of your choice.

    4.4. Genbank.

    Genbank contains protein and nucleic acid sequence information. Local copies of it can be accessed using the GCG (Wisconsin) package on concave. The very latest version can be accessed via e-mail : send the word HELP to the address retrieve@ncbi.nlm.nih.gov to obtain instructions. BLAST searches can also be sent by e-mail : send the word HELP to the address retrieve@ncbi.nlm.nih.gov to obtain instructions

    4.5. EMBL.

    EMBL maintains an anonymous FTP site at ftp.embl-heidelberg.de and a WWW site at http://www.embl-heidelberg.de where users can obtain molecular biology software and can access the following databases:

    Files can be downloaded by ftp or requested by e-mail thorough their mail server. The mail server can be used from any site (instructions can be downloaded by ftp).

    4.6. Lipid conformation

    A snapshot from the lipid simulation of Venable and Pastor has been put in the directory /e5share/softlib/dbase/venable on the SGI server. Follow the instructions in the README.LIPID file found there (some files are compressed).

    4.7. Charmm parameter files

    Parameter files of older versions of Charmm are collected separately from the Charmm directories and deposited into the directory /e5share/softlib/prog/modeling/charmm/archive/data on the SGI server.

    5. PROGRAMS

    The Molecular Modeling Core has acquired an extensive software library that is outlined in this section. The information here describes only the main features and the access to the programs, the use of each program is described in the corresponding documentation that is available to the users. Besides the programs already acquired, information is kept on software available to us. This information includes a complete catalog of the programs available at the Quantum Chemistry Program Exchange (QCPE) and the program exchange maintained by the British Science Research Council at Daresbury CCP5). Contact Dr. Mezei for details.

    5.1. Mathematics oriented programs.

    5.1.1. Mathematica

    Mathematica is a sophisticated program to perform symbolic mathematical operations. Type math to start it in the command line mode or mathematica to run it with its X-windows interface.

    5.1.2. Matlab

    Matlab is an interactive software package for scientific and engineering numeric computation, running on the graphics SGI's. It integrates numerical analysis, matrix computation, signal processing. It also has excellent graphics capabilities to plot graphs in one and two dimensions. Note, that a graphics terminal is needed to view graphics output. To run, type matlab. If you are on a graphics terminal type terminal and follow the directions. Type demo for a demonstration.

    5.1.3. Numerical Recipes.

    The subroutines described in the book Numerical Recipes (together with drivers that show how to incorporate them into a complete program) are available both in C and Fortran77. They are stored on the SGI server, in the directories /e5share/softlib/prog/math/nrecipes/f /e5share/softlib/prog/math/nrecipes/c for the Fortran and C versions, respectively. For each program there is a subdirectory routines and an other called demos containing the subroutines and corresponding demos, respectively. The demos provide a driver and some test.

    5.1.4. IMSL.

    The International Mathematical Subroutine Library is installed on concave. To use it, you have to execute first the command

    use imsl

    (you can do it in your .cshrc file). and add $LINK_FNL to the f77 compilation command (this will link the IMSL library to your compiled program).

    5.1.5. SAS.

    The general-purpose statistical package SAS is available on msvax. Read the files README.1ST and SASINFO.TXT on disk$public: for its use.

    5.2. General purpose visualization programs.

    5.2.1. Gnuplot.

    Gnuplot is an interactive plotting package available on the SGI's. From a file of x,y coordinates it is able to generate plots on a Mac or on an X-terminal (e.g., on an SGI) as well as a postscript file. It also generates 2D and 3D plots. See Dr. Mezei for the documentation. To execute type gnuplot.

    5.2.2. TeX.

    TeX is essentially a metalanguage (like postscript) that produces high-quality print of mathematical 'objects'. If you have a file written in this language, you can obtain hardcopy of it as follows: The filename should have the extension .tex. There are several variants, like LaTeX, REVTeX, AMSTeX.

    On the SGI's, type use tex, followed by pdftex <name> to obtain a PDF file <name>.pdf or by tex <name> or latex <name> for TeX and LaTeX, respectively, to obtain the device independent file <name>.dvi. To obtain the postscript file, type dvips <name>

    We also have a converter from WordPerfect 5.0 to LaTeX (see Sec. 15.1.) as well as a DOS program that converts (in a rudimentary fashion) from WordPerfect 5.0 to LaTeX - see Dr. Mezei if interested in it.

    5.3. Electronic structure calculation.

    We have currently installed the programs Gaussian, Gamess, and Molcas for ab-initio calculations, and the semiempirical packages Mopac and Amsol. The SGI versions of these programs are in the directory /e5share/softlib/prog/electron.

    5.3.1. Gaussian

    The current version is Gaussian-98. The Gaussian-94, revision E2 is still available.

    On all SGI's (including the Farm) the command gaussian jobname will submit a Gaussian job using the file jobname.g9* as input, and jobname.g9*out as output. To run Gaussian-94 on concave or the farm, use gaussian94 jobname. This command assumes that there is a directory username (the same as your login id) on the reserve file system of the host running the job.

    The gaussian scripts sets a number of environmental variables as follows:




    Host GAUSS_EXEDIR GAUSS_SCRDIR
    concave /usr/local/software/g98 /hosts/<hostname>/reserve/<loginid>
    farm /usr/share/gaussian/g98 /hosts/<hostname>/reserve/<loginid>

    On concave the variable LD_LIBRARY_PATH has to be set to GAUSS_EXEDIR also.

    Runs requiring large scratch space must be submitted with -lf <space>mb added to the gaussian command where <space> is the number of megabytes of disk space guaranteed to be available during the run (see the df command to determine the amount of space available).

    Note, that using more than one CPU (i.e., runs on the farm) requires the line %Nproc=N in the .g98 input file where N is the number of processors requested.

    5.3.2. Molcas

    Molcas is a direct-CI program package from B. Ross's group in Lundt, Sweden. It is installed on the SGI server in the directory /e5share/softlib/prog/electron/molcas. The executables are in the subdirectory bin and examples can be found in the subdirectory examples.

    5.3.3. Gamess

    The ab-initio package GAMESS (a descendant of HONDO) is available on the SGI server. To run it, type gamess <JOBNAME>. If the 1995 version is needed add 95 to the command; the 1994 version can be accessed by adding 00. The input data should be in a file called <JOBNAME>.inp It is also assumed that there is a directory username (the same as your login id) on the reserve file system of the host running the job.

    There is a directory tests in the gamess distribution directory (/e5share/softlib/prog/electron/gamess/v99/gamess) containing example input files (and described in the documentation).

    There are also graphics utilities available in conjunction with Gamess. The command
    mepmap <jobname> generates a density contour plot either as a PostScript file or displayed in an X-window. The library in Concave /usr/local/software/gamess/graphics contains additional utilities.

    5.3.4. Mopac ---> LINK TO DOCUMENTATION

    Mopac93 is in /e5share/softlib/prog/electron/mopac/v93. Execute the shell mopac in that directory. See also MOTECC (Sec.5.6.).

    5.3.5. Amsol ---> LINK TO DOCUMENTATION

    Amsol is an extension of the semiempirical program AMPAC to calculate aqueous solvation free energy by including the solvent reaction field into the Hamiltonian. It is on the SGI server. To run it, execute /e5share/softlib/prog/electron/amsol/v94/amsol.exe. Several test input and output files are in the same directory with extensions .dat and.out , respectively.

    5.3.6. Molden ---> LINK TO DOCUMENTATION

    Molden is a package for displaying molecular density. It is tuned to the Ab Initio packages GAMESS* and GAUSSIAN. It can read all the information it needs from a GAMESS or GAUSSIAN outputfile. Type molden on any of the graphics workstation to start the program.

    5.4. Molecular modeling.

    Molecular modeling generally proceeds in three stages. The system under study has to be 'built' in the computer in the first stage. This may be as simple as reading in coordinates from one of the crystallographic databases or clicking on the residue names, or as complex as sketching the molecules with the use of the interactive graphic systems or building it from fragments already programmed in these programs. The program Simulaid has been written to facilitiate various aspects of this step. The second stage requires a force field, i.e., a simple mathematical model for the intermolecular interactions. Energy minimization finds the minimum energy structure nearest to the configuration generated in the first step. Simulated annealing is an efficient (but not the only) way to search for several local minima in the hope of locating the global minimum. An equilibrium ensemble of configurations at a given temperature can be obtained by simulation (generally molecular dynamics, but also Monte Carlo is an option). Simulation is in general rather time-consuming. The third stage is the analysis of the results from the previous two. It can be as simple as the querying of the generated structure to find out its geometrical parameters to running specialized programs for the determination of, say, DNA helix axis bend. Some of the 'standard' analysis programs are described below. Other possibilities include the animation of a molecular dynamics simulation history. Clearly, as the questions to be asked vary from system to system, additional programs will be needed for newly emerging questions.

    Full graphics molecular modeling programs that run interactively on the Silicon Graphics workstations are: InsightII/Discover, Quanta, Sybyl, Macromodel, Grasp, MSV, Rasmol, Molmol, Whatif, Moil-view, and Midas. These programs allow for creating and manipulating small molecules as well as biopolymers. They provide for energy calculation, minimization and molecular dynamics (including animation). Most of them also handle sequence information, including various secondary structure prediction schemes and homology searches. All of these programs also provide a variety of analysis options. Most of these programs can not be run remotely.

    Molecular modeling programs that can be run independently include Charmm, Amber, Xplor, BatchMin, Moil, NAMD, Boss and MMC. The program Delphi is also available to calculate the electrostatic solvation energy of a molecule. Running these programs requires the preparation of data files and a script file containing the command that executes the program and identifies the input and output files. Each of these programs is reasonably well documented and examples are provided in the manuals. Help can also be obtained from Dr. Mezei. Additional examples are also available from people already using these programs. These files can be prepared and submitted remotely i.e., from any of the terminals in the building, or dialing up from outside the modems attached to the terminal server or logging on to the requisite host via Telnet from another computer.

    We also have several programs that have been written for a particular analysis task: hydrogen-bond matrix and linear distance plot (LDP) calculation, Dials and Windows for biopolymer characterization based on the programs Curves and Pcurves. Most of these programs can not be run remotely.

    5.4.1. Full graphics programs

    5.4.1.1. InsightII/Discover

    To start InsightII/Discover, log on to an SGI graphics workstation and type insightII. InsightII also provides an interface to the Discover program package to perform minimization and molecular dynamics.

    The structures generated by Insight can be converted to Charmm, Amber or Moil input with the program Intocham (see Sec. 5.4.3.8.).

    5.4.1.2. Quanta

    To use Quanta, type quanta. Quanta provides an interface to Charmm.

    5.4.1.3. Sybyl

    To start Sybyl type sybyl on any of the SGI graphics workstations. Sybyl also has a command-mode that allows you to run it from non-graphics terminals.

    5.4.1.4. Macromodel

    Macromodel is a molecular modeling package from Columbia University. It allows the use of several force fields. Minimization, molecular dynamics and Monte Carlo - molecular dynamics combinations can be performed, either interactively or in batch. To run it interactively (on the SGI graphics terminals) type mmod or macromodel. The program is in the directory

    /e5share/softlib/prog/modeling/macromodel/v5.0 - the environmental variable MMOD_ROOT should point here. Macromodel can also be run from a Mac running MacX. The batch-version (called BatchMin) executable is called bmin. The concave-specific version of BatchMin is /usr/local/software/bmin/bmin and on the Farm it is called /usr/share/macromodel/bmin.

    The package contains a number of utility programs as well as the program xcluster that performs cluster analysis on distance matrices.

    5.4.1.5. Grasp ---> LINK TO DOCUMENTATION

    Grasp is a program for visualizing surface properties like curvature, electric field. It emphasizes electrostatics and includes a simplified version of Delphi. To run it, type grasp. Clicking on the right-hand mouse button pops up the main menu. A brief tutorial is also available.

    5.4.1.6. MSV ---> LINK TO DOCUMENTATION

    MSV is a surface visualization program using the molecular surface computation package MSMS written by Michael Sanner at Scripps. It is faster than Grasp.

    5.4.1.8. MSMS ---> LINK TO DOCUMENTATION

    MSMS computes, for a given set of spheres S and a probe radius rp , the Reduced Surface and the analytical model of the Solvent Excluded Surface (SES). The SES can then be triangulated with a given vertex density. The surfaces calculated by MSMS can be visualized by the program MSV.

    5.4.1.8. Midas

    Midas is an other molecular graphics program running on X-terminals to visualize proteins, nucleic acids and small molecules and produce publication quality figures. It is used by Dock. Type midas_x on one of the SGI workstations to run it. A printed documentation is available.

    5.4.1.9. Moil-view    --> LINK TO DOCUMENTATION

    Moil-view is written by Carlos Simmerling. It was designed to work primarily with Moil and Amber but Charmm and PDB input is also supported. Type Moil-view on any of the newer graphics workstation.

    5.4.1.10. Rasmol

    Rasmol is a public-domain molecular graphics program running on X-terminals to visualize proteins, nucleic acids and small molecules and produce publication quality figures. Type rasmol on an X-terminal logged on to gene to run it. Help is available at the command line.

    5.4.1.11. Molmol ---> DOCU

    Molmol is a full graphich program from the Wutrich Laboratory In Zurich. Detailed information can be found at the URL http://www.mol.biol.ethz.ch/wuthrich/software/molmol/. Besides tha documentiation, there is an on-line tutorial.

    5.4.1.12. Whatif

    Whatif is a collection of modeling and sequence analysis programs with a graphics interface. It has been installed on the SGI's. It starts by typing whatif. There is a printed manual and there are tutorials.

    5.4.1.13. Molscript ---> LINK TO DOCUMENTATION

    Molscript, written by Per Kraulis, creates schematic or detailed molecular graphics images from molecular coordinates, usually, mostly for protein structures. Its use requires a prior use global command. The command molauto pdb1crn.ent > molscr.inp followed by molscript -ps <. molscr.inp > molscr.ps produces a Postscript file molscr.ps generated with the default options.

    5.4.2. Batch oriented molecular modeling programs

    5.4.2.1. Charmm ---> LINK TO DOCUMENTATION    V24 text    V26 HTML    DOCU (V27 text)

    Charmm is the most widely used modeling program at the Core and several versions of it are available. The executables are in /e5share/softlib/prog/modeling/charmm/execs for bonus and tata and in /usr/local/software/charmm/execs for concave and prion and in /usr/share/charmm on the farm. While executables in /e5share do run on concave, they run significantly slower, so make sure you are using the right directory. The various executables are listed below.

    See Sec. 3.9.1 for instructions how to run parallel jobs under PVM.

    The toppar directory of the older versions have been placed into /e5share/softlib/prog/modeling/charmm/archive/data/toppar_22 and /e5share/softlib/prog/modeling/charmm/archive/data/toppar_23

    5.4.2.2. Amber

    Version 4.0 of the Amber package is on the SGI server in the directory /e5share/softlib/prog/modeling/amber/v4.0N . Version 4.1 and 5.0 (incorporating the newer analysis program Carnal) is on the SGI server in the directory /e5share/softlib/prog/modeling/amber/v4.1 and /e5share/softlib/prog/modeling/amber/v5.0, respectively, for runs on bonus and tata. For runs on concave or prion there is a single-processor version in /usr/local/software/amber and a multi-processor version in /usr/local/software/amberp. Amber 5.1 is also available on the farm in the directory /usr/share/amber.

    Versions starting with 4.1 contain an interactive interface to generate the Amber input files called Leap. To run the interface, first source the file leap/leapSetup.csh in the Amber 4.1 directory. On a vt100 terminal type tleap to start it. On an X-terminal type xleap (you have to execute the requisite setenv DISPLAY command as well).

    5.4.2.3. UHBD ---> LINK TO DOCUMENTATION    --> DOCU

    The command uhbd starts the University of Houston Brownian Dysnamics program. Among others, it performs Brownian dynamics simulations and solves the Poisson-Boltzmann equation.

    5.4.2.4. Xplor

    Xplor is the molecular modeling package developed by Axel Brunger's group at Yale. Type xplor at any of the SGI's to run it. Tutorials are available in the subdirectories of the directory /e5share/softlib/prog/modeling/xplor/tutorial.

    5.4.2.5. Moil ---> LINK TO DOCUMENTATION

    Moil, the molecular dynamics package developed by Ron Elber's group, is installed on the SGI server. Its unique feature is Elber's Locally Enhanced Sampling (LES) technique designed to overcome the multiple minima problem. The latest version is in the directory /e5share/softlib/prog/modeling/moil/v96/sgi/moil.export. The executable programs are in the subdirectory exe. The documentation and a tutorial are only available in an earlier version: /e5share/softlib/prog/modeling/moil/v94 in the subdirectory moil.doc, examples are in the subdirectory of moil.tests6. An SGI viewing program, moil-view is also available in the directory /e5share/softlib/prog/modeling/moil/v93 /moil-view.0005. The documentation is in doc.0005 and the executable is exe/moil-view/exe.

    5.4.2.6. NAMD ---> LINK TO DOCUMENTATION

    NAMD is a fast molecular dynamics program from the Theoretical Biophysics Group at the University of Illinois and Beckman Institut

    5.4.2.7. MMC ---> LINK TO DOCUMENTATION

    MMC is a Metropolis Monte Carlo program to model the solvation in the canonical or isobaric or grand-canonical ensemble or to model the transformation of one solute into another. Several novel free-energy methodologies are implemented. It can also perform the analysis of the simulation history by partitioning the solvents by their proximity to the various functional groups of the solute. The input for such analysis can also be a Charmm or Amber trajectory file. See Dr. Mezei for the latest version of the program.

    5.4.2.8. Delphi ---> LINK TO DOCUMENTATION

    Delphi solves the linearized Poisson-Boltzmann equation for irregular shaped charged systems immersed in a dielectric. Both the charges and the dielectric constants at the various locations of the system are specified by the user. The latest version of Delphi has been installed on the SGI server in /e5share/softlib/prog/modeling/delphi. The executable is in the subdirectory export/yas/qdiffyas2. Example parameter file is in export/pti.

    5.4.2.9. Maxwell ---> LINK TO DOCUMENTATION

    The program package Maxwell (QCPE #637) calculates electrostatic interactions of either a finite set or of a crystal of polarizable molecules. The molecular charge distribution are characterized by a multipolar expansion in the form of general directional derivatives, based on the formalism of Maxwell. The package can also obtain multipole moments of arbitrary order from a one-determinental Gaussian wave function. For crystals the permanent multipole contribution is calculated with Campbell's generalization of the Ewald method to lattices of multipoles of arbitrary order. Calculation of energy contributions in the form of inverse distance power terms is also provided for. See Dr. Mezei for references or the programs.

    5.4.2.10. Qpack ---> LINK TO DOCUMENTATION

    The program package Qpack from UCFS (obtained for us by Dr. Osman) has been installed on the SGI server in the directory /e5share/softlib/prog/modeling /qpack. It calculates both a 'contact list' and a residue-based energy for a protein in a given conformation to quantify the packing efficiency.

    5.4.2.11. Polyrate ---> LINK TO DOCUMENTATION

    Polyrate is a program for calculation of reaction rates based on information about the intermolecular potential surface using transition state theory. Its use requires that you provide the energy calculation routines. The programs can be found on the SGI server in /e5share/softlib/prog/modeling/polyrate.

    5.4.2.12. A. Rashin's solvation program

    A program calculating hydration enthalpy of a polar or ionic molecules, based on a continuum solvation model of Alex Rashin (and written by him), is on the SGI server, in the directory /e5share/softlib/prog/misc/csolvation. The directory contains the documentation and sample scripts, as well as the executables and data files.

    5.4.2.13. Jumna and Ligand

    Jumna can build, manipulate and energy minimize fragments of DNA or RNA based on the helicoidal parameter set defined by Lavery and coworkers. A variety of constraints can be imposed. Input data can be prepared by the program Curves (see 5.4.3.3.) or the companion program Nchem. Ligand can energy minimize a molecule or a molecular complex. Jumna, Ligand and Nchem are installed on gene as /e5share/softlib/prog/modeling /curves_plus/jum8/Jum8_s,

    /e5share/softlib/prog/modeling/curves_plus/lig4/Lig4_i,

    /e5share/softlib/prog/modeling/curves_plus/nchem/Nchem_i,

    respectively. The commands jumna, ligand and nchem run them.

    5.4.2.14. Flexsrch ---> LINK TO DOCUMENTATION

    The program package Flexsrch that performs automated docking of a ligand including the exploration of the conformational space (working in tandem with Amber 4) has been installed on the SGI server. It has been written by Andrew Leach and is described in J. Mol.Bio., Vol. 235 p 345 (1994). The executables and the data files are in the Amber 4.0 directory (see Sec. 5.4.2.3 above) in the subdirectory leach.dock.

    5.4.2.15. Congen

    Congen is written by R.E. Bruccoleri to perform an exhaustive conformational search. The program is a descendant of Charmm. It is in the directory /e5share/softlib/prog/misc/congen (with documentation and test), and the executable is called congen. The executable for the concave architecture is /usr/local/software/congen.

    5.4.2.16. Modeller ---> LINK TO DOCUMENTATION

    Modeller is a program for protein modeling with spatial restraints written by Andrej Sali. It is most frequently used for homology or comparative modeling. For details see their home page. Type modeller <command file> to run it. Test cases are in the directory /e5share/softlib/prog/modeling/modeller/modeller4/scripts on the SGI server. The command files have the extension <top>. A printed documentation is available.

    5.4.2.17. Plotcorr ---> LINK TO DOCUMENTATION

    Interactive representation of predicted residue contacts from correlated mutations and other protein structure prediction data.

    The program takes as only input a multiple sequence alignment in .hssp format This files are generated and used by the PHD sever at the EMBL. Clustal can also be used to convert from other formats. As optional input the program can read the secondary structure and accesibility from the file that the PHD server returns as an accesibility and secondary structure prediction query. To run it (on any SGI graphics workstation), type

    plotcorr <hssp_file> [<PHDfile>]

    5.4.2.18. Boss

    Boss is the Monte Carlo program developed by W.L. Jorgensen. It is on the SGI server in /e5share/softlib/prog/modeling/boss. The executable is called BOSS34 and a sample command file uboptcmd is also available there. The concave-specific executable is /usr/local/software/boss.

    5.4.3. Structure and simulation analysis programs

    5.4.3.1. ldp

    Calculates and plots a linear distance plot on the HP plotter. Execute interactively from any graphics terminal running IRIX 5.3 or lower

    /e5share/softlib/prog/misc/alanexe/ldp and answer the questions.

    5.4.3.2. hb

    Calculates and plots hydrogen-bond distance matrix and linear distance plot (written by Alan Factor). The programs are on the SGI server in the directory /e5share/softlib/prog/misc/alanexe. The program runs on phage: hb2p. It has to be executed interactively to obtain the plots.

    5.4.3.3. Curves, Pcurves

    Curves calculates the helicoidal parameters of a nucleic acid as defined by Lavery and coworkers. Pcurves contains the corresponding generalization to polypeptide chains. Both are accessible through the program Dials and Windows. Curves 4.1 can also be run independently of Dials and Windows on the SGI server. Type curves to run it.

    5.4.3.4. Dials and Windows

    Provides a compact display of the output from Curves and Pcurves. To be able to run it (on the graphics workstations or on the SGI's), put /e5share/softlib/prog/modeling/dials/exe into the PATH statement of your .cshrc file. There are sample inputfiles and scripts in the directories /e5share/softlib/prog/modeling/dials/iris/examples. You can copy any of these files to your directory, modify the .com file as required and source it, followed by the command dials to run. The graphics output will appear as postscript files and - if run on an SGI workstation - on the screen.

    5.4.3.5. Brookhaven PDB analysis programs (Procheck)    --> LINK TO DOCUMENTATION

    The Brookhaven PDB has developed a number of Fortran programs facilitating the analysis of the structures. These programs are described in the PDB Newsletter - see Dr. Mezei. They can be found in /e5share/softlib/prog/modeling/procheck/EXE. on the SGI server. The command procheck input.pdb resolution will run the whole suit and the command proplot input.pdb resolution runs the program pplot where input.pdb is the input PDB filename and resolution is the resolution of the structure (in A).

    5.4.3.6. Mepsi ---> LINK TO DOCUMENTATION

    Mepsi, written by Ann Richards, calculates a molecular similarity index based on the comparison of the respective electrostatic potential maps. It is installed on the SGI server in the directory /e5share/softlib/prog/modeling/mepsi containing sample files needed to run the program. To execute it type mepsi.

    5.4.3.7. Simloc ---> LINK TO DOCUMENTATION

    Simloc, written by Dr. Mezei, takes as input the coordinates (in PDB, Charmm or Insight format) of two molecules (or two conformations of a molecule) and detects local similarities, i.e., generates substructures with low RMS. The substructures found depend on a threshold parameter RMSmax, the smaller RMSmax is, the smaller in general will the substructure RMS's be. The substructure RMS values are not limited by RMSmax, though. To execute the program, type simloc on the SGI's and answer the quiz.

    5.4.3.8. DSSP ---> LINK TO DOCUMENTATION

    DSSP is the program used to create the IDITIS database. It reads in a protein structure in PDB format and analyzes it for structural motifs. Type (on the SGI's)

    dssp <input file name> <output filename> to run it.

    5.4.3.9. Gepol ---> LINK TO DOCUMENTATION

    Gepol is the program of Juan Luis Pascal calculating molecular surfaces defined in three different ways. The latest version is in the directory /e5share/softlib/prog/modeling/gepol. The executable is gepol93. Type gepol to run it. The directory also contains older versions and examples.

    5.4.3.10. Simulaid ---> LINK TO DOCUMENTATION

    Simulaid, written by Dr. Mezei, is a collection of utilities helping the setup and analysis of a simulation. It can perform the following operations:

    Type simulaid to run the program and answer the quizzes.

    5.4.3.11. Intocham ---> LINK TO DOCUMENTATION

    Intocham converts a .car file generated by Insight into a complete (including topology files) input for either Charmm or Amber or Moil. It can also convert the Charmm structure back to Insight. To run it, type intocham on the SGI's and answer the quizzes. Contact Dr. Mezei if problems arise - there can be a number of ambiguities in the type conversions.

    5.4.3.12. Babel

    Babel is a coordinate file conversion program, able to interconvert among a number of formats used by popular modeling and ab initio programs. It runs only under IRIX 5.3 or lower (on bonus or tata) Type babel -m to be presented with the menu and answer the quiz.

    5.4.3.13. Ligplot, Hbplus ---> LINK TO DOCUMENTATION

    The program Ligplot (from J. Thornton's lab) produces a nice schematic drawing of the environment of a ligand (based on a given structure) marking hydrogen bonds and hydrophobic contacts has been installed on Gene. The hydrogen bonds are established from the program Hbplus, also installed and called automatically by Ligplot. To run it, type

    source /e5share/softlib/prog/modeling/ligplot/cshrc

    followed by

    ligplot <PDB file name> startres endres chainID -h

    where the ligand stretches from residue startres and ends at residue number endres and its chain identifier is chainID. Adding the -h will have the program prompt you for a plot title. The resulting (Postscript) plot will be in a file called ligplot.ps.

    Hbplus can also be run independently and its parameters can also be changed.

    5.4.3.14. Dock

    Dock is a collection of program that allows the search of a database to find matches with a target structure. It has been installed on the SGI server in /e5share/softlib/prog/modeling/dock. Examples are provided in the subdirectory examples and there is a file file.list describing the content of each file in the dock directory tree. A printed documentation is also available.

    5.4.3.15. Rutgers Nucleic Acid Analysis Programs

    This is a DNA analysis package, written in the laboratory of Wilma Olson at Rutgers by Marla S. Babcock. It has been documented in a paper (Babcock, M.S., Pednault, E.P.D, and Olson, W.K., "Nucleic Acid Structure Analysis: A Users Guide to a Collection of New Analysis Programs," Journal of Biomolecular Structure and Dynamics, Vol. 11, No. 3, pp 597-628, 1993.) Dr. Mezei has a copy. Type rna in any window to start the analysis program.

    5.4.3.16. Autodock ---> LINK TO DOCUMENTATION

    Autodock is a flexible docking program, written in the Laboratory of A. Olson at Scripps. For detailed information and answers to FAQ, go to the URL http://www.scripps.edu/pub/olson-web/doc/autodock/. To perform a docking, type autodock_setup. This will put into your path the various programs referred to in the manual. Examples can be found in the directory /e5share/softlib/prog/modeling/autodock/dist_3.0/examples/ .

    5.4.3.17. Voidoo ---> LINK TO DOCUMENTATION

    Voidoo is a program from the Uppsala Software Factory. It finds cavities in a molecule and calculates their volume. To run it, just type voidoo.

    5.5. Sequence analysis programs.

    In conjunction with the sequence databases (see Sec. 4.) we have programs available that perform the searches, retrievals, alignments and other specialized tasks. Besides the programs discussed below, the modeling packages Quanta, Insight and Sybyl also have extensive sequence analysis capabilities.

    5.5.1. GCG. ---> LINK TO DOCUMENTATION

    GCG is a package of several programs for sequence analysis running on concave. To run it, first add the following line to your .cshrc file (on the SGI's).

    use gcg

    Then, open an xterm window on one of the graphics workstations and type:

    5.5.2. Maligned, malform ---> LINK TO DOCUMENTATION

    In addition to the programs in GCG we have the programs Maligned to display and edit multiple sequence alignments, and Malform to produce a Postscript hard copy. To run Maligned, type

    @programdisk:[maligned]maligned on msvax.

    To run Malform, type

    Run programdisk:[maligned]malform_tv.exe on msvax.

    5.5.3. Clustal ---> LINK TO DOCUMENTATION

    Clustal is a multiple sequence alignment program. Tu run it, type clustal to start version 1.5b (April, 1997)

    5.5.4. Refine ---> LINK TO DOCUMENTATION

    Refine is a program written by Karel Konvicka to post-process alignments (especially for transmembrane helices) and make some format conversions. Type refine on the SGI's to run the program.

    5.5.5. Serratus

    Serratus (originally from Oxford Molecular) allows you to manipulate multiple sequence alignments (e.g., generated by Clustal) and has a non-redundant sequence database (dated 1992) that can be queried. Serratus is installed on msvax. To run Serratus on msvax, execute the commands

    oml (can be put in the LOGIN.COM file)

    and

    delphos for the database query

    or

    somap for the manipulation of multiple sequence alignments.

    5.5.6. Matchmaker

    Mathcmaker (from Tripos) is installed on the SGI server. It makes structural predictions from protein sequence based on environmental homologies as opposed to simple sequence homologies. It is based on the work of J. Skolnick. There is documentation and a tutorial. Type matchmaker to run it on an SGI graphics terminal.

    5.5.7. Threader ---> LINK TO DOCUMENTATION

    Threader from J. Thorton's group is installed on endo. It will try to match a query sequence to its own database by a threading algorithm. To run it with a query sequence in the file test.seq (most sequence formats are supported), type

    threader -j -p test.seq results.out > & log.

    5.5.8. Iditis

    Iditis allows you to query an annotated version of the PDB based on various structural motifs. It is installed on bonus. Type iditis to run it through the graphical interface (only from an X terminal). To run it in command line mode (from a simple terminal) type iditis -c . To run a search in the background type iditis -c -o -q <query file> -x &

    Examples can be found in the directory OML_IDITIS/EXA (typing iditis will define OML_IDITIS). A brief tutorial is also available.

    6. USAGE POLICIES

    6.1. Disk space policies

    Since the total disk space is limited on each machine, fair usage dictates to set limits to the disk space used. These limits are different for the home directories (that are regularly backed up by the system) and for the scratch directories:





    Host Home directory limit Scratch space limit
    SGI's (fulcrum) 400 Mb  
    Farm (york) 400 Mb 4,000 Mb
    concave   4,000 Mb

    Note: the /home and /scratch filesystems are the ONLY places where users are allowed to run their jobs. All other filesystems are restricted for system use. In particular, do not use /tmp.

    /scratch* are shared resources. They are also a physical entity, so it is possible (and proven) that one person can fill a /scratch 100% full and this way kill or maim other people jobs. Current levels of saturation are printed during login - you should pay attention to those messages. To prevent /scratch*'s to fill up, the following policy was agreed upon:

    For special occasions users can use more than the limit. In those cases:

    1) The user must ask the system manager his permission.

    2) If there are no other requests made and the need is valid and there is enough space available, an agreement has to be made for how much and how long the user can use the given amount of extra-space.

    3) After a given time, the user has to reduce his/hers disk-usage back to the limit level. If this is not done, disk-quotas will be established.

    6.2. Job submission policies

    At the moment, concave no ques installed. Instead, we are supposed to limit the number of obs running to 24 (twice the number of CPU's); To ensure equitable access to this computational resource members of the Dept. of Physiology & Biophysics who have acces to the Farm are allowed to run one job only; other users are limited to two jobs per user. Exceptions can be granted by prior consultation with the system manager.

    Note that the queues (where installed) have time limits (see Sec. 3.7.). Jobs on machines without queues run until the next system shutdown (or crash).

    6.3. Rules for using the Farm.

    The computers of the 'avenues' are for the primary use of the Consortion Of Investigators (COI) who wrote the proposel to fund the Farm. Major emphasis is placed on creating and running parallel applications. For the ease of assuring even load on the system, specific machines are assigned to specific members of the COI:

    The CPU's of the 'street' machines are assigned as follows:

    A list of 'legitimate' users of each machine is maintained in the file /scratch/priority Anybody else submitting a job will automatically run with very low priority. This way idle CPU's can be utilized by anybody, but the 'legitimate' users can always resume running without having to wait for the 'outsider' job to finish.

    The command loadlist lists all jobs running on the various computers and loadcount lists just the number of running jobs.

    From time to time, users may request more than four CPU's. This should be arranged in advance with the other COI members.

    It is also suggested to limit the number of jobs to four on each machine to ensure the efficient execution of the parallel jobs.

    6.4. General code of conduct.

    As a matter of principle, the Molecular Modeling Core would like to establish as few rules as possible. Achievement of this goal requires responsible behavior from everybody's part.

    Responsible users

    1. familiarize themselves with the capacity of the various facilities and generally refrain from 'monopolizing' its resources;

    2. exit from programs with limited mumber of licenses (e.g., insight, quanta, matlab) when not using it and properly log off from the graphics workstations when finished;

    3. back up to tapes (see Sec. 3.10.) files that are not likely to be used soon (the larger the files the shorter time period is represented by this 'soon'), compress files that are not needed immediately and heed periodic backup requests from the system managers that may appear among the login messages;

    4. let the system manager know if they removed any manual or documentation from Rm 21-87 and generally return them promptly - in the meantime keep it accessible to others;

    5. scrupulously observe the policies related to both the disk space usage and the job submissions as spelled out above in sections 6.1. and 6.2.

    Lack of responsible behavior (as defined above) is likely to result in grumbles of increasing intensity with each occurrence, mostly from the part of the system manager, but you may hear from your affected colleagues as well.

    Adherence to the disk use policies (as described above in Sec. 6.1.) is of prime importance since consuming excessive amount of disk space may lead (and in fact, several times has lead) to filling up disks and corrupting jobs running at that time.

    Return to contents page