Tutorial: using Tleap topologies

While the protocol uses GROMACS to perform MD simulation. It can also use ambertools to build an amber topology, which can then be converted into GROMACS topology. This allows the use of force fields that are not available in GROMACS, the inclusion of ligands, non-standard amino acids, etc.

In this example, we are going to optimize a nanobody towards a protein that contains Zinc and coordinates it with amino acids that can’t be represented on a regular force-field. Hence, we’re going to need *Tleap* to build the topology and parmed to turn it into something GROMACS can work with. Both these tools come with ambertools, which comes with the protocol. For more info, check the Installation section.

p53-nanobody complex — Figure 1: snapshot of one optimized complex. **p53** is the yellow one on the left, with its loops colored red and blue, these loops have to be stabilized so it doesn’t loose its function; the zinc atom and its coordinating residues are on the bottom-left corner. The nanobody is the green one on the right, with its CDRs 1, 2 and 3 colored magenta, orange and gray, respectively.

As always, the name of our environment is locuaz, so we start by activating it.

mamba activate locuaz

Warning

GROMACS and Amber have different conventions when it comes to resSeq numbering. GROMACS goes for a strided progression, assigning a resSeq number of 1 to the first residue of each chain. Amber, on the other hand, expects all residues to have different resSeq numbers. locuaz supports both conventions, but be aware that setting use_tleap to True will change the resSeq numbering standard.

Setting up the system

The starting complex (p53-VHH) in this tutorial has been obtained using HADDOCK, followed by an equilibration. In this example, the 4-coordinated zinc metal center in p53 is described with Zinc AMBER force field ZAFF. Therefore, the topology is built with a *Tleap* script, which requires additional parameter files ZAFF.frcmod and ZAFF.prep.

The topology of the complex provided in this example has been prepared with the following Tleap script:

source oldff/leaprc.ff99SBildn
source leaprc.water.tip3p
addAtomTypes { { "ZN" "Zn" "sp3" } { "S2" "S" "sp3" } { "N1" "N" "sp3" } }
loadamberparams frcmod.ions1lm_126_tip3p
loadamberprep ZAFF.prep
loadamberparams ZAFF.frcmod
mol = loadpdb nb.pdb

bond mol.195.ZN mol.81.SG
bond mol.195.ZN mol.84.ND1
bond mol.195.ZN mol.143.SG
bond mol.195.ZN mol.147.SG
bond mol.217.SG mol.292.SG     ##adding the S-S bond

solvatebox mol TIP3PBOX 10.0
addions mol CL 0
addions mol NA 0

saveamberparm mol amb_nb.prmtop amb_nb.inpcrd
quit

The script can be run as:

tleap -f tleap

The topology was then converted into the GROMACS topology format using ParmEd, acpype is an alternative but we recommend staying with ParmEd since it’s the same locuaz uses internally.

Lastly, the system was minimized and equilibrated using the same mdp GROMACS input files we will be using during the protocol, min.mdp and nvt.mdp. For the NPT run, we ran for 10ns and saw no changes in the interface. Ideally you do not want to start with a complex that changes its interface too much, since this will change the affinity and the scoring of the mutations loose meaning.

Necessary files

As in tutorialsimple:Tutorial: running a simple optimization, we’re going to focus on the writing of the YAML config file. A more detailed explanation of the available options, can be found in the YAML configuration file. The materials for this tutorial are located in the examples/tleap_tutorial folder:

nb.pdb: the PDB file of the pre-equilibrated complex. As usual, target chains go first, also, remember that since we are using Tleap, residues should be numbered on a continuous progression.
tleap: Tleap dir with the script to build the topology of the system each time a mutation is performed. This script will be identical to the one above, with the exception of the solvatebox line, since the solvent is already there. Another thing to notice is the usage of addions. We keep this commands since Tleap will be responsible of keeping neutrality of the system and avoid using addions2 since we need it to replace water molecules each time it ads ions, to keep the N of the system constant. ZAFF.frcmod and ZAFF.prep (auxiliary Zn parameters)
config_tleap.yaml: the input file to run the protocol.
mdp directory: minimization, NVT and NPT GROMACS input files.

The configuration file

We will focus on the new options that didn’t show up on tutorialsimple:Tutorial: running a simple optimization.

paths

paths:
    gmxrc: /usr/local/gromacs/bin
    scorers: /home/pbarletta/labo/22/locuaz/rebin
    mutator: /home/pbarletta/labo/22/locuaz/rebin/dlpacker
    mdp: /home/pbarletta/labo/22/locuaz/daux/mdp
    input: [ /home/pbarletta/labo/22/locuaz/daux/oct_nb ]
    tleap: /home/pbarletta/labo/22/locuaz/daux/oct_nb/tleap
    work: /home/pbarletta/labo/22/locuaz/daux/work_dir

Tleap: the path to the *Tleap* scripts. It is mandatory if Tleap is used.

main

The running mode of the protocol is set to evolve, which is the default value, so it’s not actually necessary.

main:
    name: nb
    mode: evolve

protocol

In the protocol section, several important options concerning the protocol have to be specified.

protocol:
    epochs: 5
    branches: 2
    memory_size: 4
    failed_memory_size: 6
    memory_positions: [[2, 3, 4, 6, 7, 8], [], [], [] ]

memory_positions: this time we’re setting the memory ourselves, at least for the first run. This is used to fill a queue of size memory_size. At each epoch, the mutated position will be pushed into the queue and thus push out the oldest value. In this config whe are preventing the positions \(2, 3, 4, 6, 7\) and \(8\) from being mutated, only on the first epoch, since the other 3 slots are occupied by empty memories ([]). If these empty slots weren’t present, then [2, 3, 4, 6, 7, 8] would last another 3 epochs.

generation

Warning

Mutation Generators have been deprecated since version 0.7.0 and are slated for removal in 0.8.0. Use a Mutation Creator instead.

generation:
    generator: SPM4i
    probe_radius: 3

This time we are selecting any of the CDR residues in the interface with equal probability.

creation

creation:
    sites: 1
    sites_interfacing: true
    sites_interfacing_probe_radius: 3.0
    sites_probability: uniform
    aa_bins: ["CDEST", "AGIMLV", "PFWY", "RNQHK"]
    aa_bins_criteria: without
    aa_probability: uniform

This time we set both sites_probability and aa_probability to uniform. We don’t want to force the optimization too much and get trapped in a local minimum. So any of the binder’s mutating_resSeq thar are interfacing with the target can be chosen with equal probability and after that, and after a bin is chosen, each amino acid from the bin has equal probability of being chosen.

mutation

mutation:
    mutator: dlpr
    reconstruct_radius: 5

pruning

pruning:
    pruner: consensus
    consensus_threshold: 3

md

md:
    gmx_mdrun: gmx mdrun
    mdp_names:
        min_mdp: min.mdp
        nvt_mdp: nvt.mdp
        npt_mdp: npt.mdp
    ngpus: 2
    mpi_procs: 1
    omp_procs: 4
    pinoffsets: [0, 4]
    use_tleap: true
    box_type: octahedron

Notice we’re not setting the water model nor the force field, since we’re relaying on our Tleap script to take care of that.

pinoffsets: notice that we are using 4 OMP processors and 2 GPUs, hence, pinoffsets has a length of 2, one for each GPU run, and with a spacing of 4 threads.
use_tleap: True, this option is specified only if Tleap is used to build the topology.

target

target:
    chainID: [A]

binder

binder:
    chainID: [B]
    mutating_chainID: [B,B,B]
    mutating_resSeq: [[220,221,222,223,224,225,226,227],[248,249,250,251,252,253,254],[294, 295, 296, 297, 298, 299, 300]]
    mutating_resname: [[S,G,F,D,F,S,D,A],[R,S,G,L,A,T,S],[K,S,R,R,G,Q,G]]

scoring

scoring:
    functions: [ bluuesbmf, piepisa, evoef2, gmx_mmpbsa ]
    nthreads: 80
    mpi_procs: 2
    start: 50
    end: -1

2 new options show up with respect to the tutorialsimple:Tutorial: running a simple optimization

start: Useful if you want to skip a few frames before starting to score. 0-indexed.

end: Also 0-indexed. Defaults to -1, which means all remaining frames.

Running the protocol

There’s nothing new here with respect to the simple tutorial, we just run the protocol with our config file

mamba activate locuaz
python /home/user/locuaz/locuaz/protocol.py config_tleap.yaml

And as always, the protocol will create the working directory folder and inside of it, a folder for each branch:

directory structure of a branch folder — Figure 2: the look of any *branch* folder after it has been finished. *Tleap* related files are highlighted.