Frequently Asked Questions

1_ I used tleap to build my initial system but now my starting PDB has no chainID information and locuaz won’t take it. What can I do about it?

tleap is a bit of an old piece of software and it seems that back then chainID info wasn’t very important. If you write out a PDB file along with your prmtop and rst7 files when building your topology, you’ll see that the PDB has no chainID info and neither does the topology.

Fortunately, parmed —which will be included with your locuaz installation—, added some new flags to the .prmtop format, including RESIDUE_CHAINID, which will hold the information we want. In order to add it, along with some other extra info, we will ask parmed to get it from our initial PDB, which should have chainID info

This short parmed script will do it:

parm no_chainid_top.prmtop
addPDB start.pdb
outparm top.prmtop

no_chainid_top.prmtop is the topology we got out from tleap, with water, ions and all that. start.pdb is the PDB tleap started out from, without water, box or anything, but with the chainID information.

We then run this script:

parmed -i add_chainID.parmed

You can now check top.prmtop for parmed additions:

%COMMENT Residue chain ID (chainId) read from PDB file; DIMENSION(NRES)
A   A   A   A   A   A   A   A   A   A   A   A   A   A   B   B   B   B   B   B

The final step is to get a PDB out of this topology and the rst7 tleap gave us. We can, again, write a parmed script for this:

parm top.prmtop
trajin rst7_from_tleap.rst7
trajout pdb_with_chainID.pdb

Or, we can use another program that comes with ambertools, which was made for this specific purpose:

ambpdb -p top.prmtop -c rst7_from_tleap.rst7 > pdb_with_chainID.pdb

Now we can use pdb_with_chainID.pdb to start out our optimization, probably after giving it a better name.

2_ My complex is spreading apart. Can I use GROMACS pulling to keep them together?

Of course! This GROMACS feature is orthogonal to the optimization so locuaz doesn’t concern itself with it. What it will do is give GROMACS an index selection file (.ndx) with selections for the target and the binder. You can then reference target and binder as the pull groups in the pull code. This pull code goes in the mdp file you’ll use for the NPT run:

; Pull code
pull                        = yes
pull-pbc-ref-prev-step-com  = yes       ;
pull-ncoords                = 1         ; only one reaction coordinate
pull-ngroups                = 2         ; two groups defining one reaction coordinate
pull-group1-name            = target
pull-group1-pbcatom         = 4075      ; atom close to the center
pull-group2-name            = binder
pull-group2-pbcatom         = 7251      ; atom close to the center
pull-coord1-groups          = 1 2       ; target and binder define the reaction coordinate
pull-coord1-type            = umbrella  ; harmonic potential
pull-coord1-geometry        = distance  ;
pull-coord1-dim             = Y Y Y     ; pull along all cordinates
pull-coord1-start           = yes       ; define initial COM distance > 0
pull-coord1-rate            = 0.00      ; keep it fixed
pull-coord1-k               = 1         ; kJ mol^-1 nm^-2

Notice the target and binder as the pull groups 1 and 2, respectively. You can probably use this same sample file and just modify the atoms at pull-group1-pbcatom and pull-group2-pbcatom, and if necessary, the pull-coord1-k.

But do remember to lift this pulling and optimize your binder using free MD once you have a somewhat better binder than can stay with its target without any restraints. Try optimizing for 5 epochs and then lift the restrictions to keep optimizing without pulling.

3_ What is NUMA? How can I know the number of NUMA regions my machine has?

NUMA (Non-Unified Memory Access) it’s a design idea to deal with multiple CPU cores and their access to memory.

For us users, it’s important to be NUMA-aware, that is, understand that some cores have faster access to certain regions and slower access to other ones. The same applies to the relationship between CPUs and GPUs.

To know your NUMA layout, just run this command:

$ numactl  --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 256924 MB
node 0 free: 249438 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 257999 MB
node 1 free: 256694 MB
node distances:
node   0   1
  0:  10  11
  1:  11  10

This means we have 2 NUMA regions and when using MPS, we should set numa_regions: 2.

For more info, check this related blogpost.

4_ Why use both memory_positions and failed_memory_positions at the same time?

Because sometimes you want to avoid mutating back a position that failed to improve binding but don’t want to set a too big of a memory. In those cases, you can set a large failed_memory_size and a lower memory_size.

5_ What are the empty brackets in the memory_positions list?

Empty memory slots on input user memory are allowed. This allows the user to control for how many epochs will the non-empty memory be recalled. Place them after the desired positions:

memory_positions: [[2, 3, 4, 6, 7, 8], [], [], [] ]
memory_size: 4

This means that positions 2, 3, 4, 6, 7 and 8 won’t be mutated on the first epoch, but will be eligible for mutation right after.