Frequently Asked Questions
1_ I used tleap to build my initial system but now my starting PDB has no chainID information and locuaz won’t take it. What can I do about it?
tleap is a bit of an old piece of software, and it seems that back then chainID info wasn’t very important. If you write out a PDB file along with your prmtop and rst7 files when building your topology, you’ll see that the PDB has no chainID info and neither does the topology.
Fortunately, parmed —which will be included with your locuaz installation—,
added some new flags to the .prmtop
format, including RESIDUE_CHAINID
,
which will hold the information we want. In order to add it, along with some other
extra info, we will ask parmed to get it from our initial PDB, which should have
chainID info
This short parmed script will do it:
parm no_chainid_top.prmtop
addPDB start.pdb
outparm top.prmtop
go
no_chainid_top.prmtop
is the topology we got out from tleap, with water, ions
and all that. start.pdb
is the PDB tleap started out from, without water, box
or anything, but with the chainID information.
We then run this script:
parmed -i add_chainID.parmed
You can now check top.prmtop
for parmed additions:
%FLAG RESIDUE_CHAINID
%COMMENT Residue chain ID (chainId) read from PDB file; DIMENSION(NRES)
%FORMAT(20a4)
A A A A A A A A A A A A A A B B B B B B
The final step is to get a PDB out of this topology and the rst7 tleap gave us. We can, again, write a parmed script for this:
parm top.prmtop
trajin rst7_from_tleap.rst7
trajout pdb_with_chainID.pdb
Or, we can use another program that comes with ambertools, which was made for this specific purpose:
ambpdb -p top.prmtop -c rst7_from_tleap.rst7 > pdb_with_chainID.pdb
Now we can use pdb_with_chainID.pdb
to start out our optimization, probably
after giving it a better name.
2_ My complex is spreading apart. Can I use GROMACS pulling to keep them together?
Of course! This GROMACS feature is orthogonal to the optimization so locuaz
doesn’t concern itself with it.
What it will do is give GROMACS an index selection
file (.ndx
) with selections for the target and the binder. You can then
reference target and binder as the pull groups in the pull code.
This pull code goes in the mdp file you’ll use for the NPT run:
; Pull code
pull = yes
pull-pbc-ref-prev-step-com = yes ;
pull-ncoords = 1 ; only one reaction coordinate
pull-ngroups = 2 ; two groups defining one reaction coordinate
pull-group1-name = target
pull-group1-pbcatom = 4075 ; atom close to the center
pull-group2-name = binder
pull-group2-pbcatom = 7251 ; atom close to the center
pull-coord1-groups = 1 2 ; target and binder define the reaction coordinate
pull-coord1-type = umbrella ; harmonic potential
pull-coord1-geometry = distance ;
pull-coord1-dim = Y Y Y ; pull along all cordinates
pull-coord1-start = yes ; define initial COM distance > 0
pull-coord1-rate = 0.00 ; keep it fixed
pull-coord1-k = 1 ; kJ mol^-1 nm^-2
Notice the target and binder as the pull groups 1 and 2, respectively.
You can probably use this same sample file and just modify the atoms at
pull-group1-pbcatom
and pull-group2-pbcatom
, and if necessary, the
pull-coord1-k
.
But do remember to lift this pulling and optimize your binder using free MD once you have a somewhat better binder than can stay with its target without any restraints. Try optimizing for 5 epochs and then lift the restrictions to keep optimizing without pulling.
3_ What is NUMA? How can I know the number of NUMA regions my machine has?
NUMA (Non-Unified Memory Access) it’s a design idea to deal with multiple CPU cores and their access to memory.
For us users, it’s important to be NUMA-aware, that is, understand that some cores have faster access to certain regions and slower access to other ones. The same applies to the relationship between CPUs and GPUs.
To know your NUMA layout, just run this command:
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 256924 MB
node 0 free: 249438 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 257999 MB
node 1 free: 256694 MB
node distances:
node 0 1
0: 10 11
1: 11 10
This means we have 2 NUMA regions and when using MPS, we should set
numa_regions: 2
.
For more info, check this related blogpost.
4_ Why use both memory_positions
and failed_memory_positions
at the same time?
Because sometimes you want to avoid mutating back a position that failed to improve
binding but don’t want to set a too big of a memory. In those cases, you can set
a large failed_memory_size
and a lower memory_size
.
5_ What are the empty brackets in the memory_positions
list?
Empty memory slots on input user memory are allowed. This allows the user to control for how many epochs will the non-empty memory be recalled. Place them after the desired positions:
memory_positions: [[2, 3, 4, 6, 7, 8], [], [], [] ]
memory_size: 4
This means that positions 2, 3, 4, 6, 7 and 8 won’t be mutated on the first epoch, but will be eligible for mutation right after.
6_ AssertionError: No valid branches in input.
If you’re trying to start an optimization and you’re getting this error, most
likely you’ve selected an already existing work
dir. You have to let locuaz
create a dir on its own:
paths:
work: /leonardo/home/userexternal/pbarlett/non_existent_folder
If the work
dir already exists, locuaz will try to restart an optimization
from it and then fail when it can’t recognize its contents.
7_ AssertionError: Too few segments in the input PDB. There should be at least 3 (target+binder+solvent).
If you’re trying to start an optimization and you’re getting this error, you
probably have a starting PDB without chainID info.
If you created your system with tleap
, check I used tleap to build my initial system
but now my starting PDB has no chainID information and locuaz won’t take it.
What can I do about it?