locuaz has several moving parts and each of them has its role in the optimization process. The optimization procedure begins with the Mutation Generator generating a new random mutation for the binder sequence, which is carried out by a Mutator, thus generating a new set of complexes between the binder and the target. Subsequently, a Sampler runs a minimization, NVT equilibration and a NPT simulation, so the target and binder interactions can be assessed with the chosen Scorer*(s). Finally, the *Pruner applies a selection criterion using the binding scores in order to accept or reject the mutation.
The process is repeated iteratively to explore new sequences with potentially improved affinities towards their targets. This workflow is outlined in Figure 1.
Many complexes can be generated on each run of this workflow and each of them will have MD data, scores, etc.. We will refer to each complex, plus its data as branch, while the set of branches from the same run of this workflow is called an epoch.
We will now make a review of each of the blocks depicted in Figure 1.
Throughout this documentation, we will refer to the user configuration options as
config, and its
various options as
locuaz has to coordinate between several external programs and be flexible enough to allow different protocols to be run, hence, some abstractions are needed. We will call these abstractions blocks.
Since there are many tools and each of them has a different naming syntax, things can get confusing at times. This is why locuaz puts a layer of abstraction over them and standardizes their names. All external programs and any files they depend on (like their binaries), are named in lowercase letters without any other symbols. So, for example, while gmx-mmpbsa may be named at times gmx_mmpbsa, gmx-MMPBSA, etc., we will always refer to it as gmxmmpbsa and its input script has to be named gmxmmpbsa and be located inside a folder called gmxmmpbsa.
Other programs like the rosetta scorer may need additional files. These are listed on its dedicated section.
Mutation Generator (deprecated)
These blocks are the one in charge of generating the new binders.
They have been deprecated since version
These are the currently available generators:
This is a Single Point Mutation generator. This means that it chooses a single position (from the user input
config["binder"]["mutating_resSeq"]), and all the mutations will be performed there.
To choose which amino acid will be used, it splits all amino acids (except cysteine, which is discarded) in the
following categories: negative, positive, hydrophobic and ring-containing.
Then, it chooses 1 from each group to generate as many mutations as the user asked for
SPM4 use this generator.
SPM4i is similar to
SPM4, but it adds an additional filter before choosing which position to mutate.
Given the positions selected by the user on
discard those that are not part of the target/binder interface.
To determine the interface, locuaz uses the freesasa library which uses a rolling-probe,
whose radius can be set using the
config["generation"]["probe_radius"] to any value ranging
4.0 (in angstrom blocks). The bigger the radius, the more residues will be classified
as part of the interface; the default is
SPM4i to use this generator.
SPM4i, but besides freesasa, it’s based on free energy considerations.
The generator will read the decomp_gmxmmpbsa.csv output file from gmxmmpbsa and pick the
residue that is collaborating the least with the interaction with the target.
At the same time, this position has to also comply with the previous prerequisites,
that is, being part of the interface and one of the positions included in
You can also set the probe radius in this mutator.
Don’t forget to include
gmxmmpbsa alongside your other scorers (in
and to include instructions in the gmxmmpbsa input file to perform the decompositions. The decomposition section
should look something like this:
/ &decomp idecomp=2, dec_verbose=0, print_res="within 4" /
Check Amber’s manual and gmx_MMPBSA docs for more info.
SPM4gmxmmpbsa use this generator.
The Mutation Creator replaces all Mutation Generators as the block in charge of taking a top branch from an epoch and creating the mutations that will give rise to the branches of the next epoch.
Mutation Generators were mono-blocks that the user could pick for the task. On the other hand, the Mutation Creator is unique, but highly configurable. The user can build their desired Mutation Creator out of the many options available.
The available options are split according to the 2 phases of the generation of a new mutation: the choosing of the site to be mutated and the choosing of the new amino acid (AA).
User can choose how many positions to mutate, whether these must be in the interface, and the likeliness of each position of being chosen. It can be uniform or guided by an mmpbsa method that chooses the position that’s contributing the least to the binding affinity.
Amino acid selection
Before selecting amino acids, the Mutation Creator selects a bin.
Given that it’s too computationally expensive to test all AAs at each position, and that there are similarities among them, the Mutation Creator gives the possibility of splitting the 20 AAs into bin of similar AAs that it will later choose from, for a more efficient sampling of the AA space. Once a bin is chosen, a specific AA has to be decided and this is where their probabilities play in.
At the beginning of the optimization process, a user may choose to split the 20 AAs into sub-groups of similar AAs (bins) in order to substitute the “wild-type” AAs for very different ones. The idea would be that these substitutions give more information. Later on, once a good affinity has been achieved, a change of an alanine for an arginine, may not be the optimal, perhaps a valine would be better.
That’s the idea behind bins, to exclude AAs that are probably not optimal in order to explore the solution space more efficiently.
It’s important to note that the options from the Mutation Creator may have an impact in the number of branches generated.
constant_width=false, the number of sites requested will multiply the number
of new branches generated from each previous top branch. Eg: if
sites=1 and the number of top branches from epoch
i is 2, then 4 new epochs will be generated in total for epoch i+1, 2
from each top branch from epoch i.
On the other hand, if the same conditions apply, but with
sites=2, 8 branches
will be generated in total, 4 from each branch, and 2 at each position.
The mutators are the external tools that actually take the complex and perform the mutations generated by the mutation generator, repack its side-chain and may repack the side-chains of neighboring residues as well. There’s no definitive best tool, so it’s up to the user to choose one after appropriate benchmarks are done.
Mutators based on DLPacker are the only ones that are built into the protocol and can be readily used, once its weights are downloaded. Other Mutators like the one based on EvoEF2 need an external binary that has to be downloaded. More tools can easily be added, through the interface that the Mutator class offers. Check Mutators for a reference to the class that abstract over these programs.
Whichever one you choose, set the
config["paths"]["mutator"] to the directory where it’ll find the necessary files.
This mutator is based on DLPacker which is, according to our benchmarks, one of the best side-chain packers to use
after a mutation. It’s the default mutator and while it comes built-in with locuaz, it needs its weights, which
are too heavy to be bundled alongside the installation. Check Mutators for more info about this.
dlp use this mutator.
dlpr use this mutator and adjust the reconstruct radius with the
Check Post-installation or Mutators for more info about this.
evoef2 is one of the available scorers but, at heart, it’s a Potential Energy Function (PEF) and it can
also replace a residue for another one, and then reorient it by minimizing its PEF. To use it, clone the evoef2 repo,
rename it to
evoef2 compile it using the
build.sh script and rename the binary to
evoef2 use this mutator.
Molecular Dynamics (MD) of the complexes are carried out using the GROMACS simulation package,
so some of the options associated to this block are transparent wrappers to GROMACS command line options
which map to
-pinoffset, respectively. Other GROMACS options are hard-coded,
-pin on and the use of the GPU for all interactions but the bonded ones.
Naturally, the mdp inputs also need to be specified in
config['md']['mdp_names']['npt_mdp'], which correspond to
the minimization, NVT and NPT, respectively.
Another important one is
config['md']['ngpus'], which will determine the number of parallel runs that can be ran.
With respect to topologies, these can be built and updated iteravely either with GROMACS or Amber’s Tleap.
config['md']['gmx_mdrun'] allows setting the name of the binary that carries out the MD. Its default
value is usually the right one (
gmx mdrun), but users of some systems may realize that the sysadmins have
compiled the mdrun command with a different name; this is why we added this option.
When using GROMACS to build the topology,
can be configured. Noticed there are no options to set the box. locuaz does not run any
editconf commands, it
will always keep the box from the system.
While the engine is always GROMACS, the topology can be built through Amber as well by setting
config['paths']['tleap'] also needs to be set alongside,
so locuaz can copy the path with all the necessary files to rebuild the topology after each mutation.
These are abstractions over external programs that estimate the affinity between the target and the binder over each frame of the MD. gmxmmpbsa is the only one that comes built-in with locuaz and does not an external binary, but it does need an input script. More info on all scorers can be found at Scorers.
After scoring the affinity, the chosen Pruner will decide if the mutation was successful or not. Pruners will take the original complex(es) and the newly mutated ones and will output the best of them for the next round of optimization. The exact criteria that will decide which complex(es) are at the top depends on the chosen pruner. More info on this at Pruners.
When using just one scorer, the metropolis pruner can be used which, as its name suggests, uses the metropolis acceptance ratio to decide if the mutation is accepted or not.
If many scorers are used, the consensus pruner checks how many of them improved their scores on the mutated complex with respect to the previous one, if enough of them indicate an in increase in affinity, then the new complex is accepted. Check locuaz.prunerconsensus module for more info and this reference for more details.
All these blocks can be configured, giving rise to many different protocols. Refer to the Figure 2 for a graphical abstract of them and check the tutorials for some concrete examples.