YAML configuration file

locuaz behaviour is highly customizable, there are a lot of moving parts and alternative ways of carrying out the same function. Hence this reference, in which we’ll refer to the input configuration as config and explain each of its “sections”.

config sections

Given the high number of options, config has a hierarchical structure, where similar options are grouped in sections.

config['paths']

All paths go in this section. And all of them are required, except for:

  1. config['paths']['input']: only used when starting a new protocol run, since during restarts the protocol will read the necessary files from the current epoch. The input PDB file has to be here and be named {config['paths']['name']}.pdb

  2. config['paths']['tleap']: only necessary when using tleap to build topologies. The tleap script and any other auxiliary file (.frcmod or .lib files) should be here.

Some other things to highlight:

  1. config['paths']['gmxrc']: path to where the GMXRC is located, along with the GROMACS binary, usually called gmx

  2. config['paths']['scorers']: root directory where each scorer will have its own folder. Check Scorers for more info.

  3. config['paths']['mutator']: mutator binary and/or parameters have to be here and be named appropriately. Check Mutators for more info.

  4. config['paths']['work']: name of the working dir. If it’s an existing directory, the protocol will assume it’s restarting from a previous run, if not, it will start a new one.

config['main']

  1. config['main']['name']: system’s name.

  2. config['main']['mode']: in case you want to just finish the MD run of the last epoch on your work dir, set this to run, if you just want to score it set it to score. In most scenarios, the default option evolve is what you want. This is also the only config option that may be overrided by a CLI option (eg: --mode evolve).

  3. config['main']['prefix']: the prefix files from the NPT run will get. Not very useful.

  4. config['main']['starting_epoch']: useful when a starting a new protocol from a PDB, or set of PDBs, that was already optimized by a protocol run, eg: when running an unrestrained optimization after a restrained one.

config['protocol']

  1. config['protocol']['epochs']: .

  2. config['protocol']['branches']: .

  3. config['protocol']['prevent_fewer_branches']: .

  4. config['protocol']['memory_size']: .

  5. config['protocol']['memory_positions']: .

  6. config['protocol']['failed_memory_size']: .

  7. config['protocol']['failed_memory_positions']: .

  8. config['protocol']['memory_aminoacids']: not yet implemented.

config['generation'] (deprecated)

  1. config['generation']['generator']: check the currently available generators below.

  2. config['generation']['probe_radius']: some generators exclude positions that are not currently on the interface, which is calculated by the rolling probe method (freesasa); the bigger the probe, the more residues that will be classified as being part of the interface.

config['creation']

Site selection options

  1. config['creation']['sites']: number of sites to mutate. Each new branch will still get 1 mutation, so increasing this number will increase the number of new branches, if constant_width=false.

  2. config['creation']['sites_interfacing']: only consider sites that are on the interface with the target.

  3. config['creation']['sites_interfacing_probe_radius']: a higher probe radius will increase the number of residues that are considered as being part of the interface.

  4. config['creation']['sites_probability']: set it to uniform if you want all positions to have the same chance of being chosen, or set it to mmpbsa if you’re already using the mmpbsa scorer and want to choose the sites that contribute the less to the interaction. Remember that to do this you need to add a section for residue decomposition to the gmxmmpbsa script file:

    /
    &decomp
    idecomp=2, dec_verbose=0,
    print_res="within 4"
    /
    

Check the gmxmmpbsa section for more info.

Amino acid selection options

These options affect the probability of each amino acid being chosen to be placed at the already selected site.

  1. config['creation']['aa_bins']: list of strings where each element is a bin, represented as string of consecutive one-letter coded amino acids. If you don’t want to group amino acids, just set it to 1 bin with all amino acids like this: [CDESTAGIMLVPFWYRNQHK]

  2. config['creation']['aa_bins_criteria']: set it to without so amino acids will be chosen from all other bins, but the one that contains the current amino acid, before the mutation. The opposite effect is obtained when set to within, only amino acids contained in the same bin as the current one can be chosen.

  3. config['creation']['aa_probability']: it can be set to either uniform, ReisBarletta, to use the probabilities extracted from the Reis & Barletta et. al. paper, and custom, to set your own. In this last case, you’ll also have to set the following option.

  4. config['creation']['aa_probability_custom']: a 20-element dictionary with the probability assigned to each amino acid.

config['mutation']

  1. config['mutation']['mutator']: check the currently available mutators below.

  2. config['mutation']['reconstruct_radius']: when using the dlpr mutator, residues within this radius from the mutated residue will get their sidechains reoriented by DLPacker.

  3. config['mutation']['allowed_nonstandard_residues']: if there’re non-protein residues and these are on the optimized interface, they need to be taken into account when optimizing the sidechain of the newly mutated residue. Add their resnames here.

config['pruning']

  1. config['pruning']['pruner']: check the currently available pruners below.

  2. config['pruning']['remaining_branches']: you can set this value when the chosen pruner leaves a fixed number of branches after pruning.

config['md']

config['target']

config['binder']

config['scoring']

config['statistics']

schema.yaml

Input configuration files are validated against the following schema, which also works as a reference you can check when in doubt, given its plain-english syntax. For example, you can check whether an option is mandatory or not (required), if it requires a string, a number, etc. (type), if it has a default value (default), etc.

paths:
  type: dict
  required: true
  contains_any_of: [ input, work ]
  schema:
    gmxrc:
      type: string
      required: true
      is_directory: true
    scorers:
      type: string
      required: true
      is_directory: true
    mutator:
      type: string
      required: true
      is_directory: true
    mdp:
      type: string
      required: true
      is_directory: true
    input:
      type: list
      minlength: 1
      maxlength: 12
      schema:
        type: string
        is_directory: true
      required: false
    work:
      type: string
      required: true
    tleap:
      type: string
      required: false
      is_directory: true

main:
  type: dict
  required: true
  schema:
    name:
      type: string
      required: true
    mode:
      type: string
      required: true
      default: "evolve"
      allowed: [ "evolve", "run", "score" ]
    starting_epoch:
      type: integer
      default: 0

protocol:
  type: dict
  required: false
  schema:
    epochs:
      type: integer
      default: 0
      min: 0
      max: 48
    new_branches:
      type: integer
      default: 1
      min: 1
      max: 19
    constant_width:
      type: boolean
      default: true
    prevent_fewer_branches:
      type: boolean
      default: true
    memory_size:
      type: integer
      required: false
      min: 0
      max: 12
      higher_than_length_of: memory_positions
    memory_positions:
      type: list
      required: false
      dependencies: memory_size
      minlength: 1
      maxlength: 12
      schema:
        type: list
        minlength: 0
        maxlength: 36
        schema:
          type: integer
          min: 1
          max: 99999
    failed_memory_size:
      type: integer
      required: false
      min: 0
      max: 12
      higher_than_length_of: failed_memory_positions
    failed_memory_positions:
      type: list
      required: false
      dependencies: memory_size
      minlength: 1
      maxlength: 12
      schema:
        type: list
        minlength: 0
        maxlength: 36
        schema:
          type: integer
          min: 1
          max: 99999

    memory_aminoacids:
      type: list
      required: false
      minlength: 1
      maxlength: 12
      schema:
        type: list
        minlength: 1
        maxlength: 19
        schema:
          type: string
          allowed: [ 'A', 'R', 'N', 'D', 'C', 'E', 'Q', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V' ]
          minlength: 1
          maxlength: 1

generation:
  type: dict
  required: false
  schema:
    generator:
      type: string
      default: SPM4i
      required: True
      allowed: [ SPM4, SPM4i, SPM4gmxmmpbsa ]
    probe_radius:
      type: float
      min: 0.1
      max: 4.0
      default: 1.4

creation:
  type: dict
  required: false
  default: {'sites': 1}
  schema:
    sites:
      type: integer
      min: 1
      max: 10
      default: 1
      required: True
    sites_interfacing:
      type: boolean
      default: True
    sites_interfacing_probe_radius:
      type: float
      min: 0.1
      max: 4.0
      default: 1.4
    sites_probability:
      type: string
      default: uniform
      required: True
      allowed: [ uniform, mmpbsa ]
      enforce_true: {mmpbsa: "sites_interfacing"}
    aa_bins:
      type: list
      minlength: 1
      maxlength: 20
      required: True
      default: ["CDEST", "AGIMLV", "PFWY", "RNQHK"]
      schema:
          type: string
          minlength: 1
          maxlength: 20
    aa_bins_criteria:
      type: string
      default: without
      required: True
      allowed: [ without, within ]
    aa_probability:
      type: string
      default: uniform
      required: True
      allowed: [ uniform, ReisBarletta, custom ]
    aa_probability_custom:
      type: dict
      required: False
      dependencies: { aa_probability: custom }
      schema:
        C:
          type: float
          required: True
          min: 0
          max: 1
        D:
          type: float
          required: True
          min: 0
          max: 1
        E:
          type: float
          required: True
          min: 0
          max: 1
        S:
          type: float
          required: True
          min: 0
          max: 1
        T:
          type: float
          required: True
          min: 0
          max: 1
        R:
          type: float
          required: True
          min: 0
          max: 1
        N:
          type: float
          required: True
          min: 0
          max: 1
        Q:
          type: float
          required: True
          min: 0
          max: 1
        H:
          type: float
          required: True
          min: 0
          max: 1
        K:
          type: float
          required: True
          min: 0
          max: 1
        A:
          type: float
          required: True
          min: 0
          max: 1
        G:
          type: float
          required: True
          min: 0
          max: 1
        I:
          type: float
          required: True
          min: 0
          max: 1
        M:
          type: float
          required: True
          min: 0
          max: 1
        L:
          type: float
          required: True
          min: 0
          max: 1
        V:
          type: float
          required: True
          min: 0
          max: 1
        P:
          type: float
          required: True
          min: 0
          max: 1
        F:
          type: float
          required: True
          min: 0
          max: 1
        W:
          type: float
          required: True
          min: 0
          max: 1
        Y:
          type: float
          required: True
          min: 0
          max: 1

mutation:
  type: dict
  required: true
  schema:
    mutator:
      type: string
      default: dlp
      required: True
      allowed: [ evoef2, dlp, dlpr ]
      crosscheck_radius: true
    reconstruct_radius:
      type: float
      min: 1.0
      max: 20.0
      default: 5.0
    allowed_nonstandard_residues:
      type: list
      minlength: 0
      maxlength: 12
      schema:
        type: string
        minlength: 2
        maxlength: 4
      default: [ ]
      warn_dependency_mutator: { mutator: dlpr}


pruning:
  type: dict
  required: true
  schema:
    prune:
      type: integer
      required: false
      min: 1
      max: 4
    pruner:
      type: string
      default: "consensus"
      required: true
      allowed: [ "consensus", "metropolis", "roundrobin" ]
    consensus_threshold:
      type: integer
      min: 1
      max: 20
      dependencies: { pruner: consensus }
    roundrobin_threshold:
      type: integer
      min: 1
      max: 20
      dependencies: { pruner: roundrobin }
    kT:
      type: float
      min: 0.1
      max: 5.0
      default: 0.593

md:
  type: dict
  required: true
  schema:
    gmx_mdrun:
      type: string
      required: false
      default: "gmx mdrun"
    mdp_names:
      type: dict
      required: true
      schema:
        min_mdp:
          type: string
          default: "min.mdp"
        nvt_mdp:
          type: string
          default: "nvt.mdp"
        npt_mdp:
          type: string
          default: "npt.mdp"
    mps:
      type: boolean
      default: false
      forbidden_if_true_mandatory_if_false: [ ngpus, mpi_procs, omp_procs, pinoffsets ]
    numa_regions:
      type: integer
      allowed: [ 1, 2, 4, 8 ]
      default: 4
    ngpus:
      type: integer
      min: 1
      max: 12
      required: false
      same_as_length_of: pinoffsets
    mpi_procs:
      type: integer
      min: 1
      max: 48
      required: false
    omp_procs:
      type: integer
      min: 1
      max: 48
      required: false
    pinoffsets:
      type: list
      minlength: 1
      maxlength: 12
      required: false
      schema:
        type: integer
    use_tleap:
      type: boolean
      default: false
    force_field:
      type: string
      allowed: [ "amber03", "amber94", "amber96", "amber99", "amber99sb-ildn",
                 "amber99sb", "amberGS", "charmm27", "gromos43a1", "gromos43a2",
                 "gromos45a3", "gromos53a5", "gromos53a6", "gromos54a7", "oplsaa" ]
      default: "amber99sb-ildn"
      required: false
    water_type:
      type: string
      default: "tip3p"
      required: false
      allowed: [ "tip3p", "tip4p", "tip4pew", "tip5p", "spc", "spce" ]
    box_type:
      type: string
      required: true
      default: "triclinic"
      allowed: [ "triclinic", "dodecahedron", "octahedron" ]
    maxwarn:
      type: integer
      min: 0
      max: 20
      default: 0
    npt_restraints:
      type: dict
      required: false
      schema:
        posres:
          type: float
          min: 1
          max: 10000
          default: 1000
        posres_water:
          type: float
          min: 1
          max: 10000
          default: 1000

target:
  type: dict
  required: true
  schema:
    chainID:
      type: list
      minlength: 1
      maxlength: 10
      required: true
      schema:
        type: string
        maxlength: 1

binder:
  type: dict
  required: true
  schema:
    chainID:
      type: list
      minlength: 1
      maxlength: 10
      required: true
      schema:
        type: string
        allowed: [ A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z ]
        required: true
    mutating_chainID:
      type: list
      minlength: 1
      maxlength: 10
      required: true
      schema:
        type: string
        allowed: [ A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z ]
        required: true
    mutating_resSeq:
      type: list
      minlength: 1
      maxlength: 10
      required: true
      same_length: mutating_chainID
      schema:
        type: list
        minlength: 1
        maxlength: 20
        required: true
        sorted: true
        unique: true
        schema:
          type: integer
          min: 1
          max: 99999
          required: true
    mutating_resname:
      type: list
      minlength: 1
      maxlength: 10
      required: true
      same_length: mutating_chainID
      schema:
        type: list
        minlength: 1
        maxlength: 20
        required: true
        schema:
          type: string
          allowed: [ A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y ]
          required: true

scoring:
  type: dict
  required: true
  schema:
    scorers:
      type: list
      minlength: 1
      maxlength: 20
      required: true
      unique_values: true
      schema:
        type: string
        allowed: [ "bach", "bluues", "bluuesbmf", "evoef2", "haddock", "piepisa", "pisa", "rosetta", "gmxmmpbsa", "autodockvina" ]
    nthreads:
      type: integer
      min: 1
      max: 256
      required: true
    mpi_procs:
      type: integer
      min: 1
      max: 256
      required: true
    start:
      type: integer
      default: 0
      min: 0
      max: 999999
      required: false
    end:
      type: integer
      default: -1
      min: -1
      max: 999999
      required: false
      scoring_end: true
    allowed_nonstandard_residues:
      type: list
      minlength: 0
      maxlength: 12
      schema:
        type: string
        minlength: 2
        maxlength: 4
      default: [ ]
statistics:
  type: dict
  required: false
  schema:
    interface:
      type: dict
      schema:
        run:
          type: boolean
        warn_above:
          type: float
          min: 0.0
          max: 999.0
          higher_than: warn_below
        warn_below:
          type: float
          min: 0.0
          max: 999.0
        warn_above_relative:
          type: float
          min: 1.0
          max: 10.0
        warn_below_relative:
          type: float
          min: 0.0
          max: 1.0
        warn_variance:
          type: float
          min: 0.0
          max: 99999.0
        warn_variance_relative:
          type: float
          min: 0.0
          max: 5.0
        nthreads:
          type: integer
          min: 1
          max: 24
          default: 1
    cmdistance:
      type: dict
      schema:
        run:
          type: boolean
        warn_above:
          type: float
          min: 0.0
          max: 999.0
          higher_than: warn_below
        warn_below:
          type: float
          min: 0.0
          max: 999.0
        warn_above_relative:
          type: float
          min: 1.0
          max: 10.0
        warn_below_relative:
          type: float
          min: 0.0
          max: 1.0
        warn_variance:
          type: float
          min: 0.0
          max: 99999.0