3. Overview of input and output files

Input/output files depend on the external code used for structure relaxation.

An important technical element of our philosophy is the multi-stage strategy for structure relaxation. Final structures and energies must be high-quality, in order to correctly drive evolution. Most of the newly generated structures are far from local minimum and their high-quality relaxation is extremely expensive. This cost can be offset if the first stages of relaxation are done with cruder computational conditions — only at the last stages is there a need for high-quality calculations. The first stages of structure relaxation can be performed with cheaper approaches or lower computational conditions (basis set, k-points sampling, pseudopotentials) or level of approximation (forcefields vs. LDA vs. GGA) and even different structure relaxation code (see Section 2.5 for a list of supported codes) during structure relaxation of each candidate structure. We strongly suggest you initially optimize the cell shape and atomic positions at constant unit cell volume, and only then perform full optimization of all structural variables. While optimizing at constant volume, you do not need to worry about Pulay stresses in plane-wave calculations — it is OK to use a small basis set; however, for variable-cell relaxation you will need a high-quality basis set. For structure relaxation, you can often get away with a small set of k-points — but don’t forget to sufficiently increase this at the last stage(s) of structure relaxation, to get accurate energies. Use your (and our) wisdom, be a strategist, and remember that poor relaxation can ruin your results.

3.1. Input files

Suppose that the directory where the calculations are performed is ~/StructurePrediction. This directory will contain:

  • file input.uspex, thoroughly described in Section 5.

  • Subdirectory ~/StructurePrediction/Specific/ with VASP, GULP, etc. executables, and enumerated input files for structure relaxation — INCAR_1, INCAR_2, …, and pseudopotentials. You can actually alter this filenames (see Section 5.7)

  • Subdirectory ~/StructurePrediction/Seeds/ contains files with seed structures. Seed structures should be in VASP5 POSCAR format grouped into folders. Which folder will be used for which generation is specified in Section 5.5.14.

  • Files with molecule definitions (see Section 5.9).

  • Files with environment definitions (see Section 5.10).

3.1.1. Specific folder

Executables and enumerated input files for structure relaxation (using external codes, like VASP, GULP, …) should be put in subdirectory

  • For VASP, put files INCAR_1, INCAR_2, …, etc., defining how relaxation and energy calculations will be performed at each stage of relaxation (we recommend at least 3 stages of relaxation), and the corresponding POTCAR_* files with pseudopotentials. E.g., INCAR_1 and INCAR_2 perform very crude structure relaxation of both atomic positions and cell parameters, keeping the volume fixed, INCAR_3 performs full structure relaxation under constant pressure with medium precision, INCAR_4 performs very accurate calculations. Each higher-level structure relaxation starts from the results of a lower-level optimization and improves them. files of all relevant elements should also be in Specific folder, for instance POTCAR_O, POTCAR_C, etc.

  • For GULP, files goptions_1, goptions_2, … and , ginput_1, ginput_2, … must be present. The former specify what kind of optimization is performed, the latter specify the details (interatomic potentials, pressure, temperature, number of relaxation iterations, etc.).

  • For Quantum Espresso, files qEspresso options 1, qEspresso options 1, …, must be present. All files should be the normal QE input files with all parameters except atom coordinates, cell parameters and \(k\)-points (these will be written by USPEX at the end of the file). We recommend performing a multi-step relaxation. For instance qEspresso options 1, does a crude structure relaxation of atomic positions with fixed cell parameters, qEspresso options 2 does full structure relaxation under constant external pressure with medium precision; qEspresso options 3 and does very accurate calculations.

3.1.1.1. INCAR_* files in Specific/ folder for VASP

To use USPEX correctly, you should carefully edit the files in Specific/ folder to control the structure relaxation in USPEX. We take example of VASP as an external code:

  • Your final structures have to be well relaxed, and energies — precise. The point is that your energy ranking has to be correct (to check this, look at E_series.pdf file in the output).

  • Your POTCAR files: To yield correct results, the cores of your pseudopotentials (or PAW potentials) should not overlap by more than 10–15%.

  • To have accurate relaxation at low cost, use the multistage relaxation with at least three stages of relaxation for each structure, i.e. at least three INCAR files (INCAR_1, INCAR_2, INCAR_3, …). We usually set 4–5 stages of relaxation.

  • Your initial structures will be usually very far from local minima, in such cases it helps to relax atoms and cell shape at constant volume first (ISIF = 4 in INCAR_1,2), then do full relaxation (ISIF = 3 in INCAR_3,4), and finish with a very accurate single-point calculation (ISIF = 2 and NSW = 0 in INCAR_5).

    Exceptions: when you do fixed-cell predictions, and also in evolutionary metadynamics (except full relaxation) you must have ISIF = 2.

  • When your volume does not change, you can use default plane wave cutoff. When you optimize cell voluem (ISIF = 3), you must increase it by 30–40%, otherwise you get a large Pulay stress. Also your convergence criteria can be loose in the beginning, but have to be tight in the end: e.g., EDIFF = 1e-2 and EDIFFG = 1e-1 in INCAR_1, gradually tightening to EDIFF = 1e-4 and EDIFFG = 1e-3 in INCAR_4. The maximum number of iterations (NSW) should be sufficiently large to enable good relaxation, but not too large to avoid wasting computer time on poor configurations. The larger your system, the larger NSW should be.

  • Choosing an efficient relaxation algorithm can save a lot of time. In VASP, we recommend to start relaxation with conjugate gradients (IBRION = 2 and POTIM = 0.02) and when the structure is closer to local minimum, switch to IBRION = 1 and POTIM = 0.3.

  • Even if you study an insulating system, many configurations that you will sample are going to be metallic, so to have well-converged results, you must use “metallic” treatment — which works both for metals and insulators. We recommend the Methfessel-Paxton smearing scheme (ISMEAR = 1). For a clearly metallic system, use ISMEAR = 1 and SIGMA = 0.1–0.2. For a clearly insulating system, we recommend ISMEAR = 1 and SIGMA starting at 0.1 (INCAR_1) and decreasing to 0.05.

Here we provide an example of files for carbon with 16 atoms in the unit cell, with default ENCUT = 400 eV in POTCAR:

INCAR_1:
    PREC=LOW
    EDIFF=1e-2
    EDIFFG=1e-1
    NSW=65
    ISIF=4
    IBRION=2
    POTIM=0.02
    ISMEAR=1
    SIGMA=0.10
INCAR_2:
    PREC=NORMAL
    EDIFF=1e-3
    EDIFFG=1e-2
    NSW=55
    ISIF=4
    IBRION=1
    POTIM=0.30
    ISMEAR=1
    SIGMA=0.08
INCAR_3:
    PREC=NORMAL
    EDIFF=1e-3
    EDIFFG=1e-2
    ENCUT=520.0
    NSW=65
    ISIF=3
    IBRION=2
    POTIM=0.02
    ISMEAR=1
    SIGMA=0.07
INCAR_4:
    PREC=NORMAL
    EDIFF=1e-4
    EDIFFG=1e-3
    ENCUT=600.0
    NSW=55
    ISIF=3
    IBRION=1
    POTIM=0.30
    ISMEAR=1
    SIGMA=0.06
INCAR_5:
    PREC=NORMAL
    EDIFF=1e-4
    EDIFFG=1e-3
    ENCUT=600.0
    NSW=0
    ISIF=2
    IBRION=2
    POTIM=0.02
    ISMEAR=1
    SIGMA=0.05

3.2. Output files

These are stored in the folder results1, if this is a new calculation and results2, results3, if the calculation has been restarted or run a few times), there will be a separate results* folder for each calculation.

Caution

When looking at space groups in the file Individuals, keep in mind that USPEX often underdetermines space group symmetries, because of finite precision of structure relaxation and relatively tight space group determination tolerances. You should visualize the predicted structures. To get the true space group symmetry, either increase symmetry tolerances (but this can be dangerous), or re-relax your structure with increased precision.

The subdirectory contains the following files:

  • OUTPUT.txt – summarizes input variables, structures produced by USPEX, and their characteristics.

  • parameters.uspex — this is a copy of the file used in this calculation with some defaults explicitly writen, for your reference.

  • Individuals – gives details of all produced structures (energies, unit cell volumes, space groups, variation operators that were used to produce the structures, \(k\)-points mesh used to compute the structures’ final energy, degrees of order, etc.).

  • BESTindividuals gives this information for the best structures from each generation.

  • convex_hull — only for variable-composition calculations, this file gives all thermodynamically stable compositions, and their enthalpies (per atom).

  • gatheredPOSCARS — relaxed structures (in the VASP5 POSCAR format).

  • BESTgatheredPOSCARS — the same data for the best structure in each generation.

  • gatheredPOSCARS_unrelaxed — gives all structures produced by USPEX before relaxation.

  • enthalpies_complete.csv — gives the enthalpies for all structures in each stage of relaxation.

  • origin — shows which structures originated from which parents and through which variation operators.

  • goodStructures and extended_convex_hull (for fixed- and variable-composition calculations correspondingly) report all of the different structures (details) in order of decreasing stability, starting from the most stable structure and ending with the least stable.

  • goodStructures_POSCARS and extended_convex_hull_POSCARS (for fixed- and variable-composition calculations correspondingly) report all of the different structures (in the VASP5 POSCAR format) in order of decreasing stability, starting from the most stable structure and ending with the least stable.

  • *.uspex auxiliary files which complement POSCARS files if needed. They contain information about molecular association of atoms as well as periodical boundary conditions.

  • graphical files (.svg) — for rapid visual assessment of the results:

    • Energy_vs_N.svg (Fitness_vs_N.svg) — energy (fitness) as a function of structure number;

    • Energy_vs_Volume.svg — energy as a function of volume;

    • Variation-Operators.svg — energy of the child vs. parent(s); different operators are marked with different colors (this graph allows one to assess the performance of different variation operators) also show evolution of each operator’s strength.

    • E_series — correlation between energies from relaxation steps \(i\) and \(i+1\); helps to detect problems and improve structure relaxation.

    • For variable compositions there is an additional graph extendedConvexHull.svg, which shows the enthalpy of formation as function of composition.