8.8 How do I set up a calculation using a job submission script?

To set up a job submission script, we expect users to know some basic knowledge of python programing and your job submission systems.

There are two modes for job submission: local submission or remote submission, depending on whether you submit ab initio calculations to the local machine where you run USPEX, or to a remote supercomputer.

Step 1: Configuring files in Submission/ folder

Case I: Local submission.

Please edit in INPUT.txt file the following tag:

1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)

Then, it is necessary to run ssh server on your local machine. USPEX will connect to it and run ab-initio code via ssh.

Then, go to the directory Submission/, where you need to edit two files: submitJob_local.py and checkStatus_local.py.

One can find the detailed instructions in these files. In general, one just needs to tell USPEX how to submit the job and check if the job has completed or not.

In submitJob_local.py:

from subprocess import check_output
import re
import sys


def submitJob_local(index : int, commnadExecutable : str) -> int:
    """
    This routine is to submit job locally
    One needs to do a little edit based on your own case.

    Step 1: to prepare the job script which is required by your supercomputer
    Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
    Step 3: to get the jobID from the screen message
    :return: job ID
    """

    # Step 1
    myrun_content = ''
    myrun_content += '#!/bin/sh\n'
    myrun_content += '#SBATCH -o out\n'
    myrun_content += '#SBATCH -p cpu\n'
    myrun_content += '#SBATCH -J USPEX-' + str(index) + '\n'
    myrun_content += '#SBATCH -t 06:00:00\n'
    myrun_content += '#SBATCH -N 1\n'
    myrun_content += '#SBATCH -n 8\n'
    # myrun_content += 'cd ${PBS_O_WORKDIR}\n' check this, must have /cephfs suffix with SBATCH in my case
    myrun_content += 'mpirun vasp_std > log\n'
    with open('myrun', 'w') as fp:
        fp.write(myrun_content)

    # Step 2
    # It will output some message on the screen like '2350873.nano.cfn.bnl.local'
    output = str(check_output('sbatch myrun', shell=True))
    
    # Step 3
    # Here we parse job ID from the output of previous command
    jobNumber = int(re.findall(r'\d+', output)[0])
    return jobNumber


if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', dest='index', type=int)
    parser.add_argument('-c', dest='commnadExecutable', type=str)
    args = parser.parse_args()

    jobNumber = submitJob_local(index=args.index, commnadExecutable=args.commnadExecutable)
    print('CALLBACK ' + str(jobNumber))

In checkStatus_local.py:

import argparse
import glob
import os

from subprocess import check_output

_author_ = 'etikhonov'


def checkStatus_local(jobID : int) -> bool:
    """
    This function is to check if the submitted job is done or not
    One needs to do a little edit based on your own case.
    1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
    Step1: the command to check job by ID. 
    Step2: to find the keywords from screen message to determine if the job is done
    Below is just a sample:
    -------------------------------------------------------------------------------
    Job id                    Name             User            Time Use S Queue
    ------------------------- ---------------- --------------- -------- - -----
    2455453.nano              USPEX            qzhu            02:28:42 R cfn_gen04 
    -------------------------------------------------------------------------------
    If the job is still running, it will show as above.
    
    If there is no key words like 'R/Q Cfn_gen04', it indicates the job is done.
    :param jobID: 
    :return: doneOr
    """

    # Step 1
    output = str(check_output('qstat {}'.format(jobID), shell=True))
    # Step 2
    doneOr = True
    if ' R ' in output or ' Q ' in output:
        doneOr = False
    if doneOr:
        for file in glob.glob('USPEX*'):
            os.remove(file)  # to remove the log file
    return doneOr

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-j', dest='jobID', type=int)
    args = parser.parse_args()

    isDone = checkStatus_local(jobID=args.jobID)
    print('CALLBACK ' + str(int(isDone)))

Case II: Remote submission.

Please edit in INPUT.txt file the following tag:

2       : whichCluster (default 0, 1: local submission; 2: remote submission)

Finally, go to the directory Submission/, where you need to edit two files:
submitJob_remote.py and checkStatus_remote.py

In submitJob_remote.py:

import argparse
import os
import re

from subprocess import check_output


def submitJob_remote(workingDir : str, index : int, commandExecutable : str) -> int:
    """
    This routine is to submit job to remote cluster
    One needs to do a little edit based on your own case.
    Step 1: to prepare the job script which is required by your supercomputer
    Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
    Step 3: to get the jobID from the screen message

    :param workingDir: working directory on remote machine
    :param index: index of the structure.
    :param commandExecutable: command executable for current step of optimization
    :return:
    """

    # Step 1
    # Specify the PATH to put your calculation folder
    Home = '/home/etikhonov' # 'pwd' of your home directory of your remote machine
    Address = 'rurik'  # your target server: ssh alias or username@address
    Path = Home + '/' + workingDir + '/CalcFold' + str(index) # Just keep it
    run_content = ''
    run_content += '#!/bin/sh\n'
    run_content += '#SBATCH -o out\n'
    run_content += '#SBATCH -p cpu\n'
    run_content += '#SBATCH -J USPEX-' + str(index) + '\n'
    run_content += '#SBATCH -t 06:00:00\n'
    run_content += '#SBATCH -N 1\n'
    run_content += '#SBATCH -n 8\n'
    run_content += 'cd /cephfs'+ Path + '\n'
    run_content += commandExecutable + '\n'

    with open('myrun', 'w') as fp:
        fp.write(run_content)

    # Create the remote directory
    # Please change the ssh/scp command if necessary.
    try:
        os.system('ssh -i ~/.ssh/id_rsa ' + Address + ' mkdir -p ' + Path)
    except:
        pass

    # Copy calculation files
    # add private key -i ~/.ssh/id_rsa if necessary
    os.system('scp POSCAR   ' + Address + ':' + Path)
    os.system('scp INCAR    ' + Address + ':' + Path)
    os.system('scp POTCAR   ' + Address + ':' + Path)
    os.system('scp KPOINTS  ' + Address + ':' + Path)
    os.system('scp myrun ' + Address + ':' + Path)

    # Step 2
    # Run command
    output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + ' qsub ' + Path + '/myrun', shell=True))

    # Step 3
    # Here we parse job ID from the output of previous command
    jobNumber = int(re.findall(r'\d+', output)[0])
    return jobNumber


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', dest='index', type=int)
    parser.add_argument('-c', dest='commnadExecutable', type=str)
    parser.add_argument('-f', dest='workingDir', type=str)
    args = parser.parse_args()

    jobNumber = submitJob_remote(workingDir=args.workingDir, index=args.index, commnadExecutable=args.commnadExecutable)
    print('CALLBACK ' + str(jobNumber))


In checkStatus_remote.py:

import argparse
import os

from subprocess import check_output

def checkStatus_remote(jobID : int, workingDir : str, index : int) -> bool:
    """
    This routine is to check if the submitted job is done or not
    One needs to do a little edit based on your own case.
    Step1: Specify the PATH to put your calculation folder
    Step2: Check JobID, the exact command to check job by jobID
    :param jobID:
    :param index:
    :param workingDir:
    :return:
    """
    # Step 1
    Home = '/home/etikhonov'  # 'pwd' of your home directory of your remote machine
    Address = 'rurik'  # Your target supercomputer: username@address or ssh alias
    # example of address: user@somedomain.edu -p 2222
    Path = Home + '/' + workingDir + '/CalcFold' + str(index)  # just keep it

    # Step 2
    output = str(check_output('ssh ' + Address + ' qstat ' + str(jobID), shell=True))
    # If you using full adress without ssh alias, you must provide valid ssh private key like there:
    # output = str(check_output('ssh -i ~/.ssh/id_rsa ' + Address + ' /usr/bin/qstat ' + str(jobID), shell=True))

    if not ' R ' in output or not ' Q ' in output:
        doneOr = True
        # [nothing, nothing] = unix(['scp -i ~/.ssh/id_rsa ' Address ':' Path '/OUTCAR ./']) %OUTCAR is not necessary by default
        os.system('scp ' + Address + ':' + Path + '/OSZICAR ./')  # For reading enthalpy/energy
        os.system('scp ' + Address + ':' + Path + '/CONTCAR ./')  # For reading structural info
        # Edit ssh command as above!
    else:
        doneOr = False
    return doneOr


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-j', dest='jobID', type=int)
    parser.add_argument('-i', dest='index', type=int)
    parser.add_argument('-f', dest='workingDir', type=str)
    args = parser.parse_args()

    isDone = checkStatus_remote(jobID=args.jobID, workingDir=args.workingDir, index=args.index)
    print('CALLBACK ' + str(int(isDone)))