Dear Jobsub users,
We have decided to break up the go-live of jobsub_lite into four phases, rather than the one-phase go-live we communicated to you on Jan. 3 (see the original message below). The four phases will be as follows:
- Phase 1:
- On Jan. 18, 2023, between 8 a.m. and 2 p.m., jobsub_lite will be installed on all FIFE (FabrIc for Frontier Experiments) interactive nodes.
- POMS (Production Operations Management System) users will default to using jobsub_lite.
- Phase 2:
- On Feb. 1, 2023, jobsub_client version v_lite will be made current in UPS (Unix Product Support) so all users get jobsub_lite by default.
- Phase 3:
- On March 1, 2023, we will stop jobs from being submitted to jobsub_server schedds (jobsub02, jobsub03) from jobsub_client. Jobs can still be managed from jobsub_client.
- Phase 4:
- On April 5, 2023, we will decommission jobsub_server. All schedds will now be jobsub_lite schedds, and jobsub_client will no longer work.
The rest of the information provided in the original message remain the same.
*** ORIGINAL MESSAGE****
WHAT ARE WE DOING?
The Computational Science and Artificial Intelligence Directorate will be releasing jobsub_lite, the rewrite of jobsub_client/jobsub_server, to FIFE (FabrIc for Frontier Experiments) experiment interactive machines.
WHEN WILL THIS OCCUR?
Wednesday, Jan. 18, 2023; from 8 a.m. to 2 p.m. Central Time
WHAT IS THE IMPACT TO YOU?
There are two major impacts:
- jobsub_lite uses SciTokens (https://scitokens.org) to authorize users to submit jobs, write to storage, etc. The first time users submit jobs, they will have to authenticate with the token issuer (CILogon) using their Services account username and password. Detailed instructions are available here: https://fifewiki.fnal.gov/wiki/Getting_started_with_jobsub_lite#Authentication
- Users will no longer be required to source UPS-environment-building scripts or setup jobsub_client. By simply logging in to an experiment interactive node, they will be able to run all the normal jobsub commands (jobsub_submit, jobsub_q, etc.).
We have tried to make jobsub_lite as close to a jobsub_client replacement as possible, though there will be some slight changes from the current jobsub_client. For a short time, you will be able to submit and manage jobs from both the old jobsub_client and the new jobsub_lite, though jobs submitted with jobsub_lite will not be manageable from jobsub_client, and vice-versa. If you explicitly setup an old (not “current” version after go-live) version of jobsub_client from UPS (UNIX Product Support), you will get jobsub_client, and if you do nothing, you will get jobsub_lite.
When jobsub_lite is released, a new version of jobsub_client, v_lite, will be made current in UPS. This new version of jobsub_client will simply point to the jobsub_lite executables. This is being done so scripts that set up jobsub_client will automatically begin to use jobsub_lite. For the aforementioned short period of time when both jobsub_client and jobsub_lite are usable on all interactive nodes, older versions of jobsub_client can be set up by passing the old version to the setup command. We strongly discourage this, but we understand that there may be a few corner cases where using the old jobsub_client might be needed during the transition period.
WHAT DO YOU NEED TO DO?
- Please test your workflows with jobsub_lite by following the instructions below.
- If you are able, try to attend one of the training sessions the FIFE Group will be holding in January to learn more about jobsub_lite. See details below.
Several experiments’ offline coordinators/liaisons have already requested that jobsub_lite be installed on a single experiment interactive node so users can test. Please reach out to your offline coordinator or liaison to see if jobsub_lite is installed on an interactive node for your experiment, and if so, test your workflows with jobsub_lite. If your experiment does not have a node with jobsub_lite and you want to test, please discuss with your liaison or the FIFE group and we can figure out a place for you to test.
jobsub_lite is deployed on the test interactive nodes and will be deployed everywhere via RPM, with the jobsub_lite executables installed into users’ PATHs at login. So, to use jobsub_lite on a node on which it is installed, simply log in to the node, and run the various jobsub commands like before. You don’t need to run the “setup” command from UPS to use jobsub_lite. For example, this is what running jobsub_lite commands would look like on novagpvm03.fnal.gov:
$ kinit -f yourusername@FNAL.GOV
$ ssh novagpvm03.fnal.gov
… MOTD for novagpvm03
-bash-4.2$ which jobsub_submit
-bash-4.2$ jobsub_submit -G nova file:///usr/bin/sleep 300
Attempting kerberos auth with https://htvaultprod.fnal.gov:8200 … succeeded
Attempting to get token from https://htvaultprod.fnal.gov:8200 … succeeded
Storing vault token in /tmp/vt_u10610
Storing bearer token in /tmp/bt_token_nova_Analysis_10610
1 job(s) submitted to cluster 57107298.
Use job id firstname.lastname@example.org to retrieve output
As mentioned above, the first time you do any grid operations using jobsub_lite, you will need to authenticate with our token issuer, CILogon. More information about authentication and submitting jobs using jobsub_lite can be found in this tutorial:
Training sessions for jobsub_lite
We plan to hold four more jobsub_lite training sessions in January 2023, one session per week. Two sessions will be held before Jan. 18 and two after Jan. 18. (The first training session was held during a FIFE meeting in December 2022.) Please stay tuned. As soon as we finalize the training dates, we will communicate them to users.
If you have any questions, please open a Service Desk ticket to be routed to Distributed Computing Support Group, and we will be happy to answer any questions.