Additional go-live timeline changes for jobsub_lite

Dear Jobsub users,

 

After we posted the message on Jan. 9, we began to see more testing activity with jobsub_lite, accompanied by constructive feedback and a couple of minor issues. We have decided to postpone the previously announced timeline (see the message posted on Jan. 9 below) by two weeks in order to allow more time for those who desire to test jobsub_lite. We will maintain the same phased rollout as before, but the new timeline will be as follows:

  • Phase 1:
    • On Feb. 1, 2023, between 8 a.m. and 2 p.m., jobsub_lite will be installed on all FIFE (FabrIc for Frontier Experiments) interactive nodes.
    • POMS (Production Operations Management System) users will default to using jobsub_lite.
  • Phase 2:
    • On Feb. 15, 2023, jobsub_client version v_lite will be made current in UPS (Unix Product Support) so all users get jobsub_lite by default.
  • Phase 3:
    • On March 15, 2023, we will stop jobs from being submitted to jobsub_server schedds (jobsub02, jobsub03) from jobsub_client. Jobs can still be managed from jobsub_client.
  • Phase 4:
    • On April 19, 2023, we will decommission jobsub_server. All schedds will now be jobsub_lite schedds, and jobsub_client will no longer work.

 

The rest of the information provided in the message posted on Jan. 9 remains the same.

***Message sent on Jan. 9***

Dear Jobsub users,

We have decided to break up the go-live of jobsub_lite into four phases, rather than the one-phase go-live we communicated to you on Jan. 3 (see the original message below). The four phases will be as follows:

  • Phase 1:
    • On Jan. 18, 2023, between 8 a.m. and 2 p.m., jobsub_lite will be installed on all FIFE (FabrIc for Frontier Experiments) interactive nodes.
    • POMS (Production Operations Management System) users will default to using jobsub_lite.
  • Phase 2:
    • On Feb. 1, 2023, jobsub_client version v_lite will be made current in UPS (Unix Product Support) so all users get jobsub_lite by default.
  • Phase 3:
    • On March 1, 2023, we will stop jobs from being submitted to jobsub_server schedds (jobsub02, jobsub03) from jobsub_client. Jobs can still be managed from jobsub_client.
  • Phase 4:
    • On April 5, 2023, we will decommission jobsub_server. All schedds will now be jobsub_lite schedds, and jobsub_client will no longer work.

 

The rest of the information provided in the original message remain the same.

 

*** ORIGINAL MESSAGE****

WHAT ARE WE DOING?

The Computational Science and Artificial Intelligence Directorate will be releasing jobsub_lite, the rewrite of jobsub_client/jobsub_server, to FIFE (FabrIc for Frontier Experiments) experiment interactive machines.

WHEN WILL THIS OCCUR?

Wednesday, Jan. 18, 2023; from 8 a.m. to 2 p.m. Central Time

WHAT IS THE IMPACT TO YOU?

There are two major impacts:

  1. jobsub_lite uses SciTokens (https://scitokens.org) to authorize users to submit jobs, write to storage, etc. The first time users submit jobs, they will have to authenticate with the token issuer (CILogon) using their Services account username and password. Detailed instructions are available here: https://fifewiki.fnal.gov/wiki/Getting_started_with_jobsub_lite#Authentication

 

  1. Users will no longer be required to source UPS-environment-building scripts or setup jobsub_client. By simply logging in to an experiment interactive node, they will be able to run all the normal jobsub commands (jobsub_submit, jobsub_q, etc.).

 

We have tried to make jobsub_lite as close to a jobsub_client replacement as possible, though there will be some slight changes from the current jobsub_client. For a short time, you will be able to submit and manage jobs from both the old jobsub_client and the new jobsub_lite, though jobs submitted with jobsub_lite will not be manageable from jobsub_client, and vice-versa. If you explicitly setup an old (not “current” version after go-live) version of jobsub_client from UPS (UNIX Product Support), you will get jobsub_client, and if you do nothing, you will get jobsub_lite.

 

When jobsub_lite is released, a new version of jobsub_client, v_lite, will be made current in UPS. This new version of jobsub_client will simply point to the jobsub_lite executables. This is being done so scripts that set up jobsub_client will automatically begin to use jobsub_lite. For the aforementioned short period of time when both jobsub_client and jobsub_lite are usable on all interactive nodes, older versions of jobsub_client can be set up by passing the old version to the setup command. We strongly discourage this, but we understand that there may be a few corner cases where using the old jobsub_client might be needed during the transition period.

WHAT DO YOU NEED TO DO?

  • Please test your workflows with jobsub_lite by following the instructions below.
  • If you are able, try to attend one of the training sessions the FIFE Group will be holding in January to learn more about jobsub_lite. See details below.

 

Testing jobsub_lite 

Several experiments’ offline coordinators/liaisons have already requested that jobsub_lite be installed on a single experiment interactive node so users can test. Please reach out to your offline coordinator or liaison to see if jobsub_lite is installed on an interactive node for your experiment, and if so, test your workflows with jobsub_lite. If your experiment does not have a node with jobsub_lite and you want to test, please discuss with your liaison or the FIFE group and we can figure out a place for you to test.

 

jobsub_lite is deployed on the test interactive nodes and will be deployed everywhere via RPM, with the jobsub_lite executables installed into users’ PATHs at login. So, to use jobsub_lite on a node on which it is installed, simply log in to the node, and run the various jobsub commands like before. You don’t need to run the “setup” command from UPS to use jobsub_lite. For example, this is what running jobsub_lite commands would look like on novagpvm03.fnal.gov:

 

$ kinit -f yourusername@FNAL.GOV

$ ssh novagpvm03.fnal.gov

 

… MOTD for novagpvm03

 

-bash-4.2$ which jobsub_submit

/opt/jobsub_lite/bin/jobsub_submit

-bash-4.2$ jobsub_submit -G nova file:///usr/bin/sleep 300

Attempting kerberos auth with https://htvaultprod.fnal.gov:8200 … succeeded

Attempting to get token from https://htvaultprod.fnal.gov:8200 … succeeded

Storing vault token in /tmp/vt_u10610

Storing bearer token in /tmp/bt_token_nova_Analysis_10610

Submitting job(s).

1 job(s) submitted to cluster 57107298.

Use job id 57107298.0@jobsub01.fnal.gov to retrieve output

-bash-4.2$

 

As mentioned above, the first time you do any grid operations using jobsub_lite, you will need to authenticate with our token issuer, CILogon. More information about authentication and submitting jobs using jobsub_lite can be found in this tutorial:

https://fifewiki.fnal.gov/wiki/Getting_started_with_jobsub_lite

 

Training sessions for jobsub_lite 

We plan to hold four more jobsub_lite training sessions in January 2023, one session per week. Two sessions will be held before Jan. 18 and two after Jan. 18. (The first training session was held during a FIFE meeting in December 2022.) Please stay tuned. As soon as we finalize the training dates, we will communicate them to users.

 

If you have any questions, please open a Service Desk ticket to be routed to Distributed Computing Support Group, and we will be happy to answer any questions.

If you have any questions about this message, contact the Service Desk:
https://servicedesk.fnal.gov
servicedesk@fnal.gov
(630) 840-2345