Team Lead for HPC Engineering (Hybrid Eligible)

Date: Aug 20, 2024

Location: Oak Ridge, TN, US, 37830

Company: Oak Ridge National Laboratory

Requisition Id 13570 

 

 

Due to the security clearance requirements of this position, we are unable to consider EAD and Green Card holders. Additionally, Visa sponsorship of any kind is unavailable for this position (H1B, F1 (including exercising OPT), H4, J1, etc.).

 

Overview:    

We're hiring a Team Lead for HPC Engineering to focus on the growth and management of our team, and providing oversight of technical projects/operations in direct support of our customers and projects! This position resides in the emerging Technologies and Computing (ETAC) group in the Research Computing Support (RCS) division of the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL).

 

What will you be doing?  

ETAC focuses on support researcher’s HPC computing, Data engineering and management, Infrastructure as a Service, and new technology needs. Staff charge directly to projects or programs, and not to a centralized budget. Our goal is to enable research while protecting ORNL assets, especially from cyber incidents.

 

Team Leads manage staff, assigning them to projects in many directorates. RCS staff who support those directorates report directly to you. The expectation is that a small amount of your time will be spent on research projects as well. Routine meetings with the customer (those providing funding) and the directorate management are needed to ensure IT needs are being met, and to manage workload as projects ramp up and down.

 

Additionally, you'll plan and manage updates, upgrades, O&M services for HPC and artificial intelligence/machine learning environments providing research staff and their collaborators with access to critically important, computational research platforms and state of the art post processing, modeling, simulation, and visualization capabilities. You are also responsible for scientific and research SW, storage systems, connections to other research systems and user support.

 

Major Duties/Responsibilities: 

Staff Management: 

  • Hire, develop, and coach staff to ensure project and team success including, but not limited to, scheduling of preventive maintenance, performing patching, and upgrading of research SW and systems, planning and implementing HW/SW refresh/upgrades, and ensuring systems are prepared to enable disaster recovery if necessary.
  • Determine responsibilities of assigned organization and staff positions to accomplish business objectives.
  • Develop and lead training planning/budgeting, and ensure assigned employees are aware of and follow ORNL, and any other applicable policies, procedures, and regulations.
  • Manage staff performance, including creation of yearly performance and development plans, and year-end assessments.
  • Meet with staff regularly, providing technical expertise and guidance, ensuring they have the vital resources and support and to understand current projects and assist with removing roadblocks.
  • Time and attendance oversight and approval.
  • Provide reports as requested, such as major accomplishments or staffing per project.
  • Provide project management of informal to formal projects ensuring on-time, on-scope, and on-budget delivery.

 

Leadership and Collaboration: 

  • Create and foster partnerships with HPC research at ORNL to encourage best in breed delivery of services, while ensuring that compliance and governance control protocols are followed.
  • Leverage technical background to recommend, design, and develop HPC research products.
  • Review operating metrics and direct the resolution of operational and maintenance problems to prevent operational delays.
  • Develop and maintain estimates and plans to include labor hours, production cost, level of services, maintenance, and provide guidance to the development of new concepts.
  • Develop business relationships with customers and directorate management, understanding their business and where IT could provide services.
  • Keep customers and directorate management informed of major IT initiatives, identifying how they might affect Directorate operations.
  • Provide guidance in the development of processes, including personnel requirements, material needs, subcontract requirements, and equipment needs.
  • Work with researchers to develop requirements for future computational resources.

 

Team Leads in RCS contribute to the Division, Group, and Team’s business continuity by engaging in cross-training with other Team Leads and individual contributors. Cross-training activities could include any of the following:

  • Design, install, configure, secure, and maintain HPC systems, and related application software in support of project needs.
  • Advise in the selection and purchase of hardware and software systems.
  • Provide HPC architecture design.
  • Manage backup services.
  • Ensure HPC configuration management through usage of tools.
  • Ensure the secure and effective operation of HPC systems through compliance with ORNL procedures and IT Internal Operating Procedures.
  • Monitor for system issues.
  • Solve HPC system problems quickly and effectively.
  • Collaborate with other HPC systems engineers and vendors to resolve hardware and software issues.
  • Answer user calls related to primary project work.
  • Develop and document system and service diagrams, procedures, and software build/install notes.

 

Deliver ORNL’s mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote diversity, equity, inclusion, and accessibility by encouraging a respectful workplace – in how we treat one another, work together, and measure success.  

 

Basic Qualifications: 

  • A BS in computer science, information technology, science, engineering, business or a related field of study and eight (8) to twelve (12) years of professional computing experience is required for consideration. An overall combination of equivalent experience combined with demonstrated superior performance in the field may be considered.
  • Eight (8) or more years of proven experience including managing research IT, including HPC.
  • Experience working with HPC hardware, software, and monitoring – demonstrated experience delivering systems with Infiniband, Slurm, Lustre, RDMA, Weka, and related technologies central to this team’s work.
  • Management experience, including training and mentoring groups and individuals in sophisticated technical environments.
  • Experience leading technical projects.
  • Experience running and leading projects, ensuring they are delivered on time, within budget, and to the highest quality standards.
  • Experience maintaining as-built system documentation/baseline.
  • Effective written and verbal communication skills.
  • Demonstrated ability to lead (projects, teams, etc.).
  • Working with 3rd party and vendors to provide SW, HW, and service resources.

 

Preferred Qualifications: 

  • Excellent interpersonal skills suitable for communication with customers and management.
  • Experience architecting the design of advanced computational environments/systems.
  • AI/ML experience.
  • Experience establishing operational/production and quality control standards, budget and cost controls, and obtaining information regarding types, quantities, specifications, and delivery dates of products ordered.
  • Experience procuring advanced computational systems.
  • Experience leading a team in installation, startup, testing, and commissioning advanced computational systems.
  • Experience coordinating activities with other functions of the organization and suppliers to obtain optimum production and utilization of human resources, machines, and equipment.
  • Experience creating and monitoring detailed WBS and associated cost plans.
  • Experience creating and managing annual preventive maintenance plan/schedule and annual disaster recovery plan.
  • Creating research/science monthly activities report and research quarterly status reports.

 

Special Requirements:  

  • Visa sponsorship is not available for this position.  
  • This position requires the ability to acquire and maintain a clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program. 

 

Benefits at ORNL:   

ORNL offers competitive pay and benefits programs to attract and retain dedicated people. The laboratory offers many employee benefits, including medical and retirement plans and flexible work hours, to help you and your family live happy and healthy. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also provided for convenience.

 

Other benefits include the following: Prescription Drug Plan, Dental Plan, Vision Plan, 401(k) Retirement Plan, Contributory Pension Plan, Life Insurance, Disability Benefits, Generous Vacation and Holidays, Parental Leave, Legal Insurance with Identity Theft Protection, Employee Assistance Plan, Flexible Spending Accounts, Health Savings Accounts, Wellness Programs, Educational Assistance, Relocation Assistance, and Employee Discounts.

 

In addition, we offer a flexible work environment that supports both the organization and the employee. A hybrid/onsite working arrangement may be available with this position. 

 

Having difficulty using the online application system or need an accommodation to apply due to a disability? Please email: ORNLRecruiting@ornl.gov or call 1.866.963.9545.

 

This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.

 

We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.


ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply.  UT-Battelle is an E-Verify employer.


Nearest Major Market: Knoxville