Team Lead for HPC Engineering (Hybrid Eligible)
Date: Nov 15, 2024
Location: Oak Ridge, TN, US, 37830
Company: Oak Ridge National Laboratory
Requisition Id 13570
Due to the security clearance requirements of this position, we can only consider US citizens.
Overview:
We're hiring a Team Lead for HPC Engineering to focus on the growth and management of our team, and providing oversight of technical projects/operations in direct support of our customers and projects! This position resides in the emerging Technologies and Computing (ETAC) group in the Research Computing Support (RCS) division of the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL).
What will you be doing?
ETAC focuses on support researcher’s HPC computing, Data engineering and management, Infrastructure as a Service, and new technology needs. Staff charge directly to projects or programs, and not to a centralized budget. Our goal is to enable research while protecting ORNL assets, especially from cyber incidents. Team Leads manage staff, assigning them to projects in many directorates. RCS staff who support those directorates report directly to you. The expectation is that a small amount of your time will be spent on research projects as well. Routine meetings with the customer (those providing funding) and the directorate management are needed to ensure IT needs are being met, and to manage workload as projects ramp up and down.
Additionally, you'll plan and manage updates, upgrades, O&M services for HPC/AI environments providing research staff and their collaborators with access to critically important, computational research platforms and state of the art post processing, modeling, simulation, and visualization capabilities. You are also responsible for scientific and research SW, storage systems, connections to other research systems and user support.
Major Duties/Responsibilities:
Staff Management:
- Hire, develop, and coach staff to ensure project and team success including, but not limited to, scheduling of preventive maintenance, performing patching, and upgrading of research SW and systems, planning and implementing HW/SW refresh/upgrades, and ensuring systems are prepared to enable disaster recovery if necessary.
- Develop and lead training planning/budgeting, and ensure assigned employees are aware of and follow ORNL, and any other applicable policies, procedures, and regulations.
- Manage staff performance, including creation of yearly performance and development plans, and year-end assessments.
- Meet with staff regularly, providing technical expertise and guidance, ensuring they have the vital resources and support and to understand current projects and assist with removing roadblocks.
- Provide project management of informal to formal projects ensuring on-time, on-scope, and on-budget delivery.
Leadership and Collaboration:
- Create and foster partnerships with HPC research at ORNL to encourage best in breed delivery of services, while ensuring that compliance and governance control protocols are followed.
- Leverage technical background to recommend, design, and develop HPC research products.
- Review operating metrics and direct the resolution of operational and maintenance problems to prevent operational delays.
- Keep customers and directorate management informed of major IT initiatives, identifying how they might affect Directorate operations.
- Provide guidance in the development of processes, including personnel requirements, material needs, subcontract requirements, and equipment needs.
Team Leads in RCS contribute to the Division, Group, and Team’s business continuity by engaging in cross-training with other Team Leads and individual contributors. Cross-training activities could include any of the following:
- Design, install, configure, secure, and maintain HPC systems, and related application software in support of project needs.
- Provide HPC architecture design and advise in the selection of systems.
- Ensure the secure and effective operation of HPC systems through compliance with ORNL procedures and IT Internal Operating Procedures.
- Collaborate with other HPC systems engineers and vendors to resolve hardware and software issues.
- Develop and document system and service diagrams, procedures, and software build/install notes.
Deliver ORNL’s mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service. Promote diversity, equity, inclusion, and accessibility by encouraging a respectful workplace – in how we treat one another, work together, and measure success.
Basic Qualifications:
- A BS in computer science, information technology, science, engineering, business or a related field of study and eight (8) to twelve (12) years of professional computing experience is required for consideration. An overall combination of equivalent experience combined with demonstrated superior performance in the field may be considered.
- 7 years of Management experience, including mentoring groups and individuals in technical environments.
- 7 years of Experience working with HPC hardware, software, and monitoring – for example, Infiniband, Slurm, Lustre, RDMA, Weka, and related technologies.
- Experience running and leading projects successfully within to the highest quality standards.
- Experience maintaining as-built system documentation/baseline.
- Effective written and verbal communication skills.
- Experience working with 3rd party and vendors to provide SW, HW, and service resources.
Preferred Qualifications:
- Excellent communication skills
- Experience architecting the design of advanced computational environments/systems.
- Experience leading a team in installation, startup, testing, and commissioning advanced computational systems.
- Experience establishing operational/production standards, budget and cost controls; and creating and monitoring detailed WBS and associated cost plans.
Special Requirements:
- Visa sponsorship is not available for this position.
- This position requires the ability to acquire and maintain a clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program.
Benefits at ORNL:
ORNL offers competitive pay and benefits programs to attract and retain dedicated people. The laboratory offers many employee benefits, including medical and retirement plans and flexible work hours, to help you and your family live happy and healthy. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also provided for convenience.
Other benefits include the following: Prescription Drug Plan, Dental Plan, Vision Plan, 401(k) Retirement Plan, Contributory Pension Plan, Life Insurance, Disability Benefits, Generous Vacation and Holidays, Parental Leave, Legal Insurance with Identity Theft Protection, Employee Assistance Plan, Flexible Spending Accounts, Health Savings Accounts, Wellness Programs, Educational Assistance, Relocation Assistance, and Employee Discounts.
In addition, we offer a flexible work environment that supports both the organization and the employee. A hybrid/onsite working arrangement may be available with this position.
Having difficulty using the online application system or need an accommodation to apply due to a disability? Please email: ORNLRecruiting@ornl.gov or call 1.866.963.9545.
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.
Nearest Major Market: Knoxville