Research Scientist, HPC Workflows
Date: Mar 4, 2026
Location: Oak Ridge, TN, US, 37830
Company: Oak Ridge National Laboratory
Requisition Id 16060
Overview:
Oak Ridge National Laboratory (ORNL), home to some of the world’s most powerful supercomputers, is seeking a Research Scientist in HPC Workflows to design, orchestrate, and maintain computational workflows that enable reproducible, scalable science on leadership-class systems. You will collaborate with researchers across diverse domains to translate scientific objectives into robust pipelines, automate job orchestration and data movement, and optimize end-to-end workflow performance on large-scale Linux-based HPC environments and hybrid cloud/HPC platforms.
Job Duties and Responsibilities May Include:
- Support research on HPC workflows to support the mission of the National Center for Computational Sciences.
- Workflow Design and Orchestration: Architect, implement, and maintain HPC workflows and pipelines that leverage job schedulers (e.g., SLURM, PBS) with job dependencies, arrays, and resource-aware templates. Establish reproducible execution patterns, including environment setup, module management, data staging, and cleanup.
- Scripting and Tooling: Develop command-line tools and automation in Python, Bash, and/or C/C++ to encapsulate workflow steps, manage configuration files (e.g., YAML/JSON), and implement robust logging, error handling, and checkpoint/retry strategies.
- Operational Reliability and Optimization: Diagnose job failures, mitigate bottlenecks, and improve throughput, latency, and resource utilization. Use scheduler and Linux tools (e.g., sacct, squeue, coreutils, ssh, tmux, top, iostat) to monitor, analyze, and tune workflows.
- Automation and Version Control: Implement CI/CD practices for workflow deployment, create templates and reusable libraries, and manage changes with Git. Automate environment provisioning and repeatable execution across systems and users.
- Collaboration and User Enablement: Consult with researchers to understand requirements, translate them into executable workflows, and provide documentation, training, and examples. Partner with operations teams to align workflows with policies and best practices.
- Observability and Reporting: Build simple status dashboards or reports for workflow health and progress. Aggregate job metrics, queue statistics, and resource usage to inform planning and continuous improvement.
- Security and Compliance: Apply basic cyber-security principles (e.g., SSH key hygiene, least privilege, firewall rules) to workflow design and operations. Handle credentials and secrets responsibly.
- Documentation and Support: Author clear, user-focused documentation and contribute to playbooks, runbooks, and knowledge bases. Participate in an on-call rotation for critical workflows as needed.
- Cloud and Hybrid HPC Integration: Design and operate workflows on public cloud platforms (AWS, Azure, or GCP) and in hybrid on-prem/cloud environments. Leverage cloud object storage (e.g., Amazon S3) for data staging and artifacts; implement parallel, secure data movement and lifecycle policies.
Basic Qualifications:
- Ph.D. in Computer Science, Computer Engineering, Computational Engineering, or a closely related field.
- At least 2 years of experience working with Linux-based systems; familiarity with core utilities and managed services such as coreutils, ssh, tmux, and common system services.
- At least 1 year of programming experience in one or more of Python, C/C++, or Bash.
- Strong verbal and written communication skills, with the ability to collaborate across technical and scientific teams.
Preferred Qualifications:
- Demonstrated experience in leading scientific research and publishing in high impact venues.
- Experience with HPC job schedulers such as SLURM or PBS.
- Familiarity with basic cyber-security principles (e.g., firewalls, network segmentation, secure configuration).
- Basic web development skills (e.g., HTML, CSS) for lightweight dashboards or documentation.
Security, Credentialing, and Eligibility Requirements:
For employment at Oak Ridge National Laboratory (ORNL), a Real ID compliant form of identification will be required. Additionally, ORNL is subject to Department of Energy (DOE) access restrictions. All employees must also be able to obtain and maintain a federal Personal Identity Verification (PIV) card as mandated by Homeland Security Presidential Directive 12 (HSPD-12) and Department of Energy (DOE) Order 473.1A, which requires a favorable post-employment background investigation.
To obtain this credential, new employees must successfully complete and pass a Federal Tier 1 background check investigation. This investigation includes a declaration of illegal drug activities, including use, supply, possession, or manufacture within the last year. This includes marijuana and cannabis derivatives, which are still considered illegal under federal law, regardless of state laws.
For foreign national candidates:
If you have not resided in the U.S. for three consecutive years, you are not eligible for the PIV credential and instead will need to obtain a favorable Local Site Specific Only (LSSO) risk determination to maintain employment. Once you meet the three-year residency requirement, you will be required to obtain a PIV credential to maintain employment.
Remote/Hybrid Eligibility:
This position is eligible for hybrid (onsite + remote) or fully remote work within approved U.S. States, subject to business needs, access/security requirements, and approvals. We offer a flexible work environment that supports both the organization and the employee. A hybrid/onsite working arrangement may also be available with this position, which provides flexibility to work periodically from your home, while reporting onsite to the Oak Ridge, Tennessee location on a weekly and regular basis.
About ORNL:
As a U.S. Department of Energy (DOE) Office of Science national laboratory, ORNL has an impressive 80-year legacy of addressing the nation’s most pressing challenges. Our team is made up of over 7,000 dedicated and innovative individuals! Our goal is to create an environment where a variety of perspectives and backgrounds are valued, ensuring ORNL is known as a top choice for employment. These principles are essential for supporting our broader mission to drive scientific breakthroughs and translate them into solutions for energy, environmental, and security challenges facing the nation.
Why Join Us
- Work on the world’s most powerful supercomputers, including Frontier, the first system to achieve exascale performance.
- Enable breakthrough science in fields like fusion energy, climate modeling, AI, and national security.
- Collaborate with diverse teams of scientists, engineers, and technologists from across the DOE complex and academia.
- Grow your career in a mission-driven, innovation-focused environment with access to professional development and leadership opportunities.
- Enjoy life in East Tennessee, with a thriving research community, scenic outdoor recreation, and a high quality of life.
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.
Nearest Major Market: Knoxville