Share this Job

Postdoctoral Research Associate - Al Technologies for HPC Operational Data Analytics

Date: Sep 8, 2021

Location: Oak Ridge, TN, US, 37830

Company: Oak Ridge National Laboratory

Requisition Id 6352 

Overview: 

Oak Ridge National Laboratory is the largest US Department of Energy science and energy laboratory, conducting basic and applied research to deliver transformative solutions to compelling problems in energy and security.

 

We are seeking a Postdoctoral Research Associate who will support the Analytics and AI Methods at Scale (AAIMS) group in the National Center for Computational Sciences (NCCS) Division at Oak Ridge National Laboratory (ORNL) in the area of applied AI technologies for HPC operational data analytics.. The AAIMS group develops and deploys emerging data science and AI methods for scientific user programs at scale, and enables advanced facility innovation through operational data analytics.

 

Towards the vision of AI powered self-driven facility operations achieving operational efficiency, the candidate will contribute to leadership-computing facility data analytics by exploring cutting edge AI methodologies that leverages vast amount of real-time operational data streamed from the state-of-art leadership class computing systems.  NCCS operates one of the world’s fastest high performance computing system and will host the first exa-scale system in 2022.

 

 

We are a leader in computational and computer science, with signature strengths in high-performance computing and data analytics with applications in a large variety of science domains. NCCS provides state-of-the-art computational and data science infrastructure coupled with dedicated technical and scientific professionals tackling large-scale problems across a broad range of scientific domains for accelerating scientific discovery and engineering advances. NCCS hosts the Oak Ridge Leadership Computing Facility, one of DOE’s National User Facilities.

 

We are an inclusive environment that welcomes a diversity of creative scientists and engineers with an eagerness to learn and innovate.

 

Major Duties/Responsibilities: 

NCCS collects operational data from various sources such as the HPC facility, storage backends, networks, parallel file systems, and applications and jobs at very high velocity and volume. This will help drive data-centric optimizations on HPC production systems, as well as decision-making for future HPC system acquisitions. Prospective candidates are expected to leverage their expertise in data analytics, such as machine learning, data mining, to help shape the framework and process, conduct in-depth analysis, and make significant contributions to the end-to-end data intelligence initiative in the organization.

 

Major related areas of activities are listed below but are not limited to:

  • Profile and analysis of data-intensive machine learning application workloads at extreme scale 
  • Control parameter optimization towards various HPC facility operations such as cooling, data placement, and resource scheduling
  • Analysis and prediction of various failure events found in many hardware software components deployed at scale
  • Develop efficient algorithms and practical techniques to improve data efficiencies on both existing and future systems
  • Author peer-reviewed papers, technical papers, reports

 

Basic Qualifications:

  • A PhD in Computer Science, Applied Mathematics, or a related field completed within the last 5 years

 

Preferred Qualifications:

  • Experience with machine learning, deep learning building frameworks such as TensorFlow or PyTorch
  • Experience in time-series modeling and pattern recognition
  • Experience in designing and maintaining automated data pipelines & workflows
  • Experience with distributed data/computing tools: Spark, Dask, MySQL, Kafka, Elastic search & etc.
  • Experience working on HPC across multiple CPUs and GPUs
  • Experience visualizing/presenting data for stakeholders using: Business Objects, D3, matplotlib, etc.
  • Strongly motivated to perform and publish leading edge research
  • Excellent written and oral communication skills
  • Motivated self-starter with the ability to work independently and to participate creatively in collaborative teams across the laboratory 
  • Ability to function well in a fast-paced research environment, set priorities to accomplish multiple tasks within deadlines, and adapt to ever changing needs

 

Applicants cannot have received their Ph.D. more than five years prior to the date of application and must complete all degree requirements before starting their appointment. The appointment length will be for up to 24 months with the potential for extension. Initial appointments and extensions are subject to performance and the availability of funding.

 

This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.

We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.


If you have trouble applying for a position, please email ORNLRecruiting@ornl.gov.


ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply.  UT-Battelle is an E-Verify employer.


Nearest Major Market: Knoxville