Postdoctoral Research Associate, Data Readiness

Date: Jan 22, 2026

Location: Oak Ridge, TN, US, 37830

Company: Oak Ridge National Laboratory

Requisition Id 15815 

Overview:

The Workflows and Ecosystem Services (WES) group under the Advanced Technology Section (ATS) of the National Center for Computational Sciences (NCCS) is seeking a postdoctoral research associate to advance the state of scientific AI by addressing cross-cutting challenges in data readiness for AI to enable scalable, reproducible AI workflows on leadership-class systems. This position focuses on researching, designing, and deploying innovative data pipelines and readiness frameworks to tackle obstacles such as data heterogeneity, scalability bottlenecks, privacy compliance, reproducibility, and interoperability across scientific domains. By improving data readiness processes, this role will amplify the potential of AI-driven discovery in areas such as high energy physics, fusion research, life sciences, and materials science. Furthermore, these efforts to enhance data readiness for AI workflows may play a significant role in contributing to the goals of the 2025 Genesis Mission, which seeks to accelerate scientific discovery through the integration of AI-enabled solutions. 

NCCS operates the Frontier exascale supercomputer and world-class HPC infrastructure, giving you access to resources that enable impactful, facility-scale innovation. If you're passionate about creating solutions that empower AI at scale, we encourage you to apply and help shape the future of scientific AI.

Focus Areas:

  • Cross-Domain Interoperability: Develop common readiness templates, standardized metadata models, and APIs to enable seamless integration across diverse scientific domains.
  • Scalability of Preprocessing Pipelines: Design and implement automated, parallel preprocessing workflows capable of handling multi-petabyte datasets efficiently while reducing throughput bottlenecks.
  • Data Scarcity and Quality Dynamics: Investigate methods for addressing sparse labels, non-standard metadata, and imbalanced datasets to improve AI training robustness across scientific domains.
  • Privacy and Compliance Integration: Develop privacy-preserving preprocessing pipelines that operate under stringent regulations (such as HIPAA, CUI, ITAR) while maintaining scalability and secure sharing mechanisms such as federated learning.
  • Provenance and Reproducibility Frameworks: Build systems that enable detailed provenance tracking, schema validation, and auditable workflows to ensure trustworthy and reproducible AI practices.
  • Heterogeneous Data Integration: Address challenges in reconciling experimental, simulation, and observational datasets with varying resolutions, data fidelity, and sampling rates.
  • Intelligent Sampling for Federated Learning: Investigating frameworks (such as SICKLE) for intelligently sampling cross-facility extreme-scale data to enhance federated learning workflows with platforms like APPFL and OmniFed.

Major Duties and Responsibilities:

  • Conduct and publish original research focused on data readiness methodologies and frameworks for scalable AI applications across fluid dynamics, fusion, materials, life sciences, and other strategic domains.
  • Investigate novel approaches for balancing efficient I/O, interoperability, and scientific validity in AI-ready datasets.
  • Design, prototype, and optimize preprocessing pipelines using HPC resources, targeting scalable execution and automation.
  • Collaborate with domain scientists to integrate pipelines into end-to-end AI workflows specific to scientific domains.
  • Publish research outcomes in peer-reviewed journals and conference venues, setting benchmarks and proposing methodologies for cross-disciplinary readiness challenges.
  • Aid in the development and adoption of open standards for scientific dataset processing, including contributing to open-source tools.
  • Mentor interns, students, and peers in cross-domain data readiness approaches.
  • Present findings at technical workshops, scientific meetings, and in outreach efforts to improve awareness around the importance of data readiness for scientific AI.

Basic Qualifications:

  • Ph.D. earned in Computer Science, Data Science, Computational Science, a scientific domain relevant to AI (e.g., physics, biology, chemistry, climate), or a closely related field (within the last 5 years or near completion).
  • Demonstrated expertise in data preprocessing pipelines, AI-ready dataset design, or scientific workflows in HPC environments.
  • Proven experience with modern data frameworks (e.g., PyTorch, TensorFlow), scalable I/O solutions (e.g., HDF5, ADIOS2), and distributed computing tools relevant to data preparation.
  • Evidence of ability to conduct independent research and publish in peer-reviewed venues.

Preferred Qualifications:

  • Hands-on experience prototyping and scaling data pipelines in HPC environments (Frontier-scale or similar).
  • Strong familiarity with domain-specific formats such as NetCDF, CSV/Parquet, FASTA/MMCIF, or graph-based encodings in materials and molecular AI.
  • Familiarity with frameworks for automated and reproducible workflows.
  • Knowledge of governing regulations around privacy (e.g., HIPAA, ITAR), including secure enclave architectures and federated learning approaches.
  • Background in developing reproducible pipelines with validation, provenance tracking, and schema consistency checks.
  • Publications in relevant conferences (e.g., NeurIPS, SC, AAAI, or domain-specific venues like Fusion Science or Computational Materials).
  • Collaborative mindset in team environments and across disciplines.

 

Special Requirements: 

Postdocs: Applicants cannot have received their Ph.D. more than five years prior to the date of application and must complete all degree requirements before starting their appointment. The appointment length will be up to 24 months with the potential for extension. Initial appointments and extensions are subject to performance and availability of funding.

 

Letters of Recommendation: 

Please submit three letters of reference when applying to this position. You may upload these directly to your application or have them sent to Postdocrecruitment@ornl.gov with the position title and number referenced in the subject line.

This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.

We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.


ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply.  UT-Battelle is an E-Verify employer.


Nearest Major Market: Knoxville