AI Research Computing Infrastructure Engineer

Company: Frederick National Laboratory for Cancer Research
Location: Frederick
Posted on: February 22, 2026

Job Description:

AI Research Computing Infrastructure Engineer Job ID: req4426 Employee Type: exempt full-time Division: Enterprise Information Technology Facility: Frederick: Ft Detrick Location: PO Box B, Frederick, MD 21702 USA The Frederick National Laboratory is operated by Leidos Biomedical Research, Inc. The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human clinical trials, applications of nanotechnology in medicine, and rapid response to emerging threats of infectious diseases. Accountability, Compassion, Collaboration, Dedication, Integrity and Versatility; it's the FNL way. PROGRAM DESCRIPTION The mission of Enterprise Information Technology (EIT) is to develop an enterprise-level, consolidated information technology infrastructure that provides exceptional IT capabilities to the Frederick National Labs for Cancer Research (NCI-Frederick/FNLCR) in support of basic, translational, and clinical cancer and AIDS research. The IT Operations Group (ITOG) is a part of Enterprise Information Technology (EIT) within Leidos Biomedical Research, Inc. ITOG is responsible for computational servers, storage servers, virtual machine infrastructure, and the FNLCR network. ITOG focuses on implementing enterprise IT best practices in the areas of computational services, storage, backup, and archiving; batch and application support; server consolidation and virtualization; network infrastructure; unification of voice, teleconferencing, and video communication technologies; and improved infrastructure for collocation of dedicated servers. KEY ROLES/RESPONSIBILITIES: The Research Computing Infrastructure Engineer will design, build, and operate next-generation high-performance computing (HPC) environments that support container-based workflows and GPU-accelerated research computing. The position will play a key role in evaluating, implementing, and maintaining scalable and secure computing architectures for advanced data analysis, AI/ML model training, and simulation workloads. The engineer will collaborate closely with researchers, IT professionals, and external partners to translate scientific requirements into reliable, high-performance computing solutions. Design and implement next-generation high-performance computing (HPC) environments that leverage container-driven workflows for GPU-accelerated research. Build and maintain container orchestration systems for batch and distributed workloads. Integrate containerized job workflows with existing HPC schedulers and storage systems. Develop and maintain job templates for batch GPU training and multi-node distributed computing. Automate deployment, configuration, and scaling through infrastructure-as-code and CI/CD practices. Monitor, benchmark, and optimize system performance, reliability, and resource utilization. Collaborate with researchers to containerize and optimize legacy workflows for scalable execution. Lead evaluation of emerging tools (e.g., Prefect, Ray, Airflow, Dagster) for workflow orchestration and distributed computing. Contribute to the development of tools and bridges between orchestration frameworks and traditional HPC environments. BASIC QUALIFICATIONS To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below: Possession of Bachelor’s degree from an accredited college/university according to the Council for Higher Education Accreditation (CHEA) or four (4) years relevant experience in lieu of degree. Foreign degrees must be evaluated for U.S. equivalency. In addition to the education requirement, a minimum of eight (8) years of related experience. Strong Linux systems engineering and administration experience. Hands-on experience with container orchestration tools such as Kubernetes, Nomad, Run:AI, etc. Hands-on experience with scripting/programming skills (Python, Bash, or Go) for automation, monitoring, and job orchestration. Experience with infrastructure-as-code / automation tooling (Terraform, Ansible, Packer, or equivalent). Familiarity with system performance analysis, monitoring, and tuning. Comfortable with small-team environments and taking end-to-end ownership of compute infrastructure. Ability to obtain and maintain a security clearance. PREFERRED QUALIFICATIONS Candidates with these desired skills will be given preferential consideration: Experience with multi-node distributed ML frameworks (PyTorch DDP, Ray, Horovod, TensorFlow,etc). Familiarity with pipeline orchestration tools (Prefect, Airflow, Dagster, Kubeflow). Understanding of resource management and scheduling concepts (queues, allocations, GPU device plugins, gang scheduling, multi-node coordination). Understanding of storage integration with high-performance clusters (POSIX object storage, VAST or similar). Familiarity with cloud GPU environments (AWS, GCP, Azure) and hybrid workflows. Familiarity with workflow orchestration/pipeline tools (Argo, Kubeflow, Ray, MLFlow). Good communication and documentation skills, the ability to make complex infrastructure understandable to researchers and other engineers. EXPECTED COMPETENCIES: Expertise in Kubernetes, Nomad, or equivalent container orchestration systems for large-scale computing. Deep knowledge of Linux systems administration, performance tuning, and automation. Ability to translate research computing needs into scalable, reliable infrastructure designs. Commitment to documentation, reproducibility, and open science principles. Collaborative mindset and willingness to mentor peers in containerization and HPC best practices. Commitment to Non-Discrimination All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, color, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law. Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws. Pay and Benefits Pay and benefits are fundamental to any career decision. That's why we craft compensation packages that reflect the importance of the work we do for our customers. Employment benefits include competitive compensation, Health and Wellness programs, Income Protection, Paid Leave and Retirement. More details are available here 123,800.00 - 207,125.00 USD The posted pay range for this job is a general guideline and not a guarantee of compensation or salary. Additional factors considered in extending an offer include, but are not limited to, responsibilities of the job, education, experience, knowledge, skills, and abilities as well as internal equity, and alignment with market data. The salary range posted is a full-time equivalent salary and will vary depending on scheduled hours for part time positions

Keywords: Frederick National Laboratory for Cancer Research, Arlington , AI Research Computing Infrastructure Engineer, IT / Software / Systems , Frederick, Virginia

Didn't find what you're looking for? Search again!

Let Frederick recruiters find you. Post your resume for free!

Get Frederick IT / Software / Systems jobs via email.

View more Arlington IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Cyber Security Engineer
Description: Job Description Job Description Altus Consulting is seeking a skilled Cyber Security Engineer to analyze, design, and implement security solutions across various client environments. You will collaborate (more...)
Company: Altus Consulting Corp
Location: Herndon
Posted on: 02/23/2026

Governance Analyst Level 2 Job 377
Description: Job Description Job Description Position Title Governance Analyst Level 2 Required Experience: Level 2 requires a minimum of 2 years with M.S., or 5 years with B.S. Required Clearance: TS/SCI CI (more...)
Company: Allen Integrated Solutions
Location: Chantilly
Posted on: 02/23/2026

Commissioning Engineer, AMER-East ACx
Description: Amazon Web Services AWS , a leader in cloud computing platforms, is seeking a highly skilled and motivated Commissioning Engineer for the AMER-East ACx team. This role is essential for the successful (more...)
Company: Amazon Web Services, Inc.
Location: Herndon
Posted on: 02/23/2026

Salary in Arlington, Virginia Area | More details for Arlington, Virginia Jobs |Salary

Cybersecurity Systems Engineer - Senior (CAASM)
Description: Job Description Job Description Job Description: Cybersecurity Systems Engineer Senior CAASM Location: Washington D.C. Employment Type: Full-TimeAbout SERVISSAt SERVISS, we deliver cutting-edge cybersecurity (more...)
Company: SERVISS LLC
Location: Fairfax
Posted on: 02/23/2026

Software Engineer/Developer - TS/SCI FSP
Description: Job Description Job Description Software Engineer/Developer TS/SCI FSP Department: Government Customer- Chantilly Location: Chantilly, VA ACTIVE TS/SCI CLEARANCE with FS
Company: Tenica and Associates LLC
Location: Chantilly
Posted on: 02/23/2026

Senior Full Stack Developer
Description: Job Description Job Description Northstrat is seeking an experienced and driven Senior Full Stack Developer to join our dynamic team. The ideal candidate will have extensive experience in both back-end (more...)
Company: Northstrat
Location: Chantilly
Posted on: 02/23/2026

eDiscovery Technical Advisor (Top Secret Clearance Required)
Description: Job Description Job Description eDiscovery Technical Advisor Employment Type: Full-Time, Executive-Level Department: Legal CGS is seeking a dedicated eDiscovery Technical Advisor to join a fast-paced (more...)
Company: Contact Government Services, LLC
Location: Chantilly
Posted on: 02/23/2026

Senior Relativity Archiving Analyst
Description: Job Description Job Description Senior Relativity Archiving Analyst Employment Type: Full-Time, Experienced Department: Information Technology CGS is seeking a Senior Relativity Archiving Analyst, who (more...)
Company: Contact Government Services, LLC
Location: Chantilly
Posted on: 02/23/2026

Intelligence Analyst 1 - Spanish Linguist
Description: Job Description Job Description Solutions Through Innovative Technologies, Inc. STI-TEC specializes in the delivery of professional business and information management services. STI-TEC offers government (more...)
Company: Dynamics ATS Organic
Location: Merrifield
Posted on: 02/23/2026

Project Schedule Mgr, AMER Construction Programs
Description: Amazon Web Services AWS , a leader in providing innovative cloud computing solutions, is seeking an experienced Project Schedule Manager for our AMER Construction Programs. This role requires expertise (more...)
Company: Amazon Web Services, Inc.
Location: Herndon
Posted on: 02/23/2026

Loading more jobs...

AI Research Computing Infrastructure Engineer

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account