Site Reliability Engineer II - CTJ - Top Secret

Location: Reston
Posted on: June 23, 2025

Job Description:

Do you have a passion for high scale services and working with some of Microsoft’s most critical customers? We’re looking for a Site Reliability Engineer II with the right mix of software development, on-line services experience and passion for quality to envision, design, and deliver Office 365 government cloud service offerings. Office 365 is at the center of Microsoft’s cloud first, devices first strategy as it brings together cloud versions of our most trusted communication and collaboration products like Exchange, SharePoint, and Teams with our cross-platform desktop suites and mobile apps. The Office 365 Enterprise Cloud team works with Microsoft’s largest enterprise and government customers to deliver features that meet their specific needs and enable cloud adoption. As you would expect, our customers have the highest expectations for feature quality, security, reliability, availability, and performance. The Site Reliability Engineering (SRE) team provides leadership, direction and accountability for application architecture, system design, and end-to-end implementation. As a Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. Collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our government customers and users. At Microsoft, we can offer you great teams, exciting challenges, and a fun place to work. The work environment empowers you to have a positive impact on millions of end users. The right candidate for this job (is): Passionate about distributed systems and working with highly scalable services. Enjoys new technological challenges and is motivated to solve them. Excited about making better software and continuously improving the development, integration, and deployment processes. Smart, highly motivated, self-starter who thrives in a bottoms-up, fast-paced, highly technical environment. Effective collaborator, experienced in creating technical partnerships across teams. Unwavering passion for meeting customer demands and delivering a dial tone service. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Qualifications Required/Minimum Qualifications: Masters Degree in Computer Science, Information Technology, or related field OR Bachelors Degree in Computer Science, Information Technology, or related field AND 1 years of technical experience in software engineering, network engineering, or systems administration OR 4 years of technical experience in software engineering, network engineering, or systems administration Other Requirements: Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Candidates must have an active Top Secret and be willing to upgrade to TS/SCI (with polygraph). This role will require candidates to maintain the TS/SCI (with polygraph) clearance. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate clearance and/or customer screening requirements may result in employment action up to and including termination. Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. Clearance Verification : This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment. Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance Preferred/Additional Qualifications: Masters Degree in Computer Science, Information Technology, or related field AND 1 years of technical experience in software engineering, network engineering, or systems administration OR Bachelors Degree in Computer Science, Information Technology, or related field AND 2 years of technical experience in software engineering, network engineering, or systems administration OR 5 years of technical experience in software engineering, network engineering, or systems administration Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year. Microsoft will accept applications for the role until June 26, 2025 Responsibilities Technical Knowledge and Domain-Specific Expertise Demonstrates expertise in distributed systems design, interactions between cloud technology layers and components, common dependencies at scale, and the code that defines infrastructures. Can identify and recommend configurations optimal of cloud technology solutions and modify the code base that defines systems or cloud technologies to improve the reliability and operability of supported products with minimal guidance from other engineers. Develops an understanding of the code, features, and operations of specific products at scale as required to contribute to incremental improvements in product availability, reliability, efficiency, observability, and/or performance; participates in on-boarding, code/design reviews, and regular meetings with the engineering teams that develop and/or manage those products. Researches and maintains an awareness in industry trends, advances in distributed systems and cloud technologies, new tools, and/or processes for maintaining and improving product availability, reliability, efficiency, observability, and/or performance. Contributes to the implementation of new solutions within their team by identifying ways they can be applied to solve persistent problems. Contributions to Development and Design Leverages technical expertise in large scale distributed systems and specific products, as well as objective insights drawn from analyses of production telemetry data to suggest changes or add-ons to product features or code to improve the availability, reliability, efficiency, observability, and performance of product components or features supported by their team. Develops and tests basic changes to optimize code and improve the observability, reliability and operability of a defined range of platform, system, or product components or features with direction from other engineers. Engages with product engineering teams by participating code/design reviews, regular meetings, on-call rotations and incident responses throughout product development and operations cycles; leverages technical expertise on underlying systems/platforms and insights drawn from engagements with product engineering teams and telemetry analyses to propose potential improvements in code base and designs across components and features of one or more products. Driving Operational Excellence Independently develops code or scripts that automate the performance of repetitive and easily scalable operations processes (e.g., monitoring, alerting, deploying products and updates) across components and features of products operating at scale. Leverages technical expertise and telemetry analysis across a range of components and/or features to identify patterns and opportunities to implement configuration and data changes for one or more platforms, systems, or products in production using code, tooling, and automation. Identifies opportunities to leverage existing tools and automation to enable product engineering teams to increase the velocity in which they can reliably and safely implement changes in production; monitors the effects of changes across multiple components or features within a single platform or system. Designs, develops, and maintains telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of product components and features operating at scale. Independently performs analyses using existing tools and/or models to identify insights and shares them with product engineering teams to directly contribute to improvements in product development and/or operations; monitors the impact of changes on operations metrics (e.g., Time-to-X). Independently uses existing tools and/or models to troubleshoot problems or flaws affecting the availability, reliability, performance, and/or efficiency of components and features; proposes solutions that will resolve and prevent recurring issues and brings them to the attention of their Site Reliability Engineering (SRE) and/or product engineering teams. Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams and owners to major customer impacting issues and escalates resolution of highly impactful issues affecting multiple components or features to other engineers or engineering teams as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings. Develops alerts and instrumentation across components and features to monitor product capacity and resource demands and analyze telemetry data using existing capacity planning models; draws insights from analyses of capacity and resource data to optimize component and feature code to manage resources and capacity across limited range of use conditions and system parameters. Utilizes insights from performance and resource monitoring tools to identify whether there is a need to optimize the efficiency of component and feature code, or if changes to compute resources are required; models the predicted effect of changes to code and/or compute resources across components or features to document the efficacy of proposed solutions. Shares insights and best practices that can be applied to improve development and operations of system, platform, or product components and features by participating in code/design reviews, incident drills and debriefs, and regular meetings, as well as interactions with more experienced SREs and members of product engineering teams. Additional Duties Design, develop, and deliver the required software engineering to serve and protect O365 government clouds. Own deployment, availability, reliability, performance and customer escalation targets for sovereign environments. Proactively identify and reduce issues through design, testing, and implementation of software-based solutions. Collaborate with Engineering and Program Management partners to translate customer, business, and technical requirements into architectural designs and feature releases. Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability. Work within a highly skilled team of engineers to deliver revolutionary improvements to the cloud and scale them. Other Embody our culture and values

Keywords: , Arlington , Site Reliability Engineer II - CTJ - Top Secret, IT / Software / Systems , Reston, Virginia

Didn't find what you're looking for? Search again!

Let Reston recruiters find you. Post your resume for free!

Get Reston IT / Software / Systems jobs via email.

View more Arlington IT / Software / Systems jobs

Other IT / Software / Systems Jobs

(15h Left) Travel Nurse RN - CVICU
Description: Job Description Sharp Medical Staffing is seeking a travel nurse RN CVICU for a travel nursing job in Baltimore, Maryland. Job Description amp Requirements - Specialty: CVICU - Discipline: RN - Start (more...)
Company:
Location: Baltimore
Posted on: 06/24/2025

Salon Hair Stylist (Licensed Hair Stylist)
Description: Licensed Hair Stylist - Let Your Talent Pay Off Create. Earn. Grow. Repeat. At Hair Cuttery, stylists aren't just employees, they're the magic behind the mirror. We give you the tools, technology and (more...)
Company:
Location: Nottingham
Posted on: 06/24/2025

Travel Nurse RN - ER/Trauma - $1,832 per week in Baltimore, MD
Description: TravelNurseSource is working with AHS NurseStat to find a qualified ER/Trauma RN in Baltimore, Maryland, 21201 Pay Information 1,832 per week About The Position NurseStat is looking for a Long Term (more...)
Company:
Location: Baltimore
Posted on: 06/24/2025

Salary in Arlington, Virginia Area | More details for Arlington, Virginia Jobs |Salary

Travel Mammography Technologist [Only 24h Left]
Description: Job Description MedPro Healthcare Allied Staffing is seeking a travel Mammography Technologist for a travel job in Bel Air, Maryland. Job Description amp Requirements - Specialty: Mammography Technologist (more...)
Company:
Location: Bel Air
Posted on: 06/24/2025

Customer Service Representative (Loan Consultant I)
Description: At Lendmark Financial Services, we believe the success of our company is specifically attributable to the quality of our employees and their commitment to our customers. We value each customer and understand (more...)
Company:
Location: Laurel
Posted on: 06/24/2025

Travel Nurse RN - Telemetry - $1,732 to $1,885 per week in Bel Air, MD
Description: TravelNurseSource is working with Host Healthcare to find a qualified Telemetry RN in Bel Air, Maryland, 21014 Pay Information 1,732 to 1,885 per week About The Position Host Healthcare is an award-winning (more...)
Company:
Location: Bel Air
Posted on: 06/24/2025

Software Engineer (Hybrid) - 23533
Description: Enlighten, honored as a Top Workplace from USA Today, is a leader in big data solution development and deployment, with expertise in cloud-based services, software and systems engineering, cyber capabilities, (more...)
Company:
Location: Columbia
Posted on: 06/24/2025

Travel Nurse RN - Labor/Delivery - $1,934 per week in Baltimore, MD
Description: TravelNurseSource is working with Medical Talent to find a qualified Labor/Delivery RN in Baltimore, Maryland, 21204 Pay Information 1,934 per week About The Position Join the Medical Talent Team Join (more...)
Company:
Location: Baltimore
Posted on: 06/24/2025

Salon Hair Stylist (Licensed Hair Stylist)
Description: Licensed Hair Stylist - Let Your Talent Pay Off Create. Earn. Grow. Repeat. At Hair Cuttery, stylists aren't just employees, they're the magic behind the mirror. We give you the tools, technology and (more...)
Company:
Location: Greenbelt
Posted on: 06/24/2025

Software Test Engineer - (Selenium WebDriver)
Description: Remote with some travel This Jobot Job is hosted by: Matt Desiderio Are you a fit Easy Apply now by clicking the Apply Now button and sending us your resume. Salary: 90,000 - 130,000 per year A bit (more...)
Company:
Location: Baltimore
Posted on: 06/24/2025

Loading more jobs...

Site Reliability Engineer II - CTJ - Top Secret

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account