Senior Service Engineering Manager

Last updated 4 days ago
Location:Redmond, Washington
Job Type:Full Time

Microsoft is on a mission to empower every person and every organization on the planet to achieve more. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. You can help us achieve our mission.

Cloud Operations + Innovation (CO+I) is the engine that powers Microsoft’s core cloud platforms and services that millions of people use every day. With more than 95% of Fortune 500 business on Azure, 180 million using Office 365, and millions using other services – all running on Microsoft's cloud infrastructure – CO+I builds and operates the foundation upon which Microsoft’s mission to empower every person and organization comes to life.

As a CO+I Incident Manager, you are central to our efforts to ensure our customers have the best possible service experience. In this role, you orchestrate global incident recovery with the goal of minimizing downtime and mitigation of impact of these events on our customers. You are responsible for triaging complex issues ranging from physical Data Center issues to server and network failures. You will drive the efforts of multiple Microsoft teams to bring about safe and rapid mitigation of incidents that impact customers on a global scale. This requires composure under pressure, broad technical, analytical, and problem-solving expertise, ability to confidently collaborate with varied partners, and great written and spoken communication. Your work will have you interacting across the globe with Microsoft Engineers from all disciplines including Electrical, Mechanical, Network, Software and Hardware.

Responsibilities

- “Customer Obsessed” – Have a mindset that is focused first on the customer and how to use technology to make their experience safe, reliable, performant, etc.
- Perform incident triage, to include determining scope, urgency, and potential impact, identifying the specific vulnerability, and making recommendations that enables swift remediation.
- Orchestrate the advanced troubleshooting and mitigation efforts of multiple engineering teams engaged in service restoration at the datacenter(s), ensuring safe execution with minimal disruption to the customer and business.
- Drive deep-dive post incident analysis of customer impacting incidents focusing on reducing the impact or likelihood of future similar events.
- Proactively communicate with Microsoft executive leadership, managers, engineering groups, and key stakeholders on active major incidents or crises.
- Participate in recovery implementation & testing exercises using scenario-based use cases to drive potential impact awareness​ and remediation.
- Identify opportunities and take ownership for automation and/or continuous improvement of the Incident Management process and best practices
- Capture and record all incident timelines, data, and restoration efforts for handoff to the Problem Management and Forensic Engineering teams.
- Record, coordinate, and report on progress of ‘Repair Item’ output from Post Incident Reviews, and RCA exercises.
- Provide feedback and drive improvements with current tools and processes; driving initiatives to the appropriate group, for proactive design changes and implementation or business risk assessment for incident causal factors.
- Develop and deliver Post-Mortem reports for distribution to MS executive audience(s)
- Identify, explore, and then ultimately drive cross team efforts to proactively resolve issues that could cause impact to our customers

Qualifications

Required:

- 5-7 years of experience with incident management.

- Must be able to participate in an on-call rotation including weekends
- Experience with incident/outage and crisis management.

Preferred:

- BS/BA in Electrical or Mechanical engineering, Computer Science, telecommunications, or equivalent education or five (5) years equivalent work experience.

- Demonstrated ability to think strategically and creatively using sound business judgment.
- Demonstrated quantitative skills to resolve ambiguous problems and driving to root cause.

- Direct experience with business continuity.
- Demonstrated ability to set priorities, pursue multiple threads at the same time, accurately reflect the current state and drive towards desired state.

- Ability to maintain calm during stressful situations; demonstrated leadership skills under fast-paced, highly dynamic situations
- Strong collaboration skills: working across teams and organizations is necessary to be successful.

- Excellent written and verbal communication skills.
- Working knowledge and understanding of data center systems such as - - Power, Cooling, and networking is a plus.
- ITIL Foundations certification.
- PMP Certification preferred, Excellent project/program management skills with great attention to detail.
- Demonstrated technical excellence by applying engineering principles to solve complex problems.
- Proficiency in the use of Microsoft Power BI.

#COICareers

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check every two years thereafter.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.