Senior DevOps / Site Reliability Engineer

Last updated 4 days ago
Location:New York, New York
Job Type:Full Time

Senior DevOps Engineer, PromoteIQ

Microsoft Advertising

PromoteIQ provides intelligent vendor marketing solutions for the next generation of e-commerce. Our platform helps retailers implement, automate, and scale their brand-funded digital vendor marketing programs. We sit at the intersection of marketing and e-commerce and have a singular mission of empowering retailers and brands to maximize their e-commerce performance.

PromoteIQ embodies a strong startup culture that values diversity, collaboration and craftsmanship - and above all else, results. Our bias towards execution balances critical thinking, root analysis and pragmatic problem solving. We expect a lot from one another and value our thoughtful and intellectually curious company culture.

PromoteIQ is headquartered in New York City and supports a global footprint of e-commerce retailers and brands. The company was acquired by Microsoft in Aug, 2019 and continues to operate as an independent division within Microsoft Advertising. Learn more at https://www.promoteiq.com. This role is based in our SoHo/NYC office.

Microsoft Advertising is a worldwide Sales, Marketing and Services organization on the cutting edge of the digital advertising industry. Microsoft Advertising offers a compelling portfolio of advertising products, innovative solutions and the opportunity to engage with some of the brightest minds in the digital industry. Microsoft Advertising is the destination for experienced, collaborative, and passionate digital advertising professionals seeking a rewarding career and lifestyle.

Who We’re Looking For

At PromoteIQ, this Senior DevOps / Site Reliability Engineer will specialize in developing scalable methods for building, deploying, and supporting our cloud-agnostic enterprise services and systems. This is a highly collaborative role in which you will work closely with our Software Engineers to deploy and operate our solutions; automate and streamline our processes; build and maintain tools for deployment, monitor IT operations, and troubleshoot and resolve issues in our dev, test, and production environments.

Responsibilities

Responsibilities

  • Design and build infrastructure & systems that provide high levels of scalability, reliability, and performance for PromoteIQ’s stack, while balancing security, maintainability, reliability and operational excellence
  • Interface across teams to codify and reliably test infrastructure changes using PromoteIQ’s software development lifecycle
  • Partner with Product and Dev teams to provide guidance and best practices around scalability, reliability, and performance of our productions systems, infrastructure, and software
  • Work as a team on escalations, resolving critical issues that impact our highly available dev, test, and production systems
  • Work with a creative engineering team to continuously implement and improve reliable and speedy build environments for DEV & QA; provide timely build status updates; autoate as much as possible to improve efficiency and quality
  • Promote innovation, implementation of cutting-edge technologies, outside-of-the-box thinking, teamwork, and self-organization
  • Work with Github Actions or other build tools in a CI/CD process to build and deploy to our cloud-agnostic environment
  • Ensure traceability, observability, and retrievability of system behavior
  • Build logging, monitoring, and alerting systems to identify bottlenecks and assist with debugging, analysis, and optimization in a cloud-agnostic environment
  • Improve operational efficiency through automation and deployment or development of new tools
  • Experiment with and recommend new technologies that simplify or improve PromoteIQ's stack
  • Craft solid and clearly explained designs, playbooks, and documentation, for consumption by teammates and the larger engineering organization
  • Participate in an off-hours on-call rotation, and perform periodic off-hours work during maintenance windows

Qualifications

  • BS/MS/PHD degree in Computer Science or equivalent related experience and strong theoretical fundamentals (data structures, algorithms, time complexity and space complexity, lock-free data structures, multi-threaded architectures etc.)
  • 6+ years of experience in the cloud SRE/Infrastructure, or any related fields
  • 5+ years configuring and managing cloud infrastructure (AWS, GCP, Azure)
  • 2+ years working with cloud-agnostic configuration management frameworks (Ansible, Terraform, etc.)
  • 2+ years of experience with queueing systems such as Kafka, RabbitMQ, SQS, etc
  • 2+ years working with containerization technologies (Docker, Kubernetes, etc)
  • 1+ years managing System Observability experience (Zabbix, CloudWatch, PagerDuty, Datadog, and Azure Monitor, SignalFx, Graphana, etc)
  • Understanding of SSH, VPN, TCP/IP, DNS, HTTP(S), network routing and subnet
  • Experience with an always-on and high-volume web server stack (Nginx, HAProxy, squid, etc)
  • Experience with Azure PaaS, Azure networking, and Azure Site Reliability solutions
  • Experience with AWS products including EC2, EBS, ELB, IAM, S3, Route 53, VPCs, Gateways, Lambda, etc.
  • Experience with Azure DevOps services such as DevOps, Pipelines, Test Plans, Artifacts, etc
  • Experience with CI/CD pipelines using tools such as Jenkins, Travis, Azure DevOps, TeamCity, etc
  • Knowledge of Linux architecture, security, administration, performance monitoring/tuning, troubleshooting, and production operations
  • Fluent in Python and Shell Scripting, with experience implementing automation and monitoring using shell scripting and other related tools

#PromoteIQ #MicrosoftAdvertising

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.