Our Engineering Services team's mission is to build and operate VMware SaaS and on-prem solutions. These solutions enabling customers to manage, govern, and secure their applications running in private and public clouds, including Amazon AWS, Microsoft Azure and Google Cloud Platform. We operate in a true DevOps environment with a very talented, globally distributed, and diverse team. Together we leverage distributed systems, contribute to open-source innovations, developing and running in production VMware's Cross-Cloud Services with enterprise-grade reliability.
Primary Responsibilities
• Fully responsible for achieving the defined SLO and SLAs • Collaborate with Product Management and Engineering leaders to ensure that operational and reliability requirements are clearly articulated and integrated into the product roadmap.
• Implements a top-notch continuous improvement process that includes root-cause analysis, solution identification and implementation, and ongoing emphasis on auto-remediation • Proactive risk assessment, management, and identifying mitigation associated with ongoing operations and planned releases.
• Direct responsibility for the SaaS Operations and Cloud Engineering functions which includes cost management and identifies alternative approaches for cost reduction • Responsible for building best practices and presents proposals for SRE strategy and direction.
• Clearly defines scope, objectives and approach and develops a clear detailed plan for execution • Proactively defines changes to system architecture to improve system performance and scale.
• Responsible for all components of the production services including capacity planning by staying abreast of the rate of customer acquisition.
• Oversee and manage 3+ globally distributed team • Responsible to identify points of failure and work with the engineering team to build resiliency as table stakes in our services
Expected Qualifications:
• Minimum of 8 years’ experience in senior leadership role in a 24/7/365 SaaS environment • AWS experience a must, and AWS certification is a plus • Deep knowledge and hands-on experience with SaaS operations, including data centers, networking, databases, operating systems, monitoring tools etc.
• Participation in a 24x7 on-call environment, requiring the knowledge of managing 24x7 environments using page duty or similar tools. Conducting retrospectives and identifying auto-remediation options • Strong Industry knowledge and trends, and latest features of AWS, SaaS, and Cloud applications • Ability to resolve conflicts effectively • Strong leadership skills and follow-through, focused and detailed oriented • Direct experience building teams with strong automation mindset • Excellent ability to communicate effectively with folks across technical and non-technical functions • Experience managing Site Reliability Engineers and Software development engineers • Hands-on experience in developing and executing a detailed change management process for high available environments • Experience interacting with customers directly with a strong commitment to customer success
Now submit your application online and subscribe to email or follow us on twitter to get similar jobs in the future.