Thursday, September 22, 2016

Director Site Reliability Engineering Blackboard San Francisco

Job Description: • 2-minute read •
Site Reliability Engineering (SRE) is what you get when you treat operations as if it’s a software problem. At Blackboard we truly believe that is the future of managing our services used by millions of students every day.
We are looking for a leader that can help us convert our traditional operations to one of pure site reliability engineering.
Watch: Career Advice
Responsibilities
Lead a team of System Engineers on managing the day to day operations of our biggest product; Blackboard Learn
Bring in best practices and lead the conversion of a traditional operations team to true site reliability engineers.
Develop and implement a plan of continual improvement.
Use data to measure and improve overall team productivity and product quality
Provide timely and transparent written status reports that highlight both successes and challenges.
Own end-to-end availability and build automation to prevent problem re-occurrence; eventually automate response to all non-exceptional service conditions.
Manage on-call rotations across continents.
Define and own processes for operations team
Collaborate with other members of the Learn product development staff to develop a true sense of shared responsibility & energy for the entire product, from conception thru deployment.
Recruit, mentor, organize, motivate and coach a team of software quality engineering professionals. Assign projects and arrange training to advance the career of each individual within the team. Define and monitor goals.
Identify and coach future leaders to take on additional personnel and technical responsibility
Minimum qualifications:
15+ years working in IT related areas
10+ years’ experience working with in distributed systems and manage multiple function teams
Track record of hiring great engineers and building strong teams.
Track record of identifying productivity opportunities and taking the appropriate remedial action
Always looking to uncover the root cause of any issue
Experience with and solid understanding of web and app servers, Linux, Windows, Apache Tomcat and Nginx. Strong scripting in Chef, Perl, Python, etc.
Experience with JIRA bug tracking system
Strong communication skills with an ability to listen and be heard
Self-starter, collaborative team player
Experience troubleshooting and resolving application and/or system-related issues
Excellent communication skills with a track record of influencing senior management
BA/BS degree in Computer Science or related technical field, or equivalent practical experience.
Hands on experience managing a team with processes in a security certified environment (i.e. Fedramp, MCLS, ISO, …)
Capable of technical deep-dives into code, networking, operating systems and storage, yet verbally and cognitively agile enough to hold your own in a strategy discussion with Blackboards executive team.
Send To A Friend
Related Posts Plugin for WordPress, Blogger...