BlueJeans is looking for a manager for our Site Reliability Engineering team. The SRE team engages with software development teams to improve the reliability, operability, and visibility into the health of the BlueJeans video conferencing platform. This position will report to the Director of Site Reliability Engineering.
We value solid communication and focus on execution, development of close partnerships, and strong incident & people management experience as we drive optimization and seamless customer experiences, and lightning fast performance of BlueJeans SaaS offerings. We are passionate about monitoring and improving performance, resource utilization, and ensuring that our hosted environments can scale to meet the needs of our ever-growing customer base.
You will work to directly build a team and improve the experience and value of BlueJeans brand as you solve customer’s issues to ensure that they enjoy a rock solid meetings platform.
Some of the things you'll be doing include…
- Own deployment process to staging and production environments.
- Manage JIRA ticket ingest, assignment, and metrics for SRE team.
- Manage on call 24/7 production support for all BlueJeans production environments.
- Work in collaboration with BlueJeans strategic accounts and customer success teams to troubleshoot customer submitted issues.
- Manage internal Root Cause Analysis (RCA) process and documentation.
- Work closely with Engineering teams to support strategic projects.
- Find scalability bottlenecks and areas for performance improvements.
- Deploy, maintain, and support the multiple application environments (QA, Engineering, Staging, & Production).
- Automate current manual processes via Terraform, CI/CD pipelines, etc.
- 3+ years of experience as a people manager in an Engineering or Operations capacity.
- Strong Linux administration skills with an emphasis on shell scripting.
- Hands on technical experience with supporting SaaS based applications.
- Expertise with AWS platform including RDS, VPCs, ECS/EKS, and Lambda.
- Expertise with Terraform, Docker, Kubernetes (or other orchestration tools), and Jenkins.
- Experience with infrastructure monitoring platforms (Datadog, Nagios) and Application Performance Management (APM) systems (New Relic, AppDynamics).
- Experience with Configuration management tools (Puppet, Chef, Ansible).
- Experience with CI/CD pipeline configuration, deployment, and support.