Site Reliability Engineering Manager
ECS Federal
This job is no longer accepting applications
See open jobs at ECS Federal.See open jobs similar to "Site Reliability Engineering Manager" Charleston Region.ECS is seeking a Site Reliability Engineering Manager to work in our Fairfax, VA office. Please Note: This position is contingent upon contract award.
Job Description:
The Site Reliability Engineering Manager (SREM) will play a critical role in ensuring the reliability, availability, and performance of our critical production environments. You will work closely with development and operations teams to implement best practices in DevOps, automate infrastructure, and maintain scalable and resilient systems. The ideal candidate will be able to align to the following duties:
- Ensure System Reliability: Design, implement, and maintain systems that are resilient, highly available, and performant.
- Automate Infrastructure: Develop and manage infrastructure as code (IaC) using tools like Terraform, CloudFormation, etc.
- Monitor and Optimize: Set up comprehensive monitoring and logging systems using Prometheus, Grafana, ELK Stack, etc., and ensure the continuous performance of services.
- Incident Management: Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence.
- Collaborate with Development Teams: Work closely with software engineers to integrate reliability into the software development lifecycle.
- Continuous Improvement: Identify areas for improvement and drive initiatives to enhance system reliability, scalability, and efficiency.
- Security and Compliance: Ensure systems adhere to security policies and compliance requirements.
- Documentation and Training: Create and maintain detailed documentation and provide training to team members on reliability best practices.
Required Skills:
- Must be a US Citizen.
- Ability to successfully maintain a DHS Entry on Duty (EOD)/DHS Suitability.
- Experience: 8+ years of experience in site reliability engineering, DevOps, or a related field.
- Education: Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
Technical Skills:
- Proficiency in cloud platforms (AWS, GCP, Azure).
- Experience with infrastructure as code (Terraform, CloudFormation).
- Expertise in configuration management tools (Ansible, Puppet, Chef).
- Strong knowledge of containerization and orchestration (Docker, Kubernetes).
- Proficiency in scripting languages (Python, Bash, Go).
- Experience with CI/CD pipelines (Jenkins, GitLab CI, CircleCI).
- Comprehensive understanding of monitoring and logging tools (Prometheus, Grafana, ELK Stack).
Desired Skills:
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer).
- Experience with database management and optimization.
- Familiarity with microservices architecture and service mesh (Istio, Linkerd).
Soft Skills:
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
- Ability to work independently and as part of a team.
- Attention to detail and a proactive approach to identifying and solving issues.
ECS is an equal opportunity employer and does not discriminate or allow discrimination on the basis of race, color, religion, sex, age, sexual orientation, gender identity or expression, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, status as a crime victim, disability, protected veteran status, or any other characteristic protected by law. ECS promotes affirmative action for minorities, women, disabled persons, and veterans.
ECS is a leading mid-sized provider of technology services to the United States Federal Government. We are focused on people, values and purpose. Every day, our 3800+ employees focus on providing their technical talent to support the Federal Agencies and Departments of the US Government to serve, protect and defend the American People.
This job is no longer accepting applications
See open jobs at ECS Federal.See open jobs similar to "Site Reliability Engineering Manager" Charleston Region.