About the company :
Embibe is a leading edtech platform powered by artificial intelligence and data science. We deliver personalised learning and predictable learning outcomes for each aspiring individual. We have built world class products that empower the education ecosystem- the students, teachers, educational institutes and parents, by enabling them to understand the students' strengths and weaknesses down to the most granular concept level.
We are on a journey to enable a quarter of India's population to excel in education: 300 million learners across 1.5 M institutions, including vernacular users. In our quest of discovering - what matters in education-, we have touched the lives of over 18 million students across different strata of society.
Each Embiber is passionate towards making a meaningful impact in the way we learn, process and educate ourselves. We are highly transparent in our goals, our communication, feedback and results. Aditi Avasthi, the founder CEO, leads the company's vision of building the future of learning and education, personalised to each and available to all.
Having the largest data lake in Education with 500+ billion data signals and multiple patents in progress, sometime back, we were recognised as the best AI company in education by Amazon.
We have partnered with Reliance Jio earlier to work towards our collective vision of democratizing education and impacting a billion lives.
As a DevOps Lead at Embibe you will build scalable infrastructures, storage solutions that enrich the user experience. You're the first line of defence and you make sure Embibe continues to function in the face of adversity. We value reliability and automation. We measure and monitor everything, and have a culture of continuous reflection and improvement. We're looking for engineers who share our values, particularly those who have experience in building, maintaining and managing high-volume, low-latency and distributed infrastructure for internet and mobile users.
Key Responsibilities :
1. Define and execute strategy to achieve 100% service availability
2. Measure and improve the team's performance by measuring response time to incidents,
3. problems, overall monthly service availability and customer ticket reduction.
4. Responsible for the health of Infrastructure and Applications
5. Incident ownership (detect, record, classify and close) Oversee Root Cause Analysis and 6. Corrective Actions necessary to improve reliability
7. Perform monitoring system configuration changes to increase effectiveness of monitoring
8. Drive or participate in technical design reviews and operational acceptance exercises for new and existing services
Technical Requirements :
1. Overall relevant experience should be more than 10 years.
2. Minimum 4 years of experience in managing DevOps team comprising more than 7 people.
3. In depth Linux knowledge, good understanding of the AWS cloud tech stack
4. LoadBalancers and Mysql/NoSQL Databases knowledge
5. Experience in one or more of the following languages: Shell, Python, PHP or Perl
6. Must have an understanding of building and managing large-scale systems and application architectures
7. Prior experience with configuration and maintenance of common applications such as Apache, MySQL, DHCP, SSH, DNS, VPN etc.
8. Proficient in one or more of the following monitoring and logging tools: Prometheus, Sensu, Nagios, Ganglia, Cacti, collectd, Logstash, Graphite, Cepmon
9. Working knowledge of Linux, TCP/IP, and web services
10. Should have good knowledge of Configuration Management Tools like (Ansible, Puppet, Chef etc.)
11. Build and packaging tools like Jenkins.
12. Good to have experience in Docker and Kubernetes