Site Reliability Engineer

CÔNG TY TNHH DIGITAL POWER MEDIA SGGP Building, 15F, 436-438 Nguyen Thi Minh Khai, Ward 5
Hết hạn Xóa tin

Chi tiết tuyển dụng

Mức lương: Đến $2,500
Khu vực: Hồ Chí Minh
Chức vụ: Nhân viên
Hình thức làm việc: Toàn thời gian
Lĩnh vực: Khác

Mô tả công việc

We are actively looking for a highly skilled Site Reliability Engineer to become a valuable member of our team. The ideal candidate will play a key role in engineering efforts, contributing from design to implementation, and addressing intricate technical challenges related to developer and engineering productivity and velocity.
In the position of Site Reliability Engineer, you will be responsible for crafting and implementing robust and scalable infrastructure and services utilized by our development team.
Responsibilities:
Responsible for zero-downtime of the system for millions of users.
Define and manage cloud infrastructure as code (IAC), improve the CI/CD pipeline, ensure scalability and availability of the system, build monitoring stack, automate build, configuration, and deployment orchestration scripts...
We are utilizing a broad toolset for DevOps/Infra like Docker, Helm, Github Actions... and leveraging multiple services from AWS: EC2, ECS, S3, CloudFront, RDS, IAM, Route53, CloudWatch...
Minimize incident impacts by being informed upfront with monitoring, alerts, logs and metrics and having an eye on IT standards and security.
To work with the engineering team and take architectural decisions.
Support troubleshooting efforts during incidents, applying root cause analysis to prevent recurrences.

Quyền lợi được hưởng

Competitive Salary
Provided with a Mac book/Screen
5 working days/ week
Attractive benefits for team activities (team building, Happy Friday, Happy Hour..)
Comfortable work space and friendly colleagues

Yêu cầu kỹ năng

Your skills and experience:
At least 4 years’ experience in the same position SRE with AWS technologies and services.
Experience with AWS: IAM, EC2, ALB, S3, ECS, Cloudwatch, CloudFormation....
In-depth knowledge of Kubernetes, including its architecture, deployment, and management, with a focus on CI/CD for web applications.
Have an in-depth understanding of microservice architecture, API management, and distributed systems concepts.
Understanding the well-architected framework of AWS to build and optimize systems on the AWS.
Experience in Terraform & Infrastructure as Code (IAC) principles.
Proficiency in implementing monitoring and alerting solutions (e.g., Grafana, Prometheus, ELK stack) to ensure optimal system performance and availability.
Strong understanding of command-line tools and distributed version control system such as GIT.
Mentality to share and the aspiration to constantly improve yourself and learn new things.
Familiar with Linux/ Unix Administration and scripting using shell scripts.
Self-driven, proactive.
Experience with containerization and orchestration tools (Docker Kubernetes).
Excellent problem-solving skills, with the ability to analyze and resolve complex technical issues efficiently.
The ability to work under pressure.
Nice to have:
Understanding of security best practices in web development.
Knowledge of best practices and IT operations in an always-up, always-available service
AWS Certified Solutions Architect – Professional or AWS Certified DevOps Engineer – Professional certification is highly desirable.
Should be able to design high-level/low-level network/architecture and properly document.