跳到主要内容

Staff System Engineer

关于我们

Key Responsibilities:

  • Identify operational inefficiencies and automation opportunities within monitoring workflows and infrastructure.
  • Design and implement automated solutions for deployment, configuration, and scaling of monitoring tools using Infrastructure-as-Code (IaC) technologies such as Terraform, Ansible, Puppet, or similar.
  • Leverage REST APIs of platforms like Zabbix, SolarWinds, Prometheus, and Grafana to streamline and standardize monitoring setup and management.
  • Develop reusable automation assets—scripts, templates, and modules—to ensure consistent monitoring practices across diverse environments.
  • Integrate monitoring systems with alerting, ticketing, and reporting platforms to enable seamless incident management and visibility.
  • Establish tagging strategies and observability standards to ensure uniform data collection and traceability across services.
  • Support incident response by building automated diagnostics and enriching telemetry data for faster root cause analysis.
  • Collaborate cross-functionally with DevOps and SRE teams to align monitoring automation with CI/CD pipelines and operational goals.

Tech Skills:

Infrastructure as Code (IaC) & Automation

  • Terraform
  • Ansible
  • Puppet
  • Scripting languages: Python, Bash, PowerShell, SSH

Monitoring & Observability Tools

  • Zabbix
  • SolarWinds
  • Prometheus
  • Grafana
  • Datadog, New Relic, or Dynatrace (as alternatives or complementary tools)

API Integration & Automation

  • Experience working with REST APIs for automation and integration
  • Familiarity with JSON, YAML, and HTTP methods (GET, POST, PUT, DELETE)

CI/CD & DevOps Tooling

  • Jenkins, GitLab CI, GitHub Actions, or similar
  • Docker and Kubernetes (for containerized environments)

Alerting & Incident Management Integration

  • ServiceNow, Jira, VictorOps, xMatters, or similar
  • Knowledge of event correlation and automated diagnostics

Cloud Platforms (optional)

  • AWS, Azure, or Google Cloud Platform
  • Cloud-native monitoring tools like CloudWatch, Azure Monitor, or GCP Operations Suite

Preferred Qualifications:

Soft Skills & Operational Mindset

  • Strong problem-solving and gap analysis capabilities
  • Ability to identify low-hanging fruits for automation
  • Experience in cross-functional collaboration (DevOps, SRE, IT Ops)
  • Understanding of observability principles and tagging strategies

致力平等

酷澎一直致力于员工之间的平等。我们取得的空前成功,皆离不开全球多元化团队所付出的努力。