Sysdig

Run Confidently with Secure DevOps

Remote Site Reliability Engineer (US)

Remote Location

United States

Job Type

Full-time

Compensation

$105k - $145k

Who We Are

Sysdig is the secure DevOps company, and we’re at the forefront of the container, Kubernetes, and cloud revolution. We are passionate, technical problem-solvers, continually innovating and delivering powerful solutions to confidently run cloud-native applications. Our consistent contributions to open source software projects reflect our commitment to the open cloud movement.

We value diversity and open dialog to spur ideas, working closely together to achieve goals. And we're a great place to work too — we were awarded the 2021 Bay Area Best Places to Work Award from San Francisco Business Times and the Silicon Valley Business Journal. We are looking for team members who share our commitment to customers and are willing to dig deeper, understand problems and deliver innovative solutions. Does this sound like the right place for you?

About this Role

As a Site Reliability Engineer, you’ll be responsible for the availability, performance, and resilience of the Sysdig platform in our largest on-premise customer environments. You will collaborate with high-performing infrastructure and engineering teams both within Sysdig and customer organizations to help drive the scalability and stability of our platform.

What You Will Do

Participate in a globally distributed team of Site Reliability Engineers, supporting multiple Sysdig applications across our most critical on-premises customers.
Produce best-practice recommendations for on-premises customers to improve customer experiences.
Implement disaster recovery and reliability improvement initiatives, including performance tuning and infrastructure optimization.
Maintain and support the production environments and communicate directly with customer stakeholders.
Participate in an on-call rotation

What You Need to Bring with You

Minimum of 5 years industry experience with prior experience in:
- Deploying Kubernetes workloads in a production environment
- Diagnosing and troubleshooting customer-facing production service outages
undefinedundefined
Working experience in managing one of the following database clusters. Managing includes installation, configuration, optimization, high availability improvement, failover, backup/restore, etc. Cassandra, Elasticsearch, Kafka/Zookeeper, PostgreSQL

What We Look For

Strong sense of ownership
Strong desire to earn customer trust and obsess over customer
success
Proven ability to work across, collaborate with, and negotiate
with diverse, distributed, or remote teams
Proven ability to work under pressure
Strong desire to coach or share information with others
Knowledge of Helm, Terraform, Prometheus, Grafana is preferred
Knowledge of Kubernetes Operators is a big plus

Key Technologies

Kubernetes, Golang, Python, Cassandra, Kafka, Elasticsearch, PostgreSQL, Terraform, Helm

Why work at Sysdig?

We’re a well funded startup that already has a large enterprise customer base
We have a pragmatic, approachable culture, from the CEO down
We have an organizational focus on delivering value to customers
Our open source tools (https://sysdig.com/opensource/) are widely used and loved by technologists & developers

What you can expect from Sysdig: