Site Reliability Engineer (Windows)
Waterloo, ON, CA Mississauga, ON, CA Richmond Hill, ON, CA
OPENTEXT - THE INFORMATION COMPANY
As the Information Company, our mission at OpenText is to create software solutions and deliver services that redefine the future of digital. Be part of a winning team that leads the way in Enterprise Information Management.
The Opportunity:
The role Site Reliability Engineer is to build solutions to enhance the availability, performance and stability of OpenText services as well as automating away repetitive work. You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation. Your mission will be to use cutting-edge technology for monitoring and maintaining the day-to-day operations of the entire production infrastructure for OpenText Discovery on our own cloud platform. The best person for this role is someone that has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others and turn chaos into order. #IND1
This role would be a great fit for someone with creative and innovative problem-solving skills. You will develop and implement solutions that operate at scale. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers.
You are great at:
- Provide attention to incidents according to Service Level Agreements.
- Take ownership and accountability for the incident resolution process.
- Exhibit ownership and accountability for incident resolution in a quality and timely manner.
- Be the custodian of the application in production and staging, act as a technical liaison between multiple stakeholders to evaluate, maintain, identify issues, and report them to the product/application owner.
- Establish and maintain a good relationship with team members, Product Development, Customer Service and Sales.
- Participate in training and information-sharing activities.
- Act as backup for other team members when necessary.
- May requires rotating shift work. (2-3 weeks rotation)
- On-call rotation is required, as 7x24x365 team support is required.
What it takes
- The ability to understand and maintain Scripting software, expecting proficiency in PowerShell, shell, bash, perl or python
- Good working knowledge of Windows OS and Linux is nice to have.
- Past experience supporting .Net based applications.
- Working knowledge in cloud infrastructure (IaaS) such as GCP and AWS
- Experience with installing, configuring, and operating IIS, Tomcat and Apache
- Working and operational knowledge on Ansible, Terraform, gitops
- Operational experience with Kubernetes.
- Expertise in Monitoring distributed systems/applications and knowledge of monitoring tools such as: newrelic, dynatrace, nagios or Zabbix, Prometheus Etc.
- Strong understanding of ITIL principles, certification is a plus.
- Strong ability to diagnose & troubleshoot incidents & outages.
- Exposure to system & application-level telemetry for large, distributed cloud architectures
- Diagnosing, resolving problems in high-throughput web applications & network services
- Expert-level troubleshooting skills across different levels of the solution stack to resolve customer issues within prescribed SLAs.
- Ability to handle multiple tasks concurrently.
- Ability to lead, drive and implement highly scalable and complex solutions
- A strong understanding of Security best practices.
- A proven record of being able to work independently and collaboratively.
More about our team
OpenText Site Reliability Engineering is a rapidly growing group within the organization. We are in the process of building our teams, tools and systems as part of OpenText mission to build the best Cloud services in the world.
We enable OpenText to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilizing a variety of data collection, enrichment, analytics and visualizations to learn about our complex systems. We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams.
OpenText's efforts to build an inclusive work environment go beyond simply complying with applicable laws. Our Employment Equity and Diversity Policy provides direction on maintaining a working environment that is inclusive of everyone, regardless of culture, national origin, race, color, gender, gender identification, sexual orientation, family status, age, veteran status, disability, religion, or other basis protected by applicable laws. Should you require accommodations during the selection process, please contact accommodationrequests@opentext.com.