Dive into the pulse of cutting-edge solutions with Patrianna LTD!
Are you ready to dive into the dynamic world of social gaming and be part of a rapidly expanding team? We’re on the lookout for a talented Monitoring Specialist (support) to join our Patrianna LTD team on a full-time basis.
What You Gain?
Dynamic Environment: Step into the heart of a super fast-growing social gaming company, where innovation and creativity thrive.
Global Impact: Be at the forefront of crafting a global social entertainment platform, with a primary focus on captivating the North American market.
Limitless Growth: Take your career to new heights with opportunities for advancement and personal development. Join us in the exhilarating journey of continuous growth.
Massive Reach: Contribute to the development of client web and mobile apps that engage with up to 150 million customers worldwide.
Commitment to Excellence: We’re dedicated to delivering high-quality code, ensuring predictable behavior in production, seamless scaling, and automation every step of the way.
We are looking for a skilled Monitoring Specialist to join our 24?7 SRE team. The ideal candidate will work non-business hours aligned with European time to ensure seamless operations and system reliability. This role focuses on monitoring and diagnostics across a multi-site production environment, primarily for Java-based applications on Google Cloud Platform (GCP). Leveraging modern monitoring tools, the SRE will proactively identify, analyze, and resolve issues, maintaining high service performance and reliability.
Key Responsibilities:
- Production Monitoring & Alerting
- Oversee multi-site production environments using tools like Prometheus, Grafana, and Sentry to monitor application performance, database health, and event streams.
- Continuously monitor performance metrics, setting up alerts to identify potential issues before they impact system availability.
- Log Analysis & Diagnostics
- Analyze logs across applications, databases, and event streaming services (Kafka) to detect irregularities and gain insights into root causes.
- Use tools like ELK and GCP-native monitoring solutions to maintain visibility and optimize system behavior.
- Database & Event Stream Monitoring
- Monitor and tune performance for databases like PostgreSQL/AlloyDB and Spanner, focusing on query optimization, performance metrics, and troubleshooting.
- Manage and monitor Kafka clusters, including consumer lag tracking and data pipeline health, to ensure continuous data processing.
- Error Tracking & Troubleshooting
- Use Sentry and similar tools to track, document, and resolve errors, escalating issues to the engineering team when necessary.
- Follow troubleshooting protocols and assist in root cause analysis to resolve incidents in a structured and efficient manner.
- Network & Security Insights
- Collaborate with Cloudflare tools to monitor network performance and ensure security standards, with an emphasis on DDoS protection and latency optimization.
- Work closely with the Engineering and DevOps teams to develop proactive monitoring and performance strategies.
Required Skills & Qualifications:
- Cloud Platform Expertise: Advanced knowledge of the Google Cloud Platform and associated services.
- Monitoring & APM Tools: Proficient with Prometheus, Grafana, Sentry, and ELK, plus familiarity with Kubernetes (K8s) and GCP-native monitoring solutions.
- Database Systems: Strong knowledge of PostgreSQL/AlloyDB and Spanner, especially for performance tuning, query optimization, and diagnostics.
- Event Streaming: Hands-on experience with Kafka, including the ability to monitor Kafka clusters, track consumer lag, and manage data pipeline reliability.
- Networking & Security: Familiarity with Cloudflare, DDoS protection strategies, and network performance monitoring.
- Problem-Solving Skills: Excellent analytical skills to troubleshoot complex, multi-layered cloud systems, perform root cause analysis, and address issues in a dynamic environment.
Nice-to-Have Skills:
- Scripting: Experience with Python or Bash for automation and scripting tasks.
Schedule Requirements:
- This role operates during non-business hours aligned with European time to provide continuous coverage and support for our production environments.
Вікторія