DevOps-инженер/k8s инженер
15 часов назад
Tothemoon is a user-centric, multiservice digital assets trading platform. At Tothemoon, we prioritize what matters most in finance: reliability. Whether it's buying, selling, exchanging, or investing in cryptocurrencies, you can trust us to protect your financial interests and propel you towards a prosperous future. Join a rapidly growing community of users who choose Tothemoon for their digital transactions.
We offer hands-on experience, challenging tasks, and opportunities for professional and career growth within a dynamic fintech project. We're looking for a specialist to test our product, including the mobile and web applications, as well as APIs and backend services.Key Responsibilities
Production infrastructure operations and development (90%)
• Maintain and improve managed Kubernetes clusters (control plane, node pools, autoscaling, PDB, network policies).
• Support API and ML workloads.
• Set up monitoring, alerting, logging, backups, and disaster recovery procedures.
• Investigate and resolve incidents, including on-call participation.R&D and automation (10%)
Tech Stack / Core of the Project
• Research, optimize, and automate the current infrastructure setup.
Orchestration: Kubernetes (multi-pool, autoscaling, GPU workloads)
GPU / ML: NVIDIA H100, NVIDIA stack (CUDA, drivers, nvidia-device-plugin), LLM inference
RequirementsDeep Kubernetes experience (3+ years):
• Designing and maintaining production clusters (preferably with autoscaling, PDB, network policies).
• Confident use of Deployments, StatefulSets, Ingress, RBAC, StorageClass, Helm/Kustomize.
• Experience integrating Kubernetes with cloud providers (EKS, GKE, AKS, etc.).Strong Linux background:
• Understanding of kernel operations, networking stack, cgroups, and namespaces.
• Ability to diagnose performance issues (CPU, memory, IO, network).GPU and high-load ML/LLM experience — a strong advantage:
• Deploying and managing GPU-based applications in Kubernetes.
• Basic knowledge of CUDA, NVIDIA drivers, and nvidia-device-plugin.
• Experience monitoring GPU utilization, memory, thermals, and errors.Operational and integration experience:
Why Join Us
• Integrating external services into Kubernetes (logging, monitoring, security, storage).
• Building monitoring and alerting aligned with SLO/SLA standards; incident analysis end-to-end.
• Writing runbooks and automating routine operations.
A senior-level team and a friendly, collaborative environment open to innovation and experimentation.
Real technical challenges: high load, performance optimization, GPU infrastructure, and real-time workloads.
A product team, not outsourcing — your contribution directly impacts the company's core technology.
Opportunities for professional growth and development in AI, ML infrastructure, and blockchain computing.
Supportive culture and a comfortable, modern workspace.
ConditionsFormat: On-site work in Almaty, Kulan Business Center.
Compensation: Competitive salary in USDT or fiat, including paid vacation and sick leave.
Benefits: Comfortable office and free lunches.
Schedule: Full-time, flexible working hours.
-
Инженер
5 дней назад
Almaty, Almaty, Казахстан SANTO Полный рабочий день 600 000 ₸ - 1 200 000 ₸ в годС опытом работы по регистрации лекарственных средств не менее 1 года, знание английского языка обязательно.