DevOps-инженер/k8s инженер

15 часов назад


Almaty, Almaty, Казахстан Tothemoon Полный рабочий день 1 500 000 ₸ - 3 000 000 ₸ в год
About Tothemoon

Tothemoon is a user-centric, multiservice digital assets trading platform. At Tothemoon, we prioritize what matters most in finance: reliability. Whether it's buying, selling, exchanging, or investing in cryptocurrencies, you can trust us to protect your financial interests and propel you towards a prosperous future. Join a rapidly growing community of users who choose Tothemoon for their digital transactions.

We offer hands-on experience, challenging tasks, and opportunities for professional and career growth within a dynamic fintech project. We're looking for a specialist to test our product, including the mobile and web applications, as well as APIs and backend services.Key Responsibilities
  • Production infrastructure operations and development (90%)

    • Maintain and improve managed Kubernetes clusters (control plane, node pools, autoscaling, PDB, network policies).

    • Support API and ML workloads.

    • Set up monitoring, alerting, logging, backups, and disaster recovery procedures.

    • Investigate and resolve incidents, including on-call participation.

  • R&D and automation (10%)

    • Research, optimize, and automate the current infrastructure setup.
     

    Tech Stack / Core of the Project
  • Orchestration: Kubernetes (multi-pool, autoscaling, GPU workloads)

  • GPU / ML: NVIDIA H100, NVIDIA stack (CUDA, drivers, nvidia-device-plugin), LLM inference

    Requirements
  • Deep Kubernetes experience (3+ years):

    • Designing and maintaining production clusters (preferably with autoscaling, PDB, network policies).

    • Confident use of Deployments, StatefulSets, Ingress, RBAC, StorageClass, Helm/Kustomize.

    • Experience integrating Kubernetes with cloud providers (EKS, GKE, AKS, etc.).

  • Strong Linux background:

    • Understanding of kernel operations, networking stack, cgroups, and namespaces.

    • Ability to diagnose performance issues (CPU, memory, IO, network).

  • GPU and high-load ML/LLM experience — a strong advantage:

    • Deploying and managing GPU-based applications in Kubernetes.

    • Basic knowledge of CUDA, NVIDIA drivers, and nvidia-device-plugin.

    • Experience monitoring GPU utilization, memory, thermals, and errors.

  • Operational and integration experience:

    • Integrating external services into Kubernetes (logging, monitoring, security, storage).

    • Building monitoring and alerting aligned with SLO/SLA standards; incident analysis end-to-end.

    • Writing runbooks and automating routine operations.
     

    Why Join Us
  • A senior-level team and a friendly, collaborative environment open to innovation and experimentation.

  • Real technical challenges: high load, performance optimization, GPU infrastructure, and real-time workloads.

  • A product team, not outsourcing — your contribution directly impacts the company's core technology.

  • Opportunities for professional growth and development in AI, ML infrastructure, and blockchain computing.

  • Supportive culture and a comfortable, modern workspace.

    Conditions
  • Format: On-site work in Almaty, Kulan Business Center.

  • Compensation: Competitive salary in USDT or fiat, including paid vacation and sick leave.

  • Benefits: Comfortable office and free lunches.

  • Schedule: Full-time, flexible working hours.


  • Инженер

    5 дней назад


    Almaty, Almaty, Казахстан SANTO Полный рабочий день 600 000 ₸ - 1 200 000 ₸ в год

    С опытом работы по регистрации лекарственных средств не менее 1 года, знание английского языка обязательно.