Sr. Software Engineer - K8s - GPU Orchestration - REMOTE Job at Living Talent, San Jose, CA

TlBrVnRjeTkvRzk4ZXRpVllYUC9mZkpvU2c9PQ==
  • Living Talent
  • San Jose, CA

Job Description

GPU Orchestration
  • Startup
  • Company size: 30
  • Remote within North America
  • Compensation: Base Salary 250k + Equity

Key Responsibilities

  • Lead Design, Architecture & Development of K8s-based cloud infrastructure.
  • Use K8s Controllers, Operators & CRs to Implement scalable, high-availability solutions.
  • Integrate Karpenter, and/or other advanced tools for infrastructure optimization.
  • Architect MLOps Middleware integration (dynamic workload migration, resource disaggregation).
  • Build monitoring, logging & alerting systems.
  • Drive infrastructure cost optimization through FinOps best practices in K8s deployments.
  • Promote K8s best practices & mentor software engineers.
  • Collaborate across teams to drive K8s adoption in multi-cloud and hybrid environments.
  • Open-Source Contributions in the Kubernetes community.

Qualifications

Kubernetes Expertise

  • Designing, deploying, and managing K8s clusters (AKS, EKS, GKE, OpenStack, etc.).
  • Hands-on experience with K8s core components (Karpenter, cluster autoscaler, CNI, CSI, CRI, CRD, operators).
  • 5+ years in Kubernetes infrastructure.
  • Contributing to open-source Kubernetes projects.
  • 10+ years: software engineering experience.
  • Go, Python, Bash, etc. (one or more).
  • Excellent communication skills for both technical and non-technical stakeholders.
  • Bachelor’s or Master’s degree in Computer Science or related field (preferred).

Preferred Experience

  • GPU scheduling, container orchestration, HPC (high-performance computing) workloads.
  • Multi-cloud & hybrid cloud deployments familiarity.
  • MLOps platforms experience (Kubeflow, TFX, etc.).
  • FinOps practices & cloud cost management experience/knowledge

Job Tags

Remote job,

Similar Jobs

Dune Security

Senior Site Reliability Engineer (SRE) Job at Dune Security

 ...riskthe leading cause of cybersecurity breachesand build safer, more resilient organizations.The Role: As a Senior Site Reliability Engineer (SRE) at Dune Security, you will play a critical role in ensuring our platform's stability, scalability, and security. You will... 

Coda Search│Staffing

Customs Entry Writer Job at Coda Search│Staffing

JOB SUMMARY This role is to handle the processing of the customs entry predefined by information found within the operations system. Further responsibilities will include data entry, more complex entries and internal communication with both File Openers and Customer Service...

Siemens

Fleet Program Manager Job at Siemens

 ...Note: This employer is open to candidates who want to work remote. Fleet Program Manager This is an exciting opportunity for you to join one of the largest commercial fleet programs in the U.S.! Our mission is to provide all eligible Siemens employees... 

Hospital for Special Surgery

Sports Physical Therapy Fellowship - NY Mets Job at Hospital for Special Surgery

 ...what they do and are deeply committed to our Mission, you too can be part of our transformation across the enterprise Sports Physical Therapy Fellowship - NY Mets Full-Time If interested: Please visit our education site for all information and application... 

Carlos The Travel Agent

Remote Travel Consultant Job at Carlos The Travel Agent

Are you passionate about travel and ready to turn that passion into a thriving career? We are actively seeking self-motivated and enthusiastic...  ...individuals to join our growing team as Remote Travel Consultants! This is your chance to break into the exciting world of...