Join Us

We’re redefining how AI is built and deployed—making powerful technology accessible to everyone. Our lean, fast-moving team thrives on collaboration, efficiency, and creative problem-solving. We’re looking for driven, thoughtful individuals who bring strong work ethic and curiosity, and who want to help remove barriers and create meaningful impact. Join us to grow your career in an environment that supports both personal excellence and team success.

Interested? Submit your CV for consideration

Careers@umt.llc

1. Machine Learning Engineer – Model Development & Deployment

Location: Palo Alto, CA

Type: Full-time

Level: Senior / Staff

Department: AI Engineering

About the role

We're looking for exceptional Machine Learning Engineers who excel at turning frontier foundation models into production-grade, customer-facing intelligence that serves millions of users daily.

You'll own the full lifecycle of custom ML solutions: from architectural design and heavy fine-tuning of state-of-the-art LLMs & multimodal models, through rigorous evaluation, to reliable deployment into mission-critical customer products.

This is hands-on work on systems that already power massive-scale inference — and you'll help make them even better.

Key responsibilities

Design, fine-tune, and adapt frontier foundation models (LLMs, vision-language, multimodal) for domain-specific enterprise use cases
Develop custom training pipelines using the latest distributed training techniques (DeepSpeed, FSDP, Megatron, etc.)
Implement advanced evaluation frameworks, including automated red-teaming, safety alignment, and performance benchmarking
Collaborate closely with product teams to integrate models into real-time customer-facing applications
Optimize for cost, latency, and throughput across cloud and edge environments
Contribute to internal research prototypes that later become client solutions
Mentor junior engineers and help raise the technical bar across the team

Requirements

5+ years of hands-on experience training and fine-tuning large-scale ML models in production environments
Deep expertise with modern frameworks: PyTorch (preferred), JAX, Hugging Face Transformers, vLLM, TGI, or similar
Proven track record deploying models that serve 1M+ daily requests or process billions of inferences
Strong software engineering fundamentals (clean code, CI/CD, testing, observability)
Experience with at least one of: PEFT/LoRA/QLoRA, RLHF/DPO, distillation, or mixture-of-experts architectures
Bachelor's or Master's in CS, ML, or related field (PhD a plus)

Nice-to-have

Experience with multimodal (vision + language) fine-tuning
Familiarity with production inference engines (TensorRT-LLM, ONNX, Triton)
Contributions to open-source ML projects

2. MLOps / ML Systems Engineer – Production Reliability at Hyperscale

Location: Palo Alto, CA

Type: Full-time

Level: Senior / Staff

Department: AI Operations & Infrastructure

About the role

We operate some of the most demanding production ML systems in the industry — serving millions of users with computer vision and LLM-powered services that require near-perfect availability.

As an MLOps / ML Systems Engineer, you will own the reliability, scalability, and observability of these hyperscale AI backends. You will design, build, and operate the invisible infrastructure that keeps mission-critical models running 24/7/365 — even under extreme load and constant evolution.

Key responsibilities

Design and operate production-grade MLOps platforms for continuous training, deployment, and monitoring of hundreds of models
Maintain ultra-high availability (99.99%+) for real-time inference services handling millions of requests per minute
Implement automated drift detection, model performance monitoring, and rapid rollback/retraining pipelines
Manage hyperscale backend infrastructure (Kubernetes, Ray, multi-region cloud orchestration, GPU/TPU clusters)
Build observability stacks that provide deep visibility into model behavior, latency, cost, and system health
Lead incident response for production ML systems and drive blameless post-mortems
Partner with engineering teams to productionize new model architectures with zero-downtime strategies
Optimize inference economics at scale (cost per 1M tokens, GPU utilization >80%)

Requirements

5+ years building and operating production ML systems at scale (preferably 10M+ daily users or equivalent inference volume)
Deep expertise with MLOps tools and patterns: Kubeflow, MLflow, Metaflow, Argo Workflows, Flyte, or similar
Strong experience with cloud-native infrastructure: Kubernetes, Terraform, Prometheus/Grafana, ELK, or equivalent
Hands-on knowledge of distributed ML serving (vLLM, TGI, KServe, Seldon, Ray Serve)
Proven ability to debug complex production issues across model, serving, and infra layers
Solid software engineering skills (Python, Go/Rust a plus) and comfort on-call

Nice-to-have

Experience with large-scale GPU cluster management (Slurm, Kueue, or custom schedulers)
Background in SRE for ML systems (chaos engineering for models, canary promotions, progressive delivery)
Familiarity with regulated environments (SOC2, HIPAA, GDPR)

Account

Your cart is empty

1. Machine Learning Engineer – Model Development & Deployment

2. MLOps / ML Systems Engineer – Production Reliability at Hyperscale