We’re redefining how AI is built and deployed—making powerful technology accessible to everyone. Our lean, fast-moving team thrives on collaboration, efficiency, and creative problem-solving. We’re looking for driven, thoughtful individuals who bring strong work ethic and curiosity, and who want to help remove barriers and create meaningful impact. Join us to grow your career in an environment that supports both personal excellence and team success.
Interested? Submit your CV for consideration
Careers@umt.llc
1. Machine Learning Engineer – Model Development & Deployment
Location: Palo Alto, CA
Type: Full-time
Level: Senior / Staff
Department: AI Engineering
About the role
We're looking for exceptional Machine Learning Engineers who excel at turning frontier foundation models into production-grade, customer-facing intelligence that serves millions of users daily.
You'll own the full lifecycle of custom ML solutions: from architectural design and heavy fine-tuning of state-of-the-art LLMs & multimodal models, through rigorous evaluation, to reliable deployment into mission-critical customer products.
This is hands-on work on systems that already power massive-scale inference — and you'll help make them even better.
Key responsibilities
-
Design, fine-tune, and adapt frontier foundation models (LLMs, vision-language, multimodal) for domain-specific enterprise use cases
-
Develop custom training pipelines using the latest distributed training techniques (DeepSpeed, FSDP, Megatron, etc.)
-
Implement advanced evaluation frameworks, including automated red-teaming, safety alignment, and performance benchmarking
-
Collaborate closely with product teams to integrate models into real-time customer-facing applications
-
Optimize for cost, latency, and throughput across cloud and edge environments
-
Contribute to internal research prototypes that later become client solutions
-
Mentor junior engineers and help raise the technical bar across the team
Requirements
-
5+ years of hands-on experience training and fine-tuning large-scale ML models in production environments
-
Deep expertise with modern frameworks: PyTorch (preferred), JAX, Hugging Face Transformers, vLLM, TGI, or similar
-
Proven track record deploying models that serve 1M+ daily requests or process billions of inferences
-
Strong software engineering fundamentals (clean code, CI/CD, testing, observability)
-
Experience with at least one of: PEFT/LoRA/QLoRA, RLHF/DPO, distillation, or mixture-of-experts architectures
-
Bachelor's or Master's in CS, ML, or related field (PhD a plus)
Nice-to-have
-
Experience with multimodal (vision + language) fine-tuning
-
Familiarity with production inference engines (TensorRT-LLM, ONNX, Triton)
-
Contributions to open-source ML projects
2. MLOps / ML Systems Engineer – Production Reliability at Hyperscale
Location: Palo Alto, CA
Type: Full-time
Level: Senior / Staff
Department: AI Operations & Infrastructure
About the role
We operate some of the most demanding production ML systems in the industry — serving millions of users with computer vision and LLM-powered services that require near-perfect availability.
As an MLOps / ML Systems Engineer, you will own the reliability, scalability, and observability of these hyperscale AI backends. You will design, build, and operate the invisible infrastructure that keeps mission-critical models running 24/7/365 — even under extreme load and constant evolution.
Key responsibilities
-
Design and operate production-grade MLOps platforms for continuous training, deployment, and monitoring of hundreds of models
-
Maintain ultra-high availability (99.99%+) for real-time inference services handling millions of requests per minute
-
Implement automated drift detection, model performance monitoring, and rapid rollback/retraining pipelines
-
Manage hyperscale backend infrastructure (Kubernetes, Ray, multi-region cloud orchestration, GPU/TPU clusters)
-
Build observability stacks that provide deep visibility into model behavior, latency, cost, and system health
-
Lead incident response for production ML systems and drive blameless post-mortems
-
Partner with engineering teams to productionize new model architectures with zero-downtime strategies
-
Optimize inference economics at scale (cost per 1M tokens, GPU utilization >80%)
Requirements
-
5+ years building and operating production ML systems at scale (preferably 10M+ daily users or equivalent inference volume)
-
Deep expertise with MLOps tools and patterns: Kubeflow, MLflow, Metaflow, Argo Workflows, Flyte, or similar
-
Strong experience with cloud-native infrastructure: Kubernetes, Terraform, Prometheus/Grafana, ELK, or equivalent
-
Hands-on knowledge of distributed ML serving (vLLM, TGI, KServe, Seldon, Ray Serve)
-
Proven ability to debug complex production issues across model, serving, and infra layers
-
Solid software engineering skills (Python, Go/Rust a plus) and comfort on-call
Nice-to-have
-
Experience with large-scale GPU cluster management (Slurm, Kueue, or custom schedulers)
-
Background in SRE for ML systems (chaos engineering for models, canary promotions, progressive delivery)
-
Familiarity with regulated environments (SOC2, HIPAA, GDPR)