Research Engineer, Benchmarking, Robotics, DeepMind
Minimum qualifications:
- Bachelor’s degree in Computer Science, Robotics, or equivalent practical experience.
- 2 years of experience with machine learning tools and algorithms, specifically deploying LLMs/VLMs and deep learning models.
- Experience in a technical role (software engineering, AI/ML engineering, or solutions architecture).
- Experience with Python, and with modern AI-assisted development tools to accelerate prototyping.
Preferred qualifications:
- Experience with ROS/ROS2, or on-device deployment constraints (Jetson, TPU).
- Experience managing large-scale multimodal datasets, time-series telemetry data, or building automated pipelines for hardware-in-the-loop testing.
- Familiarity with the operational realities of modern vision-language-action (VLA) models or behavior cloning policies and their common pitfalls like task overfitting.
- A deep-seated interest in the future of embodied AI and a desire to build the testing bedrock for robotics development.
About the job
At Google, research-focused Software Engineers are embedded throughout the company, allowing them to setup large-scale tests and deploy promising ideas quickly and broadly. Ideas may come from internal projects as well as from collaborations with research programs at partner universities and technical institutes all over the world.
From creating experiments and prototyping implementations to designing new architectures, engineers work on real-world problems including artificial intelligence, data mining, natural language processing, hardware and software performance analysis, improving compilers for mobile platforms, as well as core search and much more. But you stay connected to your research roots as an active contributor to the wider research community by partnering with universities and publishing papers.
Our mission is to bring advanced AI into the physical realm by building generalist robots that perceive, reason, and act naturally alongside humans.
Artificial intelligence will be one of humanity’s most transformative inventions. At DeepMind, we are a pioneering AI lab with exceptional interdisciplinary teams focused on advancing AI development to solve complex global challenges and accelerate high-quality product innovation for billions of users. We use our technologies for widespread public benefit and scientific discovery, ensuring safety and ethics are always our highest priority.
US: $147000 - $211000 (USD) + 15% bonus target + bonus + equity + benefits
Learn more about benefits at Google.
Responsibilities
- Design, implement, and maintain scalable, robust frameworks to enable large-scale evaluation of robot policies across offline open-loop testing and real-world hardware evaluations.
- Partner with researchers to design the content of various benchmarks in order to maximize evaluation signal and stress-test model capabilities.
- Build diagnostic and visualization tools that allow the team to easily root-cause policy failures and track performance regressions.
- Establish evaluation criteria for model releases and own the stability and benchmarking of models slated for critical demos.
- Innovate on how to make real-world hardware evaluation faster, more reproducible, and less reliant on manual human intervention.

