
The Fraunhofer Institute for Manufacturing Engineering and Automation IPA has developed a comprehensive benchmark for the standardized analysis of humanoid robots. For the first time, manufacturers and end users can now have the actual capabilities, safety and suitability of these robots objectively evaluated by a neutral body. The modular benchmark comprises six application-relevant criteria and is based on internationally recognized industry standards.
From media presence to realistic evaluation
Humanoid robots are omnipresent in the media and fascinate people with their human-like appearance. However, there is a huge gap between spectacular staging and actual capabilities. “For end users and manufacturers, it is essential to take a look behind the façade sometimes created by marketing agencies,” explains Simon Schmidt, Head of the Automated Systems business unit at Fraunhofer IPA. “The market is too volatile and non-transparent to be able to assess and reliably evaluate humanoids for their own applications.”
What is the benchmark?
The benchmark is a standardized service in which Fraunhofer IPA research teams guide humanoid robots through various challenges and scientifically evaluate the results. The basis for this was created thanks to funding from the Baden-Württemberg Ministry of Economics, Labor and Tourism as part of the AI Progress Center “Learning Systems and Cognitive Robotics”.
The modular structure of the benchmark enables manufacturers, end users and software providers to specifically test the areas relevant to their application. Where possible, the benchmarking is based on established industry standards that have been internationally recognized for decades – for example ISO 14644 for cleanroom suitability or ISO 10218 and ISO TS 15066 for functional safety.
The benchmark is divided into six central areas:
1. technologies and basic capabilities: Examination of installed sensors, AI models, gripper types as well as tests on walking speed, gripping forces and manageable loads. Objective measured values are recorded using a 3D tracking system and force sensors.
2 Complex capabilities: Assessment of practical generic tasks such as walking up stairs, overcoming obstacles, movement and force accuracy and reaction speed. The tests are deliberately designed to be demanding in order to make future model generations comparable.
3. Cleanroom suitability: evaluation of particle release in accordance with ISO 14644-14, outgassing behavior and cleanability – crucial for use in the semiconductor, pharmaceutical or food industries.
4. Functional safety: Central to human-robot collaboration. Stability on different surfaces, force limitation in the event of collisions, obstacle detection and system behavior in the event of failures are tested. Collision tests are carried out using the same force sensors as for collaborative industrial robots.
5. Cybersecurity: Four modules test vulnerability management, secure life cycle, network security and penetration resistance – a critical factor in view of increasing legal requirements.
6. Energy efficiency: Measurement of battery life and power consumption in various scenarios (standing, walking, walking with incline and load). The results enable realistic deployment planning and optimization of charging cycles.
Fraunhofer IPA applied the benchmark comprehensively for the first time using the Unitree G1 as an example. The technical basis was a Unitree G1 EDU-4 with Dex3-1 3-finger hands and firmware version 1.04 delivered in May 2025.
While the robot shows good self-stabilization and could be suitable for ISO class 5 cleanrooms, clear limitations also became apparent. Forces in excess of 500 Newtons can occur during collisions – well above the pain thresholds permitted by the standard. In addition, the researchers identified a critical Bluetooth security gap in the software version available at the time of testing, which allows full remote control by attackers. This vulnerability has since been rectified. In terms of energy efficiency, the maximum operating times with one battery charge were 2 hours and 49 minutes when standing and 1 hour and 49 minutes in a typical scenario involving standing and walking.
Relevance of the benchmark for companies
“Users can interpret the results directly and thus find the right humanoid for the right application,” emphasizes Werner Kraus, Head of Research at Fraunhofer IPA. The benchmark makes humanoids comparable not only with each other, but also with proven automation components. This is particularly important because:
- demographic change is forcing the use of automation in previously manual areas
- high investment decisions require well-founded, objective basis for evaluation
- safety standards for humanoids are not expected until 2028 (ISO 25785-1)
- regulatory requirements for cyber security are increasing
- sensitive production environments require reliable data to prevent contamination
The benchmark provides transparency in a non-transparent market and enables companies to develop realistic expectations and minimize risks.
Fraunhofer IPA plans to test further humanoids and build up a comparative database. Manufacturers and users can now commission individual benchmark modules through to full-scale studies and benefit from the existing infrastructure and expertise.
– – – – – –
Further links
👉 www.ipa.fraunhofer.de
Photo: Fraunhofer IPA/ Rainer Bez