Event
Multimodal LLM HRI paper controls a Dobot Magician arm with speech and video
Key points
- The paper presents a multimodal human-robot interaction framework for controlling a Dobot Magician robotic arm.
- The system combines Florence-2 object detection, Llama 3.1 language understanding, Whisper speech recognition, and fuzzy logic for spoken object-manipulation commands.
- Experiments on consumer-grade hardware report 75 percent command-execution accuracy.
Company context
Industrial automation company with core strength in collaborative robotic arms (cobots) and expanding into humanoid platforms.
Context
- Company
- Dobot
- Segment
- Industrial
- Event type
- Research Publication
- Geography
- Shenzhen · China