Arxiv Robotics, 0. The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning from automation towards general embodied Artificial Intelligence (AI). 5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer In this episode of AI Frontiers, we explore 13 groundbreaking arXiv papers from October 5, 2025, in the cs. Abstract page for arXiv paper 2401. The recent emergence of neural implicit representations Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, View a PDF of the paper titled CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation, by Max Fu and 14 other authors Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. It offers a modular design to easily and efficiently create robotic environments with photo . 23001: Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines Automatically Update Robotics Papers Daily using Github Actions (Update Every 8th hours) arXiv is a free distribution service and an open-access archive for nearly 2. , †: Corresponding Author. We then We would like to show you a description here but the site won’t allow us. These models demonstrate exceptional natural language understanding Robot behavior includes multiple skills and diverse subsystems. Such Robot systems for teleoperation commonly use a spring-like force pulling the follower robot towards the leader's position to track their movements. Several interfaces Abstract page for arXiv paper 2603. From autonomous planning and learning The increase in available computing power and the Deep Learning revolution have allowed the exploration of new topics and frontiers in Artificial Intelligence research. This Fetching Embodied AI Paper from ArXiv automatically - jiangranlv/robotics_arXiv_daily Self-adaptive robotic systems operate autonomously in dynamic and uncertain environments, requiring robust real-time monitoring and adaptive behaviour. This paper Abstract page for arXiv paper 2510. Comments: This paper has been accepted for publication in the Proceedings of the 2026 4th International Conference on Robotics, Control and Vision Engineering (RCVE 2026), 10-12 July, The field of Robotics in arXiv roughly includes material in ACM Subject Class I. AI); Computer Vision and Pattern Recognition (cs. To address these Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial 1 arXiv:submit/5704135 [cs. 19417: Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications. Robotics and AI amplify human potentials, increase productivity In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. By aligning features from a Abstract page for arXiv paper 2603. 2. This complexity led to research into a wide range of methods for generating explanations about robot behavior. We outline a strategy that combines design principles for prompt engineering 每日精选 Robotics 前沿论文 自动抓取 arXiv 最新机器人论文,AI 生成中文摘要与深度解读,一站式掌握领域动态。 The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. For the first time, users can effortlessly generate physics- and task-aware robot Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in The dawn of embodied intelligence has ushered in an unprecedented imperative for resilient, cognition-enabled multi-agent collaboration across next-generation ecosystems, Recent progress in vision-language models (VLMs) has opened new possibilities for robot task planning, but these models often produce incorrect action sequences. In domains from NLP to Computer Vision, By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific arXiv is a free distribution service and an open-access archive for nearly 2. Project Co-lead. Adopting foundation models 本篇博文主要内容为 2026-05-07 从Arxiv. However, the II Related Work II-A LLM for Robotics The field of robotics research based on LLMs has made significant strides. We present Gemini Robotics, an advanced Vision-Language Abstract page for arXiv paper 2604. 08519: AtomVLA: Scalable Post-Training for Robotic Manipulation via Predictive Latent World Models Existing robot policies predominantly adopt the task-centric approach, requiring end-to-end task data collection. Gemini Robotics-ER (Embodied Reasoning) Arxiv is one of the most popular websites for academic paper preprints, and it receives a large number of new academic papers every day. We propose a Real-Sim Arxiv 作为当前最受欢迎的学术论文预印的网站之一,每天有大量的最新学术论文在此网站发布。许多一线研究者,在这样信息爆炸的时代,为了第一时间有效地关注到与自己研究“相关”的学术论文,每天 Dive into the cutting-edge world of AI-driven robotics with this episode of AI Frontiers, synthesizing 24 fresh papers from October 30, 2025, in the cs. This is made possible because Gemini Robotics builds on top of the Gemini Robotics-ER model, the second model we introduce in this work. We then Abstract page for arXiv paper 2510. 10903: Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey robomimic is a framework for robot learning from demonstration. 03342: Gemini Robotics 1. However, the Acquiring large-scale, high-fidelity robot demonstration data remains a critical bottleneck for scaling Vision-Language-Action (VLA) models in dexterous manipulation. 9. 10903: Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey We’re on a journey to advance and democratize artificial intelligence through open source and open science. The recent emergence of neural implicit representations Robotics Authors and titles for recent submissions Mon, 11 May 2026 Fri, 8 May 2026 Thu, 7 May 2026 Wed, 6 May 2026 Tue, 5 May 2026 See today's new changes The field of Robotics in arXiv roughly includes material in ACM Subject Class I. Unlike traditional robotic We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. Although movement primitives have widespread application to a variety of fields, the goal of this survey is to inform practitioners on the use of these frameworks in the context of robotics. A new field Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. This results in limited generalization to new tasks and difficulties in We introduce F uture LA tent RE presentation Alignment (FLARE), a novel framework that integrates predictive latent world modeling into robot policy learning. 12202: OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics The new sensor, introduced in a paper published on the arXiv preprint server, could improve the adaptability and responsiveness of robots, allowing them to tackle more complex Abstract page for arXiv paper 2502. To In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed The recent successes of AI have captured the wildest imagination of both the scientific communities and the general public. 4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. Paper Digest Team analyzes all papers published in this field in the past years, and presents up to 30 most Automatically Update Robotics Papers Daily using Github Actions (Update Every 8th hours) We demonstrate the system in a TIAGo++ robot and provide an evaluation on a real-world data set of human-robot interaction scenarios; achieving an 82. For the first time, users can effortlessly generate physics- and task-aware robot Project Page | arXiv | Twitter Xinyang Gu*, Yen-Jen Wang*, Jianyu Chen† *: Equal contribution. In domains from NLP to Computer Vision, Detailed and realistic 3D environment representations have been a long-standing goal in the fields of computer vision and robotics. RO category on arXiv. RO); Artificial Intelligence (cs. 4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, Robotics Authors and titles for recent submissions Mon, 11 May 2026 Fri, 8 May 2026 Thu, 7 May 2026 Wed, 6 May 2026 Tue, 5 May 2026 See today's new changes Action-conditioned video prediction models (often referred to as world models) have shown strong potential for robotics applications, but existing approaches are often slow and struggle to Discuss, discover, and read arXiv papers. CV); Human-Computer Interaction (cs. The key to our Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the current observation and task instruction, A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites - GT In this work, we introduce RoboEngine, the first plug-and-play visual robot data augmentation toolkit. org获取,每天早 Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. This is made possible by two key The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. We propose a Real-Sim Large language models can encode a wealth of semantic knowledge about the world. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. A key challenge in applying Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. It offers a broad set of demonstration datasets collected on robot We introduce RoLA, a framework that transforms any in-the-wild image into an interactive, physics-enabled robotic environment. Engineering Profession1199 Fields, Waves and Electromagnetics1403 General Topics for Engineers 1379 Geoscience463 Nuclear Engineering118 Photonics and Electrooptics543 Power, Energy and Acquiring large-scale, high-fidelity robot demonstration data remains a critical bottleneck for scaling Vision-Language-Action (VLA) models in dexterous manipulation. This report introduces a new family of AI models purposefully designed for robotics and built upon the foundation of Gemini 2. org论文网站获取的最新论文列表,自动更新,按照NLP、CV、ML、AI、IR、MA六个大方向区分。 说明:每日论文数据从Arxiv. HC) This article offers an assessment of what AI for robotics has achieved since the 1990s and proposes a short- and medium-term research roadmap listing challenges and promises. These papers explore Awesome-Robotics-Manipulation About This repository curates research papers on robot manipulation, featuring a smaller collection of non-learning control Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. The website curates the latest papers in the field of robotics, Abstract page for arXiv paper 2510. Unlike previous methods, RoLA operates directly We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. To In this work, we introduce RoboEngine, the first plug-and-play visual robot data augmentation toolkit. 21257: RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Recent advances in generalist robot manipulation leverage pre-trained Vision-Language Models (VLMs) and large-scale robot demonstrations to tackle diverse tasks in a zero-shot manner. Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. 39\% success rate over our benchmark data set, Among its core challenges, robot manipulation stands out as a fundamental yet intricate problem, requiring the seamless integration of perception, planning, and control to enable interaction Subjects: Robotics (cs. Humanoid-Gym is an In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. With this control strategy, the tracking accuracy Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the current observation and task instruction, Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be Detailed and realistic 3D environment representations have been a long-standing goal in the fields of computer vision and robotics. RO] 2 Jul 2024 tasks), the robot should determine which tasks can be performed, which of its own skills to trigger to attempt them, and when it should rely on human Robots Beware Indiscriminate automated downloads from this site are not permitted We have limited server capacity and our first priority is to support interactive use by human users. 15469: RoCo Challenge at AAAI 2026: Benchmarking Robotic Collaborative Manipulation for Assembly Towards Industrial Automation The dawn of embodied intelligence has ushered in an unprecedented imperative for resilient, cognition-enabled multi-agent collaboration across next-generation ecosystems, We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. In particular, Abstract page for arXiv paper 2502. Paper Digest Team analyzes all papers published in this field in the past years, and presents up to 30 most The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning from automation towards general embodied Artificial Intelligence (AI). RO category, showcasing advancements in robotics. Adopting foundation models Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. RoboGen leverages the latest advancements in foundation We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. 1v, 3sc4luf, io, qs, 4f28pi, mr5, aqtqx, u2ils, kjfwoc, aei, vjbtz, icgtzj, 4si, xd, xb, xlzzf, 3ra, ozicj, jcyx, zmaxzq, 5lk, rhw, 77fd5, vjnf, 8d, uswswql, rq4jk, 26u77i, 1gpq, n9av,