Neural-NPT: A Reinforcement Learning Perspective to Dynamic Non-Prehensile Object Transportation

Abdullah Mustafa

AIST

Ryo Hanai

AIST

Ixchel G. Ramirez-Alpizar

AIST

Floris Erich

AIST

Ryoichi Nakajo

AIST

Yukiyasu Domae

AIST

Tetsuya Ogata

Waseda University

Scientific Reports

Corresponding Author: am-mustafa@aist.go.jp

Abstract

This work proposes Neural-NPT, a learning-based approach to dynamic non-prehensile object transportation (NPT) that enables fast planning of fast-reaching and robust trajectories. While model-based approaches start with a strong dynamic grasping assumption and optimize trajectories under such constraints, this optimization process is prone to failure, slow convergence, or conservative suboptimal solutions due to dynamic grasping constraints. To address these limitations, we propose a motion-planning neural policy learned via reinforcement learning. The task is formulated as a Markov decision problem, with a carefully designed observation space, reward formulation, and an acceleration-based action space for smooth trajectory generation and sim-to-real transferability. Prior to training, the sim-to-real robot dynamics gap was minimized through system identification. We randomized object size, friction, mass, center of mass, and initial pose during training for improved generalization and robustness. By curating the randomization process, different policies are obtained, including an “Optimal” policy with full observability and a “Robust” policy under partial observability of object inertia and pose. Through policy roll-out, fast and robust trajectories are planned offline and successfully deployed in both simulated and physical environments. Both our “Optimal” and “Robust” policies had higher peak velocities and accelerations, and shorter reaching times compared to the model-based baseline.

Real-world Validation

------------------------------

Transport Anything

Randomly swapping objects during exection of Optimal policy optimized for the WoodBlock. Learned trajectories is stable under arbitrary objects.

Optimal Policy [32/32]

Optimal Vs. Optimal-DG

Optimal is faster but may not be as stable for long time operation

--------------- (WoodBlock) ---------------

Optimal Policy — N.C.S. [32/32]
Optimal-DG Policy — N.C.S. [32/32]
Upright Baseline — N.C.S. [29/32]

--------------- (Chips) ---------------

Optimal Policy — N.C.S. [32/32]
Optimal-DG Policy — N.C.S. [32/32]
Upright Baseline — N.C.S. [31/32]

--------------- (CrackerBox) ---------------

Optimal Policy — N.C.S. [32/32]
Optimal-DG Policy — N.C.S. [32/32]
Upright Baseline — N.C.S. [28/32]

--------------- (BleachCleanser) ---------------

Optimal Policy — S.R. [32/32]
Optimal-DG Policy — S.R. [32/32]
Upright Baseline — S.R. [30/32]
Optimal Policy — N.C.S. [12/32]
Optimal-DG Policy — N.C.S. [22/32]
Upright Baseline — N.C.S. [30/32]

--------------- (TallBox-Center) ---------------

Optimal Policy — S.R. [30/32]
Optimal-DG Policy — S.R. [32/32]
Upright Baseline — S.R. [30/32]
Optimal Policy — N.C.S. [14/32]
Optimal-DG Policy — N.C.S. [32/32]
Upright Baseline — N.C.S. [24/32]

COM-Robust Vs. Optimal

For objects with inertia uncertainity, COM-Robust policy is more stable yet slower.

--------------- (Power Drill) ---------------

COM-Robust Policy — S.R. [14/32]
Optimal Policy — S.R. [0/32]
Upright-Robust Baseline — S.R. [24/32]
COM-Robust Policy — N.C.S. [14/32]
Optimal Policy — N.C.S. [0/32]
Upright-Robust Baseline — N.C.S. [*/32]

--------------- (Pitcher) ---------------

COM-Robust Policy — S.R. [32/32]
Optimal Policy — S.R. [28/32]
Upright-Robust Baseline — S.R. [24/32]
COM-Robust Policy — N.C.S. [25/32]
Optimal Policy — N.C.S. [2/32]
Upright-Robust Baseline — N.C.S. [*/32]

--------------- (TallBox-Top) ---------------

COM-Robust Policy — S.R. [32/32]
Optimal Policy — S.R. [0/32]
Upright-Robust Baseline — S.R. [13/32]
COM-Robust Policy — N.C.S. [32/32]
Optimal Policy — N.C.S. [0/32]
Upright-Robust Baseline — N.C.S. [*/32]

Multi-Robust Vs. Optimal

For objects with pose uncertainity (i.e. Multi-object transportation), Multi-Robust policy is more stable yet slower.

--------------- (Noodles) ---------------

Multi-Robust Policy — S.R. [32/32]
Optimal Policy — S.R. [18/32]
Multi-Robust Policy — N.C.S. [32/32]
Optimal Policy — N.C.S. [5/32]

--------------- (Bottles) ---------------

Multi-Robust Policy — S.R. [32/32]
Optimal Policy — S.R. [0/32]
Multi-Robust Policy — N.C.S. [32/32]
Optimal Policy — N.C.S. [0/32]

Friction Coefficient Variations

We test performance of Optimal policy for two surfaces rubber (μ=0.8\mu=0.8) and paper (μ=0.4\mu=0.4). Different trajectories are generated by varying the input friction to the policy. It is always safer to assume low friction (μ=0.1\mu=0.1). High-friction surfaces fails due to tilting, while low-friction surfaces failed to slippage.

--------------- (μ=0.1\mu=0.1) ---------------

Rubber — S.R. [32/32]
Paper — S.R. [32/32]
Rubber — N.C.S. [32/32]
Paper — N.C.S. [13/32]

--------------- (μ=0.2\mu=0.2) ---------------

Rubber — S.R. [32/32]
Paper — S.R. [32/32]
Rubber — N.C.S. [32/32]
Paper — N.C.S. [4/32]

--------------- (μ=0.3\mu=0.3) ---------------

Rubber — S.R. [32/32]
Paper — S.R. [25/32]
Rubber — N.C.S. [29/32]
Paper — N.C.S. [2/32]

--------------- (μ=0.4\mu=0.4) ---------------

Rubber — S.R. [32/32]
Paper — S.R. [16/32]
Rubber — N.C.S. [4/32]
Paper — N.C.S. [1/32]

--------------- (μ=0.5\mu=0.5) ---------------

Rubber — S.R. [25/32]
Paper — S.R. [11/32]
Rubber — N.C.S. [5/32]
Paper — N.C.S. [1/32]

--------------- (μ=1.0\mu=1.0) ---------------

Rubber — S.R. [22/32]
Paper — S.R. [8/32]
Rubber — N.C.S. [5/32]
Paper — N.C.S. [1/32]

Extra Footage

NoodlesTower

Multiple stacked objects

Multi-Robust Policy — N.C.S. [8/32]

Safe Self-Collision in Upright-Robust Baseline Pybullet Simulation

Despite success in simulation, many real trajectories failed under Upright-Robust baseline. Due to different contact models and approximated geometry, collision only caused moderate sliding in simulation. On hardware, the object (pitcher) didn’t slide and tilted aggressively before falling.

Simulation trajectories

BibTeX citation

    @article{neuralNPT_sr2026,
  author = "{Abdullah Mustafa, Ryo Hanai, Ixchel G. Ramirez Alpizar, Floris Erich, Ryoichi Nakajo, Yukiyasu Domae, Tetsuya Ogata}",
  title = "Neural-NPT: A Reinforcement Learning Perspective to Dynamic Non-Prehensile Object Transportation",
  journal = "Scientific Reports"
  year = "2026",
}