Neural-NPT: A Reinforcement Learning Perspective to Dynamic Non-Prehensile Object Transportation

Abstract

This work proposes Neural-NPT, a learning-based approach to dynamic non-prehensile object transportation (NPT) that enables fast planning of fast-reaching and robust trajectories. While model-based approaches start with a strong dynamic grasping assumption and optimize trajectories under such constraints, this optimization process is prone to failure, slow convergence, or conservative suboptimal solutions due to dynamic grasping constraints. To address these limitations, we propose a motion-planning neural policy learned via reinforcement learning. The task is formulated as a Markov decision problem, with a carefully designed observation space, reward formulation, and an acceleration-based action space for smooth trajectory generation and sim-to-real transferability. Prior to training, the sim-to-real robot dynamics gap was minimized through system identification. We randomized object size, friction, mass, center of mass, and initial pose during training for improved generalization and robustness. By curating the randomization process, different policies are obtained, including an “Optimal” policy with full observability and a “Robust” policy under partial observability of object inertia and pose. Through policy roll-out, fast and robust trajectories are planned offline and successfully deployed in both simulated and physical environments. Both our “Optimal” and “Robust” policies had higher peak velocities and accelerations, and shorter reaching times compared to the model-based baseline.

Real-world Validation

------------------------------

Transport Anything

Randomly swapping objects during exection of Optimal policy optimized for the WoodBlock. Learned trajectories is stable under arbitrary objects.

Optimal Vs. Optimal-DG

--------------- (WoodBlock) ---------------

--------------- (Chips) ---------------

--------------- (CrackerBox) ---------------

--------------- (BleachCleanser) ---------------

--------------- (TallBox-Center) ---------------

COM-Robust Vs. Optimal

For objects with inertia uncertainity, COM-Robust policy is more stable yet slower.

--------------- (Power Drill) ---------------

--------------- (Pitcher) ---------------

--------------- (TallBox-Top) ---------------

Multi-Robust Vs. Optimal

For objects with pose uncertainity (i.e. Multi-object transportation), Multi-Robust policy is more stable yet slower.

--------------- (Noodles) ---------------

--------------- (Bottles) ---------------

Friction Coefficient Variations

We test performance of Optimal policy for two surfaces rubber (

\mu=0.8

) and paper (

\mu=0.4

). Different trajectories are generated by varying the input friction to the policy. It is always safer to assume low friction (

\mu=0.1

). High-friction surfaces fails due to tilting, while low-friction surfaces failed to slippage.

--------------- (

\mu=0.1

) ---------------

--------------- (

\mu=0.2

) ---------------

--------------- (

\mu=0.3

) ---------------

--------------- (

\mu=0.4

) ---------------

--------------- (

\mu=0.5

) ---------------

--------------- (

\mu=1.0

) ---------------

Extra Footage

NoodlesTower

Safe Self-Collision in Upright-Robust Baseline Pybullet Simulation

Despite success in simulation, many real trajectories failed under Upright-Robust baseline. Due to different contact models and approximated geometry, collision only caused moderate sliding in simulation. On hardware, the object (pitcher) didn’t slide and tilted aggressively before falling.

Simulation trajectories

BibTeX citation

    @article{neuralNPT_sr2026,
  author = "{Abdullah Mustafa, Ryo Hanai, Ixchel G. Ramirez Alpizar, Floris Erich, Ryoichi Nakajo, Yukiyasu Domae, Tetsuya Ogata}",
  title = "Neural-NPT: A Reinforcement Learning Perspective to Dynamic Non-Prehensile Object Transportation",
  journal = "Scientific Reports"
  year = "2026",
}

Abstract

Real-world Validation

------------------------------

Transport Anything

Optimal Vs. Optimal-DG

--------------- (WoodBlock) ---------------

--------------- (Chips) ---------------

--------------- (CrackerBox) ---------------

--------------- (BleachCleanser) ---------------

--------------- (TallBox-Center) ---------------

COM-Robust Vs. Optimal

--------------- (Power Drill) ---------------

--------------- (Pitcher) ---------------

--------------- (TallBox-Top) ---------------

Multi-Robust Vs. Optimal

--------------- (Noodles) ---------------

--------------- (Bottles) ---------------

Friction Coefficient Variations

--------------- (μ=0.1\mu=0.1μ=0.1) ---------------

--------------- (μ=0.2\mu=0.2μ=0.2) ---------------

--------------- (μ=0.3\mu=0.3μ=0.3) ---------------

--------------- (μ=0.4\mu=0.4μ=0.4) ---------------

--------------- (μ=0.5\mu=0.5μ=0.5) ---------------

--------------- (μ=1.0\mu=1.0μ=1.0) ---------------

Extra Footage

NoodlesTower

Safe Self-Collision in Upright-Robust Baseline Pybullet Simulation

Simulation trajectories

BibTeX citation

--------------- ( $\mu=0.1$ ) ---------------

--------------- ( $\mu=0.2$ ) ---------------

--------------- ( $\mu=0.3$ ) ---------------

--------------- ( $\mu=0.4$ ) ---------------

--------------- ( $\mu=0.5$ ) ---------------

--------------- ( $\mu=1.0$ ) ---------------