We create INSTRUCT-GRASP dataset based on Cornell Grasping Dataset. It includes three components: Non, Single and Multi with 8 kinds of intructions. It has 1.8 million grasping samples, with 250k unique language-image non-instruction samples and 1.56 million instruction-following samples. Among these instruction-following samples, 654k pertain to the single-object scene, while the remaining 654k relate to the multi-object scene.
- Purpose: Existing datasets don't have instructions, they only focus on visual info.
- Total Size: Non-Instruction: 250k; Instruction-Following: 1.56M (654k for single-object, 654k for multi-object)
- Variety: Name, Shape, Color, Purpose, Position, Angle, Part, Strategy