GPT-Tactile is a collection of scripts for visuo-tactile object exploration and classification using a DIGIT tactile sensor and the OpenAI API. The project is designed to:
- Allow GPT to interactively instruct a human to probe objects with a DIGIT sensor, collect tactile images, and iteratively refine object predictions.
- Evaluate both open-ended and multi-choice prompt strategies.
- Assess GPT's ability to classify objects from tactile data, both in static (batch) and interactive (exploratory) settings.
- Operating System: Linux only (DIGIT sensors are supported on Linux only)
- Tested Environment: Ubuntu 22.04, Python 3.13.5
- Python Environment: Regular Python or Anaconda environment
- Packages: digit-interface, openai
git clone https://github.com/gemixin/gpt-tactile.git
cd gpt-tactile- (Optional) Set up a virtual environment:
python3 -m venv .venv source .venv/bin/activate - Install required packages:
python3 -m pip install -r requirements.txt
Create a new conda environment using the provided environment.yml:
conda env create -f environment.yml
conda activate gpt-tactileYou will need your own API key, exported as an environment variable. See the official documentation for details.
The script sends GPT a collection of 6 previously collected tactile images for five different objects (see static/images/). It does this for both prompt types (multi-choice and open-ended).
It stores the classification predictions in the static folder.
python3 static_classification.pyInteractive script where GPT guides the user to move the DIGIT sensor and capture images, aiming to classify the object.
It stores the conversation history and captured tactile images in the simple_active folder.
python3 simple_active_exploration.pySimilar to simple active, but with more explicit movement/rotation commands and axis-based control.
It stores the conversation history and captured tactile images in the active folder.
python3 active_exploration.pyThe prompts for each experiment type (static, active, simple active) can be found in their respective initial folders.
Results from my own experiments have been manually saved in the results folders.
As expected, GPT models currently struggle to interpret tactile images. During active exploration, the model struggled to understand the spatial properties of objects and often produced incorrect classifications. Performance on static images was similarly poor. These results underscore the need for pre-training on tactile data and the development of more advanced models.
If you use DIGIT or this repo in your research, please cite:
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation
Mike Lambeta, Po-Wei Chou, Stephen Tian, Brian Yang, Benjamin Maloon, Victoria Rose Most, Dave Stroud, Raymond Santos, Ahmad Byagowi, Gregg Kammerer, Dinesh Jayaraman, Roberto Calandra
IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 3, pp. 3838–3845, 2020
https://doi.org/10.1109/LRA.2020.2977257