Unit 5: Train a Tactile-Aware Policy — Paxini Gen3 Path

Step 1 — Add Tactile Modality

Extending the Policy Config

The tactile observation can be added as either the full pressure map (spatial, 64 values per frame) or the aggregated scalar (total_force_n). Start with the scalar — it is easier to normalize and sufficient for most manipulation tasks. Add the pressure map if your task requires spatial contact information (e.g., distinguishing edge vs. center contact).

# lerobot/configs/policy/act_paxini.yaml
policy:
  name: act

dataset:
  repo_id: local/paxini-grasp-place

observation_features:
  - observation.state           # joint positions (7,)
  - observation.images.wrist    # wrist camera (H, W, 3)
  - observation.tactile.total_force_n          # scalar force (1,)
  - observation.tactile.contact_area_mm2       # scalar area (1,)
  # Optional: full pressure map
  # - observation.tactile.pressure_map          # (8, 8) → flatten to (64,)

action_features:
  - action                      # target joint positions (7,)

Step 2 — Compute Normalization Statistics

Normalization Statistics from Your Dataset

Always compute normalization statistics from your specific dataset — do not use hardcoded values. The Paxini SDK provides a utility for this:

# Compute stats from your recorded dataset
from paxini.data import compute_dataset_stats

stats = compute_dataset_stats("./tactile_dataset/")
print(stats.to_yaml())

# Expected output (example values):
# observation.tactile.total_force_n:
#   mean: 2.84
#   std: 1.93
#   min: 0.0
#   max: 14.2
# observation.tactile.contact_area_mm2:
#   mean: 22.4
#   std: 18.6

# Save to config
stats.save("./act_paxini_stats.yaml")

Why normalization matters for tactile Force values (0–20 N) are on a completely different scale from joint positions (typically ±π radians) and pixel values (0–255). Without normalization, the model will learn to ignore the tactile signal because its magnitude is swamped by the visual loss term during training.

Step 3 — Training

Training Command

Launch ACT training with the tactile config. Training time: ~45 minutes on an 8GB GPU, ~3 hours on CPU.

# Install LeRobot if not already installed
pip install lerobot

# Train ACT with tactile modality
python lerobot/scripts/train.py \
    --config-name act_paxini \
    --dataset-path ./tactile_dataset/ \
    --stats-path ./act_paxini_stats.yaml \
    --output-dir ./checkpoints/act_tactile/ \
    --num-epochs 300 \
    --batch-size 8 \
    --seed 42

# Monitor training loss (in a separate terminal)
tensorboard --logdir ./checkpoints/act_tactile/logs/

Training is complete when the validation loss plateaus. For 50 episodes at 100 Hz, expect convergence around epoch 200–250. The tactile_obs_loss metric in TensorBoard should decrease steadily alongside the main action loss.

Step 4 — Evaluate vs. Vision-Only Baseline

Evaluating Improvement Over Vision-Only

Train a second policy using only vision + proprioception (no tactile channels) on the same dataset. This is your baseline. Run evaluation over 20 rollouts for each policy on your target task:

# Train the vision-only baseline (same config, minus tactile keys)
python lerobot/scripts/train.py \
    --config-name act_vision_only \
    --dataset-path ./tactile_dataset/ \
    --output-dir ./checkpoints/act_baseline/

# Evaluate both policies — 20 rollouts each
python lerobot/scripts/eval.py \
    --checkpoint ./checkpoints/act_tactile/best.ckpt \
    --num-episodes 20 \
    --eval-name "tactile"

python lerobot/scripts/eval.py \
    --checkpoint ./checkpoints/act_baseline/best.ckpt \
    --num-episodes 20 \
    --eval-name "baseline"

Look for improvement in grasp stability metrics — specifically task success rate on slip-prone objects and average hold duration during the placement phase. Tactile-aware policies typically show the largest improvement on deformable and variably-weighted objects.

What to do if tactile does not improve performance First, verify your dataset's contact events align with video (Unit 4 quality check). Second, confirm normalization statistics were computed correctly — incorrect normalization is the most common training failure mode for new modalities. Third, try adding the pressure_map as an observation instead of just the scalar — it provides richer spatial context.

What's Next

Next Directions

Multi-Sensor Fusion

Scale to 5-finger setups with per-finger tactile. The Dexterous Hands guide covers multi-sensor policy architectures.

Force Control Loops

Use the Gen3's real-time stream as a feedback signal for impedance control — adjust grip force dynamically based on detected slip events.

Reward-from-Tactile

Use grasp quality from tactile as an online reward signal for reinforcement learning fine-tuning after initial imitation learning.

Path Complete

You have gone from first sensor reading to a trained tactile-aware manipulation policy. You now have a complete pipeline — hardware, data, and policy — that you can apply to any contact-rich task.

Unit 5 Complete When...

Training completes without errors and produces a checkpoint. The tactile validation loss decreases and plateaus during training. You have run 20 evaluation rollouts for both the tactile policy and the vision-only baseline and recorded the success rates for comparison.