The World's Leading Managed Robot Data Collection Service

Expert operators. Professional teleoperation hardware. Your dataset format of choice. From 50-episode pilots to 10,000+ episode production campaigns — we collect the training data your robot policies need.

Leader-Follower Teleoperation VR & Glove Collection HDF5 / RLDS / LeRobot Delivery Research-Grade QA

Why Robot Training Data Quality Matters

Most robot learning failures are data failures. Here are the three problems every ML team hits — and how SVRC solves each one.

Hard to collect at scale

Building a data collection station takes weeks. Recruiting and training operators takes longer. Most teams spend 60% of their project timeline on infrastructure instead of research. SVRC operates multi-station facilities with trained operators ready to start collecting within days of project kickoff.

Operator quality varies wildly

An untrained operator produces demonstrations with inconsistent strategies, failed grasps, and jerky trajectories. These demonstrations actively harm policy training. SVRC operators complete a qualification program covering approach consistency, grasp precision, and temporal smoothness before they touch your data.

Format inconsistency wastes months

Different labs store data in incompatible schemas. Timestamp conventions differ. Camera naming is inconsistent. Converting between formats introduces subtle bugs. SVRC delivers datasets in your exact target format — HDF5, RLDS, or LeRobot — validated against your training pipeline before handoff.

How It Works

Four steps from project kickoff to training-ready dataset delivery.

1

Scope & Task Design

We work with your team to define the task specification, success criteria, observation space, action space, and scene diversity requirements. You receive a detailed collection protocol document before any data is collected.

2

Hardware Setup

We configure the robot arm, teleoperation interface, cameras, and workspace fixtures for your task. Camera extrinsics are calibrated, synchronization is verified to <5 ms, and a test episode validates the full pipeline end-to-end.

3

Expert Operator Collection

Qualified operators collect demonstrations following the agreed protocol. Real-time quality monitoring flags failed episodes, inconsistent strategies, and synchronization errors. You receive daily progress reports with throughput and quality metrics.

4

QA & Delivery

Every episode passes our 10-point quality checklist: timestamp sync, episode structure, gripper state consistency, scene reset verification, and more. Validated data is exported to your target format and delivered via secure transfer or Hugging Face Hub.

Data Collection Methods

We select the teleoperation method that matches your task requirements. Here is how they compare.

Method How It Works Precision Scale (demos/hr) Cost/Episode Best For
Leader-Follower Arms Operator moves lightweight leader arm; follower replicates joint positions at 3-8 ms latency Highest 20-35 $$ Contact-rich manipulation, insertion, bimanual tasks
VR / Quest 3 Hand controller positions mapped to end-effector via inverse kinematics Good 15-25 $ Pick-and-place, sorting, packing, gross manipulation
SpaceMouse / Keyboard 6-DOF joystick controls end-effector velocity; keyboard triggers discrete actions Moderate 5-12 $ Prototyping, navigation, low-precision tasks
Haptic Gloves Finger joint tracking drives dexterous robot hands with force feedback High (dexterous) 8-15 $$$ In-hand manipulation, assembly, tool use
Kinesthetic Teaching Operator physically guides the robot arm through the task in gravity-compensation mode High 10-18 $$ Simple tasks with compliant arms, quick data
Scripted Demos Programmatic waypoint trajectories with randomized perturbations Exact (deterministic) 60-200+ $ Structured tasks, data augmentation, baseline generation

Not sure which method fits your task? Talk to our team — we will recommend the right approach based on your task requirements and budget.

Output Formats

We deliver datasets in the format your training pipeline needs — no conversion headaches on your end.

HDF5

The gold standard for robot data. Native to ACT, ALOHA, and Diffusion Policy. Hierarchical episode structure with efficient random access and mature Python tooling via h5py.

RLDS / TFRecord

The format behind Open X-Embodiment and Octo. TensorFlow Datasets schema for cross-embodiment training. Streamable from cloud storage with efficient tf.data pipelines.

LeRobot / Parquet

Hugging Face ecosystem native. One-command upload to HF Hub with built-in visualization. Compact MP4 video storage. Growing community with 300+ public datasets.

Custom Formats

Need ROS bag, CSV, JSON-lines, or a proprietary schema? We write custom export adapters. You provide the target schema; we handle conversion and validation.

Read our detailed HDF5 vs RLDS vs LeRobot format comparison guide for technical deep dives on each format.

Pricing

Transparent pricing aligned to your project stage. Every tier includes task design, hardware setup, expert collection, QA, and delivery.

Pilot

50 – 200 episodes

From $2,500
  • Task design & protocol document
  • Single collection station
  • 1 qualified operator
  • 10-point QA on every episode
  • Delivery in 1 format (HDF5, RLDS, or LeRobot)
  • 1-2 week turnaround
Start a Pilot
Most Popular

Campaign

200 – 2,000 episodes

From $8,000
  • Everything in Pilot, plus:
  • Multi-station parallel collection
  • 2-4 dedicated operators
  • Weekly batch deliveries
  • Scene diversity management
  • Delivery in up to 2 formats
Start a Campaign

Enterprise

2,000+ episodes / custom scale

Custom
  • Everything in Campaign, plus:
  • Dedicated collection infrastructure
  • On-site or co-located deployment
  • SLA with uptime guarantees
  • Custom robot integration
  • All formats + Fearless Platform access
Contact Sales

Compatible Hardware

We operate and integrate with a wide range of robot arms. If your platform is ROS2-compatible, we can collect data on it.

OpenArm

Open-source, SVRC-designed

Franka FR3

Research-grade torque control

UR5e

Industrial collaborative arm

xArm 6/7

Cost-effective 6/7-DOF

Kinova Gen3

Lightweight research arm

Custom

Ship us your robot

See our full hardware catalog for specifications and availability.

Data Quality Standards

Every episode we deliver passes a 10-point quality assurance checklist. No exceptions.

  1. Synchronized timestamps — All sensor streams (cameras, joints, actions) aligned to <5 ms tolerance using hardware-triggered cameras and shared clock sources.
  2. Consistent episode structure — Every episode follows the same observation/action schema with identical array dimensions, data types, and key names.
  3. Operator qualification — Operators pass a proficiency test on the specific task before their episodes enter the production dataset.
  4. Task success verification — Each episode is reviewed for full task completion. Failed episodes are flagged and excluded from the primary dataset (available separately on request).
  5. Scene reset consistency — Object positions, lighting, and workspace state are reset to defined initial conditions between episodes. Randomization ranges are documented.
  6. Frame drop monitoring — Camera streams are checked for dropped frames. Episodes with >2% frame loss are re-collected.
  7. Gripper state consistency — Gripper open/close signals are validated against camera evidence. Phantom gripper events are corrected or flagged.
  8. Joint limit compliance — No episode contains joint positions outside the robot's safe operating range or near singularity configurations.
  9. Metadata completeness — Every episode includes task name, operator ID, timestamp, robot serial, camera config, and success label as structured metadata.
  10. Annotation standards — Language instructions, task phase labels, and keyframe annotations (when requested) follow the agreed annotation schema.

Who Uses SVRC Data Services

Research Labs

University and corporate research groups who need high-quality demonstration data for policy learning papers. We handle the tedious collection work so your researchers can focus on algorithms and experiments.

Startup Policy Training

Early-stage robotics companies building their first manipulation policies. Get from zero to a working policy in weeks instead of months by outsourcing data collection to operators who already know how to produce training-grade demonstrations.

Enterprise Deployment

Companies deploying robots in production who need ongoing data collection to improve policy performance, handle edge cases, and expand to new task variants. Our campaign and enterprise tiers support continuous data pipelines.

Academic Benchmarks

Research groups creating standardized benchmark datasets for the community. We provide the collection infrastructure and operator consistency needed for reproducible, high-quality benchmark datasets that other labs can build on.

Trusted by Leading Research Institutions

SVRC works with researchers and engineering teams at top universities and robotics companies to collect the demonstration data that powers state-of-the-art manipulation policies.

Stanford UC Berkeley MIT CMU Toyota Research

Frequently Asked Questions

What teleoperation hardware do you use?

We operate leader-follower arms (ALOHA-style WidowX/ViperX and OpenArm setups), Meta Quest 3 VR systems, 6-DOF SpaceMouse interfaces, and SenseGlove Nova 2 haptic gloves. We select the interface that best matches your task requirements for precision, throughput, and data quality. For bimanual tasks, we run dual leader-follower or dual VR configurations. See our bimanual teleoperation guide for details.

What formats do you deliver?

We deliver datasets in HDF5 (ACT/ALOHA compatible), RLDS/TFRecord (for Open X-Embodiment and Octo), LeRobot Parquet (Hugging Face Hub ready), or custom formats. You specify the format in your project brief, and we handle all conversion and validation. Read our format comparison guide for details on each.

How long does a data collection campaign take?

A pilot program (50-200 episodes) typically takes 1-2 weeks from kickoff to delivery, including task design and hardware setup. A standard campaign (200-2,000 episodes) takes 2-6 weeks depending on task complexity and scene diversity requirements. Enterprise-scale projects are scoped individually. Rush delivery is available for pilots at additional cost.

Can you collect data on my robot?

Yes. We work with OpenArm, Franka FR3, UR5e, xArm, Kinova Gen3, and most ROS2-compatible robot arms. If you ship us your robot or we can procure one, we integrate it into our collection infrastructure. Custom integrations typically take 3-5 business days. We also support mobile manipulators and bimanual configurations.

What is a typical cost per episode?

Cost per episode ranges from $8-$35 depending on task complexity, number of camera views, teleoperation method, and QA requirements. Simple tabletop pick-and-place tasks are at the lower end; contact-rich bimanual tasks with dexterous hands are at the higher end. Volume discounts apply for campaigns over 500 episodes. Contact us for a detailed quote based on your specific requirements.

Do you sign NDAs?

Yes. We sign mutual NDAs before any project discussion that involves proprietary tasks, robot configurations, or research goals. All data collected under contract is owned by the client. We do not retain copies, use client data for any other purpose, or include client data in public datasets. We also support custom data governance and security requirements for enterprise clients.

Ready to Start Your Data Collection Campaign?

Tell us about your task, robot, and timeline. We will scope a collection program and send you a detailed proposal within 48 hours.

Teleop Dataset Program

Build your dataset scope

Tell us the robot setup, modalities, volume, and license intent. We will return a structured lead plus a starter schema, capability matrix, and rough pricing band.