Guide · Hardware · VLA

Choosing Wrist Cameras and the Mounting Angle for VLA Policies

June 9, 2026 · Prometheus Robotics

Ask anyone who has trained a manipulation policy what mattered most, and surprisingly often the answer is not the model — it is the camera. For vision-language-action (VLA) policies in particular, the wrist camera and the angle it is mounted at quietly decide whether the policy ever learns to grasp reliably. It is one of the cheapest things to get right and one of the most expensive things to get wrong, because a bad angle poisons every demonstration you collect.

This guide explains why the wrist view is so important for VLAs, how to choose the mounting angle in practice, and the lens, calibration, and consistency details that make the difference — using the Prometheus humanoid, whose wrist-camera angle is adjustable precisely for this reason.

Why VLAs lean on the wrist camera

Manipulation policies use two complementary viewpoints:

The reason the wrist view matters so much for VLAs is the last few centimetres. As the hand approaches an object, a scene camera’s view of the contact point gets blocked by the arm and gripper. The wrist camera, moving with the hand, keeps the target in frame exactly when precision matters most. Policies trained with a good eye-in-hand view are markedly more robust to object position, because they can servo to what they see rather than memorizing absolute coordinates from a fixed camera.

The head camera tells the policy what and where; the wrist camera tells it how to close the last gap. VLAs use both — but it’s the wrist view that usually separates a 40% success rate from a 90% one on contact-rich tasks.

Choosing the mounting angle

The mounting angle is the tilt of the wrist camera relative to the gripper’s approach axis. There is no single correct number — it depends on your tasks — but the trade-off is consistent:

A practical method to pick it

  1. Pick the hardest grasp in your task set (small object, tight clearance).
  2. Teleoperate that grasp slowly while watching the live wrist feed. Adjust the Prometheus wrist angle until the fingertips and the object stay visible from the start of the approach all the way to contact.
  3. Check your other tasks at that angle — a tabletop pick and a shelf reach frame very differently. Find the angle that serves the whole task set, not just one grasp.
  4. Lock it, write it down, and don’t touch it again.

Consistency beats perfection. A slightly suboptimal angle used for every demonstration trains a fine policy. A great angle that drifts between demos trains nothing — to the model, a changed wrist angle is a different camera, and the dataset becomes inconsistent. This is the single most common way teams quietly ruin a dataset. Fix the angle before episode one and keep it identical through collection, evaluation, and deployment.

Lens, field of view, and image quality

Calibration and what the policy sees

For most imitation-learning setups you feed the wrist image directly and let the policy learn the geometry implicitly — you don’t need perfect extrinsics for a VLA to work. But it pays to know the camera’s pose relative to the gripper, for two reasons: it lets you reproduce the exact framing if hardware is swapped, and it lets you replay or augment data correctly. Record the mounting angle and extrinsics alongside the dataset so the setup is reproducible months later.

One wrist camera or two?

A single wrist camera on the working hand covers most single-arm manipulation. For bimanual tasks — handovers, two-handed assembly — give each arm its own wrist camera so both grasps are observed; a VLA trained on both views coordinates the hands far better than one fed a single viewpoint. On Prometheus this maps onto the Type X manipulator options, which support wrist cameras (and five-finger hands) per arm.

How Prometheus is set up for this

The platform was built with this exact problem in mind:

Common mistakes

Where this fits

Camera placement is upstream of everything else: it shapes the data you collect, which shapes whatever policy you train on it. Once your wrist view is dialed in and locked, you’re ready to collect demonstrations and train — whether that’s ACT from scratch or a fine-tuned π0.7. Get the angle right first, and both work far better.

Run this on a real humanoid

Prometheus ships with the teleoperation pipeline, stereo + wrist cameras, URDF, simulator, and SDK you need to start collecting data on day one.