Introduction to Game Piece Detection (ML Vision)
AprilTags are placed at known field positions. Game pieces are not. Finding an orange ring on carpet, distinguishing it from a cone in a shadow, or tracking a ball through a crowd of robots requires a different approach — machine learning. This lesson introduces neural network object detection and shows how to wire its output into autonomous decision-making.
By the end of this lesson, you will:
- Explain the conceptual difference between rule-based detection (AprilTags) and learned detection (ML/neural networks)
- Describe what a bounding box, class label, and confidence score represent in an object detection result
- Outline the steps to train a custom model with Roboflow: collect images, annotate, train, export, deploy
- Load and use a PhotonVision ML pipeline to detect game pieces and read detection results in Java
- Convert a bounding box center pixel position to a horizontal angle (yaw) using camera calibration geometry
- Build a simple autonomous "aim at game piece" behavior using ML detection output
Why Rule-Based Detection Fails for Game Pieces
AprilTags are engineered for detection. Their black-and-white high-contrast pattern is specifically designed to be unambiguously detected under varied lighting and at distance. The detection algorithm is deterministic — the same tag always produces the same result from the same angle, because the tag is a known mathematical pattern.
Game pieces are not engineered for detection. An orange ring looks different depending on lighting (overhead fluorescent vs. direct sunlight from skylights), viewing angle (face-on vs. edge-on), distance (full detail vs. a few pixels), and whether it's partially behind a field element, another robot, or in shadow. A rule-based approach — "find orange pixels above a brightness threshold" — breaks under exactly these real-competition conditions. Shadows make orange look brown. Jerseys in the crowd have orange. The robot's own mechanism, if orange, triggers false detections.
Machine learning object detection — specifically neural network models trained on hundreds or thousands of labeled examples — learns the appearance of the game piece across all these variations. The model doesn't look for "orange pixels"; it has internalized what a ring looks like from experience with real images, and it produces a confidence-weighted answer for every region of the frame.
- Works on engineered targets with known patterns
- Deterministic — same input, same output
- No training required
- Fails if the appearance varies (game pieces, natural objects)
- Returns exact pose with geometry (solvePnP)
- Zero tolerance for partial occlusion
- Works on arbitrary visual objects — game pieces, field elements, robots
- Probabilistic — returns confidence score, not certainty
- Requires training data (images + labels)
- Handles appearance variation, lighting changes, partial occlusion
- Returns bounding box (pixel rectangle), not 3D pose
- Fast enough for real-time at 30+ fps on coprocessors
What a Detection Result Contains
When a neural network detects an object in a camera frame, it returns three pieces of information for each detected instance:
- Bounding box: A rectangle in pixel coordinates that encloses the detected object. Typically expressed as center X, center Y, width, and height — all in pixels. The bounding box tells you where in the frame the object is, not where it is in 3D space.
- Class label: Which type of object was detected. A model trained to detect multiple types (e.g., "ring", "cone", "robot") returns which class each detected bounding box belongs to.
- Confidence score: A value from 0.0 to 1.0 indicating how confident the model is that this is a true detection. Low confidence detections (below ~0.5) are more likely to be false positives and should be filtered or treated cautiously.
This is fundamentally different from AprilTag detection. AprilTags give you a 3D pose (camera-relative position and rotation). ML detection gives you a 2D pixel region and a label. To get a robot-relative direction to the game piece, you must convert the bounding box center to an angle using camera geometry — covered in the interactive below.
Bounding Box Explorer: From Pixels to Angles
Click the camera view on the left to place a game piece detection at different positions. The system computes the horizontal angle (yaw) to the object from camera geometry, and the approximate distance estimate from the bounding box height. Use this to understand how pixel position maps to drive commands.
The horizontal angle to the game piece (yaw) tells you which direction to turn the robot to face the piece. It does not tell you how far away the piece is. Distance from a bounding box is a rough approximation at best — it depends on knowing the object's physical size and assuming it's oriented normally. For driving toward a game piece, the yaw is the primary control input (rotate until yaw ≈ 0°), with distance estimated from box height or derived from the time it takes to reach the piece. For precise game piece position (needed for PathPlanner navigation), fuse the camera angle with your odometry pose to project an approximate field position.
The ML Detection Pipeline
From raw camera image to robot-usable data, game piece detection passes through five stages. Each stage is a potential source of latency, error, or configuration problems.
Same camera as your AprilTag pipeline — or a dedicated second camera. Game piece detection often benefits from a downward-angled camera (intake-side) rather than the forward-facing AprilTag camera. A camera at 30–45° downward angle from 0.4–0.6 m above the floor sees floor-level game pieces at 1–4 m range effectively.
The trained model processes each camera frame. For FRC use, YOLOv5 and YOLOv8 nano/small variants are typical — designed to run in real-time on limited compute. On a Raspberry Pi 4, a YOLO nano model runs at 20–30fps at 640×640 input resolution. Larger models are more accurate but slower — there's always a tradeoff between accuracy and inference speed.
A single object often triggers multiple overlapping bounding boxes from the neural network. Non-maximum suppression (NMS) merges overlapping boxes above a threshold, keeping only the highest-confidence box for each distinct object. The confidence threshold and NMS overlap threshold are configurable in PhotonVision's pipeline settings. Lower confidence threshold = more detections but more false positives.
PhotonVision publishes detection results to NetworkTables at the same topic as AprilTag detections — under the camera's pipeline results. Each target includes the bounding box, class name, and confidence score. Your robot code reads these using the same PhotonCamera.getLatestResult() API from Lesson 3, with different target data fields for ML detections versus AprilTag detections.
The bounding box center pixel coordinates are converted to a camera-relative angle using the camera's field of view. This angle becomes a drive target — rotate until the game piece is centered. Alternatively, project the angle forward with distance estimation to get an approximate field position for PathPlanner navigation (Lesson 9 on-the-fly paths).
Training a Custom Model with Roboflow
Roboflow is the most accessible platform for FRC teams to collect, annotate, and train object detection models. Many FRC game piece models from the community are already available on Roboflow Universe — before building a custom model, check if a model for the current game's pieces already exists.
Capture 200–500 images of the game piece from your camera at various distances (0.5–4 m), angles, lighting conditions (shop, practice field, outdoor, dim), and with real competition distractors — other robots nearby, field structures in background, partially occluded pieces. Include images where the piece is NOT present (to teach the model what a true negative looks like). The model can only handle situations similar to its training data — images from your exact competition environment are more valuable than generic images.
In Roboflow's annotation tool, draw a tight bounding box around each game piece in every image. Assign the class label (e.g., "ring", "cone", "ball"). Consistency matters — a bounding box that includes too much background trains the model to associate irrelevant pixels with the object. For FRC pieces, aim for a box that is 5–15% larger than the physical piece on each side. Roboflow supports team annotation sharing — multiple annotators speed up large datasets.
Roboflow automatically applies augmentations (brightness variation, horizontal flip, rotation, crop) to artificially expand the training set. Train using the YOLOv8 nano or small architecture — PhotonVision supports ONNX export from these. Training on Roboflow's cloud GPU takes 10–30 minutes for a typical FRC dataset. After training, review the mAP (mean Average Precision) score — above 0.85 is typically competition-ready for a single-class detector.
Export the trained model in ONNX format from Roboflow. In PhotonVision, create an ML pipeline and upload the ONNX model file through the dashboard. Specify the class labels (in the same order as your Roboflow label map). PhotonVision handles the inference — your robot code reads results through the same PhotonLib API. Test by pointing the camera at a game piece and confirming green detection boxes appear in the stream.
The FRC community publishes trained models on Roboflow Universe (universe.roboflow.com) every season. Search for the current year's game piece names — "2025 REEFSCAPE ring", "2024 Crescendo note", etc. A community model with 500+ annotated images trained on competition lighting is often better than a model you train yourself in a week, and it's available immediately. Teams like 6328 (AdvantageKit authors) and others publish their vision models publicly. Check before spending time building your own, and contribute back if you improve on what you find.
Reading ML Detections in Java
PhotonVision's ML pipeline publishes results using the same PhotonCamera API as AprilTags. The difference is in which fields of PhotonTrackedTarget you read. ML targets don't have a tag ID or solvePnP pose — they have bounding box coordinates and a confidence score instead.
import org.photonvision.PhotonCamera; import org.photonvision.targeting.PhotonPipelineResult; import org.photonvision.targeting.PhotonTrackedTarget; public class GamePieceSubsystem extends SubsystemBase { // Dedicated ML pipeline camera — same API, different pipeline type private final PhotonCamera m_mlCamera = new PhotonCamera("IntakeCamera"); // Camera horizontal FOV — from calibration or camera spec sheet // Used to convert bounding box center to horizontal angle private static final double CAMERA_H_FOV_DEG = 70.0; // degrees private static final double CAMERA_WIDTH_PX = 1280.0; // pixels private static final double GAME_PIECE_HEIGHT_M = 0.10; // game piece physical height private static final double CAMERA_HEIGHT_M = 0.50; // camera height above floor private Optional<PhotonTrackedTarget> m_bestTarget = Optional.empty(); private double m_targetYawDeg = 0; private double m_estimatedDist = 0; @Override public void periodic() { PhotonPipelineResult result = m_mlCamera.getLatestResult(); if (!result.hasTargets()) { m_bestTarget = Optional.empty(); return; } // Get the highest-confidence detection among all visible game pieces m_bestTarget = result.getTargets().stream() .filter(t -> t.getConfidence() > 0.5) // reject low-confidence detections .max(Comparator.comparingDouble(PhotonTrackedTarget::getConfidence)); m_bestTarget.ifPresent(target -> { // For ML targets, yaw (horizontal angle) is the primary useful output. // PhotonVision computes this from the bounding box center using the // camera's calibration — positive yaw = target is to the right of center. m_targetYawDeg = target.getYaw(); // Rough distance estimate from bounding box height. // D ≈ (object_height_meters × camera_focal_length_px) / bbox_height_px // This is an approximation — use only for speed/approach decisions. double bboxHeight = target.getDetectedCorners().stream() .mapToDouble(c -> c.y).max().orElse(0) - target.getDetectedCorners().stream() .mapToDouble(c -> c.y).min().orElse(0); double focalLengthPx = CAMERA_WIDTH_PX / (2 * Math.tan(Math.toRadians(CAMERA_H_FOV_DEG / 2))); m_estimatedDist = (GAME_PIECE_HEIGHT_M * focalLengthPx) / bboxHeight; SmartDashboard.putNumber("GamePiece/Yaw", m_targetYawDeg); SmartDashboard.putNumber("GamePiece/Dist", m_estimatedDist); SmartDashboard.putNumber("GamePiece/Confidence", target.getConfidence()); }); } public boolean hasGamePiece() { return m_bestTarget.isPresent(); } public double getTargetYaw() { return m_targetYawDeg; } public double getEstimatedDistance() { return m_estimatedDist; } }
Using Game Piece Detection in Autonomous
The most common autonomous use of ML detection is a "drive toward nearest game piece" behavior. The yaw from the detection becomes the target for a rotation PID controller, and the distance estimate determines when to stop driving forward.
public class AimAtGamePieceCommand extends Command { private final DriveSubsystem m_drive; private final GamePieceSubsystem m_gamePiece; private final PIDController m_yawController; public AimAtGamePieceCommand( DriveSubsystem drive, GamePieceSubsystem gamePiece) { m_drive = drive; m_gamePiece = gamePiece; // Rotate to zero yaw (game piece centered in frame) // Tune kP so robot rotates toward piece without oscillating m_yawController = new PIDController(0.04, 0, 0.002); m_yawController.setSetpoint(0.0); // target: game piece centered m_yawController.setTolerance(2.0); // degrees — acceptable aim error addRequirements(m_drive); } @Override public void execute() { if (!m_gamePiece.hasGamePiece()) { // No detection — rotate slowly to search m_drive.driveRobotRelative(new ChassisSpeeds(0, 0, 0.5)); return; } double yaw = m_gamePiece.getTargetYaw(); double distance = m_gamePiece.getEstimatedDistance(); // Rotation: PID on yaw — rotate until game piece is centered // Clamped to avoid excessive rotation speed double rotation = MathUtil.clamp( m_yawController.calculate(yaw), -2.5, 2.5); // Forward drive: proportional to distance, stops near the piece // Only drive forward if robot is roughly aimed at the piece double forward = (Math.abs(yaw) < 15) ? MathUtil.clamp(distance * 0.5, 0, 2.5) : 0; m_drive.driveRobotRelative(new ChassisSpeeds(forward, 0, rotation)); } @Override public boolean isFinished() { // Done when aimed accurately AND within intake range return m_gamePiece.hasGamePiece() && m_yawController.atSetpoint() && m_gamePiece.getEstimatedDistance() < 0.3; } @Override public void end(boolean interrupted) { m_drive.driveRobotRelative(new ChassisSpeeds()); } }
Once you have a detection yaw and distance estimate, you can project an approximate field-relative position for the game piece by combining the camera angle with the robot's current field pose and known camera geometry. This field position becomes the target for AutoBuilder.pathfindToPose() (Lesson 9 of Unit 9) — giving you a smooth, physics-constrained path to the detected piece rather than a simple rotate-and-drive behavior. The game piece position will drift as the robot moves and the camera updates, so generate a new pathfindToPose() command whenever the estimated position changes significantly, interrupting the previous one.
🔌 System Check
ML pipelines have different failure modes from AprilTag pipelines. Check these before relying on game piece detection in matches:
- Model is loaded in PhotonVision and the correct pipeline is active. Open the dashboard and confirm the ML pipeline (not the AprilTag pipeline) is selected for the intake camera. The stream should show inference boxes over detected game pieces. If no boxes appear but the pipeline shows as "running," the model file may not have loaded correctly — redeploy from the dashboard's model management section.
- Confidence threshold is set appropriately. In PhotonVision's ML pipeline settings, check the confidence threshold (typically 0.45–0.65). Too low: false positives from the carpet, shadows, or other orange objects. Too high: misses pieces at distance or in shadow. Tune on your actual competition field if possible, or at minimum on carpet under fluorescent lighting.
- Detection works in competition lighting conditions. Competition venues have very different lighting from your shop. If possible, test the model before the competition at the venue during setup day. Common failure: model trained on shop images doesn't handle bright skylight overhead or direct LED competition lighting. Add competition-condition images to the training set if failures occur.
- Robot code falls back gracefully when no game piece is detected. Test the autonomous routine with no game piece visible. The robot should either search (rotate slowly), wait, or transition to an alternate path — not stop or error. Check that
hasGamePiece()returning false is handled in every code path that uses game piece detection. - Detection doesn't false-positive on the robot's own mechanisms. If any part of your robot is the same color as the game piece (orange intake rollers for an orange ring game, yellow structural elements for a yellow ball game), position the camera so the robot's own body isn't in the bottom of the frame. A model that sees the intake as a "game piece" will never finish the pickup command.
Knowledge Check
1. An ML detection pipeline returns a bounding box with center at pixel (480, 360) in a 1280×720 frame with a 70° horizontal field of view. The pixel center is to the left of the image center (640). Approximately what horizontal yaw angle does this detection correspond to, and which direction should the robot rotate to aim at the game piece?
2. A team's ML game piece detector is producing false positives on an orange sponsor banner in the background behind the scoring structure. Their model was trained only on images of game pieces on the floor. What is the most effective fix?
3. An AimAtGamePieceCommand is running during autonomous. The game piece is detected at yaw = −25°. The command rotates the robot left. After 0.5 seconds, the yaw reading jumps to +30° for two frames, then returns to −22°. What is the most likely cause, and how should this be handled in the command?
Build a Game Piece Detection Pipeline
- Search Roboflow Universe (
universe.roboflow.com) for a model trained on the current FRC season's game pieces. Download the best-available model in ONNX format. If no good model exists for the current season, use last season's game piece to learn the workflow. Import it into PhotonVision's ML pipeline and verify detections appear in the stream. - Implement the
GamePieceSubsystemfrom this lesson. Add the confidence filter (reject below 0.5). LogGamePiece/Yaw,GamePiece/Dist, andGamePiece/Confidenceto SmartDashboard. Deploy and verify the values update when you hold a game piece in front of the camera at various distances and angles. - Implement
AimAtGamePieceCommand. Place a game piece 2 meters in front of the robot, offset 30° to the right. Command the robot to aim (in place, wheels elevated) and confirm the yaw error decreases to within ±3° in under 2 seconds. Tune the PID kP value until the response is fast without oscillation. - Test the no-detection fallback: remove the game piece from the camera's view while the command is running. Confirm the robot performs the search behavior (slow rotation) rather than stopping or erroring. Restore the piece and confirm the command resumes aiming.
- Bonus: Project the game piece's approximate field position. Using your robot's current field pose (
m_drive.getPose()), the camera's horizontal FOV, and the target yaw from detection, compute an estimatedTranslation2dfor where the game piece is on the field. Pass this toAutoBuilder.pathfindToPose()to navigate to it rather than using rotate-and-drive. Log the projected position to Field2d and verify it moves realistically as you move the physical game piece.