Unit 10 · Lesson 1

Introduction to AprilTags

Odometry tracks where the robot thinks it is. AprilTags tell it where it actually is. This lesson covers what AprilTags are, what physical information a camera can extract from them, and why that information transforms autonomous performance from "roughly correct" to "competition-reliable."

By the end of this lesson, you will:

Explain what an AprilTag is, what makes it uniquely detectable by a camera, and how its ID is encoded
Describe the four pieces of information a camera can extract from a single AprilTag detection
Identify the AprilTag family used in FRC and explain why tag size and family choice affect detection range and accuracy
Trace the pose estimation pipeline: from camera image → corner detection → solvePnP → robot pose
Explain the conceptual difference between getting a tag's ID and getting a full robot pose from a tag
Identify the physical and environmental factors that limit AprilTag detection reliability in a competition environment

What an AprilTag Is

An AprilTag is a specific type of fiducial marker — a printed pattern designed to be uniquely and reliably detected by a camera. The name comes from the University of Michigan team that developed the detection algorithm ("April" was the team's internal project name). The technology is used across robotics, augmented reality, and industrial automation, and FIRST adopted it for FRC starting in the 2023 season.

Unlike a QR code (which stores arbitrary data) or a color target (which requires specific lighting to distinguish), an AprilTag is optimized for one purpose: letting a camera answer two questions simultaneously — "which tag is this?" and "where is this tag relative to my camera?" The design decisions behind every aspect of the tag — the border, the bit matrix, the minimum cell size — flow directly from that dual goal.

Physically, an AprilTag is a printed black-and-white square. In FRC, they're printed at a standard size (6.5 inches for the 36h11 family used in most FRC games) and mounted at known field positions. The field layout file published by FIRST before each season lists the exact 3D pose of every tag on the field. Your robot's code downloads this layout, detects the tags with its camera, and uses the known physical positions to determine where the robot must be on the field to see the tags at those apparent angles and sizes.

AprilTag Anatomy: What Each Zone Does

Every AprilTag has a precise structure. Each zone serves a specific role in the detection pipeline. Click each zone to understand what it contributes and why it must be the way it is.

AprilTag 36h11 anatomy — click a zone to inspect Tag ID: 1

Border

Quiet Zone

Data Bits

Corners

black border ring The border defines the tag's boundary for corner detection

The solid black border surrounding the data bits is what allows the detection algorithm to find the tag in the first place. The camera pipeline looks for quadrilaterals — four-sided shapes — in the image that have the high-contrast black border. Once a quadrilateral candidate is found, the border's known size and shape let the algorithm determine whether it's an AprilTag or just any dark rectangle in the scene.

The border must be a minimum of one cell wide. Making it wider improves detection reliability at long range (more contrast area) but reduces the space available for data bits (lowering the maximum number of unique IDs).

Select tag:

What a Camera Can Extract from a Single Tag

When a camera detects an AprilTag successfully, the detection pipeline produces more than just an ID number. The four pieces of information below are what make vision-based localization possible.

🔢 Tag ID int (1–587 for 36h11)

The numeric identifier encoded in the tag's bit matrix. Combined with the field layout file, the ID tells you exactly where on the field this tag is mounted — its 3D position and orientation relative to the field coordinate origin.

📐 Tag Corners List<Point2d> — 4 pixel coordinates

The pixel coordinates of the tag's four corners in the camera image. These are the raw measurements from which everything else is computed. More accurate corner detection → more accurate pose estimate. Blurring, motion, and low resolution all degrade corner accuracy.

📏 Camera-to-Tag Transform Transform3d (translation + rotation)

The 3D rigid body transform from the camera's optical center to the tag's center. Includes the distance (Z), lateral offset (X), vertical offset (Y), and three rotation angles. This is the output of the solvePnP algorithm applied to the corner pixel coordinates and known tag physical size.

📊 Ambiguity / Decision Margin double (higher = more confident)

A confidence score for the detection. For solvePnP-based pose estimation, there are technically two valid geometric solutions for a single tag's pose (the "pose ambiguity" problem). The decision margin indicates how strongly the algorithm prefers one solution over the other. Low ambiguity (competing solutions) produces unreliable pose estimates.

💡 The camera-to-tag transform is not the robot pose — it's a step toward it

The detection pipeline gives you where the tag is relative to the camera. To get the robot's field-relative pose, you need to chain three transforms: (1) camera-to-tag transform from detection, inverted to get tag-to-camera; (2) tag's known field-relative pose from the field layout; (3) camera's known position on the robot (robot-to-camera transform, measured and configured offline). This chain is: field → tag → camera → robot. WPILib's PhotonPoseEstimator (Lesson 4) handles this chain automatically — but understanding each step is what lets you debug when results are wrong.

The Pose Estimation Pipeline

From a raw camera frame to a robot field position, the pipeline passes through five distinct steps. Each one is a potential source of error — and each has different failure modes.

Image capture: camera acquires a frame

The camera produces a raw image — typically grayscale for AprilTag processing, at a resolution between 640×480 and 1280×720 in typical FRC use. Frame rate (how many images per second) and exposure time are the two most important camera settings for AprilTag detection. A camera that captures 30 fps with a 5 ms exposure handles a fast-moving FRC robot much better than one that captures 10 fps at 50 ms. Longer exposures cause motion blur, which smears the tag's corners and degrades detection accuracy.

Threshold and quadrilateral detection

The detection library (PhotonVision, WPILib's integrated detector, or Limelight's pipeline) converts the image to binary (black/white), then searches for four-sided dark regions — quadrilateral candidates. Any dark rectangular shape could be a candidate at this stage. The algorithm uses the border's minimum width and aspect ratio constraints to filter out non-tag quadrilaterals.

ID decoding: read the bit matrix

For each quadrilateral candidate, the algorithm samples the pixel brightness at the expected data bit positions (the inner grid after the border). Each sampled position is classified as black (0) or white (1), producing a binary number. This number is matched against the known valid IDs in the tag family's dictionary. A match confirms the tag's ID; an invalid bit pattern rejects the candidate. The 36h11 family's error detection allows single-bit errors to be corrected.

solvePnP: compute camera-to-tag transform

"solvePnP" — Solve Perspective-n-Point — is the core geometry algorithm. Given the four corner pixel coordinates (2D image points) and the known physical size of the tag (the four 3D world points of the tag's corners), solvePnP solves for the rotation and translation that would produce exactly those 2D observations from a camera at an unknown position. The result is the 3D rigid body transform from the camera's optical center to the tag center. This is camera calibration data is required to run solvePnP — the calibration tells the algorithm how the camera's lens maps 3D world points to 2D pixel coordinates.

Transform chain: camera-to-tag → robot field pose

Chain the transforms: invert the camera-to-tag transform to get camera pose relative to the tag, then apply the tag's known field-relative pose to get camera field pose, then apply the known robot-to-camera transform to get robot field pose. The output is a Pose2d or Pose3d representing where the robot is on the field. This is the measurement that SwerveDrivePoseEstimator.addVisionMeasurement() (Lesson 6) consumes to correct odometry drift.

AprilTag Families and the FRC Standard

AprilTags come in several "families" — different grid sizes and encoding schemes that make different tradeoffs between detection range, unique ID count, and error correction. Understanding which family FRC uses and why prevents a class of configuration mistakes.

Family	Grid size	Unique IDs	Error correction	FRC use
16h5	4×4 data bits	30	Minimal	Not used
25h9	5×5 data bits	35	Moderate	Not used
36h11	6×6 data bits	587	Strong (Hamming distance ≥ 11)	✓ 2023 season onward
Tag16h5 (legacy)	4×4 data bits	30	Minimal	Some early games

FRC uses 36h11 as the standard family. The "36" means the full tag including border is 8×8 cells (6×6 data + 1-cell border on each side). The "h11" means a Hamming distance of at least 11 between any two valid IDs — if up to 5 bits in the read bit pattern are corrupted, the algorithm can still identify the tag correctly. This makes it robust to the partial occlusions, lighting variation, and motion blur typical of an FRC match environment.

💡 Always verify you're configured for the correct family

PhotonVision, Limelight, and WPILib's AprilTagDetector must all be configured to use the same tag family as the tags on the field. If you configure your detector for 16h5 and the field uses 36h11, the detection will either fail entirely or produce false positives. Before every season, verify your vision pipeline's tag family setting matches the official WPILib AprilTagFields constant for that year. Check the PhotonVision camera settings page and confirm the family dropdown shows "36h11" (or "tag36h11" depending on the UI version).

What Limits Detection in Competition

AprilTag detection is not equally reliable in all conditions. The following factors reduce detection quality, and understanding them informs both camera placement decisions (hardware) and measurement trust decisions (software, covered in Lesson 7).

Distance. As the robot moves farther from a tag, the tag subtends fewer pixels. Below approximately 10–15 pixels per tag cell, corner detection accuracy degrades significantly. At typical FRC competition distances (2–8 meters), a 6.5-inch tag fills between 10 and 80 pixels per side depending on camera resolution and focal length. Detection is reliable at 3–5 meters, increasingly unreliable beyond 7 meters.
Angle. A tag viewed at a steep angle (more than ~60° off-normal) appears highly foreshortened. The corner pixel coordinates are compressed along one axis, reducing solvePnP accuracy. A pose estimated from a tag viewed at 75° may have 5–10× more error than the same tag viewed straight-on.
Motion blur. At a camera exposure of 20 ms and a robot velocity of 3 m/s, the tag moves 6 cm during a single frame — several pixels at competition distances. This smears corner positions and reduces detection confidence. Short exposures (2–5 ms) prevent blur but require brighter illumination or a more sensitive camera.
Lighting. FRC fields have inconsistent lighting: venue ceiling lights, flashing LEDs from other robots, sunlight through venue skylights, and team-specific robot illumination. High contrast between the tag's black and white cells is required for binary thresholding to work correctly. Very low ambient light or strong directional lighting that washes out contrast both degrade detection.
Occlusion. If another robot, field element, or the robot's own mechanism partially covers the tag, the detection may fail or produce incorrect corner coordinates. Partial occlusion with fewer than 4 visible corners will fail entirely since solvePnP requires all 4.
Camera calibration quality. solvePnP uses the camera's intrinsic calibration (focal length, principal point, distortion coefficients) to map pixels to angles. A poorly calibrated camera produces systematically biased pose estimates — the tag appears closer or at a wrong angle regardless of detection quality. Calibration is covered in Lesson 3.

🔍 What bad vision measurements do to your autonomous

A single corrupted AprilTag pose estimate injected into your pose estimator can teleport your robot's tracked position 2–3 meters instantaneously. PathPlanner or sensor-based commands then try to navigate from this phantom position — driving the robot to a completely wrong location or into a field element. This is not a hypothetical scenario. It happens regularly at competitions to teams that don't filter bad vision measurements. Unit 10, Lesson 8 covers the filtering strategies. The lesson here is that vision is powerful but requires trust management — you don't blindly accept every measurement just because the camera produced one.

A First Look at the Data Structures

You won't write vision integration code until Lessons 3–6, but understanding the WPILib data types involved now prevents conceptual confusion later. These classes will be referenced throughout the unit.

Key WPILib / PhotonVision types for AprilTag data

// The field layout — loaded once, used throughout the match
// AprilTagFieldLayout stores the 3D Pose3d of every tag on the 2025 field
AprilTagFieldLayout fieldLayout =
    AprilTagFields.k2025ReefScapeV2.loadAprilTagLayoutField();

// Get a specific tag's field pose by ID
Optional<Pose3d> tagPose = fieldLayout.getTagPose(5);
// Returns Optional.empty() if that ID isn't in the layout

// A single AprilTag detection result (from PhotonVision, Lesson 3)
PhotonTrackedTarget target;
int    tagId    = target.getFiducialId();          // which tag (1–587)
Transform3d camToTag = target.getBestCameraToTarget(); // camera → tag transform
double ambiguity = target.getPoseAmbiguity();        // 0.0–1.0, lower = more ambiguous

// Transform3d contains both translation and rotation
// Translation3d: distance in each axis (meters)
// Rotation3d: orientation as roll/pitch/yaw (radians)
Translation3d camToTagTranslation = camToTag.getTranslation();
double distanceMeters = camToTagTranslation.getNorm();  // total 3D distance to tag

// Chaining transforms manually (PhotonPoseEstimator does this for you in Lesson 4):
// robotPose = fieldTagPose × tagToCamera × cameraToRobot
Pose3d robotPose = new Pose3d()
    .plus(fieldTagPose.minus(new Pose3d(camToTag)))
    .plus(robotToCameraTransform.inverse());

💡 Use AprilTagFields, not hardcoded tag positions

WPILib's AprilTagFields enum provides official field layouts for each FRC season. Don't hardcode tag positions as constants in your code — they change every season, and FIRST sometimes releases mid-season corrections to the official layout. Call AprilTagFields.kCurrentYear.loadAprilTagLayoutField() (replacing kCurrentYear with the specific year constant) to load the authoritative layout. If your code has hardcoded tag poses from a previous season, it will produce systematically wrong robot pose estimates when the new season's tags are in different locations.

🔌 System Check

⚙️ Before You Rely on AprilTag Detection in Any Match

These are the physical and configuration prerequisites — the things that must be true before the detection pipeline can produce useful data. Software configuration (PhotonVision, calibration) is covered in Lessons 3–4; this list is the hardware and high-level setup:

Camera is physically mounted rigidly. Any flex or vibration in the camera mount shifts the camera-to-robot transform, invalidating your configured camera pose offset. The mount should be bolted through the bumper bracket or a rigid frame member, not zip-tied to a flexible plastic panel. Verify by pushing the camera by hand — it should not deflect.
Camera faces a tag-rich direction. Camera placement should maximize the number of field tags visible during the most important moments of autonomous. For most FRC games, this means mounting cameras facing the scoring structures, not the robot's back. Plan camera angles before build season begins, not after the robot is assembled.
Tag family in your vision pipeline matches the field. Open your vision software (PhotonVision, Limelight configuration, etc.) and confirm the tag family setting matches the current season's field standard (typically 36h11). Wrong family = no detections or false detections.
Field layout file matches the current season. Confirm your code loads AprilTagFields.kCurrentYear.loadAprilTagLayoutField() (with the correct year constant) and not a hardcoded layout from a previous season or a local file you modified without verifying against the official FIRST release.
Camera calibration has been performed. An uncalibrated camera will produce systematically biased pose estimates. Calibration (Lesson 3) must be done with the actual camera at the actual lens settings you'll use in competition. A calibration from a different camera or a different resolution/FOV setting on the same camera is not valid.

Knowledge Check

1. A camera detects an AprilTag with ID 7 and produces a camera-to-tag transform with a Z distance of 3.2 meters. The robot's code uses this to compute a robot field pose. Fifteen milliseconds later, the robot receives a second detection of the same tag with Z = 3.0 meters. Which piece of additional information is essential for converting either of these camera-to-tag transforms into a robot field pose?

A The robot's current gyro heading from the IMU
B The current battery voltage, because it affects camera frame rate
C Both the field-relative pose of tag 7 (from the field layout) and the camera's known position/orientation on the robot (the robot-to-camera transform)
D The tag's physical size in meters, which must be measured at the event

2. A team's robot is working well in their shop but at competition the vision system produces frequent false detections — the robot's pose estimator shows phantom jumps to incorrect positions. Their camera is configured for the "16h5" tag family. The field uses 36h11 tags. What is the most likely cause of the false detections?

A The 16h5 family has too many unique IDs, creating confusion between tags
B The 16h5 family has very low Hamming distance — arbitrary patterns in the competition environment (banners, field borders, robot graphics) match the small 4×4 bit patterns by coincidence, producing false detections that don't correspond to real tags
C The camera exposure setting needs to be reduced at the competition venue
D The field layout file needs to be updated because 16h5 uses different tag positions than 36h11

3. At 5 meters from a tag, a robot's camera produces a detection with pose ambiguity of 0.04 (very low — nearly equal competing solutions). At 2 meters from the same tag, the detection has ambiguity of 0.95. Which distance should be trusted more for pose estimation, and why?

A 5 meters — the robot is farther away so the tag is more centered in the frame, reducing solvePnP error
B 2 meters — the high ambiguity score (0.95) means one geometric solution is strongly preferred over the other; low ambiguity (0.04) at 5 meters means the two solvePnP solutions are nearly indistinguishable, making the pose estimate unreliable regardless of how "clean" the detection appears
C 5 meters — larger distance means larger tag appears in the image, improving detection quality
D Both should be trusted equally — ambiguity only affects ID decoding, not pose estimation accuracy

💪 Practice Prompt

Explore the AprilTag Field Layout and Data Structures

In a new Java file, load the current season's field layout using AprilTagFields.kCurrentYear.loadAprilTagLayoutField(). Print the total number of tags on the field using fieldLayout.getTags().size(). Then iterate through all tags and print each tag's ID and its X, Y, Z position on the field. Which tags are on your alliance's side of the field for the current game?
Look up the current FRC game manual or WPILib documentation to find the physical size of the AprilTags used this season (in inches and meters). Create a constant in your VisionConstants class: TAG_SIZE_METERS. This value is required for solvePnP to compute accurate distance estimates.
Write a method getTagsVisibleFromAllianceWall(AprilTagFieldLayout layout, boolean isBlue) that returns a list of tag IDs that a robot might see when facing the scoring structure from the alliance wall. Use the field layout's known tag positions to determine which tags are within 6 meters of the alliance wall and roughly facing the field interior. This is the set of tags your vision system will most often detect during autonomous.
Draw a top-down sketch of the FRC field (any season) marking each AprilTag's position and ID number. For each tag, note: which direction does it face? What is the maximum angle at which your robot could see it from a typical scoring approach? At what distance does that approach path intersect the tag's visible range? This exercise bridges field geometry to detection reliability — and it's the exercise drive coaches should do before strategy meetings.
Bonus: Using WPILib's Pose3d and Transform3d classes, manually compute what the robot's field pose would be if: the camera is mounted 0.3m forward and 0.5m up from the robot center, facing forward at 0° yaw; the camera detects tag 7 at a camera-to-tag transform of (2.0m, 0.1m, 0.0m translation, 0° rotation); and tag 7's field pose is (14.0m, 5.5m, 0.57m, 180° yaw). Show your transform chain step-by-step in code comments.