Frequently Asked Questions (FAQ)
This page answers the most common questions about BenchBot.
Installation and Configuration
Which simulator should I choose to start?
Gazebo Classic is recommended for beginners:
- ✅ Stable and well-documented
- ✅ Less resource-intensive
- ✅ Large ROS 2 community
- ✅ Compatible with most SLAM packages
O3DE is recommended for:
- Realistic graphics
- Advanced physics
- Visual tests and demonstrations
How long does a typical benchmark take?
Complete Timeline:
- Preparation: 5-10 minutes (first time)
- Execution: 2-5 minutes per run
- Evaluation: 30 seconds - 2 minutes
- Analysis: 5-10 minutes (multi-run comparison)
Total for a simple run: ~5-7 minutes
What are the minimal dependencies?
Mandatory:
- Python 3.8+
- ROS 2 Humble
- A simulator (Gazebo or O3DE)
- At least one SLAM algorithm (cartographer, slam_toolbox, etc.)
Optional:
- Nav2 (for autonomous navigation)
- MkDocs (for local documentation)
How to configure my first benchmark?
Step 1: Create matrix.yaml
Step 2: Launch
python -m gui.main # Graphical Interface
# OR
python -m runner.orchestrator --config matrix.yaml # CLI
Metrics and Results
What is the difference between IoU and SSIM?
| Metric | Measure | Utility |
|---|---|---|
| IoU | Global similarity (overlap) | Coverage precision |
| SSIM | Structural coherence (shapes) | Detail quality |
Example:
- IoU = 0.85: 85% of the GT map is correctly covered
- SSIM = 0.90: Structures (walls, corridors) are well preserved
What is a good score?
| Metric | Excellent | Good | Acceptable | Bad |
|---|---|---|---|---|
| IoU | > 0.85 | 0.70-0.85 | 0.60-0.70 | < 0.60 |
| SSIM | > 0.90 | 0.80-0.90 | 0.70-0.80 | < 0.70 |
| ATE | < 0.10m | 0.10-0.20m | 0.20-0.30m | > 0.30m |
| Coverage | > 95% | 85-95% | 75-85% | < 75% |
Why are all my metrics at 0?
Possible Causes:
- Missing GT Map
-
Solution: Rerun, it will be generated automatically
-
Empty SLAM Map
- Verify: The
/maptopic is publishing data -
Solution: Check SLAM node logs
-
Bad Alignment
- Cause: Different map origins
- Solution: Alignment is automatic, check evaluation logs
How to interpret ATE (Absolute Trajectory Error)?
Definition: Average localization error of the robot compared to ground truth.
Interpretation:
- 0.05m: Excellent precision (5cm)
- 0.15m: Good precision (15cm)
- 0.30m: Acceptable precision (30cm)
- > 0.50m: Localization problem
Note: ATE strongly depends on the environment and the sensor used.
Troubleshooting
My benchmark fails in WAIT_READY, what to do?
Symptoms: Timeout after 60s, state stuck at WAIT_READY
Possible Causes:
-
Topic
/mapnot published -
Insufficient
/odomfrequency -
Missing TF
map → base_link
Solutions:
- Check logs:
logs/RUN_XXX/orchestrator.log - Increase probe timeout in configuration
- Verify SLAM is launched
The simulator does not launch
Gazebo:
O3DE:
Zombie processes persist after a crash
Symptom: Ports occupied, active gzserver processes
Solution:
# Clean all Gazebo processes
pkill -9 gzserver
pkill -9 gzclient
# Clean all ROS 2 processes
pkill -9 ros2
Prevention: The orchestrator uses process groups (os.setsid) to avoid this issue.
Evaluation fails with "No map data"
Causes:
-
Empty or corrupt rosbag
-
Topic
/mapnot recorded -
Check
rosbag_topicsconfiguration in YAML -
Run duration too short
- SLAM didn't have time to publish a map
- Solution: Increase
run_durationto 60s minimum
Advanced Features
How to use the Autotuner?
Minimal Configuration:
autotuner:
enabled: true
algorithm: bayesian_optimization
target_metric: iou
max_iterations: 20
parameters:
- name: slam.resolution
type: float
range: [0.025, 0.1]
Launch:
Result: File config_optimized.yaml with the best parameters
How to simulate a noisy sensor?
Configuration:
Use Case: Test SLAM robustness against a low-cost sensor
How to compare multiple SLAMs?
Test Matrix:
Result: 3 automatic runs + comparative PDF report
How to test different degradation levels?
Example: Noise Calibration:
Result: 4 runs with performance vs noise graph
Workflow and Best Practices
What is the difference between GUI and CLI?
| Mode | Pros | Cons |
|---|---|---|
| GUI | Visual interface, real-time monitoring | Requires graphical display |
| CLI | Automation, CI/CD, headless | No real-time visualization |
Recommendation:
- GUI: Development, tests, demonstrations
- CLI: Production, CI/CD, massive benchmarks
How to organize my results?
Recommended Structure:
results/
├── runs/
│ ├── RUN_20260108_150000/ # One folder per run
│ │ ├── config_resolved.yaml
│ │ ├── rosbag2/
│ │ ├── metrics.json
│ │ └── logs/
│ └── ...
└── reports/
├── comparison_slam.pdf
└── optimization_history.pdf
How many runs for a reliable benchmark?
Recommendations:
- Quick Test: 1 run (functional validation)
- Comparison: 3 runs per configuration (mean + standard deviation)
- Publication: 5-10 runs (robust statistics)
Note: Reproducibility is guaranteed by config_resolved.yaml
How to share my results?
Files to share:
config_resolved.yaml: Exact configurationmetrics.json: Numerical resultsreport.pdf: Visual report- (Optional)
rosbag2/: Raw data (large)
Recommended Format: .tar.gz archive of the RUN_XXX folder
Next Steps
- System Overview: Architecture overview
- Orchestrator Architecture: State machine and probes
- Tools: Infrastructure and advanced features
- Evaluation Logic: Detailed metrics