Environment
| Component |
Version |
| TensorRT |
10.13.3 (CUDA 12.9) |
| ONNX Runtime |
1.22.0 |
| CUDA driver |
575.57.08 |
| GPU |
Tesla T4 (sm75, Turing) |
| OS |
Linux (Ubuntu) |
Description
When deleting a cached .engine file (1.7GB) and rebuilding it using the same .timing file (with trt_timing_cache_enable=True, trt_force_timing_cache=True), the rebuilt engine produces non-bitwise-identical inference results compared to the original engine.
I would expect that if the timing cache records which tactic was selected for every layer, rebuilding from the same cache should replay those choices and produce the same compiled engine, and therefore bitwise-identical outputs (I am using FP16 outputs).
For a segmentation model measured across 100 cases, comparing original-engine run vs rebuild-from-same-timing file:
0/100 files are bitwise identical
mask probability scores differ for ~93% of masks
Score delta: median ~7×10⁻⁵, mean ~3×10⁻³, max ~0.57
~30% of segmentation masks differ by boundary voxels (Dice still ≥ 0.96)
For comparison, a smaller detection model (137 MB engine) does produce bitwise-identical inference results after the same delete-rebuild procedure on the same 100 cases.
Does this suggests that the issue may be related to timing cache coverage completeness or something different?
Thank you in advance for your response.
Provider options used
providers = [
('TensorrtExecutionProvider', {
"trt_fp16_enable": True,
"trt_engine_cache_enable": True,
"trt_engine_cache_path": "<engine_dir>",
"trt_timing_cache_enable": True,
"trt_force_timing_cache": True,
"trt_timing_cache_path": "<timing_dir>",
"trt_builder_optimization_level": 3,
"trt_max_workspace_size": 17179869184, # 16 GB
}),
("CUDAExecutionProvider", {...}),
]
Environment
Description
When deleting a cached
.enginefile (1.7GB) and rebuilding it using the same.timingfile (withtrt_timing_cache_enable=True,trt_force_timing_cache=True), the rebuilt engine produces non-bitwise-identical inference results compared to the original engine.I would expect that if the timing cache records which tactic was selected for every layer, rebuilding from the same cache should replay those choices and produce the same compiled engine, and therefore bitwise-identical outputs (I am using FP16 outputs).
For a segmentation model measured across 100 cases, comparing original-engine run vs rebuild-from-same-timing file:
0/100 files are bitwise identical
mask probability scores differ for ~93% of masks
Score delta: median ~7×10⁻⁵, mean ~3×10⁻³, max ~0.57
~30% of segmentation masks differ by boundary voxels (Dice still ≥ 0.96)
For comparison, a smaller detection model (137 MB engine) does produce bitwise-identical inference results after the same delete-rebuild procedure on the same 100 cases.
Does this suggests that the issue may be related to timing cache coverage completeness or something different?
Thank you in advance for your response.
Provider options used