Effect of Caching .timing File on Engine Rebuild

## Environment
| Component        | Version                        |
|------------------|--------------------------------|
| TensorRT         | 10.13.3 (CUDA 12.9)            |
| ONNX Runtime     | 1.22.0                         |
| CUDA driver      | 575.57.08                      |
| GPU              | Tesla T4 (sm75, Turing)        |
| OS               | Linux (Ubuntu)                 |

## Description

When deleting a cached `.engine` file (1.7GB) and rebuilding it using the **same** `.timing` file (with `trt_timing_cache_enable=True`, `trt_force_timing_cache=True`), the rebuilt engine produces **non-bitwise-identical inference results** compared to the original engine.

I would expect that if the timing cache records which tactic was selected for every layer, rebuilding from the same cache should replay those choices and produce the same compiled engine, and therefore bitwise-identical outputs (I am using FP16 outputs).

For a segmentation model measured across 100 cases, comparing original-engine run vs rebuild-from-same-timing file:
0/100 files are bitwise identical
mask probability scores differ for ~93% of masks
Score delta: median ~7×10⁻⁵, mean ~3×10⁻³, max ~0.57
~30% of segmentation masks differ by boundary voxels (Dice still ≥ 0.96)

For comparison, a smaller detection model (137 MB engine) does produce bitwise-identical inference results after the same delete-rebuild procedure on the same 100 cases. 
Does this suggests that the issue may be related to timing cache coverage completeness or something different?

Thank you in advance for your response.

## Provider options used

```python
providers = [
    ('TensorrtExecutionProvider', {
        "trt_fp16_enable": True,
        "trt_engine_cache_enable": True,
        "trt_engine_cache_path": "<engine_dir>",
        "trt_timing_cache_enable": True,
        "trt_force_timing_cache": True,
        "trt_timing_cache_path": "<timing_dir>",
        "trt_builder_optimization_level": 3,
        "trt_max_workspace_size": 17179869184,  # 16 GB
    }),
    ("CUDAExecutionProvider", {...}),
]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect of Caching .timing File on Engine Rebuild #4798

Environment

Description

Provider options used

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Version
TensorRT	10.13.3 (CUDA 12.9)
ONNX Runtime	1.22.0
CUDA driver	575.57.08
GPU	Tesla T4 (sm75, Turing)
OS	Linux (Ubuntu)

Effect of Caching .timing File on Engine Rebuild #4798

Description

Environment

Description

Provider options used

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions