Skip to content

VAD fails with cuDNN error when using GPU #2970

@peterATIn2Dialog

Description

@peterATIn2Dialog

Describe the bug

When processing long audio files with SpeechBrain's VAD on GPU, the model fails with a misleading cuDNN error when the RNN processes sequences longer than ~50,000 timesteps:

"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."

The error message is misleading - the tensors are actually contiguous.

The error occurs in VAD.get_speech_prob_chunk() when the internal GRU layer receives sequences longer than cuDNN can handle. Specifically:

  • Short sequences (<50k timesteps) process fine
  • Long sequences (>50k timesteps) trigger the cuDNN error

proposed solution:

The workaround is thus to ensure the sequences are chunked if exceeding this 50K threshold.


Expected behaviour

I expected to be able to run VAD on gpu - but was hitting a runtime error:

"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."

To Reproduce

  from speechbrain.inference import VAD

  # Load VAD model on GPU
  vad = VAD.from_hparams(
      source="speechbrain/vad-crdnn-libriparty",
      run_opts={"device": "cuda"}
  )

  # Process a long audio file (>30 minutes)
  # This will fail during double_check_speech_segments when 
  # get_speech_prob_chunk processes segments >50k timesteps
  boundaries = vad.get_speech_segments("long_audio.wav")

Environment Details

No response

Relevant Log Output

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions