[Inference Performance] Enable vLLM support

I have adapted FireredASR-LLM based on the community [vLLM 0.10.1](https://github.com/PatchouliTIS/FireredASR-vLLM).
 
Testing on a local single-GPU H20 environment shows the QPM reaching approximately 1200 with a batch size of 16. I would appreciate your feedback on these results.

Additionally, is it possible to modify the FireredASR-LLM structure to better fit vLLM's loading workflow?