chat

<aside> 💡

通过osc-llm chat命令测试模型的推理延迟(tokens/s)

</aside>

模型	精度	每秒词元数	运行命令	prompt
llama-3meta-llama/Meta-Llama-3-8B-Instruct	bf16	最好: 53
最差: 49
平局: 50	llm chat —checkpoint_dir meta-llama/Meta-Llama-3-8B-Instruct —compile true	what do llama eat	GPU: 4090 CPU: AMD EPYC 7453	0.1.5
llama-3meta-llama/Meta-Llama-3-8B-Instruct	int8	最好: 89
最差: 80
平均 : 85	llm quantize int8 xxx ; llm chat xxxxx —compile true	what do llama eat	GPU: 4090 CPU: AMD EPYC 7453	0.1.5