Cake-大模型分布式解决方案
小于 1 分钟
Cake-大模型分布式解决方案
官网GitHub
下载模型
pip install modelscope
modelscope download --model llm-research/meta-llama-3-8b --local_dir /data/llm/llama3
modelscope download --model ZhipuAI/ChatGLM-6B --local_dir /data/llm/ChatGLM-6B
运行 Qwen2-0.5B(失败)
topology.yml
linux_server_1:
host: '192.168.67.129:10128'
description: 'NVIDIA Titan X Pascal (12GB)'
layers:
- 'model.layers.0-23'
启动服务
cake-cli --model /data/llm/Qwen2-0.5B \
--mode worker \
--name linux_server_1 \
--topology /data/cake/topology.yml \
--address 0.0.0.0:10128
cake-cli --model /data/llm/Qwen2-0.5B \
--api 0.0.0.0:8080 \
--topology /data/cake/topology.yml
启动API服务时失败
[2024-09-03T10:23:29Z INFO ] [Master] dtype=F16 device=Cpu mem=6.4 MiB
[2024-09-03T10:23:29Z INFO ] loading topology from /data/cake/topology.yml
[2024-09-03T10:23:29Z INFO ] loading configuration from /data/llm/Qwen2-0.5B/config.json
[2024-09-03T10:23:29Z INFO ] loading tensors in /data/llm/Qwen2-0.5B/model.safetensors.index.json
[2024-09-03T10:23:29Z INFO ] loading tensors from model.safetensors ...
[2024-09-03T10:23:29Z INFO ] loading embeddings ...
[2024-09-03T10:23:31Z INFO ] loading lm_head ...
Error: cannot find tensor lm_head.weight
运行 Meta-Llama-3-8B(成功)
topology.yml
linux_server_1:
host: '192.168.67.129:10128'
description: 'NVIDIA Titan X Pascal (12GB)'
layers:
- 'model.layers.0-31'
cake-cli --model /data/llm/Meta-Llama-3.1-8B-Instruct \
--mode worker \
--name linux_server_1 \
--topology /data/cake/topology-llama3.1-8b.yml \
--address 0.0.0.0:10128
cake-cli --model /data/llm/llama3 \
--mode worker \
--name linux_server_1 \
--topology /data/cake/topology-llama3.1-8b.yml \
--address 0.0.0.0:10128
cake-cli --model /data/llm/llama3 \
--api 0.0.0.0:8080 \
--topology /data/cake/topology-llama3.1-8b.yml
测试
curl http://192.168.67.129:8080/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful AI assistant."
},
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'
