使用Ollama部署属于羊驼3 72B，让自己拥有属于自己的大模型，小白教材

GUOSHIYING · 2024 年4 月 30 日 01:57

使用ollama部署本地大模型

前期准备条件

1.linux系统的gpu服务器
2.服务器上已经安装了cuda驱动并且能成功使用
3.有代理服务，启动web端需要下载模型
4.系统内安装有docker

安装ollama

运行命令即可自动安装完成（没有安装完成的检查自己的环境）
curl -fsSL https://ollama.com/install.sh | sh
运行ollma并设置自启
systemctl start ollama && systemctl enable ollama
拉取大模型并运行
拉取羊驼3 70B
ollama pull llama3:70b

拉模型：ollama pull 模型名称
运行模型： ollama run 模型名称
查询已经拉取下来的模型：ollama list

安装ollama web-ui

使用docker安装前端服务

首先拉取镜像，由于内地网络的原因，我们拉取镜像会很慢，所以我们走代理镜像拉取
1.拉取代理镜像
docker pull ghcr.dockerproxy.com/open-webui/open-webui:main
2.重命名代理镜像
docker tag ghcr.dockerproxy.com/open-webui/open-webui:main ghcr.io/open-webui/open-webui:main
3.删除代理镜像
docker rmi ghcr.dockerproxy.com/open-webui/open-webui:main

启动前端服务

1.首先我们需要在创建一个数据存储文件夹（可以自己定义不同的位置）
mkdir -p /opt/open-webui/data

2.先启动一次docker，这个时候，我们需要使用自己的系统代理，使其docker内部能够访问到模型网站
-e http_proxy=‘http://172.16.98.56:7890’
-e https_proxy=‘http://172.16.98.56:7890’
这两个为我本地的代理地址，大家自己根据自己的情况来

docker run -it
-p 8080:8080
–restart unless-stopped
–name ollama-webui
-v /opt/open-webui/data:/app/backend/data
-e OLLAMA_API_BASE_URL=http://127.0.0.1:11434
-e http_proxy=‘http://172.16.98.56:7890’
-e https_proxy=‘http://172.16.98.56:7890’
ghcr.io/open-webui/open-webui:main

3.停止并删除启动的容器
运行上面的命令以后，我们就可以看到容器在下载东西，等有显示启动成功，显示端口号的时候，就可以停止容器了
docker stop ollama-webui && docker rm -f ollama-webui

4.启动新的容器
这里有一个注意点，就是我本地的ollama的api接口是本地127.0.0.1可访问，没有修改其中的默认配置，所有要用–net=host
docker run -d
–network=host
–restart unless-stopped
–name ollama-webui
-v /opt/open-webui/data:/app/backend/data
-e OLLAMA_API_BASE_URL=http://127.0.0.1:11434
-e HF_ENDPOINT=https://hf-mirror.com/
ghcr.io/open-webui/open-webui:main

5.访问服务
服务启动后，访问
localhost:8080 就可以访问到前端ui界面了
注意：首次使用没有默认密码，要自己注册

注册以后，就可以正常登录了

6.界面设置中文

7.配置ollama的api地址
连接地址为：http://127.0.0.1:11434

点击刷新，就可以测试服务器是否连接成功，连接成功保存关闭界面

8.使用模型测试
点击中间的选择模型，就可以获取到ollama下载下来的模型

9.测试使用
测试使用千问32b

测试使用羊驼3 70b

10.注意
选择模型要根据自己的显卡配置来，如果模型过大，显卡也无法运算
我运行羊驼3 70B的占用截图

version: '3.8'
services:
  ollama-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: ollama-webui
    ports:
      - "8080:8080"
    volumes:
      - /opt/open-webui/data:/app/backend/data
    environment:
      - OLLAMA_API_BASE_URL=http://192.168.31.178:11434
      - http_proxy=http://192.168.31.45:7890
      - https_proxy=http://192.168.31.45:7890
      - ENABLE_SIGNUP=False
    restart: unless-stopped

可以在注册完账号后加一个

      - ENABLE_SIGNUP=False

来防止后续用户注册

vicent · 2024 年5 月 11 日 06:01

这个广告措不及防

Letter · 2024 年5 月 11 日 07:41

llama3 70b，就算用4bit量化光加载模型就要40G显存，我缺的是使用方法吗，我缺的是显卡啊

neo · 2024 年8 月 29 日 06:22

From #develop:ai to #share

CRI4250 · 2024 年9 月 1 日 19:12

感谢你的分享

Woodentiger · 2024 年9 月 1 日 21:22

7B，我就想啥机子可以跑70B

bi_ke · 2024 年10 月 18 日 08:24

你好你好，这个怎么设置使用显存呀，我现在运行cpu占满了，gpu用的很少

bi_ke · 2024 年10 月 18 日 09:31

没问题了，原来如此，平常就是会用这么多内存，然后再回答时才会使用gpu和cpu

话题		回复	浏览量
3分钟，拥有你的本地大模型-Ollama！【Windows篇小白向】资源荟萃人工智能	79	6144	2024 年11 月 14 日
【Ollama】 Llama3来了，用Linux Ollama线下部署你自己的大模型羊驼资源荟萃 Llama3 , 人工智能	61	5518	2024 年11 月 12 日
第一次部署ollama，纪念一下，吼吼资源荟萃人工智能	33	1187	2024 年12 月 9 日
🚀 快来试试来自清华博士王慎执版Llama3 模型，表现不错！(已下线）资源荟萃 ChatGPT , Llama3 , 人工智能	60	2926	2024 年12 月 9 日
用本地大模型就是爽啊，不用管言论审查搞七捻三	55	2224	2024 年12 月 9 日