정보엔지니어 (맥스웰의 도깨비, 양자컴퓨터)

Meta의 공개 LLM인 LLAMA v2를 맥북에서 돌려보자 (apple M1/M2 sillicon, llama.cpp 사용)

다양한 사람들의 노력으로 Apple Sillicon (M1/M2, gpu 사용) 에서도 아래와 같이 구동할 수 있다. 그리고 표기했듯이 apple sillicon의 GPU inference를 사용할 수 있기 때문에 속도도 나쁘지 않다. 다만 단지 구동만 가능한 것을 확인했고, 이를 통해 학습을 할 수 있는것 같지는 않다. 그래도 돌려볼 수 있다는 것이 어딘가!

https://gist.github.com/gengwg/26592c1979a0bca8b65e2f819e31ab5c

아래와 같이 우선 llama.cpp를 git clone하자.

$ git clone https://github.com/ggerganov/llama.cpp.git
$ cd llama.cpp

아래와 같은 옵션으로 gpu inference를 활성화하여 컴파일 할 수 있다.

$ make clean
$ LLAMA_METAL=1 make

이제 메모리가 크지 않은 맥북들을 위해 각 메모리 크기에 맞는 모델을 다운받아 구동해보자.

만약에 메모리 10gb이상 맥북이면, 13b모델을 최적화 축소한 아래 모델을 시도해보자. 물론 이 모델을 적재할 메모리가 충분하다면 아래 huggingface.co/TheBloke의 다른 더 큰 모델을 참조할 수 있다. 64gb모델 맥북이나 맥스튜디오 장비를 가지고 있다면 40gb짜리 quantized된 70b모델을 돌릴 수도 있다.

일단 13b모델은 아래와 걑이 한다.

$ export MODEL=llama-2-13b-chat.Q4_0.gguf

#wget이 설치되어 있다면 아래와 같이, 혹은 아래 URL로 직접 링크를 다운로드

$ wget "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf "

메모리 8gb이하 맥북이라면 아래 모델을 받는다.

$ export MODEL=llama-2-7b-chat.Q4_0.gguf

$ wget "https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF/resolve/main/llama-2-7b-chat.Q4_0.gguf "

그리고 나서 이제 아래와 같이 실행한다. 아래는 7b모델이다. 다운받은 모델에 따라 모델명을 바꿔주자.

$ ./main -m ./llama-2-7b-chat.Q4_0.gguf -t 8 -n 128 -ngl 1 --prompt "could you generate python code for generating prime numbers?"

.... 정말 코드를 생성해준다!

기타 아래를 통해 conda를 써서 llama.cpp의 python 필요시 따로 구동할 수 있다.

$ conda create --name=llama2 python=3.11

$ conda activate llama2

$ pip install -r requirements.txt

간단하게 맥북(m1/m2이지만 다른 macos환경에서도 가능하리라고 본다) 에서 llama.cpp를 이용해 llama를 구동해보았다.

'머신러닝AI' 카테고리의 다른 글

Google Gemma 파인튜닝 해보기 (0)	2024.02.23
SOLAR 10.7B 모델, 한글 megastudy 모델을 HuggingFace에서 받아 langchain으로 간단히 구동해보자 (0)	2024.01.13
Stable Diffusion, Linux에서 실행하기 (0)	2023.09.02
meta의 공개용 번역 모델 SeamlessM4T (fairseq2 기반)를 돌려보자 (0)	2023.08.31
ChatGPT의 Facebook Meta버전 LLAMA v2를 돌려보자 (Ubuntu/NVidia 4080) (0)	2023.07.20

Posted by 작동미학

Stable Diffusion, Linux에서 실행하기

생성형 이미지의 대표주자인 Stable Diffusion을 설치해서 실행해보자. 많은 것들이 그간 발전했고 easy diffusion이라는 멋진 웹인터페이스까지 만들어진 오픈소스가 존재한다.

모든 기본 환경은 ubuntu nvidia 4080 + anaconda 를 기준으로 한다 ( https://infoengineer.tistory.com/96 )

아래 사이트를 참조하자.

https://easydiffusion.github.io/docs/installation/

$ wget https://github.com/cmdr2/stable-diffusion-ui/releases/latest/download/Easy-Diffusion-Linux.zip

$ unzip Easy-Diffusion-Linux.zip

$ cd easy-diffusion

$ ./start.sh

/work/anaconda3/bin/curl
/usr/bin/tar
/work/anaconda3/bin/bzip2
Downloading micromamba from https://micro.mamba.pm/api/micromamba/linux-64/latest to /work2/easy-diffusion/installer_files/mamba/micromamba
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
100  4569    0  4569    0     0   1176      0 --:--:--  0:00:03 --:--:-- 4461k
  0 5096k    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0bin/micromamba
100 5096k  100 5096k    0     0   811k      0  0:00:06  0:00:06 --:--:-- 3183k
Micromamba version:
1.5.0
Empty environment created at prefix: /work/easy-diffusion/installer_files/env
Packages to install: conda python=3.8.5
conda-forge/linux-64                                30.0MB @  41.3MB/s  0.7s
conda-forge/noarch                                  12.2MB @   3.5MB/s  3.5s

....

자동으로 서버가 뜬다. 내부 리눅스 브라우저를 이용해 아래를 실행한다.

http://localhost:9000/

적절한 영문 문구로 이미지 생성됨을 확인한다.

512*512 이미지가 수초만에 생성됨을 알 수 있었다. CPU/GPU 모드 등 변경하여 실험할 수 있다. 4080 GPU 카드로는 1280*768 정도 이미지를 만들 수 있는데, 1920*1280은 메모리 부족으로 오류가 난다.

이렇게 디폴트로 사용하는 방법도 있지만 custom model을 다운로드 받아 사용하는 방법도 있다. civit.ai 같은 사이트에서 model 파일을 다운로드 받아서 ./easy_diffusion/models/stable-diffusion/에 넣은 후 아래와 같이 특화된 이미지를 생성할 수도 있다.

stable diffusion model 다운로드 사이트(civit.ai), Models선택

Download를 선택하면 크기는 좀 크지만, 커스텀 모델 파일을 다운로드할 수 있다

왼쪽 메뉴의 Image Settings의 Model을 바꾸어서 생성해보면, 특화 모델 기반의 이미지 생성도 가능하다.

easy-diffusion외에 Stable Diffusion의 대명사인 AUTOMATIC1111이라는 웹인터페이스도 존재한다. 이 프로그램의 특징은 최신 기능의 빠른 적용이다. 이 툴을 설치하는 방법에 대해서도 잠깐 살펴보자.

$ conda create -n automatic python=3.10

$ conda activate automatic

$ git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

$ cd stable-diffusion-webui/

$ ./webui.sh

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################
....

#모델다운로드 등 거치게 된다

#

모두 실행된 후 http://127.0.0.1:7861/ 에 접속하면 아래와 같이 AUTOMATIC1111이 구동된다. 시중의 메뉴얼을 참조하여 활용해보자. img2img 같은 기능은 이미지를 넣어 다른 이미지를 변환하는 기능이다. 그외에 Extras같은 플러그인 등 다채로운 기능을 제공한다. 이를 테면 Extensions에 https://github.com/toriato/stable-diffusion-webui-wd14-tagger 를 설치하면, 사진에서 스크립트를 뽑아주는 반대의 기능 활용도 가능하다.

'머신러닝AI' 카테고리의 다른 글

SOLAR 10.7B 모델, 한글 megastudy 모델을 HuggingFace에서 받아 langchain으로 간단히 구동해보자 (0)	2024.01.13
Meta의 공개 LLM인 LLAMA v2를 맥북에서 돌려보자 (apple M1/M2 sillicon, llama.cpp 사용) (0)	2023.09.08
meta의 공개용 번역 모델 SeamlessM4T (fairseq2 기반)를 돌려보자 (0)	2023.08.31
ChatGPT의 Facebook Meta버전 LLAMA v2를 돌려보자 (Ubuntu/NVidia 4080) (0)	2023.07.20
Meta LLAMA를 잘 돌리기 위한 데스크탑 PC 업그레이드 (2)	2023.06.11

Posted by 작동미학

meta의 공개용 번역 모델 SeamlessM4T (fairseq2 기반)를 돌려보자

GPU를 쓰는 이 녀석의 특성에 따라 이 글은 앞의 https://infoengineer.tistory.com/96 를 계승한다. GPU에 맞게 설정을 끝냈다고 가정한다.

meta에서 텍스트와 음성 모두를 읽어서 번역하고 텍스트화하는 성능 좋은 모델을 발표했다. meta의 sequence 처리 모델인 faitseq2를 먼저 설치한 후 SeamlessM4T를 통해 아래 5개의 기능을 우리가 아는 거의 모든 언어간이 가능하다. 한글도 포함되어 있다! 그리고 무려 speech to speech 번역도 가능하다. 여기서는 우선 가장 단순 T2TT(텍스트 번역)을 해보자.

Speech-to-speech translation (S2ST)
Speech-to-text translation (S2TT)
Text-to-speech translation (T2ST)
Text-to-text translation (T2TT)
Automatic speech recognition (ASR)

설치 실행은 매우 단순하다. Anaconda를 사용해 보자.

$ conda create -n fairseq2 python=3.10
$ conda activate fairseq2
$ pip install fairseq2
$ git clone https://github.com/facebookresearch/seamless_communication
$ cd seamless_communication/
$ pip install .
$ sudo apt install libsndfile1
$ conda install -y -c conda-forge libsndfile

$ m4t_predict "The wind blows always to the west, and the girl is waiting always on the beach" t2tt kor --src_lang eng

2023-08-31 23:49:20,161 INFO -- m4t_scripts.predict.predict: Running inference on the GPU in torch.float16.
Using the cached checkpoint of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached tokenizer of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set `force=True` to download again.
2023-08-31 23:49:26,134 INFO -- m4t_scripts.predict.predict: Translated text in kor: 바람은 항상 서쪽으로 불고, 소녀는 항상 해변에서 기다리고 있습니다

$ m4t_predict "We introduce SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. It substantially outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks." t2tt kor --src_lang eng

2023-09-01 00:02:20,217 INFO -- m4t_scripts.predict.predict: Running inference on the GPU in torch.float16.
Using the cached checkpoint of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached tokenizer of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set `force=True` to download again.
2023-09-01 00:02:26,722 INFO -- m4t_scripts.predict.predict: Translated text in kor: 우리는 SONAR를 소개합니다. 새로운 다국어 및 멀티모달 고정 크기의 문장 ⁇ 입 공간으로, 음성 및 텍스트 인코더 및 디코더의 전체 스위트를 갖추고 있습니다. 그것은 xsim 및 xsim++ 다국어 유사성 검색 작업에서 LASER3 및 LabSE와 같은 기존 문장 ⁇ 입을 크게 능가합니다.

속도가 NVidia 4080 GPU에서 약 7초가 소요되는데, 모델로딩 시간을 빼면 그럭저럭 괜찮은 시간에 나오는 것 같다.

모델은 2가지가 있는데 디폴트로 SeamlessM4T-Large가 선택된다.

SeamlessM4T-Large	2.3B	🤗 Model card - checkpoint	metrics
SeamlessM4T-Medium	1.2B	🤗 Model card - checkpoint	metrics

GPU를 사용하면 그리 느리지 않은 좋은 라이브러리를 확보한 셈이다. 라이센스는 무료로 자유롭게 쓰면 되는(하지만 재배포는 제약이 있다) Creative Commons Attribution-NonCommercial 4.0 International Public License 를 취한다.

'머신러닝AI' 카테고리의 다른 글

Meta의 공개 LLM인 LLAMA v2를 맥북에서 돌려보자 (apple M1/M2 sillicon, llama.cpp 사용) (0)	2023.09.08
Stable Diffusion, Linux에서 실행하기 (0)	2023.09.02
ChatGPT의 Facebook Meta버전 LLAMA v2를 돌려보자 (Ubuntu/NVidia 4080) (0)	2023.07.20
Meta LLAMA를 잘 돌리기 위한 데스크탑 PC 업그레이드 (2)	2023.06.11
Meta LLAMA를 GPU없이 CPU 메모리에서 돌려보자 (7)	2023.06.06

Posted by 작동미학

정보엔지니어 (맥스웰의 도깨비, 양자컴퓨터)

Meta의 공개 LLM인 LLAMA v2를 맥북에서 돌려보자 (apple M1/M2 sillicon, llama.cpp 사용)

'머신러닝AI' 카테고리의 다른 글

Stable Diffusion, Linux에서 실행하기

'머신러닝AI' 카테고리의 다른 글

meta의 공개용 번역 모델 SeamlessM4T (fairseq2 기반)를 돌려보자

'머신러닝AI' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

글 보관함

달력

링크

티스토리툴바

« » 2025.10
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31