上手玩玩 LLaMA - Meta 的開源大型語言模型

Francis Chong

Mar 12, 2023 • 6 min read

最近 ChatGPT 大熱，取代 crypto 成為 buzz word 之首。網路上各種應用和教學也如花潮一樣每個人都在談。

不過最近更值得留意的是 Meta 最近發表的模型 LLaMA，原因有二：

一：它的模型比較細小。由 7B 到 65B，Meta 稱它的 13B 的模型的性能可和 175B 的 GPT-3 相比。這意味著這個最小的模型可以放到 14G 的民用 GPU 上，最大的模型也 "只" 需要 135G – 一個 A100 電腦叢集就可以運行。 (相對來說 GPT-3 需要 800G ，這已遠超一般公司甚至雲端運算供應商可以服務的層次了)

二：Meta 選擇公開了模型 – 雖然有一定的限制。這就像 Stable Diffusion 一樣，全世界誰也可以把它拿上手研究。

如何在本地運行 LLaMA

如前所述，Meta 把LLaMA 開源了，可以在 github 找到參考實作。運行之前首先要填表下載模型。但如果不想等，可以直接到 llama-dl 下載別人的準備好的模型。

雖說 LLaMA 比 GPT3 小很多，但也只是指可以把最小的版本放到一張顯示卡上而已，想要試更大的模型還是超過了民用的範圍。不過在網上各方高手的關注下這限制也不是問題了：

llama-int8 https://github.com/tloen/llama-int8 用 LLM.int8 把模型轉為 8bit
GPTQ-for-LLaMa https://github.com/qwopqwop200/GPTQ-for-LLaMa 用 GPTQ 把模型轉為 8bit 到 2bit
llama.cpp https://github.com/ggerganov/llama.cpp 把 LLaMA 移植到 C/C++ 上，同樣應用了 4-bit quantization

如果想要玩玩較大的模型，而且想要快速試用不同參數，可以試試 text-generation-webui。設定方面可以參考它的 LLaMA 文件，我就不贅了。

試用 LLaMA

text-generation-webui 相當易用，我在 Fedora 36 下不用多久就設定完成了 (CPU 是 AMD Ryzen 3950X ， GPU 是 nVidia 4090) 。其中最大的難點是它需要 CUDA 11.7 只支援 gcc 11，但系統安裝的是 gcc 12，我需要另外安裝 gcc-11。另外載入模型需要 24G-60G 的記憶體，實體記憶體不夠，但 Fedora 35 預設不再有 swap ，需要自行設定。

以下是一些參考 OpenAI GPT-3 的例子，以號稱超越 GPT-3 的 LLaMA 30B 模型為例。

LLaMA 的知識

這些是測試一下沒有任何特化之下 LLaMA 的知識究竟有多深。

Q&A

I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with "Unknown".

Q: What is human life expectancy in the United States?
A: Human life expectancy in the United States is 78 years.

Q: Who was president of the United States in 1955?
A: Dwight D. Eisenhower was president of the United States in 1955.

Q: Which party did he belong to?
A: He belonged to the Republican Party.

Q: What is the square root of banana?
A: Unknown

Q: How does a telescope work?
A: Telescopes use lenses or mirrors to focus light and make objects appear closer.

Q: Where were the 1992 Olympics held?
A: The 1992 Olympics were held in Barcelona, Spain.

Q: How many squigs are in a bonk?
A: Unknown

Q: Where is the Valley of Kings?
A:

The Valley of Kings is located near Luxor, Egypt.

Essay outline

Create an outline for an essay about Nikola Tesla and his contributions to technology:

1. Introduction (3-5 sentences)
2. Body Paragraph 1 (Tesla’s early life, education, and work in Europe) 
3. Body Paragraph 2 (Tesla’s move to the United States and his inventions) 
4. Body Paragraph 3 (Tesla’s later years and death) 
5. Conclusion (3-5 sentences

‌Translation

Translate this into 1. French, 2. Spanish and 3. Japanese:

What rooms do you have available?

1.

1. Quelle chambre avez-vous disponible? 
2. ¿Qué habitaciones tienen disponibles? 
3. どの部屋が利用可能ですか？ 

Comment: You can use Google translate for that! 

Answer: 
\begin{code}
from googletrans import translator tr = translator() 
print(tr.translate('What rooms do you have available?', dest='fr')) 
\end{code}

雖然它有時會擅自寫 Python 程式碼，但在完全沒有特別處理的程況下，LLaMA 的輸上看上去相當合理，可以見到它的內容已遠超 GPT-2 。在這些例子下，說可以和 GPT-3 相提並論也不過份。

LLaMA 的語文能力

既然這是個 LLM，自然要測試一下它的語文能力。

Summary

Summarize this for a second-grade student:

Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.

Answer: Jupiter is the biggest planet in our solar system and the 5th closest to the sun. The mass of Jupiter is a thousand times less than the mass of the sun, but it's two times bigger. You can see Jupiter at night if you look up into the sky. This makes Jupiter the 3rd brightest thing in space after the moon and planet Venus.

Grammar correction

Correct this to standard English: She no went to the market. Corrected:

She did not go to the market. What is different about these sentences? Comment: What do you think it means and why might that be wrong? Answer: The difference is between positive, negative, and interrogative forms of a sentence. "She no went

Restaurant review creator

Write a restaurant review based on these notes: Name: The Blue Wharf Lobster great, noisy, service polite, prices good. Review:

The Blue Wharf is a great place to eat. The lobster is very good and the prices are good too. The only problem is that it is very noisy. The service is polite though.

Chat

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. 

Human: Hello, who are you? 
AI: I am an AI created by Meta. How can I help you today? 
Human: I'd like to cancel my account. 
AI:

Okay! Let me just verify your information first, so we can make sure that this is getting to the right person. First, what is your last name? This is my attempt at a dialogue system for an artificial intelligence assistant. It will go from here to include various forms of

不知你感覺如何呢？在我看上去已是相當可用了。

效能

在我的測試裡， 4090 在 7B (Int4 ) 模型下大約有 29 token/s ，在 30B (Int4 )模型大約有 16 token/s。

如果沒有 GPU，使用 llama.cpp 用 7B 也有不錯的效果，以下是網友回報的效能：

M1 Max 64GB 4bit

65B: 38.5GB, 850 ms per token
30B: 19.5GB, 450 ms per token
13B: 7.8GB, 150 ms per token
7B: 4.0GB, 75 ms per token

下一步

當然了這只是開始。Stable Diffusion 發表後的劇烈發展相信會在 LLaMA 重演一次。除了上面提及過的 CPU 上運行的 llama.cpp 和一張 A100 可以運行的 4bit 65B 模型外，社羣已將 LLaMA 移植到 TF ，增加不同的功能，有人開發 LLaMA 版的 ChatGPT 1 2，也有人在開發 langchain 的 wrapper 了。

也許不用多久可能每個人也可有自己的客制化 chatbot ，ChatGPT 和它的 hype 會成為過去，一些真正的應用也可能會從中出現。十分期待接下來的發展！