Fine-tuning the LLaMA model to speak English and Chinese

6 min readApr 19, 2023

At Preemo, we’ve created a model that understands and produces both English and Chinese — by using an efficient, faster form of fine-tuning to enhance entirely open-source models and data.

The new model, which we call Alpaca LoRA 30B, can be instructed with English and Chinese, and performs well in both languages on code generation tasks and creative writing.

To produce this new model, we fine-tuned the 30B-parameter LLaMA model from Meta, which does not have multilingual capabilities, with 50k open-source training examples in English and Chinese generated by GPT-4. We used a parameter-efficient fine-tuning technique called LoRA.

Of course, we did this all using the Preemo platform.

In this post, we’ll explain:

  • Examples of what this new model can do
  • How we trained the new model
  • What you can take away from this result

Example output

Our model, Alpaca LoRA 30B, produces high quality completions in both English and Chinese, for code generation and creative writing.

Here are examples of each.

Code generation

As a code generation task, we asked our model to write an algorithm.


Write a JavaScript function to calculate if 2023 is a prime number

Our model produced the following code, along with an explanation:

Here is a JavaScript function that checks if a given number is a prime number:

function isPrime(num) {
if (num < 2) {
return false;
for (let i = 2; i <= Math.sqrt(num); i++) {
if (num % i === 0) {
return false;
return true;

console.log(isPrime(2023)); // true

This function checks if a number is a prime number by iterating through all the numbers from 2 to the square root of the given number. If the number is divisible by any of these numbers, then it is not a prime number. Otherwise, it is a prime number.

As reference, here’s output produced by Alpaca 7B. Alpaca 7B is a well-known model, and was the first model fine-tuned using LoRA. It does have fewer weights than the LLaMA model, so the comparison is not 1:1. We use this example just to show the difference in the format of the output.

Alpaca 7B produced the following code:

function isPrime(n) {
if (n <= 1) {
return false;
for (let i = 2; i < n; i++) {
if (n % i === 0) {
return false;
return true;

As you can see, the output from Alpaca 7B doesn’t include an explanation.

In the next section, we describe how we fine-tuned our model to produce an explanation.

Writing poetry in Chinese

As a creative writing task in Chinese, we asked our model, using instructions in Chinese, to write a poem praising artificial intelligence.



Our model produced the following poem:






For convenience, here’s a translation of the instruction and output above:

// English translation for readers

Write a modern poem praising artificial intelligence
// English translation for readers

Praise for Artificial Intelligence Modern Poetry

Artificial intelligence drives the future,
Intelligent machines lead us forward.
Data analysis, machine learning,
Make the world faster.

Smart homes, smart transportation,
Smart healthcare, smart education,
Artificial intelligence anytime, anywhere,
Make our lives more comfortable.

Machine translation, speech recognition,
Autonomous driving, robot work,
Artificial intelligence has changed our lives,
Making the world a better place.

Let us welcome artificial intelligence,
Let it lead us forward,
Let us create the future together,
Making the world a better place.

We would have included a comparison here, but the novel aspect of our model is that it can produce Chinese.

How we did it

In general, a fine-tuning pipeline consists of an input model, fine-tuned on a new dataset, to produce an output model. Here are the details of our pipeline.

Input model

Creating an input model class requires static model weights as well as a model definition — also known as a model architecture. To create our input model class, which we call LLaMA LoRA 30B, we loaded the 30B weights from Meta’s LLaMA model into a LoRA-adapted model architecture that uses HuggingFace transformers and the bitsandbytes library.


The English and Chinese datasets we used for fine-tuning came from the paper Instruction Tuning with GPT-4. The paper authors generated synthetic datasets in English and Chinese using GPT-4. Then they used the two datasets individually to fine-tune a model. We combined the two datasets into a single one.

As you can see in the datasets on Github, each example is formatted as: an instruction, an input, and an output. Note that in the dataset, coding questions include explanations as part of the output. Here’s an excerpt:

"instruction": "Extract the positive integers from an array?",
"input": "arr = [5,-2,8,-7,3,-1,0,5]",
"output": "To extract the positive integers from the given array, you can use the filter() function which will return the values that evaluate to True based on the condition provided in the lambda function. \\n\\nHere's the Python code snippet:\\n\\n```\\narr = [5,-2,8,-7,3,-1,0,5]\\n\\nresult = list(filter(lambda x: x > 0, arr))\\n\\nprint(result)\\n```\\n\\nOutput: [5, 8, 3, 5]\\n\\nThe filter function iterates through each element in the array and checks if the element is greater than 0, which means it's a positive integer. If it's true, the element is included in the result list. Next, we display the result list which contains only the positive integers."


"instruction": "分析给定的法律文件并解释关键点。",
"input": "",
"output": "非常抱歉,你没有提供法律文件,所以我无法为你进行分析。 请提供具体的法律文件,我将尽力为你解释其中的关键点。"

You can see the impact of these training examples on our fine-tuned model.

Fine-tuning loop with LoRA

Our fine-tuning process leverages LoRA using the same adapter as alpaca-lora.

LoRA is a more efficient fine-tuning technique. In traditional fine-tuning, the weights of the original model are unfrozen and updated. In LoRA, instead of unfreezing the original model, a new layer of weights — called adapter weights — is added.

To quantify the efficiency gain: Meta’s LLaMA model has 30B weights. Our resulting adapter was about 50k weights.

Output model

As explained above, we produced a set of static adapter weights. We call this Alpaca LoRA 30B.

To use this output model to do inference, all you need to do is load the original LLaMA weights along with these adapter weights into the same model architecture. Then you can start generating text completions.


If you want to solve problems that cut across multiple languages, our work suggests a way to create a model that speaks the languages you need.

This quick experiment shows you can produce a model that has multilingual capabilities, by taking a foundational model without these capabilities and fine-tuning it with a dataset in two languages.

Our experiment also suggests parameters can be shared across languages. An interesting question to explore is to what extent combining datasets from different languages can help further generalize a model.

These experiments are all possible today using open-source models and data — if you have the right infrastructure.

Using Preemo

At Preemo, we’re building the infrastructure to make running experiments like this easy.

We’re a team familiar with the ins and outs of fine-tuning, and would love to chat with folks we can help — whether you’re just getting started with LLMs or looking to put one into production.

Just send me a note:

About the author

Mark Huang is an ML scientist and co-founder of Preemo.
Bo Yang is a software engineer at Preemo.
Jess Lin is a writer, engineer, and musician.