How to Use Hugging Face AutoTrain to Fine-tune LLMs

How to Use Hugging Face AutoTrain to Fine-tune LLMs
Image by Editor Introduction

In recent years, the Large Language Model (LLM) has changed how people work and has been used in many fields, such as education, marketing, research, etc. Given the potential, LLM can be enhanced to solve our business problems better. This is why we could perform LLM fine-tuning.

We want to fine-tune our LLM for several reasons, including adopting specific domain use cases, improving the accuracy, data privacy and security, controlling the model bias, and many others. With all these benefits, it’s essential to learn how to fine-tune our LLM to have one in production.

One way to perform LLM fine-tuning automatically is by using Hugging Face’s AutoTrain. The HF AutoTrain is a no-code platform with Python API to train state-of-the-art models for various tasks such as Computer Vision, Tabular, and NLP tasks. We can use the AutoTrain capability even if we don’t understand much about the LLM fine-tuning process.

So, how does it work? Let’s explore further.

Getting Started with AutoTrain

Even if HF AutoTrain is a no-code solution, we can develop it on top of the AutoTrain using Python API. We would explore the code routes as the no-code platform isn’t stable for training. However, if you want to use the no-code platform, We can create the AutoTrain space using the following page. The overall platform will be shown in the image below.

How to Use Hugging Face AutoTrain to Fine-tune LLMs
Image by Author

To fine-tune the LLM with Python API, we need to install the Python package, which you can run using the following code.

pip install -U autotrain-advanced

Also, we would use the Alpaca sample dataset from HuggingFace, which required datasets package to acquire.

pip install datasets

Then, use the following code to acquire the data we need.

from datasets import load_dataset     # Load the dataset  dataset = load_dataset("tatsu-lab/alpaca")   train = dataset['train']

Additionally, we would save the data in the CSV format as we would need them for our fine-tuning.

train.to_csv('train.csv', index = False)

With the environment and the dataset ready, let’s try to use HuggingFace AutoTrain to fine-tune our LLM.

Fine-tuning Procedure and Evaluation

I would adapt the fine-tuning process from the AutoTrain example, which we can find here. To start the process, we put the data we would use to fine-tune in the folder called data.

How to Use Hugging Face AutoTrain to Fine-tune LLMs
Image by Author

For this tutorial, I try to sample only 100 row data so our training process can be much more swifter. After we have our data ready, we could use our Jupyter Notebook to fine-tune our model. Make sure the data contain ‘text’ column as the AutoTrain would read from that column only.

First, let’s run the AutoTrain setup using the following command.

!autotrain setup

Next, we would provide an information required for AutoTrain to run. For the following one is the information about the project name and the pre-trained model you want. You can only choose the model that was available in the HuggingFace.

project_name = 'my_autotrain_llm'  model_name = 'tiiuae/falcon-7b'

Then we would add HF information, if you want push your model to teh repository or using a private model.

push_to_hub = False  hf_token = "YOUR HF TOKEN"  repo_id = "username/repo_name"

Lastly, we would initiate the model parameter information in the variables below. You can change them as you like to see if the result is good or not.

learning_rate = 2e-4  num_epochs = 4  batch_size = 1  block_size = 1024  trainer = "sft"  warmup_ratio = 0.1  weight_decay = 0.01  gradient_accumulation = 4  use_fp16 = True  use_peft = True  use_int4 = True  lora_r = 16  lora_alpha = 32  lora_dropout = 0.045

With all the information is ready, we would set up the environment to accept all the information we have set up previously.

import os  os.environ["PROJECT_NAME"] = project_name  os.environ["MODEL_NAME"] = model_name  os.environ["PUSH_TO_HUB"] = str(push_to_hub)  os.environ["HF_TOKEN"] = hf_token  os.environ["REPO_ID"] = repo_id  os.environ["LEARNING_RATE"] = str(learning_rate)  os.environ["NUM_EPOCHS"] = str(num_epochs)  os.environ["BATCH_SIZE"] = str(batch_size)  os.environ["BLOCK_SIZE"] = str(block_size)  os.environ["WARMUP_RATIO"] = str(warmup_ratio)  os.environ["WEIGHT_DECAY"] = str(weight_decay)  os.environ["GRADIENT_ACCUMULATION"] = str(gradient_accumulation)  os.environ["USE_FP16"] = str(use_fp16)  os.environ["USE_PEFT"] = str(use_peft)  os.environ["USE_INT4"] = str(use_int4)  os.environ["LORA_R"] = str(lora_r)  os.environ["LORA_ALPHA"] = str(lora_alpha)  os.environ["LORA_DROPOUT"] = str(lora_dropout)

To run the AutoTrain in our notebook, we would use the following command.

!autotrain llm   --train   --model ${MODEL_NAME}   --project-name ${PROJECT_NAME}   --data-path data/   --text-column text   --lr ${LEARNING_RATE}   --batch-size ${BATCH_SIZE}   --epochs ${NUM_EPOCHS}   --block-size ${BLOCK_SIZE}   --warmup-ratio ${WARMUP_RATIO}   --lora-r ${LORA_R}   --lora-alpha ${LORA_ALPHA}   --lora-dropout ${LORA_DROPOUT}   --weight-decay ${WEIGHT_DECAY}   --gradient-accumulation ${GRADIENT_ACCUMULATION}   $( [[ "$USE_FP16" == "True" ]] && echo "--fp16" )   $( [[ "$USE_PEFT" == "True" ]] && echo "--use-peft" )   $( [[ "$USE_INT4" == "True" ]] && echo "--use-int4" )   $( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}" )

If you run the AutoTrain successfully, you should find the following folder in your directory with all the model and tokenizer producer by AutoTrain.

How to Use Hugging Face AutoTrain to Fine-tune LLMs
Image by Author

To test the model, we would use the HuggingFace transformers package with the following code.

from transformers import AutoModelForCausalLM, AutoTokenizer    model_path = "my_autotrain_llm"  tokenizer = AutoTokenizer.from_pretrained(model_path)  model = AutoModelForCausalLM.from_pretrained(model_path)

Then, we can try to evaluate our model based on the training input we have given. For example, we use the "Health benefits of regular exercise" as the input.

input_text = "Health benefits of regular exercise"  input_ids = tokenizer.encode(input_text, return_tensors="pt")  output = model.generate(input_ids)  predicted_text = tokenizer.decode(output[0], skip_special_tokens=False)  print(predicted_text)

How to Use Hugging Face AutoTrain to Fine-tune LLMs

The result is certainly still could be better, but at least it’s closer to the sample data we have provided. We can try to playing around with the pre-trained model and the parameter to improve the fine-tuning.

Tips for Successful Fine-tuning

There are few best practices that you might want to know to improve the fine-tuning process, including:

Prepare our dataset with the quality matching the representative task,
Study the pre-trained model that we used,
Use an appropriate regularization techniques to avoid overfitting,
Trying out the learning rate from smaller and gradually become bigger,
Use fewer epoch as the training as LLM usually learn the new data quite fast,
Don’t ignore the computational cost, as it would become higher with bigger data, parameter, and model,
Make sure you follow the ethical consideration regarding the data you use.

Conclusion

Fine-tuning our Large Language Model is beneficial to our business process, especially if there are certain requirements that we required. With the HuggingFace AutoTrain, we can boost up our training process and easily using the available pre-trained model to fine-tune the model.

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.

How to Use Hugging Face AutoTrain to Fine-tune LLMs

More On This Topic

How Circle co-founder Sean Neville plans to construct the primary AI-native monetary establishment

Meta provides enterprise voice calling to WhatsApp, explores AI-powered product reccomendations

Latest stories

How Circle co-founder Sean Neville plans to construct the primary...

Meta provides enterprise voice calling to WhatsApp, explores AI-powered product...

Meta restructures its AI unit below ‘Superintelligence Labs’

Why AI will eat McKinsey’s lunch — however not...

As job losses loom, Anthropic launches program to trace AI’s...

You might also like...

How Circle co-founder Sean Neville plans to construct the primary AI-native monetary establishment

Meta provides enterprise voice calling to WhatsApp, explores AI-powered product reccomendations

Meta restructures its AI unit below ‘Superintelligence Labs’