Training Large Language Model¶
Added in version 2.8.8.
Oracle Cloud Infrastructure (OCI) Data Science Jobs (Jobs) provides fully managed infrastructure to enable training large language model at scale. This page shows an example of fine-tuning the Llama 2 model. For model details on the APIs, see Train PyTorch Models.
Distributed Training with OCI Data Science
You need to configure your networking and IAM policies. We recommend running the training on a private subnet. In this example, internet access is needed to download the source code and the pre-trained model.
The llama-recipes repository contains example code to fine-tune llama2 model. The example fine-tuning script supports both full parameter fine-tuning and Parameter-Efficient Fine-Tuning (PEFT). With ADS, you can start the training job by taking the source code directly from Github with no code change.
Access the Pre-Trained Model¶
To fine-tune the model, you will first need to access the pre-trained model.
The pre-trained model can be obtained from Meta
or HuggingFace.
In this example, we will use the access token
to download the pre-trained model from HuggingFace (by setting the HUGGING_FACE_HUB_TOKEN
environment variable).
Fine-Tuning the Model¶
You can define the training job with ADS Python APIs or YAML. Here the examples for fine-tuning full parameters of the 7B model using FSDP.
from ads.jobs import Job, DataScienceJob, PyTorchDistributedRuntime
job = (
Job(name="LLAMA2-Fine-Tuning")
.with_infrastructure(
DataScienceJob()
.with_log_group_id("<log_group_ocid>")
.with_log_id("<log_ocid>")
.with_compartment_id("<compartment_ocid>")
.with_project_id("<project_ocid>")
.with_subnet_id("<subnet_ocid>")
.with_shape_name("VM.GPU.A10.2")
.with_block_storage_size(256)
)
.with_runtime(
PyTorchDistributedRuntime()
# Specify the service conda environment by slug name.
.with_service_conda("pytorch20_p39_gpu_v2")
.with_git(
url="https://github.com/facebookresearch/llama-recipes.git",
commit="1aecd00924738239f8d86f342b36bacad180d2b3"
)
.with_dependency(
pip_pkg=" ".join([
"--extra-index-url https://download.pytorch.org/whl/cu118 torch==2.1.0",
"git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4"
"appdirs==1.4.4",
"llama-recipes==0.0.1",
"py7zr==0.20.6",
])
)
.with_output("/home/datascience/outputs", "oci://bucket@namespace/outputs/$JOB_RUN_OCID")
.with_command(" ".join([
"torchrun examples/finetuning.py",
"--enable_fsdp",
"--pure_bf16",
"--batch_size_training 1",
"--model_name $MODEL_NAME",
"--dist_checkpoint_root_folder /home/datascience/outputs",
"--dist_checkpoint_folder fine-tuned"
]))
.with_replica(2)
.with_environment_variable(
MODEL_NAME="meta-llama/Llama-2-7b-hf",
HUGGING_FACE_HUB_TOKEN="<access_token>",
LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib",
)
)
)
kind: job
apiVersion: v1.0
spec:
name: LLAMA2-Fine-Tuning
infrastructure:
kind: infrastructure
spec:
blockStorageSize: 256
compartmentId: "<compartment_ocid>"
logGroupId: "<log_group_id>"
logId: "<log_id>"
projectId: "<project_id>"
subnetId: "<subnet_id>"
shapeName: VM.GPU.A10.2
type: dataScienceJob
runtime:
kind: runtime
type: pyTorchDistributed
spec:
git:
url: https://github.com/facebookresearch/llama-recipes.git
commit: 1aecd00924738239f8d86f342b36bacad180d2b3
command: >-
torchrun llama_finetuning.py
--enable_fsdp
--pure_bf16
--batch_size_training 1
--model_name $MODEL_NAME
--dist_checkpoint_root_folder /home/datascience/outputs
--dist_checkpoint_folder fine-tuned
replicas: 2
conda:
type: service
slug: pytorch20_p39_gpu_v2
dependencies:
pipPackages: >-
--extra-index-url https://download.pytorch.org/whl/cu118 torch==2.1.0
git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4
llama-recipes==0.0.1
appdirs==1.4.4
py7zr==0.20.6
outputDir: /home/datascience/outputs
outputUri: oci://bucket@namespace/outputs/$JOB_RUN_OCID
env:
- name: MODEL_NAME
value: meta-llama/Llama-2-7b-hf
- name: HUGGING_FACE_HUB_TOKEN
value: "<access_token>"
- name: LD_LIBRARY_PATH
value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib
You can create and start the job run API call or ADS CLI.
To create and start running the job:
# Create the job on OCI Data Science
job.create()
# Start a job run
run = job.run()
# Stream the job run outputs (from the first node)
run.watch()
# Use the following command to start the job run
ads opctl run -f your_job.yaml
The job run will:
Setup the PyTorch conda environment and install additional dependencies.
Fetch the source code from GitHub and checkout the specific commit.
Run the training script with the specific arguments, which includes downloading the model and dataset.
Save the outputs to OCI object storage once the training finishes.
Note that in the training command, there is no need specify the number of nodes, or the number of GPUs. ADS will automatically configure that base on the replica
and shape
you specified.
The fine-tuning runs on the samsum dataset by default. You can also add your custom datasets.
Once the fine-tuning is finished, the checkpoints will be saved into OCI object storage bucket as specified. You can load the FSDP checkpoints for inferencing.
The same training script also support Parameter-Efficient Fine-Tuning (PEFT). You can change the command
to the following for PEFT with LoRA. Note that for PEFT, the fine-tuned weights are stored in the location specified by --output_dir
, while for full parameter fine-tuning, the checkpoints are stored in the location specified by --dist_checkpoint_root_folder
and --dist_checkpoint_folder
torchrun llama_finetuning.py --enable_fsdp --use_peft --peft_method lora \
--pure_bf16 --batch_size_training 1 \
--model_name meta-llama/Llama-2-7b-hf --output_dir /home/datascience/outputs