Text classification is one of the most fundamental tasks in Natural Language Processing. This chapter explores how to classify text using both representation models (encoder-based) and generative models (decoder and encoder-decoder models). You’ll learn multiple approaches ranging from zero-shot classification to fine-tuned models, and understand when to use each technique.
{ 'text': [ "the rock is destined to be the 21st century's new 'conan' and that he's going to make a splash even greater than arnold schwarzenegger, jean-claud van damme or steven segal.", "things really get weird, though not particularly scary: the movie is all portent and no content." ], 'label': [1, 0]}
The simplest approach is to use a model that’s already fine-tuned for sentiment analysis.
from transformers import pipeline# Path to our HF modelmodel_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"# Load model into pipelinepipe = pipeline( model=model_path, tokenizer=model_path, return_all_scores=True, device="cuda:0")
This model is based on RoBERTa and has been fine-tuned specifically for sentiment analysis on Twitter data. It can classify text into negative, neutral, and positive categories.
Run inference on the test set:
import numpy as npfrom tqdm import tqdmfrom transformers.pipelines.pt_utils import KeyDataset# Run inferencey_pred = []for output in tqdm(pipe(KeyDataset(data["test"], "text")), total=len(data["test"])): negative_score = output[0]["score"] positive_score = output[2]["score"] assignment = np.argmax([negative_score, positive_score]) y_pred.append(assignment)
Evaluate performance:
from sklearn.metrics import classification_reportdef evaluate_performance(y_true, y_pred): """Create and print the classification report""" performance = classification_report( y_true, y_pred, target_names=["Negative Review", "Positive Review"] ) print(performance)evaluate_performance(data["test"]["label"], y_pred)
This achieves 85% accuracy - better than the task-specific model!
Alternative Approach: Instead of using a classifier, you can average the embeddings per class and use cosine similarity:
import pandas as pdfrom sklearn.metrics.pairwise import cosine_similarity# Average the embeddings of all documents in each target labeldf = pd.DataFrame(np.hstack([train_embeddings, np.array(data["train"]["label"]).reshape(-1, 1)]))averaged_target_embeddings = df.groupby(768).mean().values# Find the best matching embeddings between evaluation documents and target embeddingssim_matrix = cosine_similarity(test_embeddings, averaged_target_embeddings)y_pred = np.argmax(sim_matrix, axis=1)# Evaluate the modelevaluate_performance(data["test"]["label"], y_pred)
This achieves 84% accuracy without training any classifier!
Zero-shot classification doesn’t require any training data - you just provide label descriptions!
# Create embeddings for our labelslabel_embeddings = model.encode(["A negative review", "A positive review"])# Find the best matching label for each documentsim_matrix = cosine_similarity(test_embeddings, label_embeddings)y_pred = np.argmax(sim_matrix, axis=1)evaluate_performance(data["test"]["label"], y_pred)
# Prepare our dataprompt = "Is the following sentence positive or negative? "data = data.map(lambda example: {"t5": prompt + example['text']})
Run inference:
# Run inferencey_pred = []for output in tqdm(pipe(KeyDataset(data["test"], "t5")), total=len(data["test"])): text = output[0]["generated_text"] y_pred.append(0 if text == "negative" else 1)evaluate_performance(data["test"]["label"], y_pred)
Large language models like ChatGPT can perform classification through conversational prompting.
import openai# Create clientclient = openai.OpenAI(api_key="YOUR_KEY_HERE")def chatgpt_generation(prompt, document, model="gpt-3.5-turbo-0125"): """Generate an output based on a prompt and an input document.""" messages = [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": prompt.replace("[DOCUMENT]", document) } ] chat_completion = client.chat.completions.create( messages=messages, model=model, temperature=0 ) return chat_completion.choices[0].message.content
Create a structured prompt:
# Define a prompt template as a baseprompt = """Predict whether the following document is a positive or negative movie review:[DOCUMENT]If it is positive return 1 and if it is negative return 0. Do not give any other answers."""# Predict the target using GPTdocument = "unpretentious, charming, quirky, original"chatgpt_generation(prompt, document) # Returns: '1'
Run on the entire test set (requires API credits):
predictions = [chatgpt_generation(prompt, doc) for doc in tqdm(data["test"]["text"])]# Extract predictionsy_pred = [int(pred) for pred in predictions]# Evaluate performanceevaluate_performance(data["test"]["label"], y_pred)
In Chapter 5, we’ll explore Text Clustering and Topic Modeling, where you’ll learn to discover patterns and topics in unlabeled text collections.Try the notebook yourself: