Compare Prompts#
In this guide we compare prompts
Environment setup#
In this guide, we use the OpenAI API
pip install openai
export OPENAI_API_KEY="sk-..."
Data preparation#
We load a publically available congressional bill summarization dataset from HuggingFace.
We also prepare an example bill with its summary to include in a prompt as an example response.
import pandas as pd
from datasets import load_dataset
billsum = load_dataset("billsum")
billsum_df = pd.DataFrame(billsum["ca_test"]).sample(10, random_state=278487)
example_bill = billsum["test"][6]["text"]
example_bill_summary = billsum["test"][6]["summary"]
LLM response generation#
We use two different prompt templates to generate responses
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
gpt35 = ChatOpenAI(temperature=0.0, max_tokens=100)
prompt0_template= PromptTemplate(
input_variables=["text"],
template="""
Text: {text}
Summary:
"""
)
prompt1_template = PromptTemplate(
input_variables=["text", "example_bill", "example_bill_summary"],
template="""
You are an expert summarizer of legal text. A good summary
captures the most important information in the text and doesnt focus too much on small details.
Make sure to use your expert legal knowledge in summarizing.
===
Text: {example_bill}
Summary: {example_bill_summary}
===
Text: {text}
Summary:
"""
)
prompt0_chain = LLMChain(llm=gpt35, prompt=prompt0_template)
prompt1_chain = LLMChain(llm=gpt35, prompt=prompt1_template)
# generate summaries with truncated text
prompt0_summaries = [prompt0_chain.run(bill[:3000]) for bill in billsum_df.text]
prompt1_summaries = [
prompt1_chain({"text" : bill[:3000], "example_bill" : example_bill, "example_bill_summary" : example_bill_summary})["text"]
for bill in billsum_df.text
]
Create test suite#
For this test suite, we will use BERTScore to measure how much the candidate summaries approach the reference summaries by upgrading our prompt with task-specific detail and an example.
from arthur_bench.run.testsuite import TestSuite
my_suite = TestSuite(
"congressional_bills_to_reference",
"bertscore",
input_text_list=list(billsum_df.text),
reference_output_list=list(billsum_df.summary)
)
Run test suite#
my_suite.run("prompt0_summaries", candidate_output_list=prompt0_summaries)
my_suite.run("prompt1_summaries", candidate_output_list=prompt1_summaries)
View results#
Run bench from your command line to visualize the run results comparing the different temperature settings.