arthur_bench.scoring.summary_quality.SummaryQuality#

class arthur_bench.scoring.summary_quality.SummaryQuality(llm: BaseChatModel | None = None, context_window: int = 4096, tokenizer: Encoding | None = None)#

Comprehensive measure of summarization quality compared to a reference summary.

__init__(llm: BaseChatModel | None = None, context_window: int = 4096, tokenizer: Encoding | None = None)#

Methods

__init__([llm, context_window, tokenizer])

arun(candidate_outputs[, reference_outputs, ...])

Async version of run method.

arun_batch(candidate_batch[, ...])

Summary quality requires input_text_batch.

categories()

All possible values returned by the scorer if output type is categorical.

from_dict(config)

Load a scorer from a json configuration file.

is_categorical()

Whether the scorer is continuous or categorical.

name()

Get the name of this Scorer :return: the Scorer name

requires_reference()

True if scorer requires reference output to compute score, False otherwise

run(candidate_outputs[, reference_outputs, ...])

Score a set of test cases.

run_batch(candidate_batch[, ...])

Summary quality requires input_text_batch.

to_dict([warn])

Provides a json serializable representation of the scorer.

to_metadata()

type()

Supplies whether a scorer is built-in or custom.

validate_batch(candidate_batch[, ...])