arthur_bench.scoring.python_unit_testing.PythonUnitTesting#

class arthur_bench.scoring.python_unit_testing.PythonUnitTesting(unit_test_dir: str | None = None, unit_tests: List[str] | None = None)#

Wrapping the HuggingFace code_eval metric

Scores each candidate_output as a function against a pre-prepared unit test

Note: considers any code with non-standard python libraries (e.g. numpy) to have an error

https://huggingface.co/spaces/evaluate-metric/code_eval

__init__(unit_test_dir: str | None = None, unit_tests: List[str] | None = None)#

Methods

__init__([unit_test_dir, unit_tests])

arun(candidate_outputs[, reference_outputs, ...])

Async version of run method.

arun_batch(candidate_batch[, ...])

Async version of run_batch method.

categories()

All possible values returned by the scorer if output type is categorical.

from_dict(config)

Load a scorer from a json configuration file.

is_categorical()

Whether the scorer is continuous or categorical.

name()

Get the name of this Scorer :return: the Scorer name

requires_reference()

True if scorer requires reference output to compute score, False otherwise

run(candidate_outputs[, reference_outputs, ...])

Score a set of test cases.

run_batch(candidate_batch[, ...])

Score a batch of candidate generations.

to_dict([warn])

Provides a json serializable representation of the scorer.

to_metadata()

type()

Supplies whether a scorer is built-in or custom.