arthur_bench.scoring.python_unit_testing.PythonUnitTesting#
- class arthur_bench.scoring.python_unit_testing.PythonUnitTesting(unit_test_dir: str | None = None, unit_tests: List[str] | None = None)#
Wrapping the HuggingFace code_eval metric
Scores each candidate_output as a function against a pre-prepared unit test
Note: considers any code with non-standard python libraries (e.g. numpy) to have an error
https://huggingface.co/spaces/evaluate-metric/code_eval
- __init__(unit_test_dir: str | None = None, unit_tests: List[str] | None = None)#
Methods
__init__([unit_test_dir, unit_tests])arun(candidate_outputs[, reference_outputs, ...])Async version of run method.
arun_batch(candidate_batch[, ...])Async version of run_batch method.
All possible values returned by the scorer if output type is categorical.
from_dict(config)Load a scorer from a json configuration file.
Whether the scorer is continuous or categorical.
name()Get the name of this Scorer :return: the Scorer name
True if scorer requires reference output to compute score, False otherwise
run(candidate_outputs[, reference_outputs, ...])Score a set of test cases.
run_batch(candidate_batch[, ...])Score a batch of candidate generations.
to_dict([warn])Provides a json serializable representation of the scorer.
to_metadata()type()Supplies whether a scorer is built-in or custom.