Contributing#

We welcome contributions and feedback from the community!

Creating a custom scorer#

All scorers should inherit from the Scorer base class and provide a custom implementation of the run_batch method.

A scorer can leverage any combination of input texts, context texts, and reference texts to score candidate generations. All computed scores must be float values where a higher value indicates a better score. If you have a scorer that does not fit these constraints, please get in touch with the Arthur team.

Steps for adding a custom scorer:

Install bench from source, in development mode:
```
pip install -e . 
```
Add your Scorer implementation in a new file in arthur_bench/scoring. For scorers that require prompt templating, we use the LangChain library.
Register your scorer by adding it to the scorer enum in arthur_bench/models/models.py

At this point, you should be able to create test suites with your new scorer and test your implementation locally.

Contributing your scorer:

Fork the bench repository and create a pull request from your fork. This Github guide provides more in depth instructions.
Your scorer docstring should use Sphinx format for compatibility with documentation.
Provide unit tests for the scorer in a separate file in the test directory.