Python scorer

You can configure a custom Python evaluator by specifying a py-script evaluator in the scorers section of the configuration. The path key should be the path to the Python script.

"scorers": [
  {
    "type": "py-script",
    "path": "eval.py"
  }
]

In the script, you need to define an evaluate method, with the following signature:

Arguments
- output: dict with key value to get the output value (string) and key metadata to get metadata (dict)
- inputs: dict of key-value pairs from the dataset sample
Returns
- List of results: each result is dict with score (0 or 1), message (string) and name (string)

def evaluate(output, inputs):
    # ...
    return [
        {
            "score": 1,
            "message": "Reason for this score",
            "name": "name-for-this-scorer"
        }
    ]

Example

The HumanEval example uses this scorer.

Python Path

The Python script is executed on your machine using python available in PATH. This determines the Python version that is used.

The Python script can use any Python modules (built-in or third party). If you are using third-party libraries or want to use a specific version of Python, override the Python path while running the CLI.

npx @empiricalrun/cli run --python-path PATH_TO_PYTHON_BINARY

Limitations

The Python script must complete execution within 10 seconds
async Python functions are not supported

Get started

Model providers

Test dataset

Scoring outputs

Reporter

Example

Python Path

Limitations

Get started

Model providers

Test dataset

Scoring outputs

Reporter

​Example

​Python Path

​Limitations

Example

Python Path

Limitations