Custom model or app

Using a Python function as entrypoint, you can define a custom model to test with Empirical. This method can be also used to test an application, which does pre or post-processing around the LLM call or chains multiple LLM calls together.

Run configuration

In your config file, set type as py-script and specify the Python file path in the path field.

"runs": [
  {
    "type": "py-script",
    "path": "rag.py"
  }
]

You can additional pass following properties in run configuration:

name: string - a custom name to your run
parameters: object - object to pass values to the script to modify its behavior

The Python file is expected to have a method called execute with the following signature:

Arguments
- inputs: dict of key-value pairs with sample inputs
- parameters: dict of key-value pairs with the run parameters
Returns: an output dict with
- value (string): The response from the model/application
- metadata (dict): Custom key-value pairs that are passed on to the scorer and web reporter

rag.py
def execute(inputs):
    # ...
    return {
        "value": output,
        "metadata": {
            "key": value
        }
    }

In a RAG application, metadata can be used to capture the retrieved context.

Example

The RAG example uses this model provider to test a RAG application.

Get started

Model providers

Test dataset

Scoring outputs

Reporter

Run configuration

Example

Get started

Model providers

Test dataset

Scoring outputs

Reporter

​Run configuration

​Example

Run configuration

Example