JudgesClient

class martian_apart_hack_sdk.backend_clients.judges.JudgesClient(httpx, config)[source]

The client for the Martian Judges API. Use the JudgesClient to create, update, and list judges.

Normally, you don’t need to create a JudgesClient directly. Instead, use the MartianClient.judges property to access the JudgesClient.

Parameters:
  • httpx (httpx.Client) – The HTTP client to use for the API.

  • config (utils.ClientConfig) – The configuration for the API.

create_judge(judge_id, judge_spec, description=None)[source]

Create a judge.

Parameters:
  • judge_id (str) – An arbitrary identifier (chosen by you) for the judge. You’ll need to use this identifier to reference the judge in other API calls.

  • judge_spec (Union[judge_specs.JudgeSpec, Dict[str, Any]]) – The specification for the judge.

  • description (Optional[str], optional) – The description of the judge, for your own reference.

Returns:

The newly created judge resource.

Return type:

judge_resource.Judge

Raises:
  • ResourceAlreadyExistsError – If a judge with the given ID already exists.

  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out.

update_judge(judge_id, judge_spec)[source]

Update a judge.

Parameters:
  • judge_id (str) – The ID of the judge to update.

  • judge_spec (judge_specs.JudgeSpec) – The new specification for the judge.

Returns:

The new version of the judge.

Judge updates are non-destructive. The updated judge will have an incremented version number. You can use this version number to reference the judge in other API calls. You can also access previous versions of the judge by passing the previous version number to the get method.

Return type:

judge_resource.Judge

Raises:
  • ResourceNotFoundError – If the judge with the given ID does not exist.

  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out.

list()[source]

List all judges in your organization.

Returns:

A list of all judges.

Return type:

list[judge_resource.Judge]

Raises:
  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out.

get(judge_id, version=None)[source]

Get a specific judge by ID and optionally version.

Parameters:
  • judge_id (str) – The ID of the judge to get.

  • version (Optional[int], optional) – The version of the judge to get. If not provided, the latest version will be returned.

Returns:

The judge resource. OR None if the judge does not exist.

Return type:

judge_resource.Judge

Raises:
  • httpx.HTTPError – If the request fails for reasons other than a missing judge.

  • httpx.TimeoutException – If the request times out.

get_versions(judge_id)[source]

Get all versions of a specific judge.

Each time a judge is updated, a new version is created. This method returns all versions of a judge, ordered from newest to oldest.

Parameters:

judge_id (str) – The ID of the judge to get versions for.

Returns:

A list of all versions of the judge, ordered from newest to oldest.

Return type:

List[judge_resource.Judge]

Raises:
  • ResourceNotFoundError – If the judge with the given ID does not exist.

  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out.

render_prompt(judge, completion_request, completion_response)[source]

Render the judging prompt for a judge.

Concatenates the judge’s prescript, rubric, and postscript; evaluates variables in the prompt (e.g. ${min_score}, ${max_score}, ${content}); and returns the rendered prompt.

This is useful for debugging or for getting a sense of what the judge will see, without having to run the judge or call the API.

Parameters:
  • judge (judge_resource.Judge) – The judge to render the prompt for.

  • completion_request (Dict[str, Any]) – The completion request parameters that would be sent to the LLM.

  • completion_response (chat_completion.ChatCompletion) – The completion response from the LLM.

Returns:

The rendered prompt that would be sent to the Judge.

Return type:

str

Raises:
  • ResourceNotFoundError – If the judge with the given ID does not exist.

  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out based on evaluation_timeout config.

evaluate(judge, completion_request, completion_response)[source]

Evaluate an LLM response using a specific judge.

This method sends the completion request and response to the judge for evaluation. The judge will assess the response based on its rubric and return a structured evaluation.

Parameters:
  • judge (judge_resource.Judge) – The judge to use for evaluation.

  • completion_request (Dict[str, Any]) – The original completion request parameters that were sent to the LLM.

  • completion_response (chat_completion.ChatCompletion) – The completion response from the LLM to evaluate.

Returns:

The evaluation results, including:
  • score: The numerical score assigned by the judge

  • reasoning: The judge’s explanation for the score

  • metadata: Additional evaluation metadata

Return type:

JudgeEvaluation

Raises:
  • ResourceNotFoundError – If the judge with the given ID does not exist.

  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out based on evaluation_timeout config.

evaluate_using_judge_spec(judge_spec, completion_request, completion_response)[source]

Evaluate an LLM response using a judge specification directly.

Similar to evaluate(), but instead of using a saved judge, this method accepts a judge specification directly. This is useful for testing new judge specifications before creating a permanent judge.

Parameters:
  • judge_spec (Dict[str, Any]) – The judge specification to use for evaluation.

  • completion_request (Dict[str, Any]) – The original completion request parameters that were sent to the LLM.

  • completion_response (chat_completion.ChatCompletion) – The completion response from the LLM to evaluate.

Returns:

The evaluation results, including:
  • score: The numerical score assigned by the judge

  • reasoning: The judge’s explanation for the score

  • metadata: Additional evaluation metadata

Return type:

JudgeEvaluation

Raises:
  • httpx.HTTPError – If the request fails.

  • httpx.TimeoutException – If the request times out based on evaluation_timeout config.

Example Usage

from martian_apart_hack_sdk import MartianClient
from martian_apart_hack_sdk import judge_specs

# Create a client instance
client = MartianClient(
    api_url="https://api.martian.com",
    api_key="your-api-key"
)

# Create a new judge
judge_spec = judge_specs.RubricJudgeSpec(
    rubric="Rate the response on a scale of 1-10",
    min_score=1,
    max_score=10
)
judge = client.judges.create_judge(
    judge_id="my-judge",
    judge_spec=judge_spec,
    description="A judge that rates responses on a scale of 1-10"
)

# List all judges
all_judges = client.judges.list()

# Get a specific judge
my_judge = client.judges.get("my-judge")

# Get all versions of a judge
judge_versions = client.judges.get_versions("my-judge")

# Update a judge
updated_spec = judge_specs.RubricJudgeSpec(
    rubric="Rate the response on a scale of 1-5",
    min_score=1,
    max_score=5
)
updated_judge = client.judges.update_judge("my-judge", updated_spec)

# Evaluate an LLM response
from openai.types.chat import ChatCompletion
completion_request = {
    "messages": [{"role": "user", "content": "Tell me a joke"}]
}
completion_response = ChatCompletion(
    id="123",
    choices=[{"message": {"role": "assistant", "content": "Why did the chicken cross the road?"}}],
    model="gpt-4"
)
evaluation = client.judges.evaluate(judge, completion_request, completion_response)