Judge Specifications
===================

.. currentmodule:: martian_apart_hack_sdk.judge_specs

The Martian SDK provides several types of judges that can be used to evaluate model outputs. Each judge type has its own specification class that defines its behavior.

RubricJudgeSpec
--------------

.. autoclass:: RubricJudgeSpec
   :members:
   :exclude-members: to_dict

Example Usage
~~~~~~~~~~~~

.. code-block:: python

   from martian_apart_hack_sdk import judge_specs

   # Create a rubric judge that evaluates responses on a scale of 1-5
   rubric = """
   You are tasked with evaluating whether a restaurant recommendation is good.
   The scoring is as follows:
   - 1: If the recommendation doesn't meet any of the criteria.
   - 2: If the recommendation meets only some small part of the criteria.
   - 3: If the recommendation is reasonable, but not perfect.
   - 4: If the recommendation is almost perfect.
   - 5: If the recommendation is perfect.
   """
   
   judge_spec = judge_specs.RubricJudgeSpec(
       model_type="rubric_judge",
       rubric=rubric,
       model="openai/openai/gpt-4o",
       min_score=1,
       max_score=5,
   )

Other Judge Types
----------------

The following judge types are also available but are primarily used internally or in advanced use cases:

- **GoldMatchJudge**: Similar to RubricJudge but specialized for comparing responses against known good answers.
- **MaxScoreJudge**: Takes multiple judges and returns the highest score among them.
- **MinScoreJudge**: Takes multiple judges and returns the lowest score among them.
- **ConstantJudge**: Always returns a fixed score and reason.
- **AverageScoreJudge**: Takes multiple judges and returns their average score.
- **SumJudge**: Takes multiple judges and returns the sum of their scores.
- **ExactMatchJudge**: Checks if responses exactly match a set of known answers.
- **CaseJudge**: Applies different judges based on conditional logic.

For most use cases, the RubricJudgeSpec is recommended as it provides the most flexibility and natural language understanding capabilities.