Classifying text¶
Marvin has a powerful classification tool that can be used to categorize text into predefined labels. It uses a logit bias technique that is faster and more accurate than traditional LLM approaches. This capability is essential across a range of applications, from categorizing user feedback and tagging issues to managing inputs in natural language interfaces.
What it does
The classify
function categorizes text from a set of provided labels. @classifier
is a class decorator that allows you to instantiate Enums with natural language.
Example: categorize user feedback
Categorize user feedback into labels such as "bug", "feature request", or "inquiry":
How it works
Marvin enumerates your options, and uses a clever logit bias trick to force the LLM to deductively choose the index of the best option given your provided input. It then returns the choice associated with that index.
Providing labels¶
Marvin's classification tool is designed to accommodate a variety of label formats, each suited to different use cases.
Lists¶
When quick, ad-hoc categorization is required, a simple list of strings is the most straightforward approach. For example:
Example: sentiment analysis
Enums¶
For applications where classification labels are more structured and recurring, Enums provide an organized and maintainable solution:
from enum import Enum
import marvin
class RequestType(Enum):
SUPPORT = "support request"
ACCOUNT = "account issue"
INQUIRY = "general inquiry"
request = marvin.classify("Reset my password", RequestType)
assert request == RequestType.ACCOUNT
This approach not only enhances code readability but also ensures consistency across different parts of an application.
Booleans¶
For cases where the classification is binary, Booleans are a simple and effective solution. As a simple example, you could map natural-language responses to a yes/no question to a Boolean label:
Literals¶
In scenarios where labels are part of the function signatures or need to be inferred from type hints, Literal
types are highly effective. This approach is particularly useful in ensuring type safety and clarity in the codebase:
from typing import Literal
import marvin
RequestType = Literal["support request", "account issue", "general inquiry"]
request = marvin.classify("Reset my password", RequestType)
assert request == "account issue"
Providing instructions¶
The instructions
parameter in classify()
offers an additional layer of control, enabling more nuanced classification, especially in ambiguous or complex scenarios.
Gentle guidance¶
For cases where the classification needs a slight nudge for accuracy, gentle instructions can be very effective:
comment = "The interface is confusing."
category = marvin.classify(
comment,
["usability feedback", "technical issue", "feature request"],
instructions="Consider it as feedback if it's about user experience."
)
assert category == "usability feedback"
Details and few-shot examples¶
In more complex cases, where the context and specifics are crucial for accurate classification, detailed instructions play a critical role:
# Classifying a task based on project specifications
project_specs = {
"Frontend": "Tasks involving UI design, CSS, and JavaScript.",
"Backend": "Tasks related to server, database, and application logic.",
"DevOps": "Tasks involving deployment, CI/CD, and server maintenance."
}
task_description = "Set up the server for the new application."
task_category = marvin.classify(
task_description,
labels=list(project_specs.keys()),
instructions="Match the task to the project category based on the provided specifications."
)
assert task_category == "Backend"
Enums as classifiers¶
While the primary focus is on the classify
function, Marvin also includes the classifier
decorator. Applied to Enums, it enables them to be used as classifiers that can be instantiated with natural language. This interface is particularly handy when dealing with a fixed set of labels commonly reused in your application.
@marvin.classifier
class IssueType(Enum):
BUG = "bug"
IMPROVEMENT = "improvement"
FEATURE = "feature"
issue = IssueType("There's a problem with the login feature")
assert issue == IssueType.BUG
While convenient for certain scenarios, it's recommended to use the classify
function for its greater flexibility and broader application range.
Model parameters¶
You can pass parameters to the underlying API via the model_kwargs
argument of classify
or @classifier
. These parameters are passed directly to the API, so you can use any supported parameter.
Best practices¶
- Choosing the right labels: Opt for labels that are mutually exclusive and collectively exhaustive for your classification context. This ensures clarity and prevents overlaps in categorization.
- Effective use of instructions: Provide clear, concise, and contextually relevant instructions. This enhances the accuracy of the classification, especially in ambiguous or complex cases.
- Iterative testing and refinement: Continuously test and refine your classification criteria and instructions based on real-world feedback. This iterative process helps in fine-tuning the classification logic for better results.
- Prefer
classify()
over@classifier
:classify()
is more versatile and adaptable for a wide range of scenarios. It should be the primary tool for classification tasks in Marvin.
Async support¶
If you are using Marvin in an async environment, you can use classify_async
:
result = await marvin.classify_async(
"The app crashes when I try to upload a file.",
labels=["bug", "feature request", "inquiry"]
)
assert result == "bug"
Mapping¶
To classify a list of inputs at once, use .map
:
inputs = [
"The app crashes when I try to upload a file.",
"How do change my password?"
]
result = marvin.classify.map(inputs, ["bug", "feature request", "inquiry"])
assert result == ["bug", "inquiry"]
(marvin.classify_async.map
is also available for async environments.)
Mapping automatically issues parallel requests to the API, making it a highly efficient way to classify multiple inputs at once. The result is a list of classifications in the same order as the inputs.