Extracting entities from images¶
Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.
What it does
The beta extract
function can extract entities from images and text.
How it works
This involves a two-step process: first, a caption is generated for the image that is aligned with the structuring goal. Next, the actual extract operation is performed with an LLM.
Example: identifying dogs
We will extract the breed of each dog in this image:
Model parameters¶
You can pass parameters to the underlying API via the model_kwargs
argument of extract
. These parameters are passed directly to the API, so you can use any supported parameter.
Async support¶
If you are using Marvin in an async environment, you can use extract_async
:
result = await marvin.extract_async(
"I drove from New York to California.",
target=str,
instructions="2-letter state codes",
)
assert result == ["NY", "CA"]
Mapping¶
To extract from a list of inputs at once, use .map
:
inputs = [
"I drove from New York to California.",
"I took a flight from NYC to BOS."
]
result = marvin.extract.map(inputs, target=str, instructions="2-letter state codes")
assert result == [["NY", "CA"], ["NY", "MA"]]
(marvin.extract_async.map
is also available for async environments.)
Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.