marvin.utilities.strings
Module for string utilities.
count_tokens
¶
Counts the number of tokens in the given text using the specified model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to count tokens in. |
required |
model
|
str
|
The model to use for token counting. If not provided, the default model is used. |
None
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of tokens in the text. |
detokenize
¶
Detokenizes the given tokens using the specified model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tokens
|
list[int]
|
The tokens to detokenize. |
required |
model
|
str
|
The model to use for detokenization. If not provided, the default model is used. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The detokenized text. |
slice_tokens
¶
Slices the given text to the specified number of tokens.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to slice. |
required |
n_tokens
|
int
|
The number of tokens to slice the text to. |
required |
model
|
str
|
The model to use for token counting. If not provided, the default model is used. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The sliced text. |
tokenize
¶
Tokenizes the given text using the specified model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to tokenize. |
required |
model
|
str
|
The model to use for tokenization. If not provided, the default model is used. |
None
|
Returns:
Type | Description |
---|---|
list[int]
|
list[int]: The tokenized text as a list of integers. |