Skip to content

marvin.utilities.strings

Module for string utilities.

count_tokens

Counts the number of tokens in the given text using the specified model.

Parameters:

Name Type Description Default
text str

The text to count tokens in.

required
model str

The model to use for token counting. If not provided, the default model is used.

None

Returns:

Name Type Description
int int

The number of tokens in the text.

detokenize

Detokenizes the given tokens using the specified model.

Parameters:

Name Type Description Default
tokens list[int]

The tokens to detokenize.

required
model str

The model to use for detokenization. If not provided, the default model is used.

None

Returns:

Name Type Description
str str

The detokenized text.

slice_tokens

Slices the given text to the specified number of tokens.

Parameters:

Name Type Description Default
text str

The text to slice.

required
n_tokens int

The number of tokens to slice the text to.

required
model str

The model to use for token counting. If not provided, the default model is used.

None

Returns:

Name Type Description
str str

The sliced text.

tokenize

Tokenizes the given text using the specified model.

Parameters:

Name Type Description Default
text str

The text to tokenize.

required
model str

The model to use for tokenization. If not provided, the default model is used.

None

Returns:

Type Description
list[int]

list[int]: The tokenized text as a list of integers.