Generate-Sequences¶
generate-sequences
is a package created to generate text from auto-regressive pytorch-based models without tears. You can think of it as huggingface generation mixin but for a pytorch model you built from scratch. No need to include it to huggingface echosystem to generate from your model. The package features greedy generation as well as beam search generation. Many sampling techniques are also supported.
Installation¶
generate-sequences
can be installed with pip as follows:
pip install -U generate-sequences
encoder-decoder architecutres¶
In encoder-decoder architecture, the typical use-case is that the model will receive encoder inputs first. These inputs will be passed as batch to the encoder in order to generate tokens from the decoder. The decoder will get the first token as the decoder_start_token_id
then start generating untill generating the eos_token_id
where this indicates the model is done generating with this sequence. However, it will continue generating for other sequences in the batch untill all reached the eos_token_id
in which the generation stops.
You can generate from an encoder-decoder architecture using greedy approach as follows. This also applies for beam search generation.
First, Prepare your encoder inputs:
# usually the sentences are enclosed with bos and eos sentences.
encoder_sentences = [
'<bos> sentence 1 <eos>',
'<bos> sentence 2 <eos>',
'<bos> sentence 3 <eos>',
...
]
# You can also handle the <bos> and <eos> in the tokenizer if you tokenizer supports that
encoder_inputs = tokenizer.tokenize(encoder_sentences)
Then, you need to tell the package how to get the logits from your model at each time step while generating. That is, you many define a method that takes the encoder and decoder inputs, and your model will generate the logits and return them. Usually, you will use the forward method of your model to get the logits, so, the recommended name of this method is generation_forward
but you can name it literaly anything. This method can be as simple as follows:
model = MyModel(...)
def generation_forward(encoder_inputs, decoder_inputs):
# do something when receiving the decoder inputs at each time step
logits = model(encoder_inputs, decoder_inputs)
return logits
then, define the generator as follows, whether being beam search or greedy generation:
from generate_sequences import GreedyGenerator, BeamSearchGenerator
generator = GreedyGenerator(
device=model.device, # make sure to have the same device as your model
batch_size=32, # number of samples to process at each time step
max_length=512, # output max length
generation_forward=generation_forward,
eos_token_id = 1, # replace this with your own
decoder_start_token_id=0, # replace this with your own
)
Then generate:
generator.generate(encoder_inputs=encoder_inputs)
here is the full code in one chunk:
from generate_sequences import GreedyGenerator, BeamSearchGenerator
# usually the sentences are enclosed with bos and eos sentences.
encoder_sentences = [
'<bos> sentence 1 <eos>',
'<bos> sentence 2 <eos>',
'<bos> sentence 3 <eos>',
...
]
# You can also handle the <bos> and <eos> in the tokenizer if you tokenizer supports that
encoder_inputs = tokenizer.tokenize(encoder_sentences)
model = MyModel(...)
def generation_forward(encoder_inputs, decoder_inputs):
# do something when receiving the decoder inputs at each time step
logits = model(encoder_inputs, decoder_inputs)
return logits
generator = GreedyGenerator(
device=model.device, # make sure to have the same device as your model
batch_size=32,
max_length=512,
generation_forward=generation_forward,
eos_token_id = 1, # replace this with your own
decoder_start_token_id=0, # replace this with your own
)
# generate
generator.generate(encoder_inputs=encoder_inputs)
decoder-only architectures¶
In decoder-only architecture, the typical use-case is that the model will receive the decoder inputs at each time-step in order to generate the next tokens. If you want to generate sentences from scractch, you can prompt the decoder with the decoder_start_token_id then the package will continue generating untill reaching the eos_token_id. Here is a sample example:
sentences = [
'<bos> sentence 1', # it is not expected the <eos> to be passed!
"<bos>" # you can also pass the bos only.
]
# You can also handle the <bos> in the tokenizer if you tokenizer supports that
decoder_inputs = tokenizer.tokenize(sentences)
as in the encoder-decoder architecutre, write your generation method as follows. Note that encoder_inputs are still passed but you really do not need to to anything with them.
model = MyModel(...)
def generation_forward(encoder_inputs, decoder_inputs):
# do something when receiving the decoder inputs at each time step
logits = model(decoder_inputs)
return logits
define your generator:
from generate_sequences import GreedyGenerator, BeamSearchGenerator
generator = GreedyGenerator(
device=model.device, # make sure to have the same device as your model
batch_size=32, # number of samples to process at each time step
max_length=512, # output max length
generation_forward=generation_forward,
eos_token_id = 0, # replace this with your own
)
then generate:
generator.generate(decoder_inputs=decoder_inputs)
If the inputs are a set of sentences that are not <bos>
and batch_size
is greater than 1, it is required for the inputs to be of the same shape. Pass your padding token to pad_decoder_inputs
in the generate
method. You can also set the padding side to right
but THIS IS NOT a standard practice in such situation unless you know what you are doing. The typical padding side is left
which is the default value for decoder_inputs_padding_size
parameter.
here is the full code in one chunk:
sentences = [
'<bos> sentence 1', # it is not expected the <eos> to be passed!
"<bos>" # you can also pass the bos only.
]
# You can also handle the <bos> in the tokenizer if you tokenizer supports that
decoder_inputs = tokenizer.tokenize(sentences)
model = MyModel(...)
def generation_forward(encoder_inputs, decoder_inputs):
# do something when receiving the decoder inputs at each time step
logits = model(decoder_inputs)
return logits
from generate_sequences import GreedyGenerator, BeamSearchGenerator
generator = GreedyGenerator(
device=model.device, # make sure to have the same device as your model
batch_size=32, # number of samples to process at each time step
max_length=512, # output max length
generation_forward=generation_forward,
eos_token_id = 1, # replace this with your own
decoder_start_token_id=0, # replace this with your own
)
# generate
generator.generate(decoder_inputs=decoder_inputs)
Additional parameters¶
Below are some useful parameters to be passed to the generator. These parameters can be used regardless of the generation method used.
Sampling¶
generate-sequences
support various sampling methods. To get an idea on sampling, I strongly advice to read this great article (https://huggingface.co/blog/how-to-generate) for more details on generation methods. Consider going over the following points for an overview:
- Setting
multinomial_sampling=True
will generate tokens based on multinomial distribution instead of the default greedy approach. - You can play with the
temperature
by passing a value between 0,1. top_k_sampling
andtop_p_sampling
are also supported.
sort_inputs_by_size
¶
Usually, inputs comes with various lengths. However, this is ineffecient as the padding will always consider the largest sample in the batch. If samples are ordered, then largest samples will be in teh begining, taking more time at the beging and utilizing the padding effectively. As the generation paces over batches, it moves faster. This parameter is True
by default. Usually, you do not want to set it to False
unless you know what you are doing.
return_logits
¶
By setting this parameter to True in the generator, two lists will be returned. The first list is the output ids. The second list is a list of tuples where each tuple is the output token id along with its logit value. Logits are useful for many usecases like calculating perplexity, for instance.