Generate-Sequences¶

generate-sequences is a package created to generate text from auto-regressive pytorch-based models without tears. You can think of it as huggingface generation mixin but for a pytorch model you built from scratch. No need to include it to huggingface echosystem to generate from your model. The package features greedy generation as well as beam search generation. Many sampling techniques are also supported.

Installation¶

generate-sequences can be installed with pip as follows:

In [ ]:

Copied!

pip install -U generate-sequences
pip install -U generate-sequences

encoder-decoder architecutres¶

In encoder-decoder architecture, the typical use-case is that the model will receive encoder inputs first. These inputs will be passed as batch to the encoder in order to generate tokens from the decoder. The decoder will get the first token as the decoder_start_token_id then start generating untill generating the eos_token_id where this indicates the model is done generating with this sequence. However, it will continue generating for other sequences in the batch untill all reached the eos_token_id in which the generation stops.

You can generate from an encoder-decoder architecture using greedy approach as follows. This also applies for beam search generation.

First, Prepare your encoder inputs:

In [ ]:

Copied!





# usually the sentences are enclosed with bos and eos sentences.
encoder_sentences = [
    '<bos> sentence 1 <eos>',
    '<bos> sentence 2 <eos>',
    '<bos> sentence 3 <eos>',
    ...
]
# You can also handle the <bos> and <eos> in the tokenizer if you tokenizer supports that
encoder_inputs = tokenizer.tokenize(encoder_sentences)
# usually the sentences are enclosed with bos and eos sentences.
encoder_sentences = [
    ' sentence 1 ',
    ' sentence 2 ',
    ' sentence 3 ',
    ...
]
# You can also handle the  and  in the tokenizer if you tokenizer supports that
encoder_inputs = tokenizer.tokenize(encoder_sentences)

Then, you need to tell the package how to get the logits from your model at each time step while generating. That is, you many define a method that takes the encoder and decoder inputs, and your model will generate the logits and return them. Usually, you will use the forward method of your model to get the logits, so, the recommended name of this method is generation_forward but you can name it literaly anything. This method can be as simple as follows:

In [ ]:

Copied!





model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(encoder_inputs, decoder_inputs)
  return logits
model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(encoder_inputs, decoder_inputs)
  return logits

then, define the generator as follows, whether being beam search or greedy generation:

In [ ]:

Copied!





from generate_sequences import GreedyGenerator, BeamSearchGenerator

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32, # number of samples to process at each time step
    max_length=512, # output max length
    generation_forward=generation_forward,
    eos_token_id = 1, # replace this with your own
    decoder_start_token_id=0, # replace this with your own
)
from generate_sequences import GreedyGenerator, BeamSearchGenerator

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32, # number of samples to process at each time step
    max_length=512, # output max length
    generation_forward=generation_forward,
    eos_token_id = 1, # replace this with your own
    decoder_start_token_id=0, # replace this with your own
)

Then generate:

In [ ]:

Copied!

generator.generate(encoder_inputs=encoder_inputs)
generator.generate(encoder_inputs=encoder_inputs)

here is the full code in one chunk:

In [ ]:

Copied!





from generate_sequences import GreedyGenerator, BeamSearchGenerator

# usually the sentences are enclosed with bos and eos sentences.
encoder_sentences = [
    '<bos> sentence 1 <eos>',
    '<bos> sentence 2 <eos>',
    '<bos> sentence 3 <eos>',
    ...
]
# You can also handle the <bos> and <eos> in the tokenizer if you tokenizer supports that
encoder_inputs = tokenizer.tokenize(encoder_sentences)

model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(encoder_inputs, decoder_inputs)
  return logits

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32,
    max_length=512,
    generation_forward=generation_forward,
    eos_token_id = 1, # replace this with your own
    decoder_start_token_id=0, # replace this with your own
)

# generate
generator.generate(encoder_inputs=encoder_inputs)
from generate_sequences import GreedyGenerator, BeamSearchGenerator

# usually the sentences are enclosed with bos and eos sentences.
encoder_sentences = [
    ' sentence 1 ',
    ' sentence 2 ',
    ' sentence 3 ',
    ...
]
# You can also handle the  and  in the tokenizer if you tokenizer supports that
encoder_inputs = tokenizer.tokenize(encoder_sentences)

model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(encoder_inputs, decoder_inputs)
  return logits

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32,
    max_length=512,
    generation_forward=generation_forward,
    eos_token_id = 1, # replace this with your own
    decoder_start_token_id=0, # replace this with your own
)

# generate
generator.generate(encoder_inputs=encoder_inputs)

decoder-only architectures¶

In decoder-only architecture, the typical use-case is that the model will receive the decoder inputs at each time-step in order to generate the next tokens. If you want to generate sentences from scractch, you can prompt the decoder with the decoder_start_token_id then the package will continue generating untill reaching the eos_token_id. Here is a sample example:

In [ ]:

Copied!





sentences = [
    '<bos> sentence 1', # it is not expected the <eos> to be passed!
    "<bos>" # you can also pass the bos only.
]
# You can also handle the <bos>  in the tokenizer if you tokenizer supports that
decoder_inputs = tokenizer.tokenize(sentences)
sentences = [
    ' sentence 1', # it is not expected the  to be passed!
    "" # you can also pass the bos only.
]
# You can also handle the   in the tokenizer if you tokenizer supports that
decoder_inputs = tokenizer.tokenize(sentences)

as in the encoder-decoder architecutre, write your generation method as follows. Note that encoder_inputs are still passed but you really do not need to to anything with them.

In [ ]:

Copied!





model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(decoder_inputs)
  return logits
model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(decoder_inputs)
  return logits

define your generator:

In [ ]:

Copied!





from generate_sequences import GreedyGenerator, BeamSearchGenerator

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32, # number of samples to process at each time step
    max_length=512, # output max length
    generation_forward=generation_forward,
    eos_token_id = 0, # replace this with your own
)
from generate_sequences import GreedyGenerator, BeamSearchGenerator

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32, # number of samples to process at each time step
    max_length=512, # output max length
    generation_forward=generation_forward,
    eos_token_id = 0, # replace this with your own
)

then generate:

In [ ]:

Copied!

generator.generate(decoder_inputs=decoder_inputs)
generator.generate(decoder_inputs=decoder_inputs)

If the inputs are a set of sentences that are not <bos> and batch_size is greater than 1, it is required for the inputs to be of the same shape. Pass your padding token to pad_decoder_inputs in the generate method. You can also set the padding side to right but THIS IS NOT a standard practice in such situation unless you know what you are doing. The typical padding side is left which is the default value for decoder_inputs_padding_size parameter.

here is the full code in one chunk:

In [ ]:

Copied!





sentences = [
    '<bos> sentence 1', # it is not expected the <eos> to be passed!
    "<bos>" # you can also pass the bos only.
]
# You can also handle the <bos>  in the tokenizer if you tokenizer supports that
decoder_inputs = tokenizer.tokenize(sentences)

model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(decoder_inputs)
  return logits

from generate_sequences import GreedyGenerator, BeamSearchGenerator

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32, # number of samples to process at each time step
    max_length=512, # output max length
    generation_forward=generation_forward,
    eos_token_id = 1, # replace this with your own
    decoder_start_token_id=0, # replace this with your own
)

# generate
generator.generate(decoder_inputs=decoder_inputs)
sentences = [
    ' sentence 1', # it is not expected the  to be passed!
    "" # you can also pass the bos only.
]
# You can also handle the   in the tokenizer if you tokenizer supports that
decoder_inputs = tokenizer.tokenize(sentences)

model = MyModel(...)

def generation_forward(encoder_inputs, decoder_inputs):
  # do something when receiving the decoder inputs at each time step
  logits = model(decoder_inputs)
  return logits

from generate_sequences import GreedyGenerator, BeamSearchGenerator

generator = GreedyGenerator(
    device=model.device, # make sure to have the same device as your model
    batch_size=32, # number of samples to process at each time step
    max_length=512, # output max length
    generation_forward=generation_forward,
    eos_token_id = 1, # replace this with your own
    decoder_start_token_id=0, # replace this with your own
)

# generate
generator.generate(decoder_inputs=decoder_inputs)

Additional parameters¶

Below are some useful parameters to be passed to the generator. These parameters can be used regardless of the generation method used.

Sampling¶

generate-sequences support various sampling methods. To get an idea on sampling, I strongly advice to read this great article (https://huggingface.co/blog/how-to-generate) for more details on generation methods. Consider going over the following points for an overview:

Setting multinomial_sampling=True will generate tokens based on multinomial distribution instead of the default greedy approach.
You can play with the temperature by passing a value between 0,1.
top_k_sampling and top_p_sampling are also supported.

`sort_inputs_by_size`¶

Usually, inputs comes with various lengths. However, this is ineffecient as the padding will always consider the largest sample in the batch. If samples are ordered, then largest samples will be in teh begining, taking more time at the beging and utilizing the padding effectively. As the generation paces over batches, it moves faster. This parameter is True by default. Usually, you do not want to set it to False unless you know what you are doing.

`return_logits`¶

By setting this parameter to True in the generator, two lists will be returned. The first list is the output ids. The second list is a list of tuples where each tuple is the output token id along with its logit value. Logits are useful for many usecases like calculating perplexity, for instance.

Generate-Sequences¶

Installation¶

encoder-decoder architecutres¶

decoder-only architectures¶

Additional parameters¶

Sampling¶

sort_inputs_by_size¶

return_logits¶

`sort_inputs_by_size`¶

`return_logits`¶