Decoder-Only Architecture¶

In this notebook, we are going to run the .generate method of huggingface as well as the generation process of generate-sequences. The architecture we are going to run on is a decoder-only architecture, a GPT-like architecture.

In [1]:

Copied!





import torch
from tqdm.auto import tqdm
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from generate_sequences import GreedyGenerator
import torch
from tqdm.auto import tqdm
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from generate_sequences import GreedyGenerator

Load the Model and Dataset¶

In [2]:

Copied!

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model_name = "gpt2"  # You can choose other variants like 'gpt2-medium', 'gpt2-large', 'gpt2-xl'
model = GPT2LMHeadModel.from_pretrained(model_name).to(device)

tokenizer = GPT2Tokenizer.from_pretrained(model_name)
tokenizer.pad_token=tokenizer.decode(model.generation_config.bos_token_id)
tokenizer.padding_side = 'left'
device = 'cuda' if torch.cuda.is_available() else 'cpu'

model_name = "gpt2"  # You can choose other variants like 'gpt2-medium', 'gpt2-large', 'gpt2-xl'
model = GPT2LMHeadModel.from_pretrained(model_name).to(device)

tokenizer = GPT2Tokenizer.from_pretrained(model_name)
tokenizer.pad_token=tokenizer.decode(model.generation_config.bos_token_id)
tokenizer.padding_side = 'left'

/home/majed_alshaibani/.virtualenvs/generate-sequences/lib/python3.10/site-packages/torch/cuda/__init__.py:628: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")

Preparation¶

In [3]:

Copied!





# prompts to generate
input_texts = [
    "Once upon a time",
    "The quick brown fox",
    "Last night I dreamed",
    "In the heart of the city",
    "At the edge of the world",
]
MAX_LENGTH = 50
BATCH_SIZE = 2
# prompts to generate
input_texts = [
    "Once upon a time",
    "The quick brown fox",
    "Last night I dreamed",
    "In the heart of the city",
    "At the edge of the world",
]
MAX_LENGTH = 50
BATCH_SIZE = 2

In [4]:

Copied!





def get_batches(texts, batch_size):
    """Yield successive n-sized batches from texts."""
    for i in range(0, len(texts), batch_size):
        yield texts[i:i + batch_size]
def get_batches(texts, batch_size):
    """Yield successive n-sized batches from texts."""
    for i in range(0, len(texts), batch_size):
        yield texts[i:i + batch_size]

Generate text using HuggingFace `generate` method¶

In [5]:

Copied!





generated_texts = []
for batch in tqdm(get_batches(input_texts, BATCH_SIZE), desc="Generating Texts"):
    # Tokenize batch
    encoded_input = tokenizer(
        batch,
        padding=True,
        return_tensors="pt",
    ).to(device)

    # Generate text
    output = model.generate(
        input_ids=encoded_input["input_ids"],
        attention_mask=encoded_input["attention_mask"],
        max_length=MAX_LENGTH,  # Max length of the generated text
    )

    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)

# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")
generated_texts = []
for batch in tqdm(get_batches(input_texts, BATCH_SIZE), desc="Generating Texts"):
    # Tokenize batch
    encoded_input = tokenizer(
        batch,
        padding=True,
        return_tensors="pt",
    ).to(device)

    # Generate text
    output = model.generate(
        input_ids=encoded_input["input_ids"],
        attention_mask=encoded_input["attention_mask"],
        max_length=MAX_LENGTH,  # Max length of the generated text
    )

    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)

# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")

Generating Texts: 0it [00:00, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Input: Once upon a time
Generated: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a

Input: The quick brown fox
Generated: The quick brown foxes are a great way to get a little bit of a kick out of your dog.

The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown fox

Input: Last night I dreamed
Generated: Last night I dreamed of a day when I could go to the beach and swim with my friends. I was so excited to be back in the ocean. I was so excited to be back in the ocean. I was so excited to be

Input: In the heart of the city
Generated: In the heart of the city, the city of San Francisco is a city of people. It's a place where people come together to celebrate, to celebrate, to celebrate. It's a place where people come together to celebrate, to celebrate, to

Input: At the edge of the world
Generated: At the edge of the world, the world is a place of great beauty. The world is a place of great fear. The world is a place of great fear. The world is a place of great fear. The world is a place of great

Generate with sampling¶

In [6]:

Copied!





generated_texts = []
for batch in tqdm(get_batches(input_texts, BATCH_SIZE), desc="Generating Texts"):
    # Tokenize batch
    encoded_input = tokenizer(batch, return_tensors="pt", padding=True).to(device)
    
    # Generate text
    output = model.generate(
        input_ids=encoded_input["input_ids"],
        attention_mask=encoded_input["attention_mask"],
        max_length=MAX_LENGTH,  # Max length of the generated text
        top_k=50,  # Limits the sampling pool to the top_k tokens
        top_p=0.95,  # Nucleus sampling: sample only from top_p probability mass
        temperature=0.7,  # Sampling temperature: lower value -> more conservative, higher value -> more random
        do_sample=True  # Enable sampling
    )
    
    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)
    
# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")
generated_texts = []
for batch in tqdm(get_batches(input_texts, BATCH_SIZE), desc="Generating Texts"):
    # Tokenize batch
    encoded_input = tokenizer(batch, return_tensors="pt", padding=True).to(device)
    
    # Generate text
    output = model.generate(
        input_ids=encoded_input["input_ids"],
        attention_mask=encoded_input["attention_mask"],
        max_length=MAX_LENGTH,  # Max length of the generated text
        top_k=50,  # Limits the sampling pool to the top_k tokens
        top_p=0.95,  # Nucleus sampling: sample only from top_p probability mass
        temperature=0.7,  # Sampling temperature: lower value -> more conservative, higher value -> more random
        do_sample=True  # Enable sampling
    )
    
    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)
    
# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")

Generating Texts: 0it [00:00, ?it/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Input: Once upon a time
Generated: Once upon a time, you might have heard about the "Halloween Horror" phenomenon. The Halloween Horror is a Halloween film that was made for a Halloween convention in New York City and screened at the 2015 Halloween Horror Film Festival. The film had a

Input: The quick brown fox
Generated: The quick brown fox is a bit more challenging as the fox can't even be seen unless you're looking closely. The fox also has a tendency to go straight at you, and it's more difficult to get your eyes on the fox if you're

Input: Last night I dreamed
Generated: Last night I dreamed about being the first person to actually see what it was like to be in a place like this. It was amazing. I was so honored to be able to be a part of it. I really feel like I'm

Input: In the heart of the city
Generated: In the heart of the city is the Church of the Holy Trinity. The Trinity is the living God and Father of all things, the Savior of the world, the Creator and Ruler of all things.

In the Bible, God is the Father

Input: At the edge of the world
Generated: At the edge of the world, the men of my village would come to my tent to meet me at the door.

I said nothing.

The men of my village were the men of the city. I said nothing.

Generate with generate-sequences, greedy generation¶

In [7]:

Copied!

def generation_forward(encoder_inputs, decoder_inputs):
    return model(input_ids=decoder_inputs).logits
def generation_forward(encoder_inputs, decoder_inputs):
    return model(input_ids=decoder_inputs).logits

In [8]:

Copied!





gpt2_greedy_generator = GreedyGenerator(
    use_tqdm=True,
    batch_size=BATCH_SIZE,
    max_length=MAX_LENGTH,
    device=model.device,
    generation_forward=generation_forward,
    eos_token_id=model.generation_config.eos_token_id,
    decoder_start_token_id=model.generation_config.decoder_start_token_id,
)
gpt2_greedy_generator = GreedyGenerator(
    use_tqdm=True,
    batch_size=BATCH_SIZE,
    max_length=MAX_LENGTH,
    device=model.device,
    generation_forward=generation_forward,
    eos_token_id=model.generation_config.eos_token_id,
    decoder_start_token_id=model.generation_config.decoder_start_token_id,
)

In [9]:

Copied!





generated_texts = []
for batch in get_batches(input_texts, BATCH_SIZE):
    # Tokenize batch
    encoded_input = tokenizer(batch, return_tensors="pt", padding=True).to(device)
    # Generate text
    output = gpt2_greedy_generator.generate(
        encoder_inputs=None,
        decoder_inputs=encoded_input["input_ids"],
        pad_decoder_inputs=tokenizer.bos_token_id,
    )
    
    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)
    
# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")
generated_texts = []
for batch in get_batches(input_texts, BATCH_SIZE):
    # Tokenize batch
    encoded_input = tokenizer(batch, return_tensors="pt", padding=True).to(device)
    # Generate text
    output = gpt2_greedy_generator.generate(
        encoder_inputs=None,
        decoder_inputs=encoded_input["input_ids"],
        pad_decoder_inputs=tokenizer.bos_token_id,
    )
    
    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)
    
# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")

Generating Sequences:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Sequences:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Sequences: 0it [00:00, ?it/s]

Input: Once upon a time
Generated: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a

Input: The quick brown fox
Generated: The quick brown foxes are a great way to get a little bit of a kick out of your dog.

The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown fox

Input: Last night I dreamed
Generated: Last night I dreamed of a day when I could go to the beach and swim with my friends. I was so excited to see the ocean, the waves, the waves. I was so excited to see the ocean, the waves, the

Input: In the heart of the city
Generated: In the heart of the city, the city of San Francisco is a city of people. It's a place where people come together to celebrate, to celebrate, to celebrate. It's a place where people come together to celebrate, to celebrate, to

Input: At the edge of the world
Generated: At the edge of the world, the world is a place of great beauty. The world is a place of great fear. The world is a place of great fear. The world is a place of great fear. The world is a place of great

Generate with generate-sequences, greedy with sampling¶

In [10]:

Copied!

def generation_forward(encoder_inputs, decoder_inputs):
    return model(input_ids=decoder_inputs).logits
def generation_forward(encoder_inputs, decoder_inputs):
    return model(input_ids=decoder_inputs).logits

In [11]:

Copied!





gpt2_greedy_generator = GreedyGenerator(
    use_tqdm=True,
    top_k_sampling=50,
    top_p_sampling=0.95,
    device=model.device,
    batch_size=BATCH_SIZE,
    max_length=MAX_LENGTH,
    multinomial_sampling=True,
    generation_forward=generation_forward,
    eos_token_id=model.generation_config.eos_token_id,
    decoder_start_token_id=model.generation_config.decoder_start_token_id,
)
gpt2_greedy_generator = GreedyGenerator(
    use_tqdm=True,
    top_k_sampling=50,
    top_p_sampling=0.95,
    device=model.device,
    batch_size=BATCH_SIZE,
    max_length=MAX_LENGTH,
    multinomial_sampling=True,
    generation_forward=generation_forward,
    eos_token_id=model.generation_config.eos_token_id,
    decoder_start_token_id=model.generation_config.decoder_start_token_id,
)

In [12]:

Copied!





generated_texts = []
for batch in get_batches(input_texts, BATCH_SIZE):
    # Tokenize batch
    encoded_input = tokenizer(batch, return_tensors="pt", padding=True).to(device)
    # Generate text
    output = gpt2_greedy_generator.generate(
        encoder_inputs=None,
        decoder_inputs=encoded_input["input_ids"],
        pad_decoder_inputs=tokenizer.bos_token_id,
    )
    
    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)
    
# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")
generated_texts = []
for batch in get_batches(input_texts, BATCH_SIZE):
    # Tokenize batch
    encoded_input = tokenizer(batch, return_tensors="pt", padding=True).to(device)
    # Generate text
    output = gpt2_greedy_generator.generate(
        encoder_inputs=None,
        decoder_inputs=encoded_input["input_ids"],
        pad_decoder_inputs=tokenizer.bos_token_id,
    )
    
    # Decode generated texts
    batch_generated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    generated_texts.extend(batch_generated_texts)
    
# Print all collected results
for input_text, generated_text in zip(input_texts, generated_texts):
    print(f"Input: {input_text}\nGenerated: {generated_text}\n")

Generating Sequences:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Sequences:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Sequences: 0it [00:00, ?it/s]

Input: Once upon a time
Generated: Once upon a time when every thought and emotion of the human mind was to be consumed by the same thought and emotion, we are confronted with a false and utterly ungrateful reality. Our ignorance is the only thing that can bring about the correct mental

Input: The quick brown fox
Generated: The quick brown fox out of the corner of my eye and I realised I'd found my spot on this list for the best price. My sister and I, our only child, just had started school in the summer of 2015 so we'd not been

Input: Last night I dreamed
Generated: Last night I dreamed about how beautiful and beautiful this summer's beautiful people were.

The day after the premiere of my new book 'The End of History', I was having dinner in the park at the time of the premiere at the

Input: In the heart of the city
Generated: In the heart of the city, two-thirds of Chicago's schools don't have a superintendent.

While the city offers some flexibility in terms of whether or not district leaders can appoint schools superintendent, the mayor says he is not making public education

Input: At the edge of the world
Generated: At the edge of the world, he was the first person on Earth who took on more energy. His heart didn't want anything to do with it. His body was empty as his body had been built in the beginning. It's because he has