Chapter 2: Position Embeddings #1006

Stampede · 2026-04-11T00:27:37Z

Stampede
Apr 11, 2026

Thanks for the book. I am slowly making my way through it. I'm trying to make sure that I understand each part before moving on, and I am getting stuck towards the end of chapter 2.

Code block 1

context_length = max_length = 4
pos_embedding_layer = torch.nn.Embedding(context_length, output_dim)

# uncomment & execute the following line to see how the embedding layer weights look like
print(pos_embedding_layer.weight)

(output below)

Code block 2

pos_embeddings = pos_embedding_layer(torch.arange(max_length))
print(pos_embeddings.shape)

# uncomment & execute the following line to see how the embeddings look like
print(pos_embeddings)

Output of code block 2:

torch.Size([4, 256])
tensor([[ 1.7375, -0.5620, -0.6303,  ..., -0.2277,  1.5748,  1.0345],
        [ 1.6423, -0.7201,  0.2062,  ...,  0.4118,  0.1498, -0.4628],
        [-0.4651, -0.7757,  0.5806,  ...,  1.4335, -0.4963,  0.8579],
        [-0.6754, -0.4628,  1.4323,  ...,  0.8139, -0.7088,  0.4827]],
       grad_fn=<EmbeddingBackward0>)

Output of code block 1:

Parameter containing:
tensor([[ 1.7375, -0.5620, -0.6303,  ..., -0.2277,  1.5748,  1.0345],
        [ 1.6423, -0.7201,  0.2062,  ...,  0.4118,  0.1498, -0.4628],
        [-0.4651, -0.7757,  0.5806,  ...,  1.4335, -0.4963,  0.8579],
        [-0.6754, -0.4628,  1.4323,  ...,  0.8139, -0.7088,  0.4827]],
       requires_grad=True)

So the outputs look exactly the same, except for the last part with grad_fn=... versus requires_grad=True

Questions:

What is the difference between pos_embeddings and pos_embedding_layer and why do the outputs look the same?
What is torch.arange(max_length) doing? I know that it produces a list (tensor?) of integers similar to Python's built-in range() function, so I guess torch.arange has something to do with setting the absolute position of each token for each context window. But I don't see that it had any effect on the tensors that it output.
I probably have more questions but maybe it will clear up when I understand the two above.

Thanks!

casinca · 2026-04-17T10:15:51Z

casinca
Apr 17, 2026

Hello,

Concerning 1:
pos_embedding_layer is the lookup table for the position embeddings, it stores the weights for each position (each row
is a vector that represents the position embedding for that position). In this case since it's initialized to
context_length=4 it stores 4 positions, so for a sequence of maximum length of 4 tokens.

pos_embeddings are the position embeddings retrieved from the pos_embedding_layer.
Here it can be a bit confusing because retrieved with torch.arange(context_length) = 0,1,2,3 = 4 positions, it asks to
retrieve all the positions. But let's say you had passed torch.arange(2) it would only retrieve the first 2 positions,
in the example it would return smt like:

tensor([[ 1.7375, -0.5620, -0.6303,  ..., -0.2277,  1.5748,  1.0345],
        [ 1.6423, -0.7201,  0.2062,  ...,  0.4118,  0.1498, -0.4628],

The output is the same because in this example it's kind of the same thing, ie with .weight you directly look at all the 4 position embeddings in pos_embedding_layer.weight and with pos_embeddings tensor, you're asking for rows 0,1,2,3 (via indexing with torch.arange(max_length)), which is also every row in this case = same output.

Concerning 2:

torch.arange is used for indexing in the weight matrix of the pos_embedding_layer to get the position embeddings up to the
sequence length. For example if you had a sequence length of 2, seq_len=2, it would be torch.arange(seq_len) we retrieve pos for the first 2 tokens (like the snippet above) and not the whole max_len/context_length.

Hope that helps clear things up a bit in the meantime/until @rasbt can add more details or explain better :D

0 replies

arnavsharma7447-lang · 2026-05-12T16:31:17Z

arnavsharma7447-lang
May 12, 2026

Donr do it

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 2: Position Embeddings #1006

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Chapter 2: Position Embeddings #1006

Uh oh!

Stampede Apr 11, 2026

Replies: 2 comments

Uh oh!

casinca Apr 17, 2026

Uh oh!

arnavsharma7447-lang May 12, 2026

Stampede
Apr 11, 2026

casinca
Apr 17, 2026

arnavsharma7447-lang
May 12, 2026