Replies: 2 comments
-
|
Hello, Concerning 1:
The output is the same because in this example it's kind of the same thing, ie with .weight you directly look at all the 4 position embeddings in
Concerning 2: torch.arange is used for indexing in the weight matrix of the Hope that helps clear things up a bit in the meantime/until @rasbt can add more details or explain better :D |
Beta Was this translation helpful? Give feedback.
-
|
Donr do it |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thanks for the book. I am slowly making my way through it. I'm trying to make sure that I understand each part before moving on, and I am getting stuck towards the end of chapter 2.
Code block 1
(output below)
Code block 2
Output of code block 2:
Output of code block 1:
So the outputs look exactly the same, except for the last part with
grad_fn=...versusrequires_grad=TrueQuestions:
pos_embeddingsandpos_embedding_layerand why do the outputs look the same?torch.arange(max_length)doing? I know that it produces a list (tensor?) of integers similar to Python's built-inrange()function, so I guesstorch.arangehas something to do with setting the absolute position of each token for each context window. But I don't see that it had any effect on the tensors that it output.Thanks!
Beta Was this translation helpful? Give feedback.
All reactions