Weird tokens in gpt-2 token embeddings
Large language models use an embeddings table of tokens to represent words or sub words. Open-ai makes their tokens available for download. However when you inspect some of the longer tokens in the table you cant help but wonder what they're doing there. Here are some of the weirder ones I've found from the three thousand longest tokens.
PsyNetMessage
Something to do with Rocket league it appears.
Yiannopoulos
As in, Milo Yiannopoulos.
Timberwolves
I think the Minnesota Timberwolves. Not sure why though.
ForgeModLoader
A downloader for Minecraft mods?
externalToEVA
Extra-vehicular activity. I got a prompt when writing this that it has something to do with kerbel space program.
SpaceEngineers
Maybe also something to do with kerbel space program? Either that or the obvious.
TPPStreamerBot
Seems like this has something to do with Twitch Plays Pokemon. Again co-pilot prompts me with that suggestion.
TheNitromeFan
This seems to be an entire youtube channel
DragonMagazine
From the 70s and 80s. I think it was a magazine about Dungeons and Dragons. Co-pilot also confirms this.
RandomRedditor
I think this is a bot that posts random reddit comments.
FactoryReloaded
Possibly referring to MineFactoryReloaded
Charlottesville
Referring to the 2017 Charlottesville protests
SolidGoldMagikarp
and GoldMagikarp
Pokemon. I don't know why one is solid though.
RandomRedditorWithNo
Possibly some sort of insult? Or another bot.
Fitzpatrick
No idea, just thought this was funny because Irish.
glyphosate
A weed killer used by Monsanto.
## DragonBound
Could be a book or a game. No idea.
Yanukovych
Refering to Viktor Yanukovych
Christensen
Possibly Clayton Christensen?