Skip to content

Weird tokens in gpt-2 token embeddings

Large language models use an embeddings table of tokens to represent words or sub words. Open-ai makes their tokens available for download. However when you inspect some of the longer tokens in the table you cant help but wonder what they're doing there. Here are some of the weirder ones I've found from the three thousand longest tokens.

PsyNetMessage

Something to do with Rocket league it appears.

Yiannopoulos

As in, Milo Yiannopoulos.

Timberwolves

I think the Minnesota Timberwolves. Not sure why though.

ForgeModLoader

A downloader for Minecraft mods?

externalToEVA

Extra-vehicular activity. I got a prompt when writing this that it has something to do with kerbel space program.

SpaceEngineers

Maybe also something to do with kerbel space program? Either that or the obvious.

TPPStreamerBot

Seems like this has something to do with Twitch Plays Pokemon. Again co-pilot prompts me with that suggestion.

TheNitromeFan

This seems to be an entire youtube channel

DragonMagazine

From the 70s and 80s. I think it was a magazine about Dungeons and Dragons. Co-pilot also confirms this.

RandomRedditor

I think this is a bot that posts random reddit comments.

FactoryReloaded

Possibly referring to MineFactoryReloaded

Charlottesville

Referring to the 2017 Charlottesville protests

SolidGoldMagikarp and GoldMagikarp

Pokemon. I don't know why one is solid though.

RandomRedditorWithNo

Possibly some sort of insult? Or another bot.

Fitzpatrick

No idea, just thought this was funny because Irish.

glyphosate

A weed killer used by Monsanto.

##  DragonBound Could be a book or a game. No idea.

Yanukovych

Refering to Viktor Yanukovych

Christensen

Possibly Clayton Christensen?