LLMs

Token is a unit of "thinking" of LLM. Usually it is words, endings, suffixes. For example 2 tokens: read + ing. Using words instead of letters allows models to be more efficient, as single token already bears some meaning.

There are 2 main kinds of LLMs: Masked language model and Autoregressive language model. Masked language model trained to fill in the blanks, like predicting value of missing token in a sentence. Good for non-generative usage, such as classification, or task requiring understanding of all existing context. Autoregressive language model trained to fill next token taking into consideration all existing context.