Let’s chat about Deep Seek

LLMs

DeepSeek

Open weights model

Mixture of experts

Multiheaded latent attention

model distillation

Presentation and reading of DeepSeek whitepapers

Author

Peter O’Connor, Mike Gallimore

Published

February 12, 2024

This event was attended by just Mike and Peter, so we went to the pub and read the whitepaper for two of their recently released models. In particular, Peter helped Mike to understand the math on pages 7-9 of the Deepseek V3 technical report

Additionally, Mike prepared a presentation to communicate the differences between the DeepSeek models and other flagship LLMs. The presentation is available here