Let’s chat about Deep Seek

LLMs
DeepSeek
Open weights model
Mixture of experts
Multiheaded latent attention
model distillation
Presentation and reading of DeepSeek whitepapers
Author

Peter O’Connor, Mike Gallimore

Published

February 12, 2024

Image: DeepSeek logo

This event was attended by just Mike and Peter, so we went to the pub and read the whitepaper for two of their recently released models. In particular, Pete helped Mike to understand the math on pages 7-9 of the Deepseek V3 technical report

Additionally, Mike prepared a presentation to communicate the differences between the DeepSeek models and other flagship LLMs. The presentation is available here