
Semantic and Perceptual Captions
Yesterday, Deepseek released the “Janus-Pro”, which is a multimodal embedding and text-to-image generation model. The abstract reminded me of a talk I gave in 2016, where I mentioned that “data, compute and algorithm” are the three driving forces of AI. This was a common understanding at that time. The tech report starts with “In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates, (1) an optimised training strategy, (2) expanded training data, and (3) scaling to larger model size”.
The three driving forces still remain unbeaten and play crucial roles in AI advancement.

