RubberDuckFM #15: Princess Mononoke rages against Image Generation

#15: Princess Mononoke rages against Image Generation

MasaがGPT-4o画像生成の仕組みについて、各エンジニアの予想を調査したのでそれについて話します。

Pythonで学ぶ画像生成機械学習実践シリーズ
dataclass で万物に型を付けよう
Limitless Pendant
創作する遺伝子僕が愛したMEMEたち
【トーク】インパルス板倉嫉妬した芸人ベスト10！板倉が抱えていた様々な「言い訳クリスタル」を粉砕した芸人たちを本音で話す！
Mickey 17
Bong Joon Ho
Robert Pattinson
Mickey7
try! Swift Tokyo Timetable
WWDC 2025
Apple Park
Claude 3.7 Sonnet
OpenAI Realtime API
TC39
SeattleJS
Temporal
ts-blank-space
TypeScript syntax not supported by `ts-blank-space`
Oracle justified its JavaScript trademark with Node.js—now it wants that ignored
Sun Microsystems
Oracle JavaScript Extension Toolkit
Princess Mononoke 4K IMAX
Introducing 4o Image Generation
Autoregressive model
Understanding Next Token Prediction
Sora: Creating video from text
Video generation models as world simulators
Bay Bridge 近くのOpenAIオフィスはありました
Golden Gate Bridge
San Francisco–Oakland Bay Bridge
1人目動詞さんの予想
GPT-4oとGemini-2.0の画像生成能力はいかにして作られているのか
[2206.10789] Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
[2110.04627] Vector-quantized Image Modeling with Improved VQGAN
[2309.02591] Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
[2206.03605] Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
[2402.12226] AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
[2404.02905] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
A GPT-4o generated image, 2024年5月
2人目 Sangyun Lee さんの予想
[2310.01400] Sequential Data Generation with Groupwise Diffusion Process
3人目 Wh さんの予想
[2406.11838] Autoregressive Image Generation without Vector Quantization
[2105.01601] MLP-Mixer: An all-MLP Architecture for Vision
条件付き確率分布
4人目 K.Ishi さんの予想
[2408.11039] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
5人目 Saining Xie さんの予想
[2103.00020] Learning Transferable Visual Models From Natural Language Supervision
[2112.10752] High-Resolution Image Synthesis with Latent Diffusion Models
6人目 Nayan Saxena さんの予想
OpenAI image gen actually shows just 5 frames
[2005.14165] Language Models are Few-Shot Learners
4o Image Generation In-Context Learning