#15: Princess Mononoke rages against Image Generation
MasaがGPT-4o画像生成の仕組みについて、各エンジニアの予想を調査したのでそれについて話します。
- Pythonで学ぶ画像生成 機械学習実践シリーズ
- dataclass で万物に型を付けよう
- Limitless Pendant
- 創作する遺伝子 僕が愛したMEMEたち
- 【トーク】インパルス板倉 嫉妬した芸人ベスト10!板倉が抱えていた様々な「言い訳クリスタル」を粉砕した芸人たちを本音で話す!
- Mickey 17
- Bong Joon Ho
- Robert Pattinson
- Mickey7
- try! Swift Tokyo Timetable
- WWDC 2025
- Apple Park
- Claude 3.7 Sonnet
- OpenAI Realtime API
- TC39
- SeattleJS
- Temporal
- ts-blank-space
- TypeScript syntax not supported by `ts-blank-space`
- Oracle justified its JavaScript trademark with Node.js—now it wants that ignored
- Sun Microsystems
- Oracle JavaScript Extension Toolkit
- Princess Mononoke 4K IMAX
- Introducing 4o Image Generation
- Autoregressive model
- Understanding Next Token Prediction
- Sora: Creating video from text
- Video generation models as world simulators
- Bay Bridge 近くのOpenAIオフィスはありました
- Golden Gate Bridge
- San Francisco–Oakland Bay Bridge
- 1人目 動詞 さんの予想
- GPT-4oとGemini-2.0の画像生成能力はいかにして作られているのか
- [2206.10789] Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- [2110.04627] Vector-quantized Image Modeling with Improved VQGAN
- [2309.02591] Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
- [2206.03605] Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
- [2402.12226] AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
- [2404.02905] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
- A GPT-4o generated image, 2024年5月
- 2人目 Sangyun Lee さんの予想
- [2310.01400] Sequential Data Generation with Groupwise Diffusion Process
- 3人目 Wh さんの予想
- [2406.11838] Autoregressive Image Generation without Vector Quantization
- [2105.01601] MLP-Mixer: An all-MLP Architecture for Vision
- 条件付き確率分布
- 4人目 K.Ishi さんの予想
- [2408.11039] Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
- 5人目 Saining Xie さんの予想
- [2103.00020] Learning Transferable Visual Models From Natural Language Supervision
- [2112.10752] High-Resolution Image Synthesis with Latent Diffusion Models
- 6人目 Nayan Saxena さんの予想
- OpenAI image gen actually shows just 5 frames
- [2005.14165] Language Models are Few-Shot Learners
- 4o Image Generation In-Context Learning