Kento Sasaki

I am a graduate student in Informatics at the University of Tsukuba, affiliated with the Communication Understanding Laboratory. My current research focuses on the applications of Vision Language Action (VLA) Models for autonomous driving.

Work Experience

Turing Inc. Research Engineer (April 2023 - present)
Turing Inc. Internship (June 2022 - March 2023)
National Institute for Materials Science, Technical Staff (December 2021 - June 2022)
National Institute for Materials Science, Research Internship (August 2021 - September 2021)

Education

Master of Science in Informatics, University of Tsukuba (2023 - present)
Bachelor of Arts in Library and Information Science, University of Tsukuba (2021 - 2023)
Associate Degree in Electronic Control System Engineering, National Institute of Technology (KOSEN), Numazu College (2015 - 2020)

Publications

*Equal contribution.

Journals

Kento Sasaki, Yohei Seki. Exploration of Commentary Generation Methods Considering the Components of Shogi Commentary Texts, DBSJ Journal Data-Driven Studies, Vol. 2, Article No 3, 2024.

International Conferences

Keishi Ishihara*, Kento Sasaki*, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi. STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes, In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Oral, January 2026.
Hidehisa Arai*, Keita Miwa*, Kento Sasaki*, Yu Yamagichi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto. CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving, In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Oral, pp. 1933-1943, February 2025. [arXiv] [project page]

Workshops

Shingo Yokoi, Kento Sasaki, Yu Yamaguchi. Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos, International Conference on Computer Vision (ICCV), 2nd Workshop on the Challenge Of Out Of Label Hazards In Autonomous Driving (short paper), October 2025.
Kento Sasaki*, Keishi Ishihara*, Tsubasa Takahashi, Yu Yamaguchi. STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes, International Conference on Computer Vision (ICCV), The 1st End-to-End 3D Learning Workshop (short paper), October 2025.
Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi. One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression, Proceedings of the 42nd International Conference on Machine Learning (ICML), Tokenization Workshop, July 2025. [arXiv] [project page]
Yuichi Inoue*, Kento Sasaki*, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi. Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), The 3rd Workshop on Computer Vision in the Wild, June 2024. [arXiv] [leaderboard]

arXiv

Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki. Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness, arXiv preprint arXiv:2510.00517.
Futa Waseda, Shohiro Yamabe, Daiki Shiono, Kento Sasaki, Tsubasa Takahashi. Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models, arXiv preprint arXiv:2512.11899.

Domestic Conferences

三輪敬太*, 荒居秀尚*, 佐々木謙人*, 渡辺晃平, 山口祐. 自動運転のための言語・視覚・動作の統合データセットの構築, 第19回YANSシンポジウム, 2024, S5-P04.
佐々木謙人*, 井ノ上雄一*, 藤井一喜, 棚橋耕太郎, 山口祐. 大規模言語モデルを用いた日本語視覚言語モデルの構築と評価方法の提案, 第27回画像の認識・理解シンポジウム (MIRU), 2024, OS-2A-01.
佐々木謙人, 関洋平. 将棋解説文の構成要素を考慮した解説文生成手法の検討, 第15回データ工学と情報マネジメントに関するフォーラム (DEIM), 2023, 1a-7-5.
佐々木謙人, 関洋平. 将棋解説文の構成要素の定義と判別, ARG 第18回 Webインテリジェンスとインタラクション研究会 (WI2), 2022, pp. 75-78.
佐々木謙人, 山路倍弘，橋本敬之，北本朝展，鈴木静男. 伊豆地域における古文書のディープラーニングを用いた文字認識の予備的調査, GIS -理論と応用-, 2019, Vol. 27, No. 2, p. 159(93).

Awards and Honors

Talks & Media

🗣️ Turing AI Day 2025, (December 2025)
🗣️ CULTURAL LECTURE 2023, National Institute of Technology (KOSEN), Numazu College (October 2023)
📗 MOTOR FAN illustrated Vol. 219, Generative AI and World Models Powering the Tokyo 30 Project

Academic Service

reviewer: WACV’25, ICCV’25 WS