Kento Sasaki
Kento Sasaki is a Research Engineer at Turing Inc., leading the development of Vision-Language-Action (VLA) models for autonomous driving. He is also a part-time graduate student in Informatics at University of Tsukuba.
Work Experience
- Turing Inc. Research Engineer (April 2023 - present)
- Turing Inc. Internship (June 2022 - March 2023)
- National Institute for Materials Science, Technical Staff (December 2021 - June 2022)
- National Institute for Materials Science, Research Internship (August 2021 - September 2021)
Education
- Master of Science in Informatics, University of Tsukuba (2023 - present)
- Bachelor of Arts in Library and Information Science, University of Tsukuba (2021 - 2023)
- Associate Degree in Electronic Control System Engineering, National Institute of Technology (KOSEN), Numazu College (2015 - 2020)
Selected Publications
*Equal contribution.

STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes (Oral)
Keishi Ishihara*, Kento Sasaki*, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi
AAAI Conference on Artificial Intelligence (AAAI), January 2026
Keishi Ishihara*, Kento Sasaki*, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi
AAAI Conference on Artificial Intelligence (AAAI), January 2026
Abs / arXiv / Code / Dataset / Benchmark /
Bibtex
@article{STRIDEQA,
author = {Keishi Ishihara and Kento Sasaki and Tsubasa Takahashi and Daiki Shiono and Yu Yamaguchi},
title = {STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
volume = {40},
number = {7},
pages = {5257--5266},
year = {2026},
month = mar,
doi = {10.1609/aaai.v40i7.37441},
url = {https://ojs.aaai.org/index.php/AAAI/article/view/37441}
}
CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving (Oral)
Hidehisa Arai*, Keita Miwa*, Kento Sasaki*, Yu Yamagichi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1933-1943, February 2025
Hidehisa Arai*, Keita Miwa*, Kento Sasaki*, Yu Yamagichi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1933-1943, February 2025
Abs / arXiv / Dataset /
Bibtex
@inproceedings{arai2025covla,
author = {Hidehisa Arai and Keita Miwa and Kento Sasaki and Yu Yamagichi and Kohei Watanabe and Shunsuke Aoki and Issei Yamamoto},
title = {CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
pages = {1933--1943},
year = {2025},
}
One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi
Proceedings of the 42nd International Conference on Machine Learning (ICML), Tokenization Workshop, July 2025
Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi
Proceedings of the 42nd International Conference on Machine Learning (ICML), Tokenization Workshop, July 2025
Abs / arXiv / Code / Model /
Bibtex
@inproceedings{miwa2025onedpiece,
author = {Keita Miwa and Kento Sasaki and Hidehisa Arai and Tsubasa Takahashi and Yu Yamaguchi},
title = {One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression},
booktitle = {ICML Workshop on Tokenization (TokShop)},
year = {2025},
}
Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese
Yuichi Inoue*, Kento Sasaki*, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi
CVPR, The 3rd Workshop on Computer Vision in the Wild, June 2024
Yuichi Inoue*, Kento Sasaki*, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi
CVPR, The 3rd Workshop on Computer Vision in the Wild, June 2024
arXiv / Weights & Biases Public Leaderboard /
Bibtex
@inproceedings{inoue2024heron,
author = {Yuichi Inoue and Kento Sasaki and Yuma Ochi and Kazuki Fujii and Kotaro Tanahashi and Yu Yamaguchi},
title = {Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese},
booktitle = {CVPR Workshop on Computer Vision in the Wild},
year = {2024},
}Publications
Journals
- Kento Sasaki, Tsubasa Takahashi. From Vision and Language to Action: Evolution of Multimodal AI for Autonomous Driving, The Journal of the Institute of Electronics, Information and Communication Engineers, Vol. 109, No. 5, pp. 382–387, 2026.
- Kento Sasaki, Yohei Seki. Exploration of Commentary Generation Methods Considering the Components of Shogi Commentary Texts, DBSJ Journal Data-Driven Studies, Vol. 2, Article No 3, 2024.
International Conferences
- Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki. Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness, In Proceedings of the International Conference on Learning Representations (ICLR), April 2026.
- Keishi Ishihara*, Kento Sasaki*, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi. STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes, In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Oral, January 2026.
- Hidehisa Arai*, Keita Miwa*, Kento Sasaki*, Yu Yamagichi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto. CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving, In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Oral, pp. 1933-1943, February 2025.
Workshops
- Shingo Yokoi, Kento Sasaki, Yu Yamaguchi. Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos, International Conference on Computer Vision (ICCV), 2nd Workshop on the Challenge Of Out Of Label Hazards In Autonomous Driving (short paper), October 2025.
- Kento Sasaki*, Keishi Ishihara*, Tsubasa Takahashi, Yu Yamaguchi. STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes, International Conference on Computer Vision (ICCV), The 1st End-to-End 3D Learning Workshop (short paper), October 2025.
- Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi. One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression, Proceedings of the 42nd International Conference on Machine Learning (ICML), Tokenization Workshop, July 2025.
- Yuichi Inoue*, Kento Sasaki*, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi. Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), The 3rd Workshop on Computer Vision in the Wild, June 2024.
arXiv
- Futa Waseda, Shohiro Yamabe, Daiki Shiono, Kento Sasaki, Tsubasa Takahashi. Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models, arXiv preprint arXiv:2512.11899.
国内会議
- 三輪敬太*, 荒居秀尚*, 佐々木謙人*, 渡辺晃平, 山口祐. 自動運転のための言語・視覚・動作の統合データセットの構築, 第19回YANSシンポジウム, 2024, S5-P04.
- 佐々木謙人*, 井ノ上雄一*, 藤井一喜, 棚橋耕太郎, 山口祐. 大規模言語モデルを用いた日本語視覚言語モデルの構築と評価方法の提案, 第27回画像の認識・理解シンポジウム (MIRU), 2024, OS-2A-01.
- 佐々木謙人, 関洋平. 将棋解説文の構成要素を考慮した解説文生成手法の検討, 第15回データ工学と情報マネジメントに関するフォーラム (DEIM), 2023, 1a-7-5.
- 佐々木謙人, 関洋平. 将棋解説文の構成要素の定義と判別, ARG 第18回 Webインテリジェンスとインタラクション研究会 (WI2), 2022, pp. 75-78.
- 佐々木謙人, 山路倍弘,橋本敬之,北本朝展,鈴木静男. 伊豆地域における古文書のディープラーニングを用いた文字認識の予備的調査, GIS -理論と応用-, 2019, Vol. 27, No. 2, p. 159(93).
Dataset

Japan Open Driving Dataset
Turing Inc. (served as Project Lead)
Japan Open Driving Dataset is a large-scale autonomous driving dataset comprising over 100 hours of driving data collected in Tokyo, Japan. The data is stored in nuScenes format and can be loaded with the nuscenes-devkit.
Turing Inc. (served as Project Lead)
Japan Open Driving Dataset is a large-scale autonomous driving dataset comprising over 100 hours of driving data collected in Tokyo, Japan. The data is stored in nuScenes format and can be loaded with the nuscenes-devkit.
Awards and Honors
- GENIAC (Generative AI Accelerator Challenge) Social Implementation Award
- ICCV 2025 2COOOL Competition, 2nd Place Winner
- ICML 2025 Tokenization Workshop (TokShop) Best Paper Runner-up Award
- YANS 2024 Encouragement Award
- MIRU 2024 Student Encouragement Award
- University of Tsukuba Almni Association Ezaki Award 2023
- DEIM 2023 Excellent Interactive Award
- DEIM 2023 Sponsor Award (LayerX Inc.)
- ARG 18th Workshop on WI2 Excellent Research Award
- 28th GISA Conference Poster Session Award
- Suzuki Education & Culture Foundation Scholarship
Talks & Media
- 日経Robotics 2026年6月号 チューリングがVLMの新学習手法、キャプション文字の画像化で実画像に迫る性能 (2026年6月)
- TURING AI DAY 2025 (2025年12月)
- MOTOR FAN illustrated Vol. 219 [チューリングのAD開発]Tokyo30プロジェクト実現のキモは生成AIと世界モデル (2024年12月)
- 沼津高専令和5年度文化講演会 (2023年10月)
Academic Service
- reviewer: WACV’25, ICCV’25 WS

Zenn