Kento Sasaki

Kento Sasaki is a Research Engineer at Turing Inc., leading the development of Vision-Language-Action (VLA) models for autonomous driving. He is also a part-time graduate student in Informatics at University of Tsukuba.

Work Experience

Turing Inc. Research Engineer (April 2023 - present)
Turing Inc. Internship (June 2022 - March 2023)
National Institute for Materials Science, Technical Staff (December 2021 - June 2022)
National Institute for Materials Science, Research Internship (August 2021 - September 2021)

Education

Master of Science in Informatics, University of Tsukuba (2023 - present)
Bachelor of Arts in Library and Information Science, University of Tsukuba (2021 - 2023)
Associate Degree in Electronic Control System Engineering, National Institute of Technology (KOSEN), Numazu College (2015 - 2020)

Selected Publications

*Equal contribution.

STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes (Oral)
Keishi Ishihara*, Kento Sasaki*, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 40(7), pp. 5257-5266, 2026

Abs / arXiv / Code / Dataset / Benchmark /

Bibtex

@article{STRIDEQA,
    author = {Keishi Ishihara and Kento Sasaki and Tsubasa Takahashi and Daiki Shiono and Yu Yamaguchi},
    title = {STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes},
    journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
    volume = {40},
    number = {7},
    pages = {5257--5266},
    year = {2026},
    month = mar,
    doi = {10.1609/aaai.v40i7.37441},
    url = {https://ojs.aaai.org/index.php/AAAI/article/view/37441}
}

CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving (Oral)
Hidehisa Arai*, Keita Miwa*, Kento Sasaki*, Yu Yamaguchi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1933-1943, February 2025

Abs / arXiv / Dataset /

Bibtex

@inproceedings{arai2025covla,
    author = {Hidehisa Arai and Keita Miwa and Kento Sasaki and Yu Yamaguchi and Kohei Watanabe and Shunsuke Aoki and Issei Yamamoto},
    title = {CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    pages = {1933--1943},
    year = {2025},
}

One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi
Proceedings of the 42nd International Conference on Machine Learning (ICML), Tokenization Workshop, July 2025

Abs / arXiv / Code / Model /

Bibtex

@inproceedings{miwa2025onedpiece,
    author = {Keita Miwa and Kento Sasaki and Hidehisa Arai and Tsubasa Takahashi and Yu Yamaguchi},
    title = {One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression},
    booktitle = {ICML Workshop on Tokenization (TokShop)},
    year = {2025},
}

Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese
Yuichi Inoue*, Kento Sasaki*, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi
CVPR, The 3rd Workshop on Computer Vision in the Wild, June 2024

arXiv / Weights & Biases Public Leaderboard /

Bibtex

@inproceedings{inoue2024heron,
    author = {Yuichi Inoue and Kento Sasaki and Yuma Ochi and Kazuki Fujii and Kotaro Tanahashi and Yu Yamaguchi},
    title = {Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese},
    booktitle = {CVPR Workshop on Computer Vision in the Wild},
    year = {2024},
}

Publications

Journals

Kento Sasaki, Tsubasa Takahashi. From Vision and Language to Action: Evolution of Multimodal AI for Autonomous Driving, The Journal of the Institute of Electronics, Information and Communication Engineers, Vol. 109, No. 5, pp. 382–387, 2026.
Kento Sasaki, Yohei Seki. Exploration of Commentary Generation Methods Considering the Components of Shogi Commentary Texts, DBSJ Journal Data-Driven Studies, Vol. 2, Article No 3, 2024.

International Conferences

Futa Waseda, Shojiro Yamabe, Daiki Shiono, Kento Sasaki, Tsubasa Takahashi. Read or Ignore? A Unified Benchmark for Typographic-Attack Robustness and Text Recognition in Vision-Language Models, In Proceedings of the European Conference on Computer Vision (ECCV), September 2026.
Tsubasa Takahashi, Shojiro Yamabe, Futa Waseda, Kento Sasaki. Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness, The Thirteenth International Conference on Learning Representations (ICLR), April 2026.
Keishi Ishihara*, Kento Sasaki*, Tsubasa Takahashi, Daiki Shiono, Yu Yamaguchi. STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes, Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Oral, Vol. 40, No. 7, pp. 5257-5266, January 2026.
Hidehisa Arai*, Keita Miwa*, Kento Sasaki*, Yu Yamaguchi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto. CoVLA: Comprehensive Vision-Language Action Dataset for Autonomous Driving, In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Oral, pp. 1933-1943, February 2025.

Workshops

Shingo Yokoi, Kento Sasaki, Yu Yamaguchi. Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos, International Conference on Computer Vision (ICCV), 2nd Workshop on the Challenge Of Out Of Label Hazards In Autonomous Driving (short paper), October 2025.
Kento Sasaki*, Keishi Ishihara*, Tsubasa Takahashi, Yu Yamaguchi. STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes, International Conference on Computer Vision (ICCV), The 1st End-to-End 3D Learning Workshop (short paper), October 2025.
Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi. One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression, Proceedings of the 42nd International Conference on Machine Learning (ICML), Tokenization Workshop, July 2025.
Yuichi Inoue*, Kento Sasaki*, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi. Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), The 3rd Workshop on Computer Vision in the Wild, June 2024.

国内会議

三輪敬太*, 荒居秀尚*, 佐々木謙人*, 渡辺晃平, 山口祐. 自動運転のための言語・視覚・動作の統合データセットの構築, 第19回YANSシンポジウム, 2024, S5-P04.
佐々木謙人*, 井ノ上雄一*, 藤井一喜, 棚橋耕太郎, 山口祐. 大規模言語モデルを用いた日本語視覚言語モデルの構築と評価方法の提案, 第27回画像の認識・理解シンポジウム (MIRU), 2024, OS-2A-01.
佐々木謙人, 関洋平. 将棋解説文の構成要素を考慮した解説文生成手法の検討, 第15回データ工学と情報マネジメントに関するフォーラム (DEIM), 2023, 1a-7-5.
佐々木謙人, 関洋平. 将棋解説文の構成要素の定義と判別, ARG 第18回 Webインテリジェンスとインタラクション研究会 (WI2), 2022, pp. 75-78.
佐々木謙人, 山路倍弘，橋本敬之，北本朝展，鈴木静男. 伊豆地域における古文書のディープラーニングを用いた文字認識の予備的調査, GIS -理論と応用-, 2019, Vol. 27, No. 2, p. 159(93).

Dataset

Japan Open Driving Dataset
Turing Inc. (served as Project Lead)
Japan Open Driving Dataset is a large-scale autonomous driving dataset comprising over 100 hours of driving data collected in Tokyo, Japan. The data is stored in nuScenes format and can be loaded with the nuscenes-devkit.

Abs / Dataset

RACER
Daichi Nagai, Kento Sasaki
RACER (Rationale-Aware Captioning of Edge-Case Driving Scenarios) is a reasoning caption dataset designed for training vision-language-action (VLA) models in autonomous driving.

Kento Sasaki

Kento Sasaki

Work Experience

Education

Selected Publications

Publications

Journals

International Conferences

Workshops

国内会議

Dataset

Awards and Honors

Talks & Media

Academic Service