ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 原著論文

Benchmarking GPT-5 Performance and Repeatability on the Japanese National Examination for Radiological Technologists over the Past Decade (2016–2025)

https://repo.qst.go.jp/records/2001857
https://repo.qst.go.jp/records/2001857
d696f32b-0df5-43b7-9a73-90477ec995c2
アイテムタイプ 学術雑誌論文 / Journal Article(1)
公開日 2025-12-23
タイトル
タイトル Benchmarking GPT-5 Performance and Repeatability on the Japanese National Examination for Radiological Technologists over the Past Decade (2016–2025)
言語 en
言語
言語 eng
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_6501
資源タイプ journal article
著者 Umehara Kensuke

× Umehara Kensuke

Umehara Kensuke

Search repository
Ota Junko

× Ota Junko

Ota Junko

Search repository
Tatsuya Nishii

× Tatsuya Nishii

Tatsuya Nishii

Search repository
Kishimoto Riwa

× Kishimoto Riwa

Kishimoto Riwa

Search repository
Takayuki Ishida

× Takayuki Ishida

Takayuki Ishida

Search repository
抄録
内容記述タイプ Abstract
内容記述 Purpose:To evaluate GPT-5 against GPT-4o on the Japanese National Examination for Radiological Technologists (2016–2025), assessing accuracy, repeatability, and factors influencing performance differences.Materials and methods:We analyzed 1992 multiple-choice questions involving text and images, spanning the medical and engineering domains. Both models answered all questions in Japanese under identical conditions across three independent runs. Majority-vote accuracy (correct if ≥ 2 of 3 runs were correct) and first-attempt accuracy were compared using McNemar’s test. Repeatability was quantified with Fleiss’ κ. Univariable and multivariable analyses were conducted to identify question-level factors associated with GPT-5 improvements.Results:Across all 10 examination years, GPT-5 achieved a majority-vote accuracy of 92.8 % (95 % CI: 91.5–93.8), consistently outperforming GPT-4o at 72.4 % (95 % CI: 70.4–74.4; P < .001). Repeatability was higher for GPT-5 (κ = 0.925, 95 % CI: 0.915–0.935) than for GPT-4o (κ = 0.904, 95 % CI: 0.894–0.914), with correct answers in all three runs for 88.2 % vs. 68.9 % of items. GPT-5 performed better than GPT-4o in text-based (96.5 % vs. 78.1 %) and image-based questions (72.6 % vs. 41.9 %). Significant improvements were observed for MRI, CT, and radiography images; however, performance improvements were smaller for clinically oriented ultrasound and nuclear medicine images. The greatest advantages were observed in calculation questions (97.3 % vs. 39.3 %) and engineering-related domains, consistent with external benchmarks highlighting GPT-5’s improved reasoning.Conclusion:GPT-5 demonstrated significantly higher accuracy and repeatability than GPT-4o over a decade of examination, with improvements in quantitative reasoning, engineering content, and diagram interpretation. Although improvements extended to medical images, performance in clinical image interpretation remains limited.
書誌情報 European Journal of Radiology Artificial Intelligence

巻 5, p. 100064, 発行日 2025-12
出版者
出版者 Elsevier
ISSN
収録物識別子タイプ ISSN
収録物識別子 3050-5771
DOI
識別子タイプ DOI
関連識別子 10.1016/j.ejrai.2025.100064
戻る
0
views
See details
Views

Versions

Ver.1 2026-01-06 05:25:18.013283
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3