A new study reveals AI struggles with telling time and reading calendars. Despite its prowess in complex tasks, it fails to interpret basic clock images and dates. These findings, discussed at an ICLR conference, underscore significant gaps in AI’s ability to handle perceptual and reasoning tasks, raising concerns about its future applications in time-sensitive fields.
In a rather eye-opening revelation, a recent study has highlighted that artificial intelligence models struggle with some surprisingly basic tasks. These include reading an analogue clock and determining the day of the week based on a given date. While AI continues to impress with its ability to write code and create lifelike images, it seems that something as fundamental as telling time is far beyond its grasp.
The findings, presented at the 2025 International Conference on Learning Representations (ICLR) and also shared on arXiv on March 18, indicate a glaring shortfall in AI capabilities. Study lead author Rohit Saxena from the University of Edinburgh emphasized, “Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people.” Unless these issues are tackled, integrating AI into tasks that require time sensitivity could run into serious roadblocks.
To uncover these shortcomings, researchers devised a unique dataset filled with images of clocks and calendars. They utilized several multimodal large language models (MLLMs), including Meta’s Llama 3.2-Vision and OpenAI’s GPT-4o. The outcome? Disappointingly poor. In fact, over half the time, these models could not decipher the correct hour from a clock image or identify the day of the week corresponding to a specified date.
Saxena provided some context on these failures: “Early systems were trained based on labelled examples. Clock reading requires something different — spatial reasoning. The model has to detect overlapping hands, measure angles and navigate diverse designs like Roman numerals or stylized dials.” This complexity is far more challenging than simply recognizing that an image is a clock.
Calendar dates presented an equally hard nut to crack for the AI systems. When faced with questions like “What day is the 153rd day of the year?”, these models had a tough time. They managed correct clock readings just 38.7% of the time and calendar interpretations only hit a mere 26.3%. Surprisingly, these failures occur even though arithmetic calculations are foundational to computing. As Saxena explained, AI doesn’t engage in traditional math; instead, it predicts outcomes based on patterns from training data.
The research shines a light on the growing body of work that reveals fundamental differences between human and AI reasoning. AI tends to perform well with enough examples in its training data but flounders when attempting tasks that require generalization or abstract reasoning. Saxena noted, “What for us is a very simple task like reading a clock may be very hard for them, and vice versa.”
Moreover, there’s a glaring issue with AI’s training limitations, especially with rare occurrences like leap years. Although LLMs can describe leap years conceptually, they often can’t apply this knowledge practically. This indicates that more targeted examples and a better approach to logical and spatial reasoning in AI training are urgently needed.
Ultimately, the study underscores the importance of cautious trust in AI systems, especially when they’re called upon to blend perception and logic. As Saxena cautions, “AI is powerful, but when tasks mix perception with precise reasoning, we still need rigorous testing, fallback logic, and in many cases, a human in the loop.” The consequences of underestimating AI’s limits could be significant, turning what seems like minor flaws into major pitfalls in real-world applications.
In summary, this study brings to light serious deficiencies in AI’s abilities to perform fundamental tasks like reading clocks and calendars. These shortcomings highlight a significant gap that needs to be addressed if AI is to be effectively integrated into time-sensitive applications. As AI continues to improve, more conscious efforts must be directed at enhancing its spatial reasoning and logical comprehension, particularly for tasks it’s rarely trained on. Overall, reliance on AI for basic reasoning could be misleading or even dangerous.
Original Source: www.livescience.com