ImageExplorer: A Mobile Application to Assist Image Exploration for People with Visual Impairments
TECHNOLOGY NUMBER: 2023-331
OVERVIEW
ImageExplorer is a mobile app that empowers blind and visually impaired users to independently explore and verify image content through multi-layered, touch-based interaction, addressing the limitations of automated image captions.
- Multi-layered touch interface allows users to explore object details and spatial relationships within images.
- Unlocks new opportunities for accessible digital experiences, especially on social platforms and photo-sharing services, by enabling greater control over image comprehension for users who typically rely on error-prone AI captions.
BACKGROUND
Digital images are everywhere—social media, news, e-commerce—but remain inaccessible for many blind and visually impaired (BVI) users. These individuals depend on alternative text (“alt-text”), but coverage is poor: less than 0.1% of images on platforms like Twitter contain usable alt-text. AI-generated captions are increasingly used but often provide inaccurate or incomplete information, leading users to false conclusions.
The demand for scalable, accurate, and user-verifiable image understanding tools is growing. Accessibility lawsuits, compliance regulations, and inclusive digital design are driving social platforms, tech companies, and organizations to seek more robust solutions. Trends show a shift from static alt-text towards interactive, user-driven access methods, creating a significant market need for smarter, flexible image exploration technologies.
INNOVATION
ImageExplorer transforms how BVI users interact with images by combining two key concepts: touch-based scene exploration and hierarchical information layers. Instead of relying solely on a flawed single-sentence AI caption, users can physically navigate image elements and their subcomponents on a mobile device using simple gestures.
ImageExplorer intelligently detects and organizes image content using multiple deep learning models, presents primary objects and their relationships as “first layer” information, and unlocks detailed subcomponents (“second layer”) when users want to drill down. Audio cues guide users throughout, and progress feedback ensures no elements are missed.
This dual-layer, touch-driven approach gives users direct, multi-dimensional access to image data, encouraging healthy skepticism of AI results and enabling independent verification of captions. Compared to current solutions (like Facebook’s text summaries or Seeing AI’s single-layer touch), ImageExplorer empowers users with more autonomy, accuracy, and control – fulfilling unmet accessibility needs and enabling adoption in consumer and enterprise contexts.