BUILDING A DOCUMENT READING ASSISTANT FOR THE VISUALLY IMPAIRED
Main Article Content
Abstract
This study introduces a solution that applies document analysis and recognition technologies to enhance document accessibility for individuals with visual impairments. The objective is to develop an algorithm capable of accurately analysing the content of document components and converting them into voice format. Leveraging the pre-trained YOLOv8 model for document analysis and optical character recognition technology, the image annotation model uses the AIAnytime API and Pix2Tex technology to extract LaTeX code from images, facilitating the conversion of mathematical formulas into spoken words. The research results demonstrate significant progress in effectively supporting document reading, making a meaningful contribution to assistive technology for the visually impaired.
Keywords
document analysis and recognition, document image processing, visual impairments
Article Details
References
Furukawa, T. (2021). Recognition of Laser-Printed Characters Based on Creation of New Laser-Printed Characters Datasets. In International Conference on Document Analysis and Recognition (pp. 407-421). Springer International Publishing.
Ganesan, J., Azar, A. T., Alsenan, S., Kamal, N. A., Qureshi, B., & Hassanien, A. E. (2022). Deep learning reader for visually impaired. Electronics, 11(20), 3335.
Gao, L., Yi, X., Jiang, Z., Hao, L., & Tang, Z. (2017, November). ICDAR2017 competition on page object detection. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1417-1422). IEEE.
Kapgate, P., Tidke, S., Fender, R., Rathore, S., Ghodmare, S., & Ritla, L. (2023, April). Raspberry Pi Based Book Reader For Visual Impaired People. In 2023 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP) (pp. 1-6). IEEE.
Kawoosa, H. S., Singh, M., Joshi, M. M., & Goyal, P. (2022, May). NCERT5K-IITRPR: A Benchmark Dataset for Non-textual Component Detection in School Books. In International Workshop on Document Analysis Systems (pp. 461-475). Springer International Publishing.
Khan, M. A., Paul, P., Rashid, M., Hossain, M., & Ahad, M. A. R. (2020). An AI-based visual aid with integrated reading assistant for the completely blind. IEEE Transactions on Human-Machine Systems, 50(6), 507-517.
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2020, May). Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 1918-1925).
Li, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., & Zhou, M. (2020). DocBank: A benchmark dataset for document layout analysis. arXiv preprint arXiv:2006.01038
Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A. S., & Staar, P. (2022, August). Doclaynet: A large human-annotated dataset for document-layout segmentation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 3743-3751).
Siegel, N., Lourie, N., Power, R., & Ammar, W. (2018, May). Extracting scientific figures with distantly supervised neural networks. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 223-232).
Wang, J., Wang, S., & Zhang, Y. (2023). Artificial intelligence for visually impaired. Displays, 77, Article 102391.
Wang, L. L., Cachola, I., Bragg, J., Cheng, E. Y. Y., Haupt, C., Latzke, M., Kuehl, B., Zuylen, M. V., Wagner, L. & Weld, D. S. (2021). Improving the accessibility of scientific documents: Current state, user needs, and a system solution to enhance scientific PDF accessibility for blind and low vision users. arXiv preprint arXiv:2105.00076
Zhong, X., ShafieiBavani, E., & Jimeno Yepes, A. (2020, August). Image-based table recognition: data, model, and evaluation. In European Conference on Computer Vision (pp. 564-580). Springer International Publishing.
Zhong, X., Tang, J., & Yepes, A. J. (2019, September). Publaynet: largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1015-1022). IEEE.