BUILDING A DOCUMENT READING ASSISTANT FOR THE VISUALLY IMPAIRED

Thị Kim Yến
 Thái; Thị Thu Hà
 Nguyễn; Thị Quế Trân
 Võ; Ngô Mỹ Vy
 Huỳnh; Hoàng Yến Nhi
 Trần; Quốc Việt
 Ngô

doi:10.54607/hcmue.js.21.9.4118(2024)

PDF

Date Published: 30/09/2024

Online Published: 30/09/2024

Abstract Views: 75
Views PDF: 70

DOI: https://doi.org/10.54607/hcmue.js.21.9.4118(2024)

Issue

Vol. 21 No. 9 (2024)

Section

Articles

How to Cite

Thái , T. K. Y., Nguyễn , T. T. H., Võ , T. Q. T., Huỳnh , N. M. V., Trần , H. Y. N., & Ngô , Q. V. (2024). BUILDING A DOCUMENT READING ASSISTANT FOR THE VISUALLY IMPAIRED. HCMUE Journal of Science, 21(9), 1623. https://doi.org/10.54607/hcmue.js.21.9.4118(2024)

BUILDING A DOCUMENT READING ASSISTANT FOR THE VISUALLY IMPAIRED

Thị Kim Yến Thái , Thị Thu Hà Nguyễn , Thị Quế Trân Võ , Ngô Mỹ Vy Huỳnh , Hoàng Yến Nhi Trần , Quốc Việt Ngô

Abstract

This study introduces a solution that applies document analysis and recognition technologies to enhance document accessibility for individuals with visual impairments. The objective is to develop an algorithm capable of accurately analysing the content of document components and converting them into voice format. Leveraging the pre-trained YOLOv8 model for document analysis and optical character recognition technology, the image annotation model uses the AIAnytime API and Pix2Tex technology to extract LaTeX code from images, facilitating the conversion of mathematical formulas into spoken words. The research results demonstrate significant progress in effectively supporting document reading, making a meaningful contribution to assistive technology for the visually impaired.

Keywords

document analysis and recognition, document image processing, visual impairments

References

Fayyaz, N., & Khusro, S. (2023). Enhancing Accessibility for the Blind and Visually Impaired: Presenting Semantic Information in PDF Tables. Journal of King Saud University-Computer and Information Sciences, 35(7), 101617.
Furukawa, T. (2021). Recognition of Laser-Printed Characters Based on Creation of New Laser-Printed Characters Datasets. In International Conference on Document Analysis and Recognition (pp. 407-421). Springer International Publishing.
Ganesan, J., Azar, A. T., Alsenan, S., Kamal, N. A., Qureshi, B., & Hassanien, A. E. (2022). Deep learning reader for visually impaired. Electronics, 11(20), 3335.
Gao, L., Yi, X., Jiang, Z., Hao, L., & Tang, Z. (2017, November). ICDAR2017 competition on page object detection. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1417-1422). IEEE.
Kapgate, P., Tidke, S., Fender, R., Rathore, S., Ghodmare, S., & Ritla, L. (2023, April). Raspberry Pi Based Book Reader For Visual Impaired People. In 2023 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP) (pp. 1-6). IEEE.
Kawoosa, H. S., Singh, M., Joshi, M. M., & Goyal, P. (2022, May). NCERT5K-IITRPR: A Benchmark Dataset for Non-textual Component Detection in School Books. In International Workshop on Document Analysis Systems (pp. 461-475). Springer International Publishing.
Khan, M. A., Paul, P., Rashid, M., Hossain, M., & Ahad, M. A. R. (2020). An AI-based visual aid with integrated reading assistant for the completely blind. IEEE Transactions on Human-Machine Systems, 50(6), 507-517.
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2020, May). Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 1918-1925).
Li, M., Xu, Y., Cui, L., Huang, S., Wei, F., Li, Z., & Zhou, M. (2020). DocBank: A benchmark dataset for document layout analysis. arXiv preprint arXiv:2006.01038
Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A. S., & Staar, P. (2022, August). Doclaynet: A large human-annotated dataset for document-layout segmentation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 3743-3751).
Siegel, N., Lourie, N., Power, R., & Ammar, W. (2018, May). Extracting scientific figures with distantly supervised neural networks. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 223-232).
Wang, J., Wang, S., & Zhang, Y. (2023). Artificial intelligence for visually impaired. Displays, 77, Article 102391.
Wang, L. L., Cachola, I., Bragg, J., Cheng, E. Y. Y., Haupt, C., Latzke, M., Kuehl, B., Zuylen, M. V., Wagner, L. & Weld, D. S. (2021). Improving the accessibility of scientific documents: Current state, user needs, and a system solution to enhance scientific PDF accessibility for blind and low vision users. arXiv preprint arXiv:2105.00076
Zhong, X., ShafieiBavani, E., & Jimeno Yepes, A. (2020, August). Image-based table recognition: data, model, and evaluation. In European Conference on Computer Vision (pp. 564-580). Springer International Publishing.
Zhong, X., Tang, J., & Yepes, A. J. (2019, September). Publaynet: largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR) (pp. 1015-1022). IEEE.

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References