A MULTI-LABEL IMAGE CLASSIFICATION METHOD BASED ON CONVOLUTIONAL NEURAL NETWORK

Văn Thịnh Nguyễn , Văn Lăng Trần , Thế Thành Văn

Main Article Content

Abstract

 

Multi-label image classification is one of the critical and challenging tasks in computer vision. In this paper, a multi-label image classification method is proposed based on the Graph Convolutional Network (GCN) to exploit the relationship between object labels in the dataset and between objects in the image to improve accuracy. First, the image content is representation learning by a convolutional neural network (CNN), and GCN relies on the scene graph of the image. Then, the graph describing the dependency between object labels in the dataset is built as the basis for learning classifiers for the labels using GCN and applying these classifiers to the image feature to generate predicted scores. Finally, the entire network is trained using the traditional multi-label classification loss. Experiments are built and evaluated on the dataset, which is the intersection between Visual Genome and MS COCO. The results show that the proposed method is effective and superior to some recently published works.

 

Article Details

References

Cevikalp, H., Benligiray, B., & Gerek, O. N. (2020). Semi-supervised robust deep neural networks for multi-label image classification. Pattern Recognition, 100, 107164.
Chen, B., Li, J., Lu, G., Yu, H., & Zhang, D. (2020). Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification. IEEE journal of biomedical and health informatics, 24(8), 2292-2302.
Chen, X., Li, L.-J., Fei-Fei, L., & Gupta, A. (2018). Iterative visual reasoning beyond convolutions. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Chen, Z. M., Wei, X. S., Wang, P., & Guo, Y. (2019). Multi-label image recognition with graph convolutional networks. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems, 29.
Ge, Z., Mahapatra, D., Sedai, S., Garnavi, R., & Chakravorty, R. (2018). Chest x-rays classification: A multi-label and fine-grained problem. arXiv preprint arXiv:1807.07247.
Gonçalves, E. C., Freitas, A. A., & Plastino, A. (2018). A survey of genetic algorithms for multi-label classification. Paper presented at the 2018 IEEE Congress on Evolutionary Computation (CEC).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D., Bernstein, M., & Fei-Fei, L. (2015). Image retrieval using scene graphs. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., . . . Shamma, D. A. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123(1), 32-73.
Kumar, V., Aggarwal, D., Bathwal, V., & Singh, S. (2021). A Novel Approach to Scene Graph Vectorization. Paper presented at the 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS).
Lanchantin, J., Wang, T., Ordonez, V., & Qi, Y. (2021). General multi-label image classification with transformers. Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Li, Y., Huang, C., Loy, C. C., & Tang, X. (2016). Human attribute recognition by deep hierarchical contexts. Paper presented at the European conference on computer vision.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., . . . Zitnick, C. L. (2014). Microsoft coco: Common objects in context. Paper presented at the European conference on computer vision.
Lu, J., Xiong, C., Parikh, D., & Socher, R. (2017). Knowing when to look: Adaptive attention via a visual sentinel for image captioning. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Maheshwari, P., Chaudhry, R., & Vinay, V. (2021). Scene graph embeddings using relative similarity supervision. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
Marino, K., Salakhutdinov, R., & Gupta, A. (2016). The more you know: Using knowledge graphs for image classification. arXiv preprint arXiv:1612.04844.
Milewski, V., Moens, M.-F., & Calixto, I. (2020). Are scene graphs good enough to improve image captioning? arXiv preprint arXiv:2009.12313.
Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., & Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model cnns. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Paper presented at the Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80.
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., & Xu, W. (2016). Cnn-rnn: A unified framework for multi-label image classification. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
Wang, Y., He, D., Li, F., Long, X., Zhou, Z., Ma, J., & Wen, S. (2020). Multi-label classification with label graph superimposing. Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence.
Wei, X.-S., Cui, Q., Yang, L., Wang, P., & Liu, L. (2019). RPC: A large-scale retail product checkout dataset. arXiv preprint arXiv:1901.07249.
Yang, X., Tang, K., Zhang, H., & Cai, J. (2019). Auto-encoding scene graphs for image captioning. Paper presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Zhang, S., Tong, H., Xu, J., & Maciejewski, R. (2019). Graph convolutional networks: a comprehensive review. Computational Social Networks, 6(1), 1-23.
Zhu, F., Li, H., Ouyang, W., Yu, N., & Wang, X. (2017). Learning spatial regularization with image-level supervisions for multi-label image classification. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.