DISCRIMINATIVE MOTIF FINDING TO PREDICT HCV TREATMENT OUTCOMES WITH A SEMI-SUPERVISED FEATURE SELECTION METHOD
Main Article Content
Abstract
Hepatitis C treatment is currently facing many challenges, such as high costs of medicines, side effects in patients, and low success rates with Hepatitis C Virus genotype 1b (HCV-1b). In order to identify what characteristics of HCV-1b cause drug resistance, many sequence analysis methods are conducted, and bio-markers helping to predict failure rates are also proposed. However, the results may be imprecise when these methods work with a dataset having a small number of labeled sequences and short length sequences. In this paper, we aim to predict outcomes of the HCV-b treatment and characterize the properties of HCV-b by using the combination of a feature selection and semi supervised learning. Our proposed framework improves the prediction accuracy about 5% to 8% in comparison with previous methods. In addition, we obtain a set of good discriminative subsequences that could be considered as biological signals for predicting a response or resistance to HCV-1b therapy.
Keywords
discriminative motif, hepatitis C virus, sequential forward floating selection, semi-supervised feature selection
Article Details
References
Bailey, T. L., Boden, M. B., Whitington, T., & Machanick, P. (2010). The value of position-specific priors in motif discovery using meme. BMC Bioinformatics, 11(1).
Chayama, K., Tsubota, A., Kobayashi, M., Okamoto, K., Hashimoto, M., Miyano, Y.,… & Kumada, H. (1997). Pretreatment virus load and multiple amino acid substitutions in the interferon sensitivity - determining region predict the outcome of interferon treatment in patients with chronic genotypes 1h hepatitis C virus infection. Journal of Hepatology, 25(3), 745-749.
Chen, X., Nie, F., Yuan, G., & Huang, J. Z. (2017). Semi-supervised feature selection via rescaled linear regression. Proceedings of the 26th International Joint Conference on Artificial Intelligence.
Chin, A., Mirzal, A., Haron, H., & Hamed, H. (2016). Supervised, unsupervised, and semi-supervised feature selection: A review on gene election. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13.
El-Shamy, A., Shoji, I., Saito, T., Watanabe, H., Ide, Y., Deng, L.,… & Hotta, H. (2011). Sequence heterogeneity of NS5A and core proteins of hepatitis C virus and virological responses to pegylated-interferon/ribavirin combination therapy. Microbiology and Immunology, 55, 418-426.
Enomoto, N., Sakuma, N., Asahina, I., Kurosaki, Y., Murakami, M., Yamamoto, T.,… & Chifumi Sato, M. D. (1996). Mutations in nonstructural protein 5A gene and response to interferon in patients with chronic hepatitis C virus 1b infection. The New England Journal of Medicine, 334, 77-81.
Gao, M., Nettles, R. E., Belema, M., Snyder, L. B., Nguyen, V. N., Fridell, R. A.,… & Hamann, L. G. (2010). Chemical genetics strategy identifies an HCV NS5A inhibitor with a potent clinical effect. Nature Letters, 465, 96-100.
Han, J., & Kamber, M. (2006). Data mining concepts and techniques. Diane Cerra.
Kim, J. K., & Choi, S. (2011). Probabilistic models for semi-supervised discriminative motif discovery in DNA sequences. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5).
Lin, T., Murphy, R. F., & Bar-Joseph, Z. (2011). Discriminative motif finding for predicting protein subcellular localization. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(2).
Manns, M., McHutchison, J. G., Gordon, S. C., Rustgi, V. K., Shiffman, M., Reindollar, R.,… & Albrecht, J. K. (2001). Peginterferon alfa-2b plus ribavirin compared with interferon alfa-2b plus ribavirin for initial treatment of chronic hepatitis C: A randomised trial. The Lancet, 358, 985-965.
Pedregosa., F., Varoquaux, G., Gramfort., A., Michel., V., Thirion, B., Grisel, O., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Ren, J., Qiu, Z., Fan, W., Cheng, H., & Yu, P. S. (2008). Forward semi-supervised feature selection. Proceedings of the 12th Pacific-Asia Conference in Knowledge Discovery and Data Mining.
Rueda, P. M., Casado, J., Paton, R., Quintero, D., Palacios, A., Gila, A.,… & Salmeron J. (2008). Mutations in E2-PePHD, NS5A-PKRBD, NS5A-ISDR, and NS5A-V3 of hepatitis C virus genotype 1 and their relationship to pegylated interferon-ribavirin treatment responses. Journal of Virology, 82, 6644-6653.
Sami, A., & Nagatomi, R. (2008). A new definition and look at DNA motif. Intech.
Sheikhpour, R., Sarram, M. A., Gharaghani, S., & Chahooki, M. A. Z. (2017). A survey on semi-supervised feature selection methods. Pattern recognition, 64.
Vens, C., Rosso, M. N., & Danchin, E. G. J. (2011). Identifying discriminative classification-based motifs in biological sequences. Bioinformatics, 27(9), 1231-1238.
Wu, J., & Xie, J. (2010). Hidden Markov model and its application in motif findings. Statistical Methods in Molecular Biology, 620, 405-416.
Xu, Z., King, I., Lyu, M. R. T., & Jin, R. (2010). Discriminative semi-supervised feature selection via manifold regularization. IEEE Transactions on Neural Networks, 21.
Yoon, J., Lee, J. I., Baik, S. K., Lee, K. H., Sohn, J. Y., Lee, H. W., … & Yeh, B. I. (2007). Predictive factors for interferon and ribavirin combination therapy in patients with chronic hepatitis C. World Journal of Gastroenterology, 13(46), 6236-6242.
Zhao, Z., & Liu, H. (2007). Semi-supervised feature selection via spectral analysis. Proceeding of the 7th SIAM International Conference on Data Mining.
Chayama, K., Tsubota, A., Kobayashi, M., Okamoto, K., Hashimoto, M., Miyano, Y.,… & Kumada, H. (1997). Pretreatment virus load and multiple amino acid substitutions in the interferon sensitivity - determining region predict the outcome of interferon treatment in patients with chronic genotypes 1h hepatitis C virus infection. Journal of Hepatology, 25(3), 745-749.
Chen, X., Nie, F., Yuan, G., & Huang, J. Z. (2017). Semi-supervised feature selection via rescaled linear regression. Proceedings of the 26th International Joint Conference on Artificial Intelligence.
Chin, A., Mirzal, A., Haron, H., & Hamed, H. (2016). Supervised, unsupervised, and semi-supervised feature selection: A review on gene election. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13.
El-Shamy, A., Shoji, I., Saito, T., Watanabe, H., Ide, Y., Deng, L.,… & Hotta, H. (2011). Sequence heterogeneity of NS5A and core proteins of hepatitis C virus and virological responses to pegylated-interferon/ribavirin combination therapy. Microbiology and Immunology, 55, 418-426.
Enomoto, N., Sakuma, N., Asahina, I., Kurosaki, Y., Murakami, M., Yamamoto, T.,… & Chifumi Sato, M. D. (1996). Mutations in nonstructural protein 5A gene and response to interferon in patients with chronic hepatitis C virus 1b infection. The New England Journal of Medicine, 334, 77-81.
Gao, M., Nettles, R. E., Belema, M., Snyder, L. B., Nguyen, V. N., Fridell, R. A.,… & Hamann, L. G. (2010). Chemical genetics strategy identifies an HCV NS5A inhibitor with a potent clinical effect. Nature Letters, 465, 96-100.
Han, J., & Kamber, M. (2006). Data mining concepts and techniques. Diane Cerra.
Kim, J. K., & Choi, S. (2011). Probabilistic models for semi-supervised discriminative motif discovery in DNA sequences. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5).
Lin, T., Murphy, R. F., & Bar-Joseph, Z. (2011). Discriminative motif finding for predicting protein subcellular localization. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(2).
Manns, M., McHutchison, J. G., Gordon, S. C., Rustgi, V. K., Shiffman, M., Reindollar, R.,… & Albrecht, J. K. (2001). Peginterferon alfa-2b plus ribavirin compared with interferon alfa-2b plus ribavirin for initial treatment of chronic hepatitis C: A randomised trial. The Lancet, 358, 985-965.
Pedregosa., F., Varoquaux, G., Gramfort., A., Michel., V., Thirion, B., Grisel, O., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Ren, J., Qiu, Z., Fan, W., Cheng, H., & Yu, P. S. (2008). Forward semi-supervised feature selection. Proceedings of the 12th Pacific-Asia Conference in Knowledge Discovery and Data Mining.
Rueda, P. M., Casado, J., Paton, R., Quintero, D., Palacios, A., Gila, A.,… & Salmeron J. (2008). Mutations in E2-PePHD, NS5A-PKRBD, NS5A-ISDR, and NS5A-V3 of hepatitis C virus genotype 1 and their relationship to pegylated interferon-ribavirin treatment responses. Journal of Virology, 82, 6644-6653.
Sami, A., & Nagatomi, R. (2008). A new definition and look at DNA motif. Intech.
Sheikhpour, R., Sarram, M. A., Gharaghani, S., & Chahooki, M. A. Z. (2017). A survey on semi-supervised feature selection methods. Pattern recognition, 64.
Vens, C., Rosso, M. N., & Danchin, E. G. J. (2011). Identifying discriminative classification-based motifs in biological sequences. Bioinformatics, 27(9), 1231-1238.
Wu, J., & Xie, J. (2010). Hidden Markov model and its application in motif findings. Statistical Methods in Molecular Biology, 620, 405-416.
Xu, Z., King, I., Lyu, M. R. T., & Jin, R. (2010). Discriminative semi-supervised feature selection via manifold regularization. IEEE Transactions on Neural Networks, 21.
Yoon, J., Lee, J. I., Baik, S. K., Lee, K. H., Sohn, J. Y., Lee, H. W., … & Yeh, B. I. (2007). Predictive factors for interferon and ribavirin combination therapy in patients with chronic hepatitis C. World Journal of Gastroenterology, 13(46), 6236-6242.
Zhao, Z., & Liu, H. (2007). Semi-supervised feature selection via spectral analysis. Proceeding of the 7th SIAM International Conference on Data Mining.