{"@context":"http://iiif.io/api/presentation/2/context.json","@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/manifest.json","@type":"sc:Manifest","label":"ROC Random Forest and Its Application","metadata":[{"label":"dc.description.sponsorship","value":"This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree."},{"label":"dc.format","value":"Monograph"},{"label":"dc.format.medium","value":"Electronic Resource"},{"label":"dc.identifier.uri","value":"http://hdl.handle.net/11401/77477"},{"label":"dc.language.iso","value":"en_US"},{"label":"dc.publisher","value":"The Graduate School, Stony Brook University: Stony Brook, NY."},{"label":"dcterms.abstract","value":"Classification algorithms that optimize the overall accuracy or class distribution purity often suffer from difficulties in classifying class imbalanced data, in which most cases in the testing set will be classified to the majority class. However for imbalanced data classification, one usually cares more about the accuracy for identifying the minority class (e.g. diseased samples), that is, the sensitivity, other than the overall accuracy and therefore low sensitivity is highly undesirable. Receiver operating characteristic (ROC) is a 2 dimensional graph by plotting sensitivity versus specificity, i.e., accuracy in identifying the majority class (e.g. normal samples). A curve is formed by varying the decision threshold and the area under ROC (AUC) is employed as an accuracy measurement to evaluate the performance of classification. Random Forest, a modern ensemble classifier, is gaining increasing attention in the community because of its good classification capability. Each single learner is a decision tree, built on a bagging data with each node split based on a randomly selected feature subset. As a result, each base learner is relatively " independent" to the others and thus the ensemble's classification accuracy improves overall. In this dissertation, we combine the ROC analysis and the Random Forest to establish the proposed ROC Random Forest algorithm. There are two goals to this algorithm: (1) improving the AUC value, and (2) producing balanced classification result. Verification was carried out using 18 public data sets from the UCI and the results show that the ROC Random Forest not only improves the classification accuracy in terms of higher AUC value but also delivers a more balanced classification result comparing to other Random Forest settings. One draw-back of the ROC Random Forest lies in its difficulty in processing categorical predictors. Given the importance of categorical predictors in many classification problems, we have further combined the ROC Random Forest with optimal node splitting algorithms other than ROC for categorical predictors. The resulting Hybrid ROC Random Forest is further evaluated on 8 UCI data sets."},{"label":"dcterms.available","value":"2017-09-20T16:52:46Z"},{"label":"dcterms.contributor","value":"Gao, Yi"},{"label":"dcterms.creator","value":"Song, Bowen"},{"label":"dcterms.dateAccepted","value":"2017-09-20T16:52:46Z"},{"label":"dcterms.dateSubmitted","value":"2017-09-20T16:52:46Z"},{"label":"dcterms.description","value":"Department of Applied Mathematics and Statistics."},{"label":"dcterms.extent","value":"139 pg."},{"label":"dcterms.format","value":"Application/PDF"},{"label":"dcterms.identifier","value":"http://hdl.handle.net/11401/77477"},{"label":"dcterms.issued","value":"2015-05-01"},{"label":"dcterms.language","value":"en_US"},{"label":"dcterms.provenance","value":"Made available in DSpace on 2017-09-20T16:52:46Z (GMT). No. of bitstreams: 1\nSong_grad.sunysb_0771E_12222.pdf: 2581679 bytes, checksum: 9bc61738410d361cce431a75030e339e (MD5)\n Previous issue date: 2015"},{"label":"dcterms.publisher","value":"The Graduate School, Stony Brook University: Stony Brook, NY."},{"label":"dcterms.subject","value":"classification, random forest, ROC analysis, supervised learning"},{"label":"dcterms.title","value":"ROC Random Forest and Its Application"},{"label":"dcterms.type","value":"Dissertation"},{"label":"dc.type","value":"Dissertation"}],"description":"This manifest was generated dynamically","viewingDirection":"left-to-right","sequences":[{"@type":"sc:Sequence","canvases":[{"@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/canvas/page-1.json","@type":"sc:Canvas","label":"Page 1","height":1650,"width":1275,"images":[{"@type":"oa:Annotation","motivation":"sc:painting","resource":{"@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/23%2F34%2F26%2F23342669900201000237574185649107834810/full/full/0/default.jpg","@type":"dctypes:Image","format":"image/jpeg","height":1650,"width":1275,"service":{"@context":"http://iiif.io/api/image/2/context.json","@id":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/23%2F34%2F26%2F23342669900201000237574185649107834810","profile":"http://iiif.io/api/image/2/level2.json"}},"on":"https://repo.library.stonybrook.edu/cantaloupe/iiif/2/canvas/page-1.json"}]}]}]}