Piero P. Bonissone's Research Interests:

Model Ensemble and Fusion

 

 

Timeline

 

 

Papers

 

(2010) [13] P. Bonissone, J. M. Cadenas, M.C. Garrido, R.A. Diaz, A Fuzzy Random Forest, The International Journal of Approximate Reasoning, to appear, doi:10.1016/j.ijar.2010.02.003, 2010.

 

When individual classifiers are combined appropriately, a statistically significant increase in classification accuracy is usually obtained. Multiple classifier systems are the result of combining several individual classifiers. Following Breiman's methodology, in this paper a multiple classifier system based on a forest of fuzzy decision trees, i.e., a Fuzzy Random Forest, is proposed. This approach combines the robustness of multiple classifier systems, the power of the randomness to increase the diversity of the trees, and the flexibility of fuzzy logic and fuzzy sets for imperfect data management. Various combination methods to obtain the final decision of the multiple classifier system are proposed and compared. Some of them are weighted combination methods which make a weighting of the decisions of the different elements of the multiple classifier system (leaves or trees). A comparative study with several datasets is made to show the efficiency of the proposed multiple classifier system and the various combination methods. The proposed multiple classifier system exhibits a good accuracy classification, comparable to that of the best classifiers when tested with conventional data sets. However, unlike other classifiers, the proposed classifier provides a similar accuracy when tested with imperfect datasets (with missing and fuzzy values) and with datasets with noise.

 

(2010) [12] P. Bonissone, J.M. Cadenas, M.C. Garrido, R.A. Diaz, Fundamentals for Design and Construction of a Fuzzy Random Forest, in Foundations of Reasoning under Uncertainty, B. Bouchon-Meunier, Manuel Ojeda, R.R. Yager (eds.), STUDFUZZ 249, pp. 23-42, Springer-Verlag Berlin Heidelberg (2010).

 

Following Breiman's methodology, we propose the fundamentals to design and construct a forest of randomly generated fuzzy decision trees, i.e., a Fuzzy Random Forest. This approach combines the robustness of multi-classifiers, the construction efficiency of decision trees, the power of the randomness to increase the diversity of the trees in the forest, and the flexibility of fuzzy logic and the fuzzy sets for data managing. A prototype for the method has been constructed and we have implemented some specific strategies for inference in the Fuzzy Random Forest. Some experimental results are given.

 

(2009) [11] P. Bonissone, J.M. Cadenas, M.C. Garrido, R.A. Diaz, R. Martinez, Weighted Decisions in a Fuzzy Random Forest, , 2009 IFSA Word Congress, Lisbon, Portugal, July 20-24, 2009 - [GE GR Technical Report, 2000, GRC850, Sept. 2009 (pdf)].

 

A multi-classifier system - obtained by combining several individual classifiers - usually exhibits a better performance (precision) than any of the original classifiers. In this work we use a multi-classifier based on a forest of randomly generated fuzzy decision trees (Fuzzy Random Forest), and we propose a new method to combine their decisions to obtain the final decision of the forest. The proposed combination is a weighted method based on the concept of local fusion and on the data set Out Of Bag (OOB) error.

 

(2008) [10] P. Bonissone, J.M. Cadenas, M.C. Garrido, R.A. Diaz, Combination Methods in Fuzzy Random Forest, SMC 2008, Singapore, Oct. 12-15,2008 [GE GR Technical Report, 2008, GRC738, Oct. 2008 (pdf)].

 

Following Breiman's methodology, we propose a multi-classifier based on a Forest of randomly generated fuzzy decision trees, i.e., a Fuzzy Random Forest. This approach combines the robustness of multi-classifiers, the construction efficiency of decision trees, the power of the randomness to increase the diversity of the trees in the forest, and the flexibility of fuzzy logic and the fuzzy sets for data managing.

 

(2008) [9] P. Bonissone, J.M. Cadenas, M.C. Garrido, R.A. Diaz, A Fuzzy Random Forest: Fundamental for Design and Construction, IPMU 2008, Malaga, Spain, June 22-27, 2008, [GE GR Technical Report, 2008, GRC739, Oct. 2008 (pdf)].

 

Following Breiman's methodology, we propose a multi-classifier based on a Forest of randomly generated fuzzy decision trees, i.e., a Fuzzy Random Forest. This approach combines the robustness of multi-classifiers, the construction efficiency of decision trees, the power of the randomness to increase the diversity of the trees in the forest, and the flexibility of fuzzy logic and the fuzzy sets for data managing.

 

(2008) [8] P. Bonissone, F. Xue, and R. Subbu, Fast Meta-models for Local Fusion of Multiple Predictive Models, Applied Soft Computing Journal, 2008, doi:10.1016/j.asoc.2008.03.006 - [GE GR Technical Report, 2007GRC832, Oct 12, 2007 (pdf)].

 

Fusing the outputs of an ensemble of diverse predictive models usually boosts overall prediction accuracy. Such fusion is guided by each model's local performance, i.e., each model's prediction accuracy in the neighborhood of the probe point. Therefore, for each probe we instantiate a customized fusion mechanism. The fusion mechanism is a meta-model, i.e. a model that operates one level above the object-level models whose predictions we want to fuse. Like these models, such a meta-model is defined by structural and parametric information. In this paper, we focus on the definition of the parametric information for a given structure. For each probe point, we either retrieve or compute the parameters to instantiate the associated meta-model. The retrieval approach is based on a CART-derived segmentation of the probe's state space, which contains the meta-model parameters. The computation approach is based on a runtime evaluation of each model's local performance in the neighborhood of the probe. We explore various structures for the meta-model, and for each structure we compare the precompiled (retrieval) or run-time (computation) approaches. We demonstrate this fusion methodology in the context of multiple neural network models. However, our methodology is broadly applicable to other predictive modeling approaches. This fusion method is illustrated in the development of highly accurate models for emissions, efficiency, and load prediction in a complex power plant. The locally weighted fusion method boosts the predictive performance by 30-50% over the baseline single model approach for the various prediction targets. Relative to this approach, typical fusion strategies that use averaging or globally weighting schemes only produce a 2-6% performance boost over the same baseline.

 

(2006) [7] F. Xue, R. Subbu, P. Bonissone, Locally Weighted Fusion of Multiple Predictive Models, IEEE International Joint Conference on Neural Networks (IJCNN'06), pp. 2137-2143, Vancouver, BC, Canada, July 16 - 21, 2006 (pdf) - [GE GR Technical Report, 2006GRC454, Nov 10, 2006 (pdf)]

 

Fusing the outputs from an ensemble of models in an effective way can often boost overall model accuracy. This paper presents a novel method, called locally weighted fusion, which aggregates the results of multiple predictive models based on local accuracy measures of these models in the neighborhood of the probe point for which we want to make a prediction. While we demonstrate the method in the context of multiple neural network models, the concepts may be applied to other predictive techniques as well. This fusion method is applied to develop highly accurate models for emissions, efficiency, and load prediction in a complex real-world power plant. The locally weighted fusion method boosts the predictive performance by 20-40% over the baseline single model approach for the various prediction targets. Relative to this approach, fusion strategies which apply averaging or globally weighting only produce a 2-6% performance boost over the baseline.

 

(2005) [6] K. Goebel, P. Bonissone, Prognostic Information Fusion for Constant Load Systems, Proc. 7th Annual Conference on Information Fusion, Vol. 2 pp. 1247-1255, 2005 (pdf) - [GE GR Tech. Report, 2005GRC333, Aug 1, 2005 (pdf)].

 

This paper describes a process for aggregating different information sources to estimate remaining equipment life. Specifically, the approach presents a rigorous chain of preprocessing, modeling and post-processing steps that arrive at the desired prognostic result. The preprocessing steps deal with data reduction, filtering, and signature amplification. The prediction model applies ANFIS to the data. The post-processing steps include recursive trending which implicitly forces the prognostic trend to be confirmed before updated estimates are reported. Innovative measures are introduced that help in assessing the performance of the approach. The method is illustrated using real-life data from industrial web paper breakage prediction.

(2005) [5] P. Evangelista, M. Embrechts, P. Bonissone, B. Szymanski, Fuzzy ROC Curves for Unsupervised Nonparametric Ensemble Techniques, IJCNN 2005, Montreal, Canada, 2005 (pdf) - [GE GR Tech. Report, 2005GRC254, Aug 1, 2005 (pdf)].

 

This paper explores a novel ensemble technique for unsupervised classification using nonparametric statistics. Multiple classification systems (MCS), or ensemble techniques, involve considering several classification methods or multiple outputs from the same method and devising techniques to reach a decision. The performance of a binary classification system can be measured on a receiver operating characteristic (ROC) curve, and the area under the curve (AUC) is exactly the Wilcoxon Rank Sum or Mann-Whitney U statistic, both of which are nonparametric statistics based upon ranked data. Successful performance of an unsupervised ensemble can be measured through the AUC, and the performance of different aggregation techniques for the combination of the multiple classification system decision values, or rankings in this paper, is illustrated. Aggregation techniques are based upon fuzzy logic theory, creating the fuzzy ROC curve. The one-class SVM is utilized for the unsupervised classification

 

(2005) [4] P. Evangelista, P. Bonissone, M. Embrechts, B. K. Szymanski, Unsupervised Fuzzy Ensembles Applied to Intrusion Detection, Proc. 13th European Symposium on Artificial Neural Networks 2005, pp. 345-350, Bruges, Belgium, April 27-29, 2005 (pdf)

 

This paper proposes a novel method for unsupervised ensembles that specifically addresses unbalanced, unsupervised, binary classification problems. Unsupervised learning often experiences the curse of dimensionality; however subspace modeling can overcome this problem. For each subspace created, the classifier produces a decision value. The aggregation of the decision values occurs through the use of fuzzy logic, creating the fuzzy ROC curve. The one-class SVM is utilized for unsupervised classification. The primary source of data for this research is a host based computer intrusion detection dataset.

 

(2005) [3] P. Bonissone, N. Eklund, K. Goebel, Using an Ensemble of Classifiers to Audit a Production Classifier, 6th International Workshop on Multiple Classifier Systems (MCS 2005), pp. 376-386, Monterey, CA, June 13 -1, 2005 (pdf)

 

After deploying a classifier in production it is essential to support its lifecycle. This paper describes the application of an ensemble of classifiers to support two stages of the lifecycle of an on-line classifier used to underwrite life insurance applications: the monitoring of its decisions quality and the updating of the production classifier over time. All combinations of five classification methods and seven fusion methods were assessed from the perspective of accuracy and pairwise diversity of the classifiers, and accuracy, precision, and coverage of the fused classifiers. The proposed architecture consists of three offline classifiers and a fusion module.

 

(2004) [2] P. Bonissone, Automating the Quality Assurance of an On-line Knowledge-Based Classifier By Fusing Multiple Off-line Classifiers, Proc. Conference on Information Processing and Management of Uncertainty (IPMU) 2004, Perugia, Italy, July 2004 (pdf) - [GE GR Tech. Report, 2004GRC134, Apr 28, 2004 (pdf)]

 

We address two problems in the lifecycle of a production classifier: the monitoring of its decisions quality and the updating of the classifier over time. The proposed architecture consists of four off-line classifiers and an associative fusion module. The fusion is a T-norm based outer-product of the classifiers' normalized outputs. By attaching a confidence measure to each output of the fusion, we generate a distribution of the production classifier's quality. The lower tail of this distribution identifies the least reliable cases, which become candidates for auditing and manual QA. The upper tail identifies the most reliable cases, which become candidates for updating the standard reference data set used to design and tune the production classifier. We illustrate this approach with an insurance underwriting problem.

 

(2004) [1] P. Bonissone, K. Goebel, and W. Yan, Classifier Fusion using Triangular Norms, Proc. Multiple Classifier Systems (MCS) 2004, pp. 154-163, Cagliari, Italy, June 2004 (pdf) - [GE GR Tech. Report, 2006GRC143, Feb 21, 2006 (pdf)]

 

This paper describes a method for fusing a collection of classifiers where the fusion can compensate for some positive correlation among the classifiers. Specifically, it does not require the assumption of evidential independence of the classifiers to be fused (such as Dempster Shafer's fusion rule). The proposed method is associative, which allows fusing three or more classifiers irrespective of the order. The fusion is accomplished using a generalized intersection operator (T-norm) that better represents the possible correlation between the classifiers. In addition, a confidence measure is produced that takes advantage of the consensus and conflict between classifiers.

 

Patents

 

(2009) System And Method For Equipment Life Estimation, K. Goebel, P. Bonissone, W. Yan, N. Eklund, F. Xue, US Patent 7,548,830 (June 16, 2009)

 

A method to reduce uncertainty bounds of predicting a remaining life of a probe using a set of diverse models is disclosed. The method includes generating an estimated remaining life output by each model of the set of diverse models, aggregating each of the respective estimated remaining life outputs via a fusion model, and in response to the aggregating, predicting the remaining life, the predicting having reduced uncertainty bounds based on the aggregating. The method further includes generating a signal corresponding to the predicted remaining life of the probe.

 

(2008) System and process for a fusion classification for insurance underwriting suitable for use by an automated system, P. Bonissone, K. Aggour, R. Subbu, W. Yan, N. Iyer, A. Chakraborty, US Patent No. 7,383,239 (Jun 8, 2008).

 

A method and system for fusing a collection of classifiers used for an automated insurance underwriting system and/or its quality assurance is described. Specifically, the outputs of a collection of classifiers are fused. The fusion of the data will typically result in some amount of consensus and some amount of conflict among the classifiers. The consensus will be measured and used to estimate a degree of confidence in the fused decisions. Based on the decision and degree of confidence of the fusion and the decision and degree of confidence of the production decision engine, a comparison module may then be used to identify cases for audit, cases for augmenting the training/test sets for re-tuning production decision engine, cases for review, or may simply trigger a record of its occurrence for tracking purposes. The fusion can compensate for the potential correlation among the classifiers. The reliability of each classifier can be represented by a static or dynamic discounting factor, which will reflect the expected accuracy of the classifier. A static discounting factor is used to represent a prior expectation about the classifier's reliability, e.g., it might be based on the average past accuracy of the model, while a dynamic discounting is used to represent a conditional assessment of the classifier's reliability, e.g., whenever a classifier bases its output on an insufficient number of points it is not reliable.

 

(2004) Fusion classification for risk categorization in underwriting a financial risk instrument, R. Messmer, P. Bonissone, K. Aggour, R. Subbu, W. Yan, N. Iyer, PUB_20040225587 filed April 23, 2003, published November 11, 2004. (WO2004099946 )

 

A system, process and computer program product for underwriting a financial risk instrument application represented by at least one risk attribute is provided. Decision engines examine the at least one risk attribute associated with the financial risk instrument application and assign the application to one of a predetermined set of risk classes. A fusion engine compares the risk classes assigned by each of the decision engines and fuses the assigned risk classes into an aggregated result representative of the risk of the financial risk instrument application. The fusion engine includes a first multi-classifier fusion module that uses an associative function to fuse the assigned risk classes into a first aggregated result and a second multi-classifier fusion that uses a non-associative function to fuse the assigned risk classes into a second aggregated result. A comparison engine selects one of the first aggregated result generated from the first multi-classifier fusion module and the second aggregated result generated from the second multi-classifier fusion module and compares it with a production result generated from the production decision engine. The comparison engine generates an underwriting decision for the financial risk instrument application according to the comparison.

 

(2003) Methods and systems for automated property valuation, P. Khedkar, P. Bonissone, and D. Golibersuch, US Patent No. 6,609,118 (Aug. 19, 2003)

 

The present invention is a method and system for automating the process for valuing a property that produces an estimated value of a subject property, and a quality assessment of the estimated value, that is based on the fusion of multiple processes for valuing a property. In one embodiment, three processes for valuing a subject property are fused. The first process, called LOCVAL, uses the location and living area to provide an estimate of the subject property's value. The second process, called AIGEN, is a generative artificial intelligence method that trains a fuzzy-neural network using a subset of cases from a case-base, and produces a run-time system to provide an estimate of the subject property's value. The third process, called AICOMP, uses a case based reasoning process similar to the sales comparison approach to determine an estimate of the subject property's value.

 

Supervised MS - PhD Theses

 

-

 

Projects

 

  • Optimal Management of Coal-Fired Boilers for Power Generation (GE Energy) [2003-07]
  • Automated Term-Life and Long Term Care Insurance Underwriting (GE Financial Assurance, now Genworth Financial). [2000-2003]
  • Paper web breakage prediction (GE Industrial Systems / GE Trading) [1994-1998]
  • Automated Mortgage Collateral Evaluation (GE Mortgages) [1994-1995]

 

 

Author: Piero P. Bonissone - Email: bonissone@crd.ge.com

 

 
|Bonissone Home Page| GE Research Computer and Decision Sciences
|
|General Electric Global Research | General Electric Co. |