Marco Zaffalon

Professor
Istituto "Dalle Molle" di Studi sull'Intelligenza Artificiale (IDSIA)
Galleria 2
CH-6928 Manno (Lugano), Switzerland
Tel.: +41 (0)58 666 666 5
Fax: +41 (0)58 666 666 1
E-mail: .

(Last update: 9 April 2014.)

Biographical note

Papers

Edited books

Edited journal issues



Probabilistic graphical models

Authors: Alessandro Antonucci, Cassio P. de Campos and Marco Zaffalon.
Year: 2014.

This report presents probabilistic graphical models that are based on imprecise probabilities using a simplified language. In particular, the discussion is focused on credal networks and discrete domains. It describes the building blocks of credal networks, algorithms to perform inference, and discusses on complexity results and related work. The goal is to have an easy-to-follow introduction to the topic.

Published in Augustin, T., Coolen, F., de Cooman, G., Troffaes, M. (Eds), Introduction to Imprecise Probabilities, Wiley, chapter 9.

A version similar to the published paper can be downloaded.


Classification

Authors: Giorgio Corani, Joaquín Abellán, Andrés Masegosa, Serafín Moral and Marco Zaffalon.
Year: 2014.

This report presents an introduction to credal classification. The discussion focuses on the naive credal classifier and its extensions as well as on credal classifiers based on classification trees. In addition we discuss the metrics suitable for scoring credal classifiers and present some experiments. The goal is to have an easy-to-follow introduction to the topic.

Published in Augustin, T., Coolen, F., de Cooman, G., Troffaes, M. (Eds), Introduction to Imprecise Probabilities, Wiley, chapter 10.

A version similar to the published paper can be downloaded.


On the complexity of solving polytree-shaped limited memory influence diagrams with binary variables

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2013.

Influence diagrams are intuitive and concise representations of structured decision problems. When the problem is non-Markovian, an optimal strategy can be exponentially large in the size of the diagram. We can avoid the inherent intractability by constraining the size of admissible strategies, giving rise to limited memory influence diagrams. A valuable question is then how small do strategies need to be to enable efficient optimal planning. Arguably, the smallest strategies one can conceive simply prescribe an action for each time step, without considering past decisions or observations. Previous work has shown that finding such optimal strategies even for polytree-shaped diagrams with ternary variables and a single value node is NP-hard, but the case of binary variables was left open. In this paper we address such a case, by first noting that optimal strategies can be obtained in polynomial time for polytree-shaped diagrams with binary variables and a single value node. We then show that the same problem is NP-hard if the diagram has multiple value nodes. These two results close the fixed-parameter complexity analysis of optimal strategy selection in influence diagrams parametrized by the shape of the diagram, the number of value nodes and the maximum variable cardinality.

Published in Artificial Intelligence 205, 3038.

A version similar to the published paper can be downloaded.


Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices

Authors: Cassio Polpo de Campos, Paola Maria Vittoria Rancoita, Ivo Kwee, Emanuele Zucca, Marco Zaffalon and Francesco Bertoni.
Year: 2013.

In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.

Published in PLoS ONE 8(11), e79720.

A version similar to the published paper can be downloaded.


CREDO: a military decision-support system based on credal networks

Authors: Alessandro Antonucci, David Huber, Marco Zaffalon, Philippe Luginbühl, Ian Chapman and Richard Ladouceur.
Year: 2013.

Abstract: A software tool especially designed for military domains to create and query decision-support systems is presented. Credal networks, which are Bayesian networks whose parameters have the freedom to vary in convex sets, are used to model the relations among the system variables. A novel elicitation procedure of these sets, which allows the military experts to report their knowledge by purely qualitative judgements, is proposed. Two high-level fusion procedures to cope with multiple experts in this framework are also derived. All these features are supported by the software and demonstrated in an application to space security tested during the last NATO multinational experiment.

Published in FUSION 2013: Proceedings of the 16th Conference on Information Fusion.

A version similar to the published paper can be downloaded.


Computing the conglomerable natural extension

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2013.

Abstract: Given a coherent lower prevision P, we consider the problem of computing the smallest coherent lower prevision FP that is conglomerable, in case it exists. F is called the conglomerable natural extension. Past work has showed that F can be approximated by an increasing sequence (En)n∈ℕ of coherent lower previsions. We close an open problem by showing that this sequence can be made of infinitely many distinct elements. Moreover, we give sufficient conditions, of quite broad applicability, to make sure that the point-wise limit of the sequence is F in case P is the lower envelope of finitely many linear previsions. In addition, we study the question of the existence of F and its relationship with the notion of marginal extension.

Published in Cozman, F., Denoeux, T., Destercke, S., Seidenfeld, T. (Eds), ISIPTA '13: Proceedings of the Eighth International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 255264.

A version similar to the published paper can be downloaded.


Conglomerable coherence

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2013.

We contrast Williams' and Walley's theories of coherent lower previsions in the light of conglomerability. These are two of the most credited approaches to a behavioural theory of imprecise probability. Conglomerability is the notion that distinguishes them the most: Williams' theory does not consider it, while Walley aims at embedding it in his theory. This question is important, as conglomerability is a major point of disagreement at the foundations of probability, since it was first defined by de Finetti in 1930. We show that Walley's notion of joint coherence (which is the single axiom of his theory) for conditional lower previsions does not take all the implications of conglomerability into account. Considered also some previous results in the literature, we deduce that Williams' theory should be the one to use when conglomerability is not required; for the opposite case, we define the new theory of conglomerably coherent lower previsions, which is arguably the one to use, and of which Walley's theory can be understood as an approximation. We show that this approximation is exact in two important cases: when all conditioning events have positive lower probability, and when conditioning partitions are nested.

Published in International Journal of Approximate Reasoning 54(9), 13221350.

A version similar to the published paper can be downloaded.


Approximating credal network inferences by linear programming

Authors: Alessandro Antonucci, Cassio Polpo de Campos, David Huber and Marco Zaffalon.
Year: 2013.

An algorithm for approximate credal network updating is presented. The problem in its general formulation is a multilinear optimization task, which can be linearized by an appropriate rule for fixing all the local models apart from those of a single variable. This simple idea can be iterated and quickly leads to very accurate inferences. The approach can also be specialized to classification with credal networks based on the maximality criterion. A complexity analysis for both the problem and the algorithm is reported together with numerical experiments, which confirm the good performance of the method. While the inner approximation produced by the algorithm gives rise to a classifier which might return a subset of the optimal class set, preliminary empirical results suggest that the accuracy of the optimal class set is seldom affected by the approximate probabilities.

Published in van der Gaag, L. C. (Ed.), ECSQARU 2013: Proceedings of the 12th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Lecture Notes in Computer Science 7958, Springer, 1325. Best paper award.

A version similar to the published paper can be downloaded.


Probability and time

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2013.

Probabilistic reasoning is often attributed a temporal meaning, in which conditioning is regarded as a normative rule to compute future beliefs out of current beliefs and observations. However, the well-established 'updating interpretation' of conditioning is not concerned with beliefs that evolve in time, and in particular with future beliefs. On the other hand, a temporal justification of conditioning was proposed already by De Moivre and Bayes, by requiring that current and future beliefs be consistent. We reconsider the latter approach while dealing with a generalised version of the problem, using a behavioural theory of imprecise probability in the form of coherent lower previsions as well as of coherent sets of desirable gambles, and letting the possibility space be finite or infinite. We obtain that using conditioning is normative, in the imprecise case, only if one establishes future behavioural commitments at the same time of current beliefs. In this case it is also normative that present beliefs be conglomerable, which is a result that touches on a long-term controversy at the foundations of probability. In the remaining case, where one commits to some future behaviour after establishing present beliefs, we characterise the several possibilities to define consistent future assessments; this shows in particular that temporal consistency does not preclude changes of mind. And yet, our analysis does not support that rationality requires consistency in general, even though pursuing consistency makes sense and is useful, at least as a way to guide and evaluate the assessment process. These considerations narrow down in the special case of precise probability, because this formalism cannot distinguish the two different situations illustrated above: it turns out that the only consistent rule is conditioning and moreover that it is not rational to be willing to stick to precise probability while using a rule different from conditioning to compute future beliefs; rationality requires in addition the disintegrability of the present-time probability.

Published in Artificial Intelligence 198, 151.

A version similar to the published paper can be downloaded.


Density-ratio robustness in dynamic state estimation

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2013.

The filtering problem is addressed by taking into account imprecision in the knowledge about the probabilistic relationships involved. Imprecision is modelled in this paper by a particular closed convex set of probabilities that is known with the name of density ratio class or constant odds-ratio (COR) model. The contributions of this paper are the following. First, we shall define an optimality criterion based on the squared-loss function for the estimates derived from a general closed convex set of distributions. Second, after revising the properties of the density ratio class in the context of parametric estimation, we shall extend these properties to state estimation accounting for system dynamics. Furthermore, for the case in which the nominal density of the COR model is a multivariate Gaussian, we shall derive closed-form solutions for the set of optimal estimates and for the credible region. Third, we discuss how to perform Monte Carlo integrations to compute lower and upper expectations from a COR set of densities. Then we shall derive a procedure that, employing Monte Carlo sampling techniques, allows us to propagate in time both the lower and upper state expectation functionals and, thus, to derive an efficient solution of the filtering problem. Finally, we empirically compare the proposed estimator with the Kalman filter. This shows that our solution is more robust to the presence of modelling errors in the system and that, hence, appears to be a more realistic approach than the Kalman filter in such a case.

Published in Mechanical Systems and Signal Processing 37, 5475.

A version similar to the published paper can be downloaded.


The complexity of approximately solving influence diagrams

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2012.

Influence diagrams allow for intuitive and yet precise description of complex situations involving decision making under uncertainty. Unfortunately, most of the problems described by influence diagrams are hard to solve. In this paper we discuss the complexity of approximately solving influence diagrams. We do not assume no-forgetting or regularity, which makes the class of problems we address very broad. Remarkably, we show that when both the treewidth and the cardinality of the variables are bounded the problem admits a fully polynomial-time approximation scheme.

Published in de Freitas, N., Murphy, K. P. (Eds), UAI-2012: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, 604613.

A version similar to the published paper can be downloaded.


Updating credal networks is approximable in polynomial time

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2012.

Credal networks relax the precise probability requirement of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper, we present a new variable elimination algorithm for exactly computing posterior inferences in extensively specified credal networks, which is empirically shown to outperform a state-of-the-art algorithm. The algorithm is then turned into a provably good approximation scheme, that is, a procedure that for any input is guaranteed to return a solution not worse than the optimum by a given factor. Remarkably, we show that when the networks have bounded treewidth and bounded number of states per variable the approximation algorithm runs in time polynomial in the input size and in the inverse of the error factor, thus being the first known fully polynomial-time approximation scheme for inference in credal networks.

Published in International Journal of Approximate Reasoning 53(8), 11831199.

A version similar to the published paper can be downloaded.


Conglomerable natural extension

Authors: Enrique Miranda, Marco Zaffalon and Gert de Cooman.
Year: 2012.

At the foundations of probability theory lies a question that has been open since de Finetti framed it in 1930: whether or not an uncertainty model should be required to be conglomerable. Conglomerability is related to accepting infinitely many conditional bets. Walley is one of the authors who have argued in favor of conglomerability, while de Finetti rejected the idea. In this paper we study the extension of the conglomerability condition to two types of uncertainty models that are more general than the ones envisaged by de Finetti: sets of desirable gambles and coherent lower previsions. We focus in particular on the weakest (i.e., the least-committal) of those extensions, which we call the conglomerable natural extension. The weakest extension that does not take conglomerability into account is simply called the natural extension. We show that taking the natural extension of assessments after imposing conglomerability—the procedure adopted in Walley's theory—does not yield, in general, the conglomerable natural extension (but it does so in the case of the marginal extension). Iterating this process of imposing conglomerability and taking the natural extension produces a sequence of models that approach the conglomerable natural extension, although it is not known, at this point, whether this sequence converges to it. We give sufficient conditions for this to happen in some special cases, and study the differences between working with coherent sets of desirable gambles and coherent lower previsions. Our results indicate that it is necessary to rethink the foundations of Walley's theory of coherent lower previsions for infinite partitions of conditioning events.

Published in International Journal of Approximate Reasoning 53(8), 12001227.

A version similar to the published paper can be downloaded.


Evaluating credal classifiers by utility-discounted predictive accuracy

Authors: Marco Zaffalon, Giorgio Corani and Denis D. Mauá.
Year: 2012.

Predictions made by imprecise-probability models are often indeterminate (that is, set-valued). Measuring the quality of an indeterminate prediction by a single number is important to fairly compare different models, but a principled approach to this problem is currently missing. In this paper we derive, from a set of assumptions, a metric to evaluate the predictions of credal classifiers. These are supervised learning models that issue set-valued predictions. The metric turns out to be made of an objective component, and another that is related to the decision-maker's degree of risk aversion to the variability of predictions. We discuss when the measure can be rendered independent of such a degree, and provide insights as to how the comparison of classifiers based on the new measure changes with the number of predictions to be made. Finally, we make extensive empirical tests of credal, as well as precise, classifiers by using the new metric. This shows the practical usefulness of the metric, while yielding a first insightful and extensive comparison of credal classifiers.

Published in International Journal of Approximate Reasoning 53(8), 12821301.

A version similar to the published paper can be downloaded.


Conglomerable coherent lower previsions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2013.

Walley's theory of coherent lower previsions builds upon the former theory by Williams with the explicit aim to make it deal with conglomerability. We show that such a construction has been only partly successful because Walley's founding axiom of joint coherence does not entirely capture the implications of conglomerability. As a way to fully achieve Walley's original aim, we propose then the new theory of conglomerable coherent lower previsions. We show that Walley's theory coincides with ours when all conditioning events have positive lower probability, or when conditioning partitions are nested.

Published in Kruse, R. Berthold, M. R., Moewes, C., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), Synergies of Soft Computing and Statistics for Intelligent Data Analysis (SMPS 2012: Proceedings of the 6th international conference on Soft Methods in Probability and Statistics), Advances in Intelligent and Soft Computing 190, Springer Berlin Heidelberg, 419427.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Conglomerable coherence.


Solving limited memory influence diagrams

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2012.

We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables and 1064 solutions. We show that these problems are NP-hard even if the underlying graph structure of the problem has low treewidth and the variables take on a bounded number of states, and that they admit no provably good approximation if variables can take on an arbitrary number of states.

Published in Journal of Artificial Intelligence Research 44, 97140.

The published paper can be downloaded.


A model of prior ignorance for inferences in the one-parameter exponential family

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2012.

This paper proposes a model of prior ignorance about a scalar variable based on a set of distributions M. In particular, a set of minimal properties that a set M of distributions should satisfy to be a model of prior ignorance without producing vacuous inferences is defined. In the case the likelihood model corresponds to a one-parameter exponential family of distributions, it is shown that the above minimal properties are equivalent to a special choice of the domains for the parameters of the conjugate exponential prior. This makes it possible to define the largest (that is, the least-committal) set of conjugate priors M that satisfies the above properties. The obtained set M is a model of prior ignorance with respect to the functions (queries) that are commonly used for statistical inferences; it is easy to elicit and, because of conjugacy, tractable; it encompasses frequentist and the so-called objective Bayesian inferences with improper priors. An application of the model to a problem of inference with count data is presented.

Published in Journal of Statistical Planning and Inference 142(7), 19601979.

A version similar to the published paper can be downloaded.


Bayesian networks with imprecise probabilities: theory and application to classification

Authors: Giorgio Corani, Alessandro Antonucci and Marco Zaffalon.
Year: 2012.

Abstract: Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided for quantification. This requirement is sometimes too narrow. Sets instead of single distributions can provide a more realistic description in these cases. Bayesian networks can be generalized to cope with sets of distributions. This leads to a novel class of imprecise probabilistic graphical models, called credal networks. In particular, classifiers based on Bayesian networks are generalized to so-called credal classifiers. Unlike Bayesian classifiers, which always detect a single class as the one maximizing the posterior class probability, a credal classifier may eventually be unable to discriminate a single class. In other words, if the available information is not sufficient, credal classifiers allow for indecision between two or more classes, this providing a less informative but more robust conclusion than Bayesian classifiers.

Published in Holmes, D. E., Jain, L. C. (Eds), Data Mining: Foundations and Intelligent Paradigms (Volume 1: Clustering, Association and Classification), Springer-Verlag, Berlin, 4993.

A version similar to the published paper can be downloaded.


A discussion on learning and prior ignorance for sets of priors in the one-parameter exponential family

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2011.

Abstract: For a conjugate likelihood-prior model in the one-parameter exponential family of distributions, we show that, by letting the parameters of the conjugate exponential prior vary in suitable sets, it is possible to define a set of conjugate priors M that guarantees prior near-ignorance without producing vacuous inferences. This result is obtained following both a behavioural and a sensitivity analysis interpretation of prior near-ignorance. We also discuss the problem of the incompatibility of learning and prior near-ignorance for sets of priors in the one-parameter exponential family of distributions in the case of imperfect observations. In particular, we prove that learning and prior near-ignorance are compatible under an imperfect observation mechanism if and only if the support of the priors in M is the whole real axis.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11: Proceedings of the Seventh International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 6978.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is A model of prior ignorance for inferences in the one-parameter exponential family.


A fully polynomial time approximation scheme for updating credal networks of bounded treewidth and number of variable states

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2011.

Abstract: Credal networks lift the precise probability assumption of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper we present a new algorithm which is an extension of the well-known variable elimination algorithm for computing posterior inferences in extensively specified credal networks. The algorithm efficiency is empirically shown to outperform a state-of-the-art algorithm. We then provide the first fully polynomial time approximation scheme for inference in credal networks with bounded treewidth and number of states per variable.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11: Proceedings of the Seventh International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 277286.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Updating credal networks is approximable in polynomial time.


Conglomerable natural extension

Authors: Enrique Miranda, Marco Zaffalon and Gert de Cooman.
Year: 2011.

Abstract: We study the weakest conglomerable model that is implied by desirability or probability assessments: the conglomerable natural extension. We show that taking the natural extension of the assessments while imposing conglomerability—the procedure adopted in Walley's theory—does not yield, in general, the conglomerable natural extension (but it does so in the case of the marginal extension). Iterating this process produces a sequence of models that approach the conglomerable natural extension, although it is not known, at this point, whether it is attained in the limit. We give sufficient conditions for this to happen in some special cases, and study the differences between working with coherent sets of desirable gambles and coherent lower previsions. Our results indicate that it might be necessary to re-think the foundations of Walley's theory of coherent conditional lower previsions for infinite partitions of conditioning events.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11: Proceedings of the Seventh International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 287296.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Conglomerable natural extension.


Utility-based accuracy measures to empirically evaluate credal classifiers

Authors: Marco Zaffalon, Giorgio Corani and Denis D. Mauá.
Year: 2011.

Abstract: Predictions made by imprecise-probability models are often indeterminate (that is, set-valued). Measuring the quality of an indeterminate prediction by a single number is important to fairly compare different models, but a principled approach to this problem is currently missing. In this paper we derive a measure to evaluate the predictions of credal classifiers from a set of assumptions. The measure turns out to be made of an objective component, and another that is related to the decision-maker's degree of risk-aversion. We discuss when the measure can be rendered independent of such a degree, and provide insights as to how the comparison of classifiers based on the new measure change with the number of predictions to be made. Finally, we empirically study the behavior of the proposed measure.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11: Proceedings of the Seventh International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 401410.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Evaluating credal classifiers by utility-discounted predictive accuracy.


Independent natural extension

Authors: Gert de Cooman, Enrique Miranda and Marco Zaffalon.
Year: 2011.

Abstract: There is no unique extension of the standard notion of probabilistic independence to the case where probabilities are indeterminate or imprecisely specified. Epistemic independence is an extension that formalises the intuitive idea of mutual irrelevance between different sources of information. This gives epistemic independence very wide scope as well as appeal: this interpretation of independence is often taken as natural also in precise-probabilistic contexts. Nevertheless, epistemic independence has received little attention so far. This paper develops the foundations of this notion for variables assuming values in finite spaces. We define (epistemically) independent products of marginals (or possibly conditionals) and show that there always is a unique least-committal such independent product, which we call the independent natural extension. We supply an explicit formula for it, and study some of its properties, such as associativity, marginalisation and external additivity, which are basic tools to work with the independent natural extension. Additionally, we consider a number of ways in which the standard factorisation formula for independence can be generalised to an imprecise-probabilistic context. We show, under some mild conditions, that when the focus is on least-committal models, using the independent natural extension is equivalent to imposing a so-called strong factorisation property. This is an important outcome for applications as it gives a simple tool to make sure that inferences are consistent with epistemic independence judgements. We discuss the potential of our results for applications in Artificial Intelligence by recalling recent work by some of us, where the independent natural extension was applied to graphical models. It has allowed, for the first time, the development of an exact linear-time algorithm for the imprecise probability updating of credal trees.

Published in Artificial Intelligence 175, 19111950.

A version similar to the published paper can be downloaded.


Robust filtering through coherent lower previsions

Authors: Alessio Benavoli, Marco Zaffalon and Enrique Miranda.
Year: 2011.

Abstract: The classical filtering problem is re-examined to take into account imprecision in the knowledge about the probabilistic relationships involved. Imprecision is modeled in this paper by closed convex sets of probabilities. We derive a solution of the state estimation problem under such a framework that is very general: it can deal with any closed convex set of probability distributions used to characterize uncertainty in the prior, likelihood, and state transition models. This is made possible by formulating the theory directly in terms of coherent lower previsions, that is, of the lower envelopes of the expectations obtained from the set of distributions. The general solution is specialized to two particular classes of coherent lower previsions. The first consists of a family of Gaussian distributions whose means are only known to belong to an interval. The second is the so-called linear-vacuous mixture model, which is a family made of convex combinations of a known nominal distribution (e.g., a Gaussian) with arbitrary distributions. For the latter case, we empirically compare the proposed estimator with the Kalman filter. This shows that our solution is more robust to the presence of modelling errors in the system and that, hence, appears to be a more realistic approach than the Kalman filter in such a case.

Published in IEEE Transactions on Automatic Control 56(7), 15671581.

A version similar to the published paper can be downloaded.


Notes on desirability and conditional lower previsions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We detail the relationship between sets of desirable gambles and conditional lower previsions. The former is one the most general models of uncertainty. The latter corresponds to Walley's celebrated theory of imprecise probability. We consider two avenues: when a collection of conditional lower previsions is derived from a set of desirable gambles, and its converse. In either case, we relate the properties of the derived model with those of the originating one. Our results constitute basic tools to move from one formalism to the other, and thus to take advantage of work done in the two fronts.

Published in Annals of Mathematics and Artificial Intelligence 60(34), 251309.

A version similar to the published paper can be downloaded.


Epistemic irrelevance in credal nets: the case of imprecise Markov trees

Authors: Gert de Cooman, Filip Hermans, Alessandro Antonucci and Marco Zaffalon.
Year: 2010.

Abstract: We focus on credal nets, which are graphical models that generalise Bayesian nets to imprecise probability. We replace the notion of strong independence commonly used in credal nets with the weaker notion of epistemic irrelevance, which is arguably more suited for a behavioural theory of probability. Focusing on directed trees, we show how to combine the given local uncertainty models in the nodes of the graph into a global model, and we use this to construct and justify an exact message-passing algorithm that computes updated beliefs for a variable in the tree. The algorithm, which is linear in the number of nodes, is formulated entirely in terms of coherent lower previsions, and is shown to satisfy a number of rationality requirements. We supply examples of the algorithm's operation, and report an application to on-line character recognition that illustrates the advantages of our approach for prediction. We comment on the perspectives, opened by the availability, for the first time, of a truly efficient algorithm based on epistemic irrelevance.

Published in International Journal of Approximate Reasoning 51(9), 10291052.

A version similar to the published paper can be downloaded.


Inference and risk measurement with the pari-mutuel model

Authors: Renato Pelessoni, Paolo Vicig and Marco Zaffalon.
Year: 2010.

Abstract: We explore generalizations of the pari-mutuel model (PMM), a formalization of an intuitive way of assessing an upper probability from a precise one. We discuss a naive extension of the PMM considered in insurance, compare the PMM with a related model, the Total Variation Model, and generalize the natural extension of the PMM introduced by P. Walley and other pertained formulae. The results are subsequently given a risk measurement interpretation: in particular it is shown that a known risk measure, Tail Value at Risk (TVaR), is derived from the PMM, and a coherent risk measure more general than TVaR from its imprecise version. We analyze further the conditions for coherence of a related risk measure, Conditional Tail Expectation. Conditioning with the PMM is investigated too, computing its natural extension, characterising its dilation and studying the weaker concept of imprecision increase.

Published in International Journal of Approximate Reasoning 51(9), 11451158.

A version similar to the published paper can be downloaded.


Factorisation properties of the strong product

Authors: Gert de Cooman, Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We investigate a number of factorisation conditions in the framework of sets of probability measures, or coherent lower previsions, with finite referential spaces. We show that the so-called strong product constitutes one way to combine a number of marginal coherent lower previsions into an independent joint lower prevision, and we prove that under some conditions it is the only independent product that satisfies the factorisation conditions.

Published in In Borgelt, C., González Rodrìguez, G., Trutschnig, W., Asunción Lubiano, M., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), SMPS 2010: Proceedings of the 5th international conference on Soft Methods in Probability and Statistics, Combining Soft Computing and Statistical Methods in Data Analysis, Advances in Intelligent and Soft Computing 77, 139147.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Independent natural extension.


Independent natural extension

Authors: Gert de Cooman, Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We introduce a general definition for the independence of a number of finite-valued variables, based on coherent lower previsions. Our definition has an epistemic flavour: it arises from personal judgements that a number of variables are irrelevant to one another. We show that a number of already existing notions, such as strong independence, satisfy our definition. Moreover, there always is a least-committal independent model, for which we provide an explicit formula: the independent natural extension. Our central result is that the independent natural extension satisfies so-called marginalisation, associativity and strong factorisation properties. These allow us to relate our research to more traditional ways of defining independence based on factorisation.

Published in Hüllermeier, E., Kruse, R., Hoffmann, F. (Eds), IPMU 2010: Proceedings of the 13th Information Processing and Management of Uncertainty in Knowledge-Based Systems Conference, Computational Intelligence for Knowledge-Based Systems Design, Lecture Notes in Computer Science 6178, Springer, 737746.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Independent natural extension.


Conditional models: coherence and inference through sequences of joint mass functions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We call a conditional model any set of statements made of conditional probabilities or expectations. We take conditional models as primitive compared to unconditional probability, in the sense that conditional statements do not need to be derived from an unconditional probability. We focus on two problems: (coherence) giving conditions to guarantee that a conditional model is self-consistent; (inference) delivering methods to derive new probabilistic statements from a self-consistent conditional model. We address these problems in the case where the probabilistic statements can be specified imprecisely through sets of probabilities, while restricting the attention to finite spaces of possibilities. Using Walley's theory of coherent lower previsions, we fully characterise the question of coherence, and specialise it for the case of precisely specified probabilities, which is the most common case addressed in the literature. This shows that coherent conditional models are equivalent to sequences of (possibly sets of) unconditional mass functions. In turn, this implies that the inferences from a conditional model are the limits of the conditional inferences obtained by applying Bayes' rule, when possible, to the elements of the sequence. In doing so, we unveil the tight connection between conditional models and zero-probability events.

Published in Journal of Statistical Planning and Inference 140(7), 18051833.

A version similar to the published paper can be downloaded.


Generalized loopy 2U: a new algorithm for approximate inference in credal networks

Authors: Alessandro Antonucci, Yi Sun, Cassio Polpo de Campos and Marco Zaffalon.
Year: 2010.

Abstract: Credal networks generalize Bayesian networks by relaxing the requirement of precision of probabilities. Credal networks are considerably more expressive than Bayesian networks, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal networks. The algorithm is based on an important representation result we prove for general credal networks: that any credal network can be equivalently reformulated as a credal network with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal network is then updated by L2U, a loopy approximate algorithm for binary credal networks. Overall, we generalize L2U to non-binary credal networks, obtaining a scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences with respect to other state-of-the-art algorithms is evaluated by extensive numerical tests.

Published in International Journal of Approximate Reasoning 51(5), 474484.

A version similar to the published paper can be downloaded.

The software that implements the generalized loopy 2U is freely available at the GL2U page.


Building knowledge-based systems by credal networks: a tutorial

Authors: Alberto Piatti, Alessandro Antonucci and Marco Zaffalon.
Year: 2010.

Abstract: Knowledge-based systems are computer programs achieving expert-level competence in solving problems for specific task areas. This chapter is a tutorial on the implementation of this kind of systems in the framework of credal networks. Credal networks are a generalization of Bayesian networks where credal sets, i.e., closed convex sets of probability measures, are used instead of precise probabilities. This allows for a more flexible model of the knowledge, which can represent ambiguity, contrast and contradiction in a natural and realistic way. The discussion guides the reader through the different steps involved in the specification of a system, from the evocation and elicitation of the knowledge to the interaction with the system by adequate inference algorithms. Our approach is characterized by a sharp distinction between the domain knowledge and the process linking this knowledge to the perceived evidence, which we call the observational process. This distinction leads to a very flexible representation of both domain knowledge and knowledge about the way the information is collected, together with a technique to aggregate information coming from different sources. The overall procedure is illustrated throughout the chapter by a simple knowledge-based system for the prediction of the result of a football match.

Published in Baswell, A. R. (Ed), Advances in Mathematics Research 11, Nova Science Publishers, New York.

A version similar to the published paper can be downloaded.


Lazy naive credal classifier

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2009.

Abstract: We propose a local (or lazy) version of the naive credal classifier. The latter is an extension of naive Bayes to imprecise probability developed to issue reliable classifications despite small amounts of data, which may then be carrying highly uncertain information about a domain. Reliability is maintained because credal classifiers can issue set-valued classifications on instances that are particularly difficult to classify. We show by extensive experiments that the local classifier outperforms the original one, both in terms of accuracy of classification and because it leads to stronger conclusions (i.e., set-valued classifications made by fewer classes). By comparing the local credal classifier with a local version of naive Bayes, we also show that the former reliably deals with instances which are difficult to classify, unlike the local naive Bayes which leads to fragile classifications.

Published in U'09: Proceedings of the First ACM SIGKDD International Workshop on Knowledge Discovery from Uncertain Data. ACM, 3037.

A version similar to the published paper can be downloaded.


Multiple model tracking by imprecise Markov trees

Authors: Alessandro Antonucci, Alessio Benavoli, Marco Zaffalon, Gert de Cooman and Filip Hermans.
Year: 2009.

Abstract: We present a new procedure for tracking manoeuvring objects by hidden Markov chains. It leads to more reliable modelling of the transitions between hidden states compared to similar approaches proposed within the Bayesian framework: we adopt convex sets of probability mass functions rather than single 'precise probability' specifications, in order to provide a more realistic and cautious model of the manoeuvre dynamics. In general, the downside of such increased freedom in the modelling phase is a higher inferential complexity. However, the simple topology of hidden Markov chains allows for efficient tracking of the object through a recently developed belief propagation algorithm. Furthermore, the imprecise specification of the transitions can produce so-called indecision, meaning that more than one model may be suggested by our method as a possible explanation of the target kinematics. In summary, our approach leads to a multiple-model estimator whose performance, investigated through extensive numerical tests, turns out to be more accurate and robust than that of Bayesian ones.

Published in FUSION 2009: Proceedings of the 12th International Conference on Information Fusion. IEEE, 17671774.

A version similar to the published paper can be downloaded.


Reliable hidden Markov model filtering through coherent lower previsions

Authors: Alessio Benavoli, Marco Zaffalon and Enrique Miranda.
Year: 2009.

Abstract: We extend Hidden Markov Models for continuous variables taking into account imprecision in our knowledge about the probabilistic relationships involved. To achieve that, we consider sets of probabilities, also called coherent lower previsions. In addition to the general formulation, we study in detail a particular case of interest: linear-vacuous mixtures. We also show, in a practical case, that our extension outperforms the Kalman filter when modelling errors are present in the system.

Published in FUSION 2009: Proceedings of the 12th International Conference on Information Fusion. IEEE, 17431750.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Robust filtering through coherent lower previsions.


Epistemic irrelevance in credal networks: the case of imprecise Markov trees

Authors: Gert de Cooman, Filip Hermans, Alessandro Antonucci and Marco Zaffalon.
Year: 2009.

Abstract: We replace strong independence in credal networks with the weaker notion of epistemic irrelevance. Focusing on directed trees, we show how to combine local credal sets into a global model, and we use this to construct and justify an exact message-passing algorithm that computes updated beliefs for a variable in the tree. The algorithm, which is essentially linear in the number of nodes, is formulated entirely in terms of coherent lower previsions. We supply examples of the algorithm's operation, and report an application to on-line character recognition that illustrates the advantages of our model for prediction.

Published in Augustin, T., Coolen, F., Moral, S., Troffaes, M. C. M. (Eds), ISIPTA '09: Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 149158.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Epistemic irrelevance in credal nets: the case of imprecise Markov trees.


Natural extension as a limit of regular extensions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2009.

Abstract: This paper is devoted to the extension of conditional assessments that satisfy some consistency criteria, such as weak or strong coherence, to further domains. In particular, we characterise the natural extension of a number of conditional lower previsions on finite spaces, by showing that it can be calculated as the limit of a sequence of conditional lower previsions defined by regular extension. Our results are valid for conditional lower previsions with non-linear domains, and allow us to give an equivalent formulation of the notion of coherence in terms of credal sets.

Published in Augustin, T., Coolen, F., Moral, S., Troffaes, M. C. M. (Eds), ISIPTA '09: Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 327336.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Conditional models: coherence and inference through sequences of joint mass functions.


The pari-mutuel model

Authors: Renato Pelessoni, Paolo Vicig and Marco Zaffalon.
Year: 2009.

Abstract: We explore generalizations of the pari-mutuel model (PMM), a formalization of an intuitive way of assessing an upper probability from a precise one. We discuss a naive extension of the PMM considered in insurance and generalize the natural extension of the PMM introduced by P. Walley and other related formulae. The results are subsequently given a risk measurement interpretation: in particular it is shown that a known risk measure, Tail Value at Risk (TVaR), is derived from the PMM, and a coherent risk measure more general than TVaR from its imprecise version. We analyze further the conditions for coherence of a related risk measure, Conditional Tail Expectation. Explicit formulae for conditioning the PMM and conditions for dilation or imprecision increase are also supplied and discussed.

Published in Augustin, T., Coolen, F., Moral, S., Troffaes, M. C. M. (Eds), ISIPTA '09: Proceedings of the Sixth International Symposium on Imprecise Probability: Theories and Applications. SIPTA, 347356.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Inference and risk measurement with the pari-mutuel model.


Conservative inference rule for uncertain reasoning under incompleteness

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2009.

Abstract: In this paper we formulate the problem of inference under incomplete information in very general terms. This includes modelling the process responsible for the incompleteness, which we call the incompleteness process. We allow the process' behaviour to be partly unknown. Then we use Walley's theory of coherent lower previsions, a generalisation of the Bayesian theory to imprecision, to derive the rule to update beliefs under incompleteness that logically follows from our assumptions, and that we call conservative inference rule. This rule has some remarkable properties: it is an abstract rule to update beliefs that can be applied in any situation or domain; it gives us the opportunity to be neither too optimistic nor too pessimistic about the incompleteness process, which is a necessary condition to draw reliable while strong enough conclusions; and it is a coherent rule, in the sense that it cannot lead to inconsistencies. We give examples to show how the new rule can be applied in expert systems, in parametric statistical inference, and in pattern classification, and discuss more generally the view of incompleteness processes defended here as well as some of its consequences.

Published in Journal of Artificial Intelligence Research 34, 757821.

The published paper can be downloaded.


Coherence graphs

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2009.

Abstract: We study the consistency of a number of probability distributions, which are allowed to be imprecise. To make the treatment as general as possible, we represent those probabilistic assessments as a collection of conditional lower previsions. The problem then becomes proving Walley's (strong) coherence of the assessments. In order to maintain generality in the analysis, we assume to be given nearly no information about the numbers that make up the lower previsions in the collection. Under this condition, we investigate the extent to which the above global task can be decomposed into simpler and more local ones. This is done by introducing a graphical representation of the conditional lower previsions that we call the coherence graph: we show that the coherence graph allows one to isolate some subsets of the collection whose coherence is sufficient for the coherence of all the assessments; and we provide a polynomial-time algorithm that finds the subsets efficiently. We show some of the implications of our results by focusing on three models and problems: Bayesian and credal networks, of which we prove coherence; the compatibility problem, for which we provide an optimal graphical decomposition; probabilistic satisfiability, of which we show that some intractable instances can instead be solved efficiently by exploiting coherence graphs.

Published in Artificial Intelligence 173, 104144.

A version similar to the published paper can be downloaded.


JNCC2, the implementation of naive credal classifier 2

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: JNCC2 implements the naive credal classifier 2, which is an extension of naive Bayes to imprecise probabilities. JNCC2 permits to efficiently infer the classifier from data, to evaluate its performance and to make it predict the classes of new instances. Moreover, the software is designed to cross-compare the performance of the classifier with that of naive Bayes as a way to verify the robustness of naive credal classifier 2 to the challenges originated by small sample sizes and missing data. JNCC2 is released under the GNU GPL license.

Published in Journal of Machine Learning Research 9, 26952698.

The published paper can be downloaded.

The software that implements the naive credal classifier 2 is freely available at the JNCC2 page.


Credal networks for military identification problems

Authors: Alessandro Antonucci, Ralph Brühlmann, Alberto Piatti and Marco Zaffalon.
Year: 2009.

Abstract: Credal networks are imprecise probabilistic graphical models generalizing Bayesian networks to convex sets of probability mass functions. This makes credal networks particularly suited to model expert knowledge under very general conditions, including states of qualitative and incomplete knowledge. In this paper, we present a credal network for risk evaluation in case of intrusion of civil aircrafts into a restricted flight area. The different factors relevant for this evaluation, together with an independence structure over them, are initially identified. These factors are observed by sensors, whose reliabilities can be affected by variable external factors, and even by the behaviour of the intruder. A model of these observation processes, and the necessary fusion scheme for the information returned by the sensors measuring the same factor, are both completely embedded into the structure of the credal network. A pool of experts, facilitated in their task by specific techniques to convert qualitative judgements into imprecise probabilistic assessments, has made possible the quantification of the network. We show the capabilities of the proposed model by means of some preliminary tests referred to simulated scenarios. Overall, we can regard this application as a useful tool to support military experts in their decision, but also as a quite general imprecise-probability paradigm for information fusion.

Published in International Journal of Approximate Reasoning 50, 666679.

A version similar to the published paper can be downloaded.


Limits of learning about a categorical latent variable under prior near-ignorance

Authors: Alberto Piatti, Marco Zaffalon, Fabio Trojani and Marcus Hutter.
Year: 2009.

Abstract: In this paper, we consider the coherent theory of (epistemic) uncertainty of Walley, in which beliefs are represented through sets of probability distributions, and we focus on the problem of modeling prior ignorance about a categorical random variable. In this setting, it is a known result that a state of prior ignorance is not compatible with learning. To overcome this problem, another state of beliefs, called near-ignorance, has been proposed. Near-ignorance resembles ignorance very closely, by satisfying some principles that can arguably be regarded as necessary in a state of ignorance, and allows learning to take place. What this paper does, is to provide new and substantial evidence that also near-ignorance cannot be really regarded as a way out of the problem of starting statistical inference in conditions of very weak beliefs. The key to this result is focusing on a setting characterized by a variable of interest that is latent. We argue that such a setting is by far the most common case in practice, and we provide, for the case of categorical latent variables (and general manifest variables) a condition that, if satisfied, prevents learning to take place under prior near-ignorance. This condition is shown to be easily satisfied even in the most common statistical problems. We regard these results as a strong form of evidence against the possibility to adopt a condition of prior near-ignorance in real statistical problems.

Published in International Journal of Approximate Reasoning 50, 597611.

A version similar to the published paper can be downloaded.


Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2008.

Abstract: Credal networks are models that extend Bayesian nets to deal with imprecision in probability, and can actually be regarded as sets of Bayesian nets. Credal nets appear to be powerful means to represent and deal with many important and challenging problems in uncertain reasoning. We give examples to show that some of these problems can only be modeled by credal nets called non-separately specified. These, however, are still missing a graphical representation language and updating algorithms. The situation is quite the opposite with separately specified credal nets, which have been the subject of much study and algorithmic development. This paper gives two major contributions. First, it delivers a new graphical language to formulate any type of credal network, both separately and non-separately specified. Second, it shows that any non-separately specified net represented with the new language can be easily transformed into an equivalent separately specified net, defined over a larger domain. This result opens up a number of new outlooks and concrete outcomes: first of all, it immediately enables the existing algorithms for separately specified credal nets to be applied to non-separately specified ones. We explore this possibility for the 2U algorithm: an algorithm for exact updating of singly connected credal nets, which is extended by our results to a class of non-separately specified models. We also consider the problem of inference on Bayesian networks, when the reason that prevents some of the variables from being observed is unknown. The problem is first reformulated in the new graphical language, and then mapped into an equivalent problem on a separately specified net. This provides a first algorithmic approach to this kind of inference, which is also proved to be NP-hard by similar transformations based on our formalism.

Published in International Journal of Approximate Reasoning 49(2), 345361.

A version similar to the published paper can be downloaded.


Generalized loopy 2U: a new algorithm for approximate inference in credal networks

Authors: Alessandro Antonucci, Marco Zaffalon, Yi Sun and Cassio Polpo de Campos.
Year: 2008.

Abstract: Credal nets generalize Bayesian nets by relaxing the requirement of precision of probabilities. Credal nets are considerably more expressive than Bayesian nets, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal nets. The algorithm is based on an important representation result we prove for general credal nets: that any credal net can be equivalently reformulated as a credal net with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal net is updated by L2U, a loopy approximate algorithm for binary credal nets. Thus, we generalize L2U to non-binary credal nets, obtaining an accurate and scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences is evaluated by empirical tests.

Published in Jaeger, M., Nielsen, T. D. (Eds), PGM'08: Proceedings of the Fourth European Workshop on Probabilistic Graphical Models. Hirtshals (Denmark), 1724.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Generalized loopy 2U: a new algorithm for approximate inference in credal networks.

The software that implements the generalized loopy 2U is freely available at the GL2U page.


Credal model averaging: an extension of Bayesian model averaging to imprecise probabilities

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: We deal with the arbitrariness in the choice of the prior over the models in Bayesian model averaging (BMA), by modelling prior knowledge by a set of priors (i.e., a prior credal set). We consider Dash and Cooper's BMA applied to naive Bayesian networks, replacing the single prior over the naive models by a credal set; this models a condition close to prior ignorance about the models, which leads to credal model averaging (CMA). CMA returns an indeterminate classification, i.e., multiple classes, on the instances for which the learning set is not informative enough to smooth the effect of the choice of the prior. We give an algorithm to compute exact credal model averaging for naive networks. Extensive experiments show that indeterminate classifications preserve the reliability of CMA on the instances which are classified in a prior-dependent way by BMA.

Published in Daelemans, W., Goethals, B., Morik, K. (eds), Proceedings of ECML PKDD 2008: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Lecture notes in Computer Science 5211, Springer, 257271.

A version similar to the published paper can be downloaded.


Spatially distributed identification of debris flow source areas by credal networks

Authors: Andrea Salvetti, Alessandro Antonucci and Marco Zaffalon.
Year: 2008.

Abstract: Debris flows represent a very destructive natural hazard, affecting buildings, transport infrastructures, and, very often, causing human losses in mountain regions. That makes the identification of potential source areas of debris flows inside a watershed particularly important. In this paper we present a general identification procedure based on the credal network (that is an imprecise probabilistic graphical model generalizing Bayesian networks) originally introduced by [Antonucci et al., 2004]. That model is significantly improved by a more refined description of the meteorological and hydrological processes contributing to the debris flow initiation. As a counterpart of such improvement, the model pays a slight increase in terms of computational time for identifications. That does not prevent its extensive, spatially distributed, application to whole basins, thanks to a preliminary deterministic analysis that rejects local areas where the triggering of a debris flow cannot take place. The overall procedure is tested for a debris flow prone watershed in Southern Switzerland. The model detects the areas in the basin more prone to debris flow initiation and also shows that different rainfall return periods produce different patterns of hazard in the basin. That makes it possible with this procedure to determine the return period of the critical rainfall that triggers debris flow as a result of channel-bed failure in a specific point along the drainage network.

Published in Sànchez-Marrè, M., Béjar, J., Comas, J., Rizzoli, A. E., Guariso, G. (Eds), iEMSs 2008: International Congress on Environmental Modelling and Software Integrating Sciences and Information Technology for Environmental Assessment and Decision Making (Transactions of the 4th Biennial Meeting of the International Environmental Modelling and Software Society), iEMSs (Manno, Switzerland), 380387.

A version similar to the published paper can be downloaded.


Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, by issuing set-valued classifications, NCC2 is able to isolate and properly deal with instances that are hard to classify (on which naive Bayes accuracy drops considerably), and to perform as well as naive Bayes on the other instances. The experiments point to a general problem: they show that with missing values, empirical evaluations may not reliably estimate the accuracy of a traditional classifier, such as naive Bayes. This phenomenon adds even more value to the robust approach to classification implemented by NCC2.

Published in Journal of Machine Learning Research 9, 581621.

The published paper can be downloaded.

The software that implements the naive credal classifier 2 is described in another paper published in Journal of Machine Learning Research: JNCC2, the implementation of naive credal classifier 2, and it is freely available at the JNCC2 page.


Naive credal classifier 2: an extension of naive Bayes for delivering robust classifications

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: Naive credal classifier 2 (NCC2) extends naive Bayes in order to deliver more robust classifications. NCC2 is based on a set of prior densities rather than on a single prior; as a consequence, when faced with instances whose classification is prior-dependent (and therefore might not be reliable), it returns a set of classes (we call this an indeterminate classification) instead of a single class. Moreover, NCC2 introduces a very general and flexible treatment of missing data, which, under certain circumstances, can also lead to indeterminate classifications. In this case, indeterminacy can be regarded as a way to preserve reliability despite the information hidden by missing values. We call hard-to-classify the instances classified indeterminately by NCC2. Extensive empirical evaluations show that naive Bayes' accuracy drops considerably on the hard-to-classify instances identified by NCC2, and that on the other hand, NCC2 has high set-accuracy (the proportion of times that the actual class is contained in the set of returned classes) when it is indeterminate.

Published in Stahlbock, R., Crone, S. F., Lessmann, S. (Eds), Proceedings of the 4th International Conference on Data Mining: DMIN'08, CSREA Press, 8490.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2.


Credal networks for operational risk measurement and management

Authors: Alessandro Antonucci, Alberto Piatti and Marco Zaffalon.
Year: 2007.

Abstract: According to widely accepted guidelines for self-regulation, the capital requirements of a bank should relate to the level of risk with respect to three different categories. Among them, operational risk is the more difficult to assess, as it requires merging expert judgments and quantitative information about the functional structure of the bank. A number of approaches to the evaluation of operational risk based on Bayesian networks have been recently considered. In this paper, we propose credal networks, which are a generalization of Bayesian networks to imprecise probabilities, as a more appropriate framework for the measurement and management of operational risk. The reason is the higher flexibility provided by credal networks compared to Bayesian networks in the quantification of the probabilities underlying the model: this makes it possible to represent human expertise required for these evaluations in a credible and robust way. We use a real-world application to demonstrate these features and to show how to measure operational risk by means of algorithms for inference over credal nets. This is shown to be possible, also in the case when the observation of some factor is vague.

Published in Apolloni, B., Howlett, R. J., Jain, L. C., Proceedings of the 11th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems: KES2007, Lectures Notes in Computer Science 4693, Springer, 604611.

A version similar to the published paper can be downloaded.


Credal networks for military identification problems

Authors: Alessandro Antonucci, Ralph Brühlmann, Alberto Piatti and Marco Zaffalon.
Year: 2007.

Abstract: Credal networks are imprecise probabilistic graphical models generalizing Bayesian networks to convex sets of probability mass functions. This makes credal networks particularly suited to capture and model expert knowledge under very general conditions, including states of qualitative and incomplete knowledge. In this paper, we present a credal network for risk evaluation in case of intrusion of civil aircrafts into a no-fly zone. The different factors relevant for this evaluation, together with an independence structure over them, are initially identified. These factors are observed by sensors, whose reliabilities can be affected by variable external factors, and even by the behavior of the intruder. A model of these observation mechanisms, and the necessary fusion scheme for the information returned by the sensors measuring the same factor, are both completely embedded into the structure of the credal network. A pool of experts, facilitated in their task by specific techniques to convert qualitative judgments into imprecise probabilistic assessments, has made possible the quantification of the network. We show the capabilities of the proposed network by means of some preliminary tests referred to simulated scenarios. Overall, we can regard this application as an useful tool to support military experts in their decision, but also as a quite general imprecise-probability paradigm for information fusion.

Published in de Cooman, G, Vejnarová, J., Zaffalon, M. (Eds), ISIPTA '07: Proceedings of the fifth International Symposium on on Imprecise Probability: Theories and Applications, Action M Agency, Prague (Czech Republic), 110.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Credal networks for military identification problems.


Coherence graphs

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2007.

Abstract: We consider the task of proving Walley's (joint or strong) coherence of a number of probabilistic assessments, when these assessments are represented as a collection of conditional lower previsions. In order to maintain generality in the analysis, we assume to be given nearly no information about the numbers that make up the lower previsions in the collection. Under this condition, we investigate the extent to which the above global task can be decomposed into simpler and more local ones. This is done by introducing a graphical representation of the conditional lower previsions, that we call the coherence graph: we show that the coherence graph allows one to isolate some subsets of the collection whose coherence is sufficient for the coherence of all the assessments. The situation is shown to be completely analogous in the case of Walley's notion of weak coherence, for which we prove in addition that the subsets found are optimal, in the sense that they embody the maximal degree to which the task of checking weak coherence can be decomposed. In doing all of this, we obtain a number of related results: we give a new characterisation of weak coherence; we characterise, by means of a special kind of coherence graph, when the local notion of separate coherence is sufficient for coherence; and we provide an envelope theorem for collections of lower previsions whose graph is of the latter type.

Published in de Cooman, G, Vejnarová, J., Zaffalon, M. (Eds), ISIPTA '07: Proceedings of the fifth International Symposium on on Imprecise Probability: Theories and Applications, Action M Agency, Prague (Czech Republic), 297306.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Coherence graphs.


Learning about a categorical latent variable under prior near-ignorance

Authors: Alberto Piatti, Marco Zaffalon, Fabio Trojani and Marcus Hutter.
Year: 2007.

Abstract: It is well known that complete prior ignorance is not compatible with learning, at least in a coherent theory of (epistemic) uncertainty. What is less widely known, is that there is a state similar to full ignorance, that Walley calls near-ignorance, that permits learning to take place. In this paper we provide new and substantial evidence that also near-ignorance cannot be really regarded as a way out of the problem of starting statistical inference in conditions of very weak beliefs. The key to this result is focusing on a setting characterized by a variable of interest that is latent. We argue that such a setting is by far the most common case in practice, and we show, for the case of categorical latent variables (and general manifest variables) that there is a sufficient condition that, if satisfied, prevents learning to take place under prior near-ignorance. This condition is shown to be easily satisfied in the most common statistical problems.

Published in de Cooman, G, Vejnarová, J., Zaffalon, M. (Eds), ISIPTA '07: Proceedings of the fifth International Symposium on on Imprecise Probability: Theories and Applications, Action M Agency, Prague (Czech Republic), 357364.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Limits of learning about a categorical latent variable under prior near-ignorance.


Notes on "Notes on conditional previsions"

Authors: Paolo Vicig, Marco Zaffalon and Fabio G. Cozman.
Year: 2007.

Abstract: These notes comment on Williams' fundamental essay Notes on Conditional Previsions, written as a research report in 1975 and published in the present issue. Basic aspects of that work are discussed, including historical background and relevance to the foundations of probability; examples are supplied to help understanding.

Published in International Journal of Approximate Reasoning 44(3), 358365.

A version similar to the published paper can be downloaded.


Fast algorithms for robust classification with Bayesian nets

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2007.

Abstract: We focus on a well-known classification task with expert systems based on Bayesian networks: predicting the state of a target variable given an incomplete observation of the other variables in the network, i.e., an observation of a subset of all the possible variables. To provide conclusions robust to near-ignorance about the process that prevents some of the variables from being observed, it has recently been derived a new rule, called conservative updating. With this paper we address the problem to efficiently compute the conservative updating rule for robust classification with Bayesian networks. We show first that the general problem is NP-hard, thus establishing a fundamental limit to the possibility to do robust classification efficiently. Then we define a wide subclass of Bayesian networks that does admit efficient computation. We show this by developing a new classification algorithm for such a class, which extends substantially the limits of efficient computation with respect to the previously existing algorithm. The algorithm is formulated as a variable elimination procedure, whose computation time is linear in the input size.

Published in International Journal of Approximate Reasoning 44(3), 200223.

A version similar to the published paper can be downloaded.


Locally specified credal networks

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2006.

Abstract: Credal networks are models that extend Bayesian nets to deal with imprecision in probability, and can actually be regarded as sets of Bayesian nets. Evidence suggests that credal nets are a powerful means to represent and deal with many important and challenging problems in uncertain reasoning. We give examples to show that some of these problems can only be modelled by credal nets called non-separately specified. These, however, are still missing a graphical representation language and solution algorithms. The situation is quite the opposite with separately specified credal nets, which have been the subject of much study and algorithmic development. This paper gives two major contributions. First, it delivers a new graphical language to formulate any type of credal network, both separately and non-separately specified. Second, it shows that any non-separately specified net represented with the new language can be easily transformed into an equivalent separately specified net, defined over a larger domain. This result opens up a number of new perspectives and concrete outcomes: first of all, it immediately enables the existing algorithms for separately specified credal nets to be applied to non-separately specified ones.

Published in Studený, M., Vomlel, J. (Eds), PGM'06: Proceedings of the third European Workshop on Probabilistic Graphical Models, Action M Agency, Prague (Czech Republic), 2534.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks.


Classification of dementia types from cognitive profiles data

Authors: Giorgio Corani, Chris Edgar, Isabelle Marshall, Keith Wesnes and Marco Zaffalon.
Year: 2006.

Abstract: The Cognitive Drug Research (CDR) system is specifically validated for dementia assessment; it consists of a series of computerized tests, which assess the cognitive faculties of the patient to derive a cognitive profile. We use six different classification algorithms to classify clinically diagnosed diseases from their cognitive profiles. Good accuracy was obtained in separating patients affected by Parkinson's disease from demented patients, and in discriminating between Alzheimer's disease and Vascular Dementia. However, in discriminating between Parkinson disease with dementia (PDD) and dementia with Lewy bodies (DLB), the accuracy was only slightly superior to chance; the existence of a significant difference in the cognitive profiles of DLB and PDD is indeed questioned in the medical literature.

Published in ECML 2006: Proceedings of the seventeenth European Conference on Machine Learning, Lecture Notes in Computer Science 4213, Springer, Berlin, 470477.

A version similar to the published paper can be downloaded.


Binarization algorithms for approximate updating in credal nets

Authors: Alessandro Antonucci, Marco Zaffalon, Jaime S. Ide and Fabio G. Cozman.
Year: 2006.

Abstract: Credal networks generalize Bayesian networks relaxing numerical parameters. This considerably expands expressivity, but makes belief updating a hard task even on polytrees. Nevertheless, if all the variables are binary, polytree-shaped credal networks can be efficiently updated by the 2U algorithm. In this paper we present a binarization algorithm, that makes it possible to approximate an updating problem in a credal net by a corresponding problem in a credal net over binary variables. The procedure leads to outer bounds for the original problem. The binarized nets are in general multiply connected, but can be updated by the loopy variant of 2U. The quality of the overall approximation is investigated by promising numerical experiments.

Published in Penserini, L., Peppas, P., Perini, A. (Eds), STAIRS'06: Proceedings of the third European Starting AI Researcher Symposium. IOS Press, Amsterdam (Netherlands), 120131.

A version similar to the published paper can be downloaded.


Equivalence between Bayesian and credal nets on an updating problem

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2006.

Abstract: We establish an intimate connection between Bayesian and credal nets. Bayesian nets are precise graphical models, credal nets extend Bayesian nets to imprecise probability. We focus on traditional belief updating with credal nets, and on the kind of belief updating that arises with Bayesian nets when the reason for the missingness of some of the unobserved variables in the net is unknown. We show that the two updating problems are formally the same.

Published in Lawry, J., Miranda, E., Bugarin, A., Li, S., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), Soft Methods for Integrated Uncertainty Modelling, Springer (Proceedings of the third international conference on Soft Methods in Probability and Statistics: SMPS 2006), 223230.

A version similar to the published paper can be downloaded.


Robust inference of trees

Authors: Marco Zaffalon and Marcus Hutter.
Year: 2005.

Abstract: This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating the data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time $O(m^4)$, $m$ being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.

Published in Annals of Mathematics and Artificial Intelligence 45(12), 215239.

A version similar to the published paper can be downloaded.


Fast algorithms for robust classification with Bayesian nets

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2005.

Abstract: We focus on a well-known classification task with expert systems based on Bayesian networks: predicting the state of a target variable given an incomplete observation of the other variables in the network, i.e., an observation of a subset of all the possible variables. To provide conclusions robust to near-ignorance about the process that prevents some of the variables from being observed, it has recently been derived a new rule, called conservative updating. With this paper we address the problem to efficiently compute the conservative updating rule for robust classification with Bayesian networks. We show first that the general problem is NP-hard, thus establishing a fundamental limit to the possibility to do robust classification efficiently. Then we define a wide subclass of Bayesian networks that does admit efficient computation. We show this by developing a new classification algorithm for such a class, which extends substantially the limits of efficient computation with respect to the previously existing algorithm.

Published in Cozman, F. G., Nau, R., Seidenfeld, T. (Eds), ISIPTA '05: Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications. SIPTA, 1120.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Fast algorithms for robust classification with Bayesian nets.


Limits of learning from imperfect observations under prior ignorance: the case of the imprecise Dirichlet model

Authors: Alberto Piatti, Marco Zaffalon and Fabio Trojani.
Year: 2005.

Abstract: Consider a relaxed multinomial setup, in which there may be mistakes in observing the outcomes of the process—this is often the case in real applications. What can we say about the next outcome if we start learning about the process in conditions of prior ignorance? To answer this question we extend the imprecise Dirichlet model to the case of imperfect observations and we focus on posterior predictive probabilities for the next outcome. The results are very surprising: the posterior predictive probabilities are vacuous, irrespectively of the amount of observations we do, and however small is the probability of doing mistakes. In other words, the imprecise Dirichlet model cannot help us to learn from data when the observational mechanism is imperfect. This result seems to rise a serious question about the use of the imprecise Dirichlet model for practical applications, and, more generally, about the possibility to learn from imperfect observations under prior ignorance.

Published in Cozman, F. G., Nau, R., Seidenfeld, T. (Eds), ISIPTA '05: Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications. SIPTA, 276286.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Limits of learning about a categorical latent variable under prior near-ignorance.


Conservative rules for predictive inference with incomplete data

Author: Marco Zaffalon.
Year: 2005.

Abstract: This paper addresses the following question: how should we update our beliefs after observing some incomplete data, in order to make credible predictions about new, and possibly incomplete, data? There may be several answers to this question according to the model of the process that creates the incompleteness. This paper develops a rigorous modelling framework that makes it clear the conditions that justify the different answers; and, on this basis, it derives a new conditioning rule for predictive inference to be used in a wide range of states of knowledge about the incompleteness process, including near-ignorance, which, surprisingly, does not seem to have received attention so far. Such a case is instead particularly important, as modelling incompleteness processes can be highly impractical, and because there are limitations to statistical inference with incomplete data: it is generally not possible to learn how incompleteness processes work by using the available data; and it may not be possible, as the paper shows, to measure empirically the quality of the predictions. Yet, these depend heavily on the assumptions made.

Published in Cozman, F. G., Nau, R., Seidenfeld, T. (Eds), ISIPTA '05: Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications. SIPTA, 406415.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Conservative inference rule for uncertain reasoning under incompleteness.


Updating beliefs with incomplete observations

Authors: Gert de Cooman and Marco Zaffalon.
Year: 2004.

Abstract: Currently, there is renewed interest in the problem, raised by Shafer in 1985, of updating probabilities when observations are incomplete (or set-valued). This is a fundamental problem in general, and of particular interest for Bayesian networks. Recently, Grünwald and Halpern have shown that commonly used updating strategies fail in this case, except under very special assumptions. In this paper we propose a new method for updating probabilities with incomplete observations. Our approach is deliberately conservative: we make no assumptions about the so-called incompleteness mechanism that associates complete with incomplete observations. We model our ignorance about this mechanism by a vacuous lower prevision, a tool from the theory of imprecise probabilities, and we use only coherence arguments to turn prior into posterior (updated) probabilities. In general, this new approach to updating produces lower and upper posterior probabilities and previsions (expectations), as well as partially determinate decisions. This is a logical consequence of the existing ignorance about the incompleteness mechanism. As an example, we use the new updating method to properly address the apparent paradox in the 'Monty Hall' puzzle. More importantly, we apply it to the problem of classification of new evidence in probabilistic expert systems, where it leads to a new, so-called conservative updating rule. In the special case of Bayesian networks constructed using expert knowledge, we provide an exact algorithm to compare classes based on our updating rule, which has linear-time complexity for a class of networks wider than polytrees. This result is then extended to the more general framework of credal networks, where computations are often much harder than with Bayesian nets. Using an example, we show that our rule appears to provide a solid basis for reliable updating with incomplete observations, when no strong assumptions about the incompleteness mechanism are justified.

Published in Artificial Intelligence 159(12), 75125.

A version similar to the published paper can be downloaded.

This paper has been reviewed by M. F. McNally in Computing Reviews 46(12), 2005.


Assessing debris flow hazard by credal nets

Authors: Alessandro Antonucci, Andrea Salvetti and Marco Zaffalon.
Year: 2004.

Published in López-Díaz, M., Gil, M. A., Grzegorzewski, P., Hryniewicz, O., Lawry, J. (Eds),Soft Methodology and Random Information Systems, Springer (Proceedings of the Second International Conference on Soft Methods in Probability and Statistics: SMPS 2004), 125132.

A version similar to the published paper can be downloaded.


Hazard assessment of debris flows by credal networks

Authors: Alessandro Antonucci, Andrea Salvetti and Marco Zaffalon.
Year: 2004.

Abstract: Debris flows are destructive natural hazards that affect human life, buildings, and infrastructures. Despite their importance, debris flows are only partially understood, and human expertise still plays a key role for hazard identification. This paper proposes filling the modelling gap by using credal networks, an imprecise-probability model. The model uses a directed graph to capture the causal relationships between the triggering factors of debris flows. Quantitative influences are represented by probability intervals, determined from historical data, expert knowledge, and theoretical models. Most importantly, the model joins the empirical and the quantitative modelling levels, in the direction of more credible inferences. The model is evaluated on real case studies related to dangerous areas of the Ticino Canton, southern Switzerland. The case studies highlight the good capabilities of the model: for all the areas the model produces significant probabilities of hazard.

Published in Pahl-Wostl, C., Schmidt, S., Rizzoli, A. E., Jakeman, A. J. (Eds), iEMSs 2004: Complexity and Integrated Resources Management, Transactions of the 2nd Biennial Meeting of the International Environmental Modelling and Software Society, iEMSs (Manno, Switzerland) 98103.

A version similar to the published paper can be downloaded.


Credal networks for hazard assessment of debris flows

Authors: Alessandro Antonucci, Andrea Salvetti and Marco Zaffalon.
Year: 2007.

Published in Kropp, J., Scheffran, J. (Eds), Advanced Methods for Decision Making and Risk Management in Sustainability Science, Nova Science Publishers, New York, 237256

A version similar to the published paper can be downloaded.


Distribution of mutual information from complete and incomplete data

Authors: Marcus Hutter and Marco Zaffalon.
Year: 2005.

Abstract: Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n^{-3}), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. We discuss some of these prospective applications, and for one of them, namely feature selection, we show how inductive mutual information leads to significant improvements in the inferred models.

Published in Computational Statistics and Data Analysis 48(3), 633657.

A version similar to the published paper can be downloaded.


Credible classification for environmental problems

Author: Marco Zaffalon.
Year: 2005.

Abstract: Classifiers that aim at doing reliable predictions should rely on carefully elicited prior knowledge. Often this is not available so they should be able to start learning from data in condition of prior ignorance. This paper shows empirically, on an agricultural data set, that established methods of classification do not always adhere to this principle. Common ways to represent prior ignorance are shown to have an overwhelming weight compared to the information in the data, producing overconfident predictions. This point is crucial for problems, such as environmental ones, where prior knowledge is often scarce and even the data may not be known precisely. Credal classification, and in particular the naive credal classifier, are proposed as more faithful ways to represent states of ignorance. We show that with credal classification, conditions of ignorance may limit the power of the inferences, not the reliability of the predictions.

Published in Environmental Modelling & Software 20(8), 10031012.

A version similar to the published paper can be downloaded.


Bayesian treatment of incomplete discrete data applied to mutual information and feature selection

Authors: Marcus Hutter and Marco Zaffalon.
Year: 2003.

Abstract: Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.

Published in In Günter, A., Kruse, R., Neumann, B., KI-2003: Proceedings of the 26th German Conference on Artificial Intelligence, Lecture Notes in Computer Science 2821, Springer, Heidelberg, 396406.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Distribution of mutual information from complete and incomplete data.

Updating with incomplete observations

Authors: Gert de Cooman and Marco Zaffalon.
Year: 2003.

Abstract: Currently, there is renewed interest in the problem, raised by Shafer in 1985, of updating probabilities when observations are incomplete (or set-valued). This is a fundamental problem, and of particular interest for Bayesian networks. Recently, Grünwald and Halpern have shown that commonly used updating strategies fail here, except under very special assumptions. We propose a new rule for updating probabilities with incomplete observations. Our approach is deliberately conservative: we make no or weak assumptions about the so-called incompleteness mechanism that produces incomplete observations. We model our ignorance about this mechanism by a vacuous lower prevision, a tool from the theory of imprecise probabilities, and we derive a new updating rule using coherence arguments. In general, our rule produces lower posterior probabilities, as well as partially determinate decisions. This is a logical consequence of the ignorance about the incompleteness mechanism. We show how the new rule can properly address the apparent paradox in the `Monty Hall' puzzle. In addition, we apply it to the classification of new evidence in Bayesian networks constructed using expert knowledge. We provide an exact algorithm for this task with linear-time complexity, also for multiply connected nets.

Published in Kjærulff, U., Meek, C. (Eds), UAI-2003: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco, 142150.

A version similar to the published paper can be downloaded.

Powerpoint slides of presentation are also available.

A revised and updated version of this conference paper is Updating beliefs with incomplete observations.

Tree-based credal networks for classification

Authors: Marco Zaffalon and Enrico Fagiuoli.
Year: 2003.

Abstract: Bayesian networks are models for uncertain reasoning successfully applied to the data-mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a state-of-the-art Bayesian net for classification, called tree-augmented naive Bayes classifier, to credal sets originated from probability intervals. We formalize the new model, develop an exact linear-time classification algorithm, and evaluate the credal net-based classifier on a number of real data sets. The empirical analysis shows that the new classifier is good and reliable, and raises a problem of overly caution that is discussed in the paper. Overall, given the favorable trade-off between expressiveness and efficient computation, the newly proposed classifier appear to be a good candidate for the wide-scale application of credal networks to real and complex tasks.

Published in Reliable Computing 9(6), 487509.

A version similar to the published paper can be downloaded.


Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data

Authors: Marco Zaffalon, Keith Wesnes and Orlando Petrini.
Year: 2003.

Abstract: Dementia is a serious personal, medical and social problem. Recent research indicates early and accurate diagnoses as the key to effectively cope with it. No definitive cure is available but in some cases when the impairment is still mild the disease can be contained. This paper describes a diagnostic tool that jointly uses the naive credal classifier and the most widely used computerized system of cognitive tests in dementia research, the Cognitive Drug Research system. The naive credal classifier extends the discrete naive Bayes classifier to imprecise probabilities. The naive credal classifier models both prior ignorance and ignorance about the likelihood by sets of probability distributions. This is a new way to deal with small or incomplete data sets that departs significantly from most established classification methods. In the empirical study presented here the naive credal classifier provides reliability and unmatched predictive performance. It delivers up to 95% correct predictions while being very robust with respect to the partial ignorance due to the largely incomplete data. The diagnostic tool also proves to be very effective in discriminating between Alzheimer's disease and Dementia with Lewy Bodies.

Published in Artificial Intelligence in Medicine 29(12), 6179.

A version similar to the published paper can be downloaded.


Robust feature selection by mutual information distributions

Authors: Marco Zaffalon and Marcus Hutter.
Year: 2002.

Abstract: Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.

Published in Darwiche, A., Friedman, N. (Eds), UAI-2002: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco, 577584.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Distribution of mutual information from complete and incomplete data.


Exact credal treatment of missing data

Author: Marco Zaffalon.
Year: 2002.

Abstract: This paper proposes an exact, no-assumptions approach to dealing with incomplete sets of multivariate categorical data. An incomplete data set is regarded as a finite collection of complete data sets, and a joint distribution is obtained from each of them, at a descriptive level. The tools to simultaneously treat all the possible joint distributions compatible with an incomplete set of data are given. In particular, a linear description of the set of distributions is formulated, and it is shown that the computation of bounds on the expectation of real-valued functions under such distributions is both possible and efficient, by means of linear programming. Specific algorithms are also developed whose complexity grows linearly in the number of observations. An analysis is then carried out to estimate population probabilities from incomplete multinomial samples. The descriptive tool extends in a straightforward way to the inferential problem by exploiting Walley's imprecise Dirichlet model.

Published in Journal of Statistical Planning and Inference 105(1), 105122.

A version similar to the published paper can be downloaded.


The naive credal classifier

Author: Marco Zaffalon.
Year: 2002.

Abstract: Convex sets of probability distributions are also called credal sets. They generalize probability theory by relaxing the requirement that probability values be precise. Classification, i.e. assigning class labels to instances described by a set of attributes, is an important domain of application of Bayesian methods, where the naive Bayes classifier has a surprisingly good performance. This paper proposes a new method of classification which involves extending the naive Bayes classifier to credal sets. Exact and effective solution procedures for naive credal classification are derived, and the related dominance criteria are discussed. Credal classification appears as a new method, based on more realistic assumptions and in the direction of more reliable inferences.

Published in Journal of Statistical Planning and Inference 105(1), 521.

A version similar to the published paper can be downloaded.


Credal classification for dementia screening

Authors: Marco Zaffalon, Keith Wesnes and Orlando Petrini.
Year: 2001.

Abstract: Dementia is a very serious personal, medical and social problem. Early and accurate diagnoses seem to be the key to effectively cope with it. This paper presents a diagnostic tool that couples the most widely used computerized system of cognitive tests in dementia research, the Cognitive Drug Research system, with the naive credal classifier. Although the classifier is trained on an incomplete database, it provides unmatched predictive performance and reliability. The tool also proves to be very effective in discriminating between Alzheimer's disease and dementia with Lewy bodies, which is a problem on the frontier of research on dementia.

Published in Quaglini, S., Barahona P., Andreassen, S. (Eds), AIME '01: Proceedings of the Eighth European Conference on Artificial Intelligence in Medicine, Lecture Notes in Computer Science 2101, Springer, 6776.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data.


Robust discovery of tree-dependency structures

Author: Marco Zaffalon.
Year: 2001.

Abstract: The problem of inferring dependency structures from random samples is a very fundamental topic in artificial intelligence and statistics. This paper reviews an early result from Chow and Liu on the approximation of unknown multinomial distributions by tree-dependency distributions, at the light of imprecise probabilities. Imprecision, arising here from Walley's imprecise Dirichlet model, generally makes many tree structures be plausible given the data. This paper focuses on the inference of the substructure common to all the possible trees. Such common pattern is a set of reliable dependencies. The problem of identifying the common pattern is abstracted and solved here in the general context of graph algorithms. On this basis, an algorithm is developed that infers reliable dependencies in time O(k3), from a set of k binary random variables, that converge to a tree as the sample grows. The algorithm works by computing bounds on the solutions of global optimization problems. There are a number of reasons why trees are a very important special case of dependence graphs. This work appears as a significant step in the direction of discovering dependency structures under the realistic assumption of imprecise knowledge.

Published in de Cooman, G., Fine, T., Seidenfeld, T. (Eds), ISIPTA '01: Proceedings of the Second International Symposium on Imprecise Probabilities and Their Applications, Shaker Publishing, The Netherlands, 394403.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Robust inference of trees.


Statistical inference of the naive credal classifier

Author: Marco Zaffalon.
Year: 2001.

Abstract: In the wish list of the characteristics of a classifier, there are a reliable approach to small data sets and a clear and robust treatment of incomplete samples. This paper copes with such difficult problems by adopting the paradigm of credal classification. By exploiting Walley's imprecise Dirichlet model, it defines how to infer the naive credal classifier from a possibly incomplete multinomial sample. The derived procedure is exact and linear in the number of attributes. The obtained classifier is robust to small data sets and to all the possible missingness mechanisms. The results of some experimental analyses that compare the naive credal classifier with naive Bayesian models support the presented approach.

Published in de Cooman, G., Fine, T., Seidenfeld, T. (Eds), ISIPTA '01: Proceedings of the Second International Symposium on Imprecise Probabilities and Their Applications, Shaker Publishing, The Netherlands, 384393.

A version similar to the published paper can be downloaded.


An optimization methodology for intermodal terminal management

Authors: Luca Maria Gambardella, Andrea E. Rizzoli, Monaldo Mastrolilli and Marco Zaffalon.
Year: 2001.

Abstract: A solution to the problems of resource allocation and scheduling of loading and unloading operations in a container terminals is presented. The two problems are formulated and solved hierarchically. First, the solution of the resource allocation problem returns, over a number of work shifts, a set of quay cranes used to load and unload containers from the moored ships and the set of yard cranes to store those containers on the yard. Then, a scheduling problem is formulated to compute the loading and unloading lists of containers for each allocated crane. The feasibility of the solution is verified against a detailed, discrete-event based, simulation model of the terminal. The simulation results show that the optimized resource allocation, which reduce the costs by 40%, can be effectively adopted in combination with the optimized loading and unloading list. Moreover, the simulation shows that the optimized lists reduce the number of crane conflicts on the yard and the average length of the truck queues in the terminal.

Published in Journal of Intelligent Manufacturing 12, 521534.

A version similar to the published paper can be downloaded.


Tree-augmented naive credal classifiers

Authors: Enrico Fagiuoli and Marco Zaffalon.
Year: 2000.

Abstract: Credal classification is a generalization of common classification which allows a coherent treatment of imprecision to be realized. This paper defines a credal classifier by extending the discrete tree-augmented naive Bayes classifiers to sets of probability distributions. A solution algorithm is developed whose computational complexity is linear in the number of features. A method to induce the credal classifier from data is proposed.

Published in IPMU 2000: Proceedings of the 8th Information Processing and Management of Uncertainty in Knowledge-Based Systems Conference, Universidad Politécnica de Madrid, Spain, 13201327.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is Tree-based credal networks for classification.


A credal approach to naive classification

Author: Marco Zaffalon.
Year: 1999.

Abstract: Convex sets of probability distributions are also called credal sets. They generalize probability theory with special regard to the relaxation of the precision requirement about the probability values. Classification, i.e., assigning class labels to instances described by a set of attributes, is a typical domain of application of Bayesian methods, where the naive Bayesian classifier is considered among the best tools. This paper explores the classification model obtained when the naive Bayesian classifier is extended to credal sets. Exact and effective solution procedures for classification are derived, and the related dominance criteria are discussed. A methodology to induce the classifier from data is proposed.

Published in de Cooman, G., Cozman, F. G., Moral, S., Walley, P. (Eds),  ISIPTA '99: Proceedings of the First International Symposium on Imprecise Probabilities and Their Applications, The Imprecise Probabilities Project, Universiteit Gent, Belgium, 405414.

A version similar to the published paper can be downloaded.

A revised and updated version of this conference paper is The naive credal classifier.


2U: an exact interval propagation algorithm for polytrees with binary variables

Authors: Enrico Fagiuoli and Marco Zaffalon.
Year: 1998.

Abstract: This paper addresses the problem of computing posterior probabilities in a discrete Bayesian network where the conditional distributions of the model belong to convex sets. The computation on a general Bayesian network with convex sets of conditional distributions is formalized as a global optimization problem. It is shown that such a problem can be reduced to a combinatorial problem, suitable to exact algorithmic solutions. An exact propagation algorithm for the updating of a polytree with binary variables is derived. The overall complexity is linear to the size of the network, when the maximum number of parents is fixed.

Published in Artificial Intelligence 106(1), 77107.

A version similar to the published paper can be downloaded.


A note about redundancy in influence diagrams

Authors: Enrico Fagiuoli and Marco Zaffalon.
Year: 1998.

Abstract: Influence diagrams are formal tools for modelling decision processes and for computing optimal strategies under risk. Like Bayesian networks, influence diagrams exploit the sparsity of the dependency relationships among the random variables in order to reduce computational complexity. In this note, we initially observe that an influence diagram can have some arcs that are not necessary for a complete description of the model. We show that while it may not be easy to detect such arcs, it is important, since a redundant graphical structure can exponentially increase the computational time of a solution procedure. Then we define a graphical criterion that is shown to allow the identification and removal of the redundant parts of an influence diagrams. This technical result is significant because it precisely defines what is relevant to know at the time of a decision. Furthermore, it allows a redundant influence diagram to be transformed into another influence diagram, that can be used to compute the optimal policy in an equivalent but more efficient way.

Published in International Journal of Approximate Reasoning 19(34), 231246.

A version similar to the published paper can be downloaded.


Simulation and planning of an intermodal container terminal

Authors: Luca Maria Gambardella, Andrea E. Rizzoli and Marco Zaffalon.
Year: 1998.

Abstract: A decision support system for the management of an intermodal container terminal is presented. Among the problems to be solved, there are the spatial allocation of containers on the terminal yard, the allocation of resources and the scheduling of operations in order to maximise a performance function based on some economic indicators. These problems are solved using techniques from optimisation, like job-shop scheduling, genetic algorithms or mixed-integer linear programming. At the terminal, the same problems are usually solved by the terminal manager, only using his/her experience. The manager can trust computer generated solutions only by validating them by means of a simulation model of the terminal. Thus, the simulation tool also becomes a means to introduce new approaches into traditional settings. In the present paper we focus on the resource allocation problem. We describe our modules for the optimisation of the allocation process and for the simulation of the terminal. The former is based on integer linear programming; the latter is a discrete event simulation tool, based on the process-oriented paradigm. The simulator provides a test bed for checking the validity and the robustness of the policy computed by the optimisation module. The case study of the Contship La Spezia Container Terminal, located in the Mediterranean Sea in Italy, is examined.

Published in Simulation 71(2), 107116.

A version similar to the published paper can be downloaded.


Constraint logic programming and integer programming approaches and their collaboration in solving an assignment scheduling problem

Authors: Ken Darby-Dowman, James Little, Gautam Mitra and Marco Zaffalon.
Year: 1997.

Abstract: Generalised assignment problems, traditionally solved by integer programming techniques, are addressed in the light of current constraint programming methods. A scheduling application from manufacturing, based on a modified generalised assignment problem, is used to examine the performance of each technique under a variety of problem characteristics. Experimental evidence showed that, for a set of assignment problems, constraint logic programming performed consistently better than integer programming. Analysis of the constraint logic programming and integer programming processes identified ways in which the search was effective. The insight gained from the analysis led to an integer programming approach with significantly improved performance. Finally, the issue of collaboration between the two contrasting approaches is examined with respect to ways in which the solvers can be combined in an effective manner.

Published in Constraints 1(3), 245264.

A version similar to the published paper can be downloaded.