Cassio P. de Campos' publications |
Articles in journals |
Kuznetsov independence of variables X and Y means that, for any pair of bounded functions f(X) and g(Y), E[f(X)g(Y)]=E[f(X)] *times* E[g(Y)], where E[.] denotes interval-valued expectation and *times* denotes interval multiplication. We present properties of Kuznetsov independence for several variables, and connect it with other concepts of independence in the literature; in particular we show that strong extensions are always included in sets of probability distributions whose lower and upper expectations satisfy Kuznetsov independence. We introduce an algorithm that computes lower expectations subject to judgments of Kuznetsov independence by mixing column generation techniques with nonlinear programming. Finally, we define a concept of conditional Kuznetsov independence, and study its graphoid properties. |
Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove an analogous result for inference in Naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and Naive Bayes networks are used in real applications of imprecise probability. |
A parametric regression model for right-censored data with a log-linear median regression function and a transformation in both response and regression parts, named parametric Transform-Both-Sides (TBS) model, is presented. The TBS model has a parameter that handles data asymmetry while allowing various different distributions for the error, as long as they are unimodal symmetric distributions centered at zero. The discussion is focused on the estimation procedure with five important error distributions (normal, double-exponential, Student's t, Cauchy and logistic) and presents properties, associated functions (that is, survival and hazard functions) and estimation methods based on maximum likelihood and on the Bayesian paradigm. These procedures are implemented in TBSSurvival, an open-source fully documented R package. The use of the package is illustrated and the performance of the model is analyzed using both simulated and real data sets. |
Influence diagrams are intuitive and concise representations of structured decision problems. When the problem is non-Markovian, an optimal strategy can be exponentially large in the size of the diagram. We can avoid the inherent intractability by constraining the size of admissible strategies, giving rise to limited memory influence diagrams. A valuable question is then how small do strategies need to be to enable efficient optimal planning. Arguably, the smallest strategies one can conceive simply prescribe an action for each time step, without considering past decisions or observations. Previous work has shown that finding such optimal strategies even for polytree-shaped diagrams with ternary variables and a single value node is NP-hard, but the case of binary variables was left open. In this paper we address such a case, by first noting that optimal strategies can be obtained in polynomial time for polytree-shaped diagrams with binary variables and a single value node. We then show that the same problem is NP-hard if the diagram has multiple value nodes. These two results close the fixed-parameter complexity analysis of optimal strategy selection in influence diagrams parametrized by the shape of the diagram, the number of value nodes and the maximum variable cardinality. |
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases. |
Increased number of circulating monocytes at presentation has been recently associated with shorter survival in Hodgkin lymphoma, follicular lymphoma and diffuse large B cell lymphoma. This study aimed to assess the prognostic impact of the absolute monocyte count (AMC) at diagnosis in mantle cell lymphoma (MCL). From the series of MCL cases recorded on the databases of the Oncology Institute of Southern Switzerland in Bellinzona (Switzerland) and the Division of Haematology of the Amedeo Avogadro University of Eastern Piedmont in Novara (Italy), the AMC at diagnosis was available in 97 cases. Cox regression was used for both univariate and multivariate analysis. With a median follow up of 7 years, the 5-year overall survival (OS) was 29% for patients with AMC >500/ul and 62% for patients with AMC <= 500/ul (p=0.006). Elevated AMC and beta-2 microglobulin at diagnosis remained independent outcome predictors at multivariate analysis and might be used to build a simple prognostic scoring system. Survival was significantly shorter in patients with both AMC and beta-2 microglobulin above the upper limit of normal but the MCL international prognostic index (MIPI) remained the strongest survival predictor in this series. In this relatively small and heterogeneous series an increased AMC identified poor-risk patients. Our results suggest that AMC in conjunction with the beta-2 microglobulin level might provide an inexpensive way to stratify the MCL patient risk as a complement to the MIPI, which was confirmed to be a very powerful prognostic tool. |
We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables and 10^64 solutions. We show that these problems are NP-hard even if the underlying graph structure of the problem has low treewidth and the variables take on a bounded number of states, and that they admit no provably good approximation if variables can take on an arbitrary number of states. |
Credal networks relax the precise probability requirement of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper, we present a new variable elimination algorithm for exactly computing posterior inferences in extensively specified credal networks, which is empirically shown to outperform a state-of-the-art algorithm. The algorithm is then turned into a provably good approximation scheme, that is, a procedure that for any input is guaranteed to return a solution not worse than the optimum by a given factor. Remarkably, we show that when the networks have bounded treewidth and bounded number of states per variable the approximation algorithm runs in time polynomial in the input size and in the inverse of the error factor, thus being the first known fully polynomial-time approximation scheme for inference in credal networks. |
Marginal zone B-cell lymphomas (MZLs) have been divided into 3 distinct subtypes (extranodal MZLs of mucosa-associated lymphoid tissue [MALT] type, nodal MZLs, and splenic MZLs). Nevertheless, the relationship between the subtypes is still unclear. We performed a comprehensive analysis of genomic DNA copy number changes in a very large series of MZL cases with the aim of addressing this question. Samples from 218 MZL patients (25 nodal, 57 MALT, 134 splenic, and 2 not better specified MZLs) were analyzed with the Affymetrix Human Mapping 250K SNP arrays, and the data combined with matched gene expression in 33 of 218 cases. MALT lymphoma presented significantly more frequently gains at 3p, 6p, 18p, and del(6q23) (TNFAIP3/A20), whereas splenic MZLs was associated with del(7q31), del(8p). Nodal MZLs did not show statistically significant differences compared with MALT lymphoma while lacking the splenic MZLs-related 7q losses. Gains of 3q and 18q were common to all 3 subtypes. del(8p) was often present together with del(17p) (TP53). Although del(17p) did not determine a worse outcome and del(8p) was only of borderline significance, the presence of both deletions had a highly significant negative impact on the outcome of splenic MZLs. |
Credal networks provide a scheme for dealing with imprecise probabilistic models. The inference algorithms often used in credal networks compute the interval of the posterior probability of an event of interest given evidence of the specific kind -- evidence that describe the current state of a set of variables. These algorithms do not perform evidential reasoning in case of the evidence must be processed according to the conditioning rule proposed by RC Jeffrey. This paper describes a procedure to integrate evidence with Jeffrey's rule when performing inferences with credal nets. |
This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-and-bound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. |
Credal networks generalize Bayesian networks by relaxing the requirement of precision of probabilities. Credal networks are considerably more expressive than Bayesian networks, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal networks. The algorithm is based on an important representation result we prove for general credal networks: that any credal network can be equivalently reformulated as a credal network with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal network is then updated by L2U, a loopy approximate algorithm for binary credal networks. Overall, we generalize L2U to non-binary credal networks, obtaining a scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences with respect to other state-of-the-art algorithms is evaluated by extensive numerical tests. |
We present TANC, a TAN classifier (tree-augmented naive) based on imprecise probabilities. TANC models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM). A first contribution of this paper is the experimental comparison between EDM and the global Imprecise Dirichlet Model using the naive credal classifier (NCC), with the aim of showing that EDM is a sensible approximation of the global IDM. TANC is able to deal with missing data in a conservative manner by considering all possible completions (without assuming them to be missing-at-random), but avoiding an exponential increase of the computational time. By experiments on real data sets, we show that TANC is more reliable than the Bayesian TAN and that it provides better performance compared to previous TANs based on imprecise probabilities. Yet, TANC is sometimes outperformed by NCC because the learned TAN structures are too complex; this calls for novel algorithms for learning the TAN structures, better suited for an imprecise probability classifier. |
Despite recent therapeutic improvements, the clinical course of diffuse large B-cell lymphoma (DLBCL) still differs considerably among patients. We conducted this retrospective multi-centre study to evaluate the impact of genomic aberrations detected using a high-density genome wide-single nucleotide polymorphism-based array on clinical outcome in a population of DLBCL patients treated with R-CHOP-21 (rituximab, cyclophosphamide, doxorubicine, vincristine and prednisone repeated every 21_d). 166 DNA samples were analysed using the GeneChip Human Mapping 250K NspI. Genomic anomalies were analysed regarding their impact on the clinical course of 124 patients treated with R-CHOP-21. Unsupervised clustering was performed to identify genetically related subgroups of patients with different clinical outcomes. Twenty recurrent genetic lesions showed an impact on the clinical course. Loss of genomic material at 8p23.1 showed the strongest statistical significance and was associated with additional aberrations, such as 17p- and 15q-. Unsupervised clustering identified five DLBCL clusters with distinct genetic profiles, clinical characteristics and outcomes. Genetic features and clusters, associated with a different outcome in patients treated with R-CHOP, have been identified by arrayCGH. |
We examine the representation of judgements of stochastic independence in probabilistic logics. We focus on a relational logic where (i) judgements of stochastic independence are encoded by directed acyclic graphs, and (ii) probabilistic assessments are flexible in the sense that they are not required to specify a single probability measure. We discuss issues of knowledge representation and inference that arise from our particular combination of graphs, stochastic independence, logical formulas and probabilistic assessments. |
This paper explores the application of semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information to computer vision problems. Our version of SQPN allows qualitative influences and imprecise probability measures using intervals. We describe an Imprecise Dirichlet model for parameter learning and an iterative algorithm for evaluating posterior probabilities, maximum a posteriori and most probable explanations. Experiments on facial expression recognition and image segmentation problems are performed using real data. |
This paper investigates probabilistic logics endowed with independence relations. We review propositional probabilistic languages without and with independence. We then consider graph-theoretic representations for propositional probabilistic logic with independence; complexity is analyzed, algorithms are derived, and examples are discussed. Finally, we examine a restricted first-order probabilistic logic that generalizes relational Bayesian networks. |
This paper investigates the computation of lower/upper expectations that must cohere with a collection of probabilistic assessments and a collection of judgements of epistemic independence. New algorithms, based on multilinear programming, are presented, both for independence among events and among random variables. Separation properties of graphical models are also investigated. |
Book chapters |
This paper explores the application of semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information to computer vision problems. Our version of SQPN allows qualitative influences and imprecise probability measures using intervals. We describe an Imprecise Dirichlet model for parameter learning and an iterative algorithm for evaluating posterior probabilities, maximum a posteriori and most probable explanations. Experiments on facial expression recognition and image segmentation problems are performed using real data. |
Full peer-reviewed articles in conference proceedings (papers rated in a top percentage) |
An algorithm for approximate credal network updating is presented. The problem in its general formulation is a multilinear optimization task, which can be linearized by an appropriate rule for fixing all the local models apart from those of a single variable. This simple idea can be iterated and quickly leads to very accurate inferences. The approach can also be specialized to classification with credal networks based on the maximality criterion. A complexity analysis for both the problem and the algorithm is reported together with numerical experiments, which confirm the good performance of the method. While the inner approximation produced by the algorithm gives rise to a classifier which might return a subset of the optimal class set, preliminary empirical results suggest that the accuracy of the optimal class set is seldom affected by the approximate probabilities. |
Credal networks are graph-based statistical models whose parameters take values on a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The result of inferences with such models depends on the irrelevance/independence concept adopted. In this paper, we study the computational complexity of inferences under the concepts of epistemic irrelevance and strong independence. We strengthen complexity results by showing that inferences with strong independence are NP-hard even in credal trees with ternary variables, which indicates that tractable algorithms, including the existing one for epistemic trees, cannot be used for strong independence. We prove that the polynomial time of inferences in credal trees under epistemic irrelevance is not likely to extend to more general models, because the problem becomes NP-hard even in simple polytrees. These results draw a definite line between networks with efficient inferences and those where inferences are hard, and close several open questions regarding the computational complexity of such models. |
Semi-qualitative probabilistic networks (SQPNs) merge two important graphical model formalisms: Bayesian networks and qualitative probabilistic networks. They provide a very general modeling framework by allowing the combination of numeric and qualitative assessments over a discrete domain, and can be compactly encoded by exploiting the same factorization of joint probability distributions that are behind the Bayesian networks. This paper explores the computational complexity of semi-qualitative probabilistic networks, and takes the polytree-shaped networks as its main target. We show that the inference problem is coNP-Complete for binary polytrees with multiple observed nodes. We also show that inferences can be performed in linear time if there is a single observed node, which is a relevant practical case. Because our proof is constructive, we obtain an efficient linear time algorithm for SQPNs under such assumptions. To the best of our knowledge, this is the first exact polynomial-time algorithm for SQPNs. Together these results provide a clear picture of the inferential complexity in polytree-shaped SQPNs. |
This paper presents a new anytime algorithm for the marginal MAP problem in graphical models of bounded treewidth. We show asymptotic convergence and theoretical error bounds for any fixed step. Experiments show that it compares well to a state-of-the-art systematic search algorithm. |
Influence diagrams allow for intuitive and yet precise description of complex situations involving decision making under uncertainty. Unfortunately, most of the problems described by influence diagrams are hard to solve. In this paper we discuss the complexity of approximately solving influence diagrams. We do not assume no-forgetting or regularity, which makes the class of problems we address very broad. Remarkably, we show that when both the treewidth and the cardinality of the variables are bounded the problem admits a fully polynomial-time approximation scheme. |
We present a new algorithm for exactly solving decision-making problems represented as an influence diagram. We do not require the usual assumptions of no forgetting and regularity, which allows us to solve problems with limited information. The algorithm, which implements a sophisticated variable elimination procedure, is empirically shown to outperform a state-of-the-art algorithm in randomly generated problems of up to 150 variables and $10^{64}$ strategies. |
Credal networks lift the precise probability assumption of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper we present a new algorithm which is an extension of the well-known variable elimination algorithm for computing posterior inferences in extensively specified credal networks. The algorithm efficiency is empirically shown to outperform a state-of-the-art algorithm. We then provide the first fully polynomial time approximation scheme for inference in credal networks with bounded treewidth and number of states per variable. |
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. It is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure), which extends previous complexity results. Furthermore, a Fully Polynomial Time Approximation Scheme for MAP in networks with bounded treewidth and bounded number of states per variable is developed. Approximation schemes were thought to be impossible, but here it is shown otherwise under the assumptions just mentioned, which are adopted in most applications. |
This paper considers inference from multinomial data and addresses the problem of choosing the strength of the Dirichlet prior under a mean-squared error criterion. We compare the Maxi-mum Likelihood Estimator (MLE) and the most commonly used Bayesian estimators obtained by assuming a prior Dirichlet distribution with non-informative prior parameters, that is, the parameters of the Dirichlet are equal and altogether sum up to the so called strength of the prior. Under this criterion, MLE becomes more preferable than the Bayesian estimators at the increase of the number of categories k of the multinomial, because non-informative Bayesian estimators induce a region where they are dominant that quickly shrinks with the increase of k. This can be avoided if the strength of the prior is not kept constant but decreased with the number of categories. We argue that the strength should decrease at least k times faster than usual estimators do. |
This paper addresses exact learning of Bayesian network structure from data based on the Bayesian Dirichlet score function and its derivations. We describe useful properties that strongly reduce the computational costs of many known methods without losing global optimality guarantees. We show empirically the advantages of the properties in terms of time and memory consumptions, demonstrating that state-of-the-art methods, with the use of such properties, might handle larger data sets than those currently possible. |
In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003} |
This paper addresses exact learning of Bayesian network structure from data and expert's knowledge based on score functions that are decomposable. First, it describes useful properties that strongly reduce the time and memory costs of many known methods such as hill-climbing, dynamic programming and sampling variable orderings. Secondly, a branch and bound algorithm is presented that integrates parameter and structural constraints with data in a way to guarantee global optimality with respect to the score function. It is an any-time procedure because, if stopped, it provides the best current solution and an estimation about how far it is from the global solution. We show empirically the advantages of the properties and the constraints, and the applicability of the algorithm to large data sets (up to one hundred variables) that cannot be handled by other current methods (limited to around 30 variables). |
Credal nets generalize Bayesian nets by relaxing the requirement of precision of probabilities. Credal nets are considerably more expressive than Bayesian nets, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal nets. The algorithm is based on an important representation result we prove for general credal nets: that any credal net can be equivalently reformulated as a credal net with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal net is updated by L2U, a loopy approximate algorithm for binary credal nets. Thus, we generalize L2U to non-binary credal nets, obtaining an accurate and scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences is evaluated by empirical tests. |
This paper describes a new algorithm to solve the decision making problem in Influence Diagrams based on algorithms for credal networks. Decision nodes are associated to imprecise probability distributions and a reformulation is introduced that finds the global maximum strategy with respect to the expected utility. We work with Limited Memory Influence Diagrams, which generalize most Influence Diagram proposals and handle simultaneous decisions. Besides the global optimum method, we explore an anytime approximate solution with a guaranteed maximum error and show that imprecise probabilities are handled in a straightforward way. Complexity issues and experiments with random diagrams and an effects-based military planning problem are discussed. |
Probabilistic graphical models such as Bayesian Networks have been increasingly applied to many computer vision problems. Accuracy of inferences in such models depends on the quality of network parameters. Learning reliable parameters of Bayesian networks often requires a large amount of training data, which may be hard to acquire and may contain missing values. On the other hand, qualitative knowledge is available in many computer vision applications, and incorporating such knowledge can improve the accuracy of parameter learning. This paper describes a general framework based on convex optimization to incorporate constraints on parameters with training data to perform Bayesian network parameter estimation. For complete data, a global optimum solution to maximum likelihood estimation is obtained in polynomial time, while for incomplete data, a modified expectation-maximization method is proposed. This framework is applied to real image data from a facial action unit recognition problem and produces results that are similar to those of state-of-the-art methods. |
This papers investigates the manipulation of statements of strong independence in probabilistic logic. Inference methods based on polynomial programming are presented for strong independence, both for unconditional and conditional cases. We also consider graph-theoretic representations, where each node in a graph is associated with a Boolean variable and edges carry a Markov condition. The resulting model generalizes Bayesian networks, allowing probabilistic assessments and logical constraints to be mixed. |
A credal network is a tool to represent uncertainty, where probability values are imprecise or indeterminate. Credal networks generalize Bayesian networks, associating a DAG to a collection of sets of probability measures. In this work we present complexity results for many inference problems in such networks. We describe also new (exact and approximate) algorithms for inference in credal networks based on multilinear programming. Experiments indicate that these algorithms have better performance than state-of-the-art ones. We deal with other related models too, such as semi-qualitative networks, showing new complexity results and algorithms. |
This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates. |
This papers investigates the computation of lower/upper expectations that must cohere with a collection of probabilistic assessments and a collection of judgements of epistemic independence. New algorithms, based on multilinear programming, are presented, both for independence among events and among gambles. Separation properties of graphical models are also investigated. |
This paper presents new results on the complexity of graph-theoretical models that represent probabilities (Bayesian networks) and that represent interval and set valued probabilities (credal networks). We define a new class of networks with bounded width, and introduce a new decision problem for Bayesian networks, the maximin a posteriori. We present new links between the Bayesian and credal networks, and present new results both for Bayesian networks (most probable explanation with observations, maximin a posteriori) and for credal networks (bounds on probabilities a posteriori, most probable explanation with and without observations, maximum a posteriori). |
This paper investigates a representation language with flexibility inspired by probabilistic logic and compactness inspired by relational Bayesian networks. The goal is to handle propositional and first-order constructs together with precise, imprecise, indeterminate and qualitative probabilistic assessments. The paper shows how this can be achieved through the theory of credal networks. New exact and approximate inference algorithms based on multilinear programming and iterated/loopy propagation of interval probabilities are presented; their superior performance, compared to existing ones, is shown empirically. |
Inferences in directed acyclic graphs associated with probability sets and probability intervals are NP-hard, even for polytrees. In this paper we focus on such inferences, and propose: 1) a substantial improvement on Tessems A/R algorithm for polytrees with probability intervals; 2) a new algorithm for direction-based local search (in sets of probabilities) that improves on existing methods; 3) a collection of branch-and-bound algorithms that combine the previous techniques.The first two techniques lead to approximate solutions, while branch-and-bound procedures can produce either exact or approximate solutions. We report dramatic improvements on existing techniques for inference with probability sets and intervals, in some cases reducing the computational effort by many orders of magnitude. |
Full peer-reviewed articles in conference proceedings (intermediate acceptance rate) |
Hidden Markov models (HMMs) are widely used models for sequential data. As with other probabilistic graphical models, they require the specification of precise probability values, which can be too restrictive for some domains, especially when data are scarce or costly to acquire. We present a generalized version of HMMs, whose quantification can be done by sets of, instead of single, probability distributions. Our models have the ability to suspend judgment when there is not enough statistical evidence, and can serve as a sensitivity analysis tool for standard non-stationary HMMs. Efficient inference algorithms are developed to address standard HMM usage such as the computation of likelihoods and most probable explanations. Experiments with real data show that the use of imprecise probabilities leads to more reliable inferences without compromising efficiency. |
This paper presents an investigation on the computational complexity of stochastic optimization problems. We discuss a scenario-based model which captures the important classes of two-stage stochastic combinatorial optimization, two-stage stochastic linear programming, and two-stage stochastic integer linear programming. This model can also be used to handle chance constraints, which are used in many stochastic optimization problems. We derive general upper bounds for the complexity of computational problems related to this model, which hold under very mild conditions. Additionally, we show that these upper bounds are matched for some stochastic combinatorial optimization problems arising in the field of transportation and logistics. |
This paper describes an Imprecise Dirichlet Model and the maximum entropy criterion to learn Bayesian network parameters under insufficient and incomplete data. The method is applied to two distinct recognition problems, namely, a facial action unit recognition and an activity recognition in video surveillance sequences. The model treats a wide range of constraints that can be specified by experts, and deals with incomplete data using an ad-hoc expectation-maximization procedure. It is also described how the same idea can be used to learn dynamic Bayesian networks. With synthetic data, we show that our proposal and widely used methods, such as the Bayesian maximum a posteriori, achieve similar accuracy. However, when real data come in place, our method performs better than the others, because it does not rely on a single prior distribution, which might be far from the best one. |
This paper addresses the problem of learning structure of Bayesian and Dynamic Bayesian networks from incomplete data based on the Bayesian Information Criterion. We describe a procedure to map the problem of the dynamic case into a corresponding augmented Bayesian network through the use of structural constraints. Because the algorithm is exact and anytime, it is well suitable for a structural Expectation-Maximization (EM) method where the only source of approximation is due to the EM itself. We show empirically that the use a global maximizer inside the structural EM is computationally feasible and leads to more accurate models. |
We consider the problem of inference from multinomial data with chances theta, subject to the a-priori information that the true parameter vector theta belongs to a known convex polytope theta. The proposed estimator has the parametrized structure of the conditional-mean estimator with a prior Dirichlet distribution, whose parameters (s,t) are suitably designed via a dominance criterion so as to guarantee, for any theta in bigtheta, an improvement of the Mean Squared Error over the Maximum Likelihood Estimator (MLE). The solution of this MLE-dominance problem allows us to give a different interpretation of: (1) the several Bayesian estimators proposed in the literature for the problem of inference from multinomial data; (2) the Imprecise Dirichlet Model (IDM) developed by Walley. |
This paper describes a new approach to unify constraints on parameters with training data to perform parameter estimation in Bayesian networks of known structure. The method is general in the sense that any convex constraint is allowed, which includes many proposals in the literature. Driven by a maximum entropy criterion and the imprecise Dirichlet model, we present a constrained convex optimization formulation to combine priors, constraints and data. Experiments indicate benefits of this framework. |
Full peer-reviewed articles in conference proceedings (others/unknown acceptance rate) |
This paper addresses the problem of estimating the parameters of a Bayesian network from incomplete data. This is a hard problem, which for computational reasons cannot be effectively tackled by a full Bayesian approach. The workaround is to search for the estimate with maximum posterior probability. This is usually done by selecting the highest posterior probability estimate among those found by multiple runs of Expectation-Maximization with distinct starting points. However, many local maxima characterize the posterior probability function, and several of them have similar high probability. We argue that high probability is necessary but not sufficient in order to obtain good estimates. We present an approach based on maximum entropy to address this problem and describe a simple and effective way to implement it. Experiments show that our approach produces significantly better estimates than the most commonly used method. |
Credal nets are probabilistic graphical models which extend Bayesian nets to cope with sets of distributions. This feature makes the model particularly suited for the implementation of classifiers and knowledge-based systems. When working with sets of (instead of single) probability distributions, the identification of the optimal option can be based on different criteria, some of them eventually leading to multiple choices. Yet, most of the inference algorithms for credal nets are designed to compute only the bounds of the posterior probabilities. This prevents some of the existing criteria from being used. To overcome this limitation, we present two simple transformations for credal nets which make it possible to compute decisions based on the maximality and E-admissibility criteria without any modification in the inference algorithms. We also prove that these decision problems have the same complexity of standard inference, being NP^PP-hard for general credal nets and NP-hard for polytrees. |
The formalism of credal networks can be used to represent imprecision in multivariate probabilistic models. Currently, the algorithms for inference with credal nets allow to compute posterior probability intervals given specific evidence. However, they do not deal with soft evidence. This paper presents an approach to integrate soft evidence in credal networks. The proposal is to convert the soft evidence into constraints that are appended to a multilinear program to perform inferences. |
Markov Decision Processes (MDPs) are extensively used to encode sequences of decisions with probabilistic effects. Markov Decision Processes with Imprecise Probabilities (MDPIPs) encode sequences of decisions whose effects are modeled using sets of probability distributions. In this paper we examine the computation of Gamma-maximin policies for MDPIPs using multilinear and integer programming. We discuss the application of our algorithms to ``factored'' models and to a recent proposal, Markov Decision Processes with Set-valued Transitions (MDPSTs), that unifies the fields of probabilistic and ``nondeterministic'' planning in artificial intelligence research. |
A credal network associates a directed acyclic graph with a collection of sets of probability measures; it offers a compact representation for sets of multivariate distributions. In this paper we present a new algorithm for inference in credal networks based on an integer programming reformulation. We are concerned with computation of lower/upper probabilities for a variable in a given credal network. Experiments reported in this paper indicate that this new algorithm has better performance than existing ones for some important classes of networks. |
Partially ordered preferences generally lead to choices that do not abide by standard expected utility guidelines; often such preferences are revealed by imprecision in probability values. We investigate five criteria for strategy selection in decision trees with imprecision in probabilities: extensive Gamma-maximin and Gamma-maximax, interval dominance, maximality and E-admissibility. We present algorithms that generate strategies for all these criteria; our main contribution is an algorithm for E-admissibility that runs over admissible strategies rather than over sets of probability distributions. |
The goal of this contribution is to discuss local computation in credal networks -- graphical models that can represent imprecise and indeterminate probability values. We analyze the inference problem in credal networks, discuss how inference algorithms can benefit from local computation, and suggest that local computation can be particularly important in approximate inference algorithms. |
A credal network is a graphical tool for representation and manipulation of uncertainty, where probability values may be imprecise or indeterminate. A credal network associates a directed acyclic graph with a collection of sets of probability measures; in this context, inference is the computation of tight lower and upper bounds for conditional probabilities. In this paper we present new algorithms for inference in credal networks based on multilinear programming techniques. Experiments indicate that these new algorithms have better performance than existing ones, in the sense that they can produce more accurate results in larger networks. |
Full peer-reviewed articles in local conferences |
This article presents the jOptimum project, a linear/integer optimization package featured to teaching mathematical programming disciplines. We argue why this package is useful for learning and explain details about the project and its development stage. Practical experiences on linear programming courses are also presented, highlighting the use of the software and pointing out new features that will be developed. |
This article presents our experience on using practical lab classes for teaching Operating Systems introductory disciplines. We describe the working environment, the activities that are proposed to students, and why the introduction of such activities has been important for learning and increasing student interest in the area. |
This article presents the jOptimum project, a linear and multilinear modeling package featured to teaching mathematical programming disciplines. We describe the problems that can be solved using jOptimum and how it is integrated with solvers. We also argue why the package is useful for learning and explain details about the project and its development stage. |
A credal network is a graph-theoretic model that represents imprecision in joint probability distributions. An inference in a credal net aims at computing an interval for the probability of an event of interest. Algorithms for inference in credal networks can be divided into exact and approximate. The selection of an algorithm is based on a trade off that ponders how much time someone wants to spend in a particular calculation against the quality of the computed values. This paper presents an algorithm, called IDS, that combines exact and approximate methods for computing inferences in polytree-shaped credal networks. The algorithm provides an approach to trade time and precision when making inferences in credal nets |
In this article we describe BOCA, a software for supporting programming contests developed to be used in the Maratona de Programa{\c{c}\~a}o of the Brazilian Computing Society. The system can also be used for supporting disciplines that make evaluation of student assignments during the classes. |
Technical reports |
This paper strengthens the NP-hardness result for the (partial) maximum a posteriori (MAP) problem in Bayesian networks with topology of trees (every variable has at most one parent) and variable cardinality at most three. MAP is the problem of querying the most probable state configuration of some (not necessarily all) of the network variables given evidence. It is demonstrated that the problem remains hard even in such simplistic networks. |
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters. |
We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables and $10^{64}$ solutions. We show that the problem is NP-hard even if the underlying graph structure of the problem has small treewidth and the variables take on a bounded number of states, but that a fully polynomial time approximation scheme exists for these cases. Moreover, we show that the bound on the number of states is a necessary condition for any efficient approximation scheme. |
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness. |
Short papers and Abstracts |
Prognostic classification schemes are often developed and used in medicine to stratify patients for survival prediction based on clinical and/or biologic variables. One of the most important aims is to identify patients with very poor outcome, that might take advantage from experimental therapies, or with very good prognosis who might be treated with less intensive regimens. Survival trees and their generalizations are powerful in survival prediction, but they do not account for a core property of prognostic indices: different combinations of causes (values of clinical covariates) can lead to similar phenotype/survival outcome. Procedures have been suggested for merging the groups corresponding to leaves (combinations of causes) in survival trees, but they show limitations. We propose to apply a clustering algorithm on the survival curves of leaves' groups, using a proper dissimilarity measure. We study several choices for both the clustering algorithm and the dissimilarity measure and we compare them with standard procedures in the literature, using both simulated and public real data. In order to enhance the evaluation of the performance of the prognostic models, we also derive a new index of separation to be used together with an error measure of the prediction. This new separation index takes into account three important characteristics of risk groups (besides the prediction itself): the retention of their order, the reliability/robustness in terms of size, and the goodness of separation among all corresponding survival curves. We discuss the new separation index and how widely used separation indices miss to consider all these characteristics. |
Introduzione: I database clinici rappresentano uno strumento per studiare i tumori. L'usuale metodo di analisi consiste in una prima selezione dei possibili fattori prognostici attraverso un'analisi univariata (pe, logrank test); il modello prognostico viene poi costruito attraverso una regressione multivariata di tipo backward (ossia, ad ogni passo viene eliminata la variabile meno significativa). Pazienti, con osservazioni mancanti nelle variabili di volta in volta considerate, vengono solitamente eliminati dallo studio, riducendo il potere e rendendo meno probabile la selezione di fattori prognostici rilevanti, ma con poche osservazioni. Anche in presenza di osservazioni complete, l'analisi univariata pu{\`o} eliminare variabili rilevanti ma solo in combinazione con altri fattori. Scopo del progetto {\`e} studiare modalit{\`a} di gestione dei dati mancanti e la costruzione di modelli prognostici affidabili nei pazienti con linfomi. Metodi: Dati utilizzati: 180 pazienti con linfoma marginale (MZL) (Zucca, Blood, 2004); 251 con MZL (Rinaldi, Blood 2011); 166 con linfoma diffusi a grandi cellule (DLBCL) (Scandurra, Mian, BJH 2010); 127 con DLBCL (IOSI). Risultati: Bayesian networks sono risultati il metodo migliore per inferire le possibili relazioni tra le variabili, per imputare il valore delle variabili con osservazioni mancanti sulla base delle variabili presenti da cui dipendono. Una regressione ad albero (che considera tutte le possibili interazioni tra le variabili nella loro selezione) combinata con un metodo di clustering applicato alle curve di sopravvivenza (corrispondenti ai possibili valori dell'indice) si {\`e} dimostrato superiore a quello usuale per una migliore e oggettiva identificazione dei gruppi. Conclusioni: La combinazione di Bayesian networks per la stima dei missing data assieme con regressione ad albero risulta molto promettente per l'identificazione di modelli prognostici in pazienti con linfomi. Le modalit{\`a} ottimizzate per la gestione dei dati mancanti potrebbero avere un impatto anche sulla valutazione della qualit{\`a} delle prestazioni ospedaliere fornite ai pazienti. |
Diffuse large B-cell lymphoma (DLBCL) is one of the most common types of non-Hodgkin lymphoma in adults. Over the past few years, the addition of rituximab immunotherapy to CHOP (R-CHOP) has significantly improved the clinical outcome of patients with DLBCL. As clinical course differs considerably among patients, DLBCL seems to be a heterogeneous disease, possibly related to genomic subtypes. 166 frozen samples of patients affected by DLBCL, 124 of them treated with R-CHOP, were collected from 11 international centers. Clinical parameters including survival data were available for most patients. Extracted DNA was analyzed with Affymetrix Human Mapping 250K arrays. Copy number profiles were estimated from raw data using modified Bayesian piecewise constant regression (mBPCR). Unsupervised clustering was applied using non-negative matrix factorization (NMF) to the copy number data. We have implemented an optimized version of the NMF algorithm, which runs as an external C++ library of the statistical package R and it is able to produce results more than a hundred times faster than pure R code. Furthermore, the code is easily parallelizable, which delivers surprisingly fast results. NMF cluster results were analyzed for association with clinical parameters and survival data. After a first level of filtering to discard very similar probes, unsupervised NMF discovered five clusters of patients of similar genomic characteristics. Two clusters were discarded from further analysis because of their small sample sizes. The three remaining groups showed significant differences in clinical parameters and suggest different biological DLBCL subtypes. Finally, Kaplan-Meier survival analysis indicates differences in clinical course according to these clusters. Unsupervised NMF clustering of genomic copy number data can discover both biological and clinical relevant subtypes. We illustrated this for a cohort of DLBCL patients treated with R-CHOP. |
We examine the representation of judgements of stochastic independence in probabilistic logics. We focus on a relational logic where (i) judgements of stochastic independence are encoded by directed acyclic graphs, and (ii) probabilistic assessments are flexible in the sense that they are not required to specify a single probability measure. We discuss issues of knowledge representation and inference that arise from our particular combination of graphs, stochastic independence, logical formulas and probabilistic assessments. |
This paper presents the WikIA project hosted at TIDIA's Virtual Incubator of Digital Content. WikIA is a web portal where the Brazilian community may freely interact to exchange information about Artificial Intelligence groups in Brazil. We explain details concerning the development of this project and its current stage. |
This article presents the jOptimum project hosted at TIDIA's Virtual Incubator of Digital Content. jOptimum is a linear and multilinear modeling package featured to teaching mathematical programming disciplines. We describe details of the project, its relation to other systems and teaching features. |
The computational manipulation of probability measures often requires the treatment of interval values, not only due to numerical errors, but also due to more fundamental difficulties: we may want to model imprecise beliefs; we may have incomplete knowledge about probability values; we may be interested in merging beliefs from groups of experts; and we may wish to verify the effect of perturbations in probabilistic models. Such difficulties have often led to the study of interval probability and related theories. The goal of this paper is to present a brief overview of methods and results that can be relevant to the validated manipulation of probabilistic models. |
Invited Talks in Conferences/Symposia |
Miscellaneous |
This document was translated from BibT_{E}X by bibtex2html