Workload-Driven Learning of Mallows Mixtures with Pairwise Preference Data

Julia Stoyanovich
Drexel University
stoyanovich@drexel.edu
Lovro Ilijasic
Drexel University
lovro@drexel.edu
Haoyue Ping
Drexel University
hp354@drexel.edu

Abstract

In this paper we present a framework for learning mixtures of Mallows models from large samples of incomplete preferences. The problem we address is of significant practical importance in social choice, recommender systems, and other domains where it is required to aggregate, or otherwise analyze, preferences of a heterogeneous user base.

We improve on state-of-the-art methods for learning mixtures of Mallows models with pairwise preference data. Exact sampling from the Mallows posterior in presence of arbitrary pairwise evidence is known to be intractable even for a single Mallows. This motivated to the development of an approximate sampler called AMP. In this paper we propose AMPx, an ensemble method for approximate sampling from the Mallows posterior that combines AMP with frequency based estimation of posterior probabilities. We experimentally demonstrate that AMPx achieves faster convergence and higher accuracy than AMP alone. We also adapt stateof- the-art clustering techniques that have not been used in this setting, for learning parameters of the Mallows mixture, and show experimentally that mixture parameters can be learned accurately and efficiently.

Full paper (PDF)

Experiments, Data and Code

Datasets and code used to obtain the results in this paper are available for download in this archive:

Download

The archive contains:

Please note that the Java library is a snapshot used in the time of writing this paper, while the full library, containing more features, will soon be available on GitHub.

The structure of the archive is as follows:


      .
      |-- Java (Java library for managing preference data, as NetBeans project)
      |   |-- javadoc (Library documentation)
      |   |-- lib (Required .jar files)
      |   |-- nbproject (NetBeans project files)
      |   `-- src (Library source code)
      |
      `-- Experiments (Data and code for running the experiments)
          |-- data (Real datasets)
          `-- lib (Java and Python libraries)
    

Experiments folder contains run_synthetic_data_experiment.py and run_sushi_experiment.py to run experiments on synthetic and Sushi datasets respectively. Unless otherwise specified, the results (.csv files and plots) will be in ./data/synthetic/output and ./data/real/output/ folders.

Please follow the instructions in the Experiments/ReadMe file for more information on the experiments.