KernelHerdingTensorized

class otkerneldesign.KernelHerdingTensorized(kernel=None, distribution=None, candidate_set_size=None, candidate_set=None, initial_design=None, is_greedy=False)

Incrementally select new design points with tensorized kernel herding. The main difference with the KernelHerding class is contained in the compute_target_potential() method. It requires the kernel to be a product of one-dimensional kernels and the input random variables to be independent. Exploiting these properties, it can compute the target potential as a product of univariate potentials, which is much faster.

Parameters:
kernelopenturns.CovarianceModel

Covariance kernel used to define potentials. Must be a product of one-dimensional kernels. By default a product of Matern kernels with smoothness 5/2.

distributionopenturns.Distribution

Distribution the design points must represent. Must have an independent copula. If not specified, then candidate_set must be specified instead. Even if candidate_set is specified, can be useful if it allows the use of tensorized formulas.

candidate_set_sizepositive int

Size of the set of all candidate points. Unnecessary if candidate_set is specified. Otherwise, 2^{12} by default.

candidate_set2-d list of float

Large sample that empirically represents a distribution. If not specified, then distribution and candidate_set_size must be in order to generate it automatically.

initial_design2-d list of float

Sample of points that must be included in the design. Empty by default.

is_greedyBoolean

Set to False by default, then the criterion is the difference between the current and target potential. When set to True, the MMD minimization is strictly greedy. In practice, the two criteria are very close, only for the greedy one the current potential is multiplied by (\frac{m}{m+1}).

Examples

>>> import openturns as ot
>>> import otkerneldesign as otkd
>>> distribution = ot.ComposedDistribution([ot.Normal(0.5, 0.1)] * 2)
>>> dimension = distribution.getDimension()
>>> # Kernel definition
>>> ker_list = [ot.MaternModel([0.1], [1.0], 2.5)] * dimension
>>> kernel = ot.ProductCovarianceModel(ker_list)
>>> # Tensorized kernel herding design
>>> kht = otkd.KernelHerdingTensorized(kernel=kernel, distribution=distribution)
>>> kht_design, _ = kht.select_design(20)

Methods

compute_criterion(design_indices)

Compute the criterion on a design. At any point of the candidate set, this criterion is simply given by the difference between the target potential and the potential of a discrete measure defined by a given design.

Parameters:
design_indiceslist of positive int

List of the indices of the selected points in the Sample of candidate points

Returns:
current_potential - target_potentialnumpy.array

Vector of the values taken by the criterion on all candidate points

compute_current_energy(design_indices)

Compute the energy of the discrete measure defined by the design \mat{X}_n. Considering the discrete measure \zeta_n = \frac{1}{n} \sum_{i=1}^{n} \delta(\vect{x}^{(i)}), its energy is defined as

E_{\zeta_n} := \frac{1}{n^2} \sum_{i=1}^{n} \sum_{j=1}^{n} k(\vect{x}^{(i)}, \vect{x}^{(j)}).

Parameters:
design_indiceslist of positive int

List of the indices of the selected points in the Sample of candidate points

Returns:
potentialfloat

Energy of the discrete measure defined by the design

compute_current_potential(design_indices)

Compute the potential of the discrete measure (a.k.a, kernel mean embedding) defined by the design \mat{X}_n. Considering the discrete measure \zeta_n = \frac{1}{n} \sum_{i=1}^{n} \delta(\vect{x}^{(i)}), its potential is defined as

P_{\zeta_n}(x) = \frac{1}{n} \sum_{i=1}^{n} k(\vect{x}, \vect{x}^{(i)}).

Parameters:
design_indiceslist of positive int

List of the indices of the selected points in the Sample of candidate points

Returns:
potentialpotential of the measure defined by the design \mat{X}_n.
compute_mmd(design_indices)

Compute Maximum Mean Discrepancy between \mu and \zeta_n = \frac{1}{n} \sum_{i=1}^{n} \delta(\vect{x}^{(i)}).

Parameters:
design_indiceslist of positive int

List of the indices of the selected points in the Sample of candidate points

Returns:
mmdfloat

Maximum Mean Discrepancy between target and current measure.

compute_target_energy()

Compute the energy of the target probability measure \mu.

Returns:
potentialfloat

Energy of the measure \mu defined by

E_{\mu} := \int \int k(\vect{x}, \vect{x}') d \mu(\vect{x}) d \mu(\vect{x}').

compute_target_potential()

Compute the potential of the target probability measure \mu. In the case of independent input variables, this implementation is more efficient that the one offered by the KernelHerding class.

Let \cX be a cross product of one-dimensional sets \cX_{[i]}, \cX=\cX_{[1]}\times\cdots\times\cX_{[d]}, and let the measure \mu be the product of its marginals \mu_{[i]} on the \cX_{[i]}. When the kernel k is the product of one-dimensional kernels k_{[i]}, then for all \vect{x}=(x_1,\ldots,x_d)\in\cX, the potential P_{k,\mu}(\vect{x}) can be expressed as

P_{k,\mu}(\vect{x}) := \int_\cX k(\vect{x}, \vect{x}') d \mu(\vect{x}')
= \prod_{i=1}^d \int_{\cX_{[i]}} k_{[i]}(x_i, x_i') d \mu_{[i]}(x_i')
= \prod_{i=1}^d P_{k_{[i]},\mu_{[i]}}(x_i),

where for each i\in\{1,\ldots,d\}, P_{k_{[i]},\mu_{[i]}} is the one-dimensional potential with respect to the distribution \mu_{[i]} and the kernel k_{[i]}.

This method exploits this property by computing the potential as a product of univariate potentials, individually estimated by regular grids.

Returns:
potentialnumpy.array

Potential of the measure \mu computed as

P_{k,\mu}(\vect{x}) = \prod_{i=1}^d P_{k_{[i]},\mu_{[i]}}(x_i).

draw_energy_convergence(design_indices)

Draws the convergence of the energy for a set of points selected among the candidate set.

Parameters:
design_indiceslist of positive int

List of the indices of the selected points in the Sample of candidate points

Returns:
figmatplotlib.Figure

Energy convergence of the design of experiments

plot_datadata used to plot the figure
draw_mmd_convergence(design_indices)

Draws the convergence of the MMD between a discrete measure and the target measure.

Parameters:
design_indiceslist of positive int

List of the indices of the selected points in the Sample of candidate points

Returns:
figmatplotlib.Figure

MMD convergence of the design of experiments

plot_datadata used to plot the figure
get_candidate_set()

Accessor to the candidate set.

Returns:
candidate_setopenturns.Sample

A deepcopy of the candidate set.

get_indices(sample)

When provided a subsample of the candidate set, returns the indices of its points in the candidate set.

Parameters:
sample2-d list of float

A subsample of the candidate set.

Returns:
indiceslist of int

Indices of the points of the sample within the candidate set.

select_design(size)

Select a design with tensorized kernel herding.

Parameters:
sizepositive int

Number of points to be selected

design_indiceslist of positive int

List of the indices of already selected points (empty by default) in the Sample of candidate points

Returns:
designopenturns.Sample

Sample of all selected points

design_indiceslist of positive int or None

List of the indices of the selected points in the Sample of candidate points