.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_example/plot_ML_validation_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_example_plot_ML_validation_example.py: Machine learning validation example =================================== The aim of this page is to provide simple use-case where kernel design is used to build a design of experiments complementary to an existing one, either to enhance a machine learning model or for validation. .. GENERATED FROM PYTHON SOURCE LINES 9-15 .. code-block:: default import numpy as np import openturns as ot import otkerneldesign as otkd import matplotlib.pyplot as plt from matplotlib import cm .. GENERATED FROM PYTHON SOURCE LINES 16-17 The following helper class will make plotting easier. .. GENERATED FROM PYTHON SOURCE LINES 17-46 .. code-block:: default class DrawFunctions: def __init__(self): dim = 2 self.grid_size = 100 lowerbound = [0.] * dim upperbound = [1.] * dim mesher = ot.IntervalMesher([self.grid_size-1] * dim) interval = ot.Interval(lowerbound, upperbound) mesh = mesher.build(interval) self.nodes = mesh.getVertices() self.X0, self.X1 = np.array(self.nodes).T.reshape(2, self.grid_size, self.grid_size) def draw_2D_contour(self, title, function=None, distribution=None, colorbar=cm.viridis): fig = plt.figure(figsize=(7, 6)) if distribution is not None: Zpdf = np.array(distribution.computePDF(self.nodes)).reshape(self.grid_size, self.grid_size) nb_isocurves = 9 contours = plt.contour(self.X0, self.X1, Zpdf, nb_isocurves, colors='black', alpha=0.6) plt.clabel(contours, inline=True, fontsize=8) if function is not None: Z = np.array(function(self.nodes)).reshape(self.grid_size, self.grid_size) plt.contourf(self.X0, self.X1, Z, 18, cmap=colorbar) plt.colorbar() plt.title(title) plt.xlabel("$x_0$") plt.ylabel("$x_1$") return fig .. GENERATED FROM PYTHON SOURCE LINES 47-50 Regression model of a 2D function --------------------------------- Define the function to be approximated. .. GENERATED FROM PYTHON SOURCE LINES 50-55 .. code-block:: default function_expression = 'exp((2*x1-1))/5 - (2*x2-1)/5 + ((2*x2-1)^6)/3 + 4*((2*x2-1)^4) - 4*((2*x2-1)^2) + (7/10)*((2*x1-1)^2) + (2*x1-1)^4 + 3/(4*((2*x1-1)^2) + 4*((2*x2-1)^2) + 1)' irregular_function = ot.SymbolicFunction(['x1', 'x2'], [function_expression]) irregular_function.setName("Irregular") print(irregular_function) .. rst-class:: sphx-glr-script-out .. code-block:: none [x1,x2]->[exp((2*x1-1))/5 - (2*x2-1)/5 + ((2*x2-1)^6)/3 + 4*((2*x2-1)^4) - 4*((2*x2-1)^2) + (7/10)*((2*x1-1)^2) + (2*x1-1)^4 + 3/(4*((2*x1-1)^2) + 4*((2*x2-1)^2) + 1)] .. GENERATED FROM PYTHON SOURCE LINES 56-57 Draw a contours of the 2D function. .. GENERATED FROM PYTHON SOURCE LINES 57-61 .. code-block:: default d = DrawFunctions() d.draw_2D_contour("Irregular function", function=irregular_function) plt.show() .. image-sg:: /auto_example/images/sphx_glr_plot_ML_validation_example_001.png :alt: Irregular function :srcset: /auto_example/images/sphx_glr_plot_ML_validation_example_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 62-65 Define the joint input random vector, here uniform since our goal is to build a good regression model on the entire domain. .. GENERATED FROM PYTHON SOURCE LINES 65-67 .. code-block:: default distribution = ot.ComposedDistribution([ot.Uniform(0, 1)] * 2) .. GENERATED FROM PYTHON SOURCE LINES 68-69 Build a learning set, for example by Latin Hypercube Sampling. .. GENERATED FROM PYTHON SOURCE LINES 69-75 .. code-block:: default learning_size = 20 ot.RandomGenerator.SetSeed(0) LHS_experiment = ot.LHSExperiment(distribution, learning_size, True, True) x_learn = LHS_experiment.generate() y_learn = irregular_function(x_learn) .. GENERATED FROM PYTHON SOURCE LINES 76-79 Build a design of experiments complementary to the existing learning set (e.g., for testing). Note that the kernel herding method could also be used. .. GENERATED FROM PYTHON SOURCE LINES 79-84 .. code-block:: default test_size = 10 sp = otkd.GreedySupportPoints(distribution, initial_design=x_learn) x_test = sp.select_design(test_size) y_test = irregular_function(x_test) .. GENERATED FROM PYTHON SOURCE LINES 85-87 Plot the Learning set (in red) and testing sets (in black with the corresponding construction design order). .. GENERATED FROM PYTHON SOURCE LINES 87-96 .. code-block:: default fig = d.draw_2D_contour("Irregular function", function=irregular_function) plt.scatter(x_learn[:, 0], x_learn[:, 1], label='Learning set ($m={}$)'.format(len(x_learn)), marker='$L$', color='C3') plt.scatter(x_test[:, 0], x_test[:, 1], label='Test set ($n={}$)'.format(len(x_test)), marker='$T$', color='k') # Test set indexes [plt.text(x_test[i][0] * 1.02, x_test[i][1] * 1.02, str(i + 1), weight="bold", fontsize=np.max((20 - i, 5))) for i in range(test_size)] lgd = plt.legend(bbox_to_anchor=(0.5, -0.1), loc='upper center') plt.tight_layout(pad=1) plt.show() .. image-sg:: /auto_example/images/sphx_glr_plot_ML_validation_example_002.png :alt: Irregular function :srcset: /auto_example/images/sphx_glr_plot_ML_validation_example_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 97-100 Kriging model fit and validation -------------------------------- Build a simple Kriging regression model. .. GENERATED FROM PYTHON SOURCE LINES 100-108 .. code-block:: default dim = distribution.getDimension() basis = ot.ConstantBasisFactory(dim).build() covariance_model = ot.MaternModel([0.2] * dim, 2.5) algo = ot.KrigingAlgorithm(x_learn, y_learn, covariance_model, basis) algo.run() result = algo.getResult() kriging_model = result.getMetaModel() .. GENERATED FROM PYTHON SOURCE LINES 109-111 Build a large Monte Carlo reference test set and compute a reference performance metric on it. .. GENERATED FROM PYTHON SOURCE LINES 111-117 .. code-block:: default xref_test = distribution.getSample(10000) yref_test = irregular_function(xref_test) ref_val = ot.MetaModelValidation(xref_test, yref_test, kriging_model) ref_Q2 = ref_val.computePredictivityFactor()[0] print("Reference Monte Carlo (n=10000) predictivity coefficient: {:.3}".format(ref_Q2)) .. rst-class:: sphx-glr-script-out .. code-block:: none Reference Monte Carlo (n=10000) predictivity coefficient: 0.828 .. GENERATED FROM PYTHON SOURCE LINES 118-122 In comparison, our test set underestimates the performance of the Kriging model. This situation is expected since the test points are supposed to be far from the learning set. .. GENERATED FROM PYTHON SOURCE LINES 122-126 .. code-block:: default val = ot.MetaModelValidation(x_test, y_test, kriging_model) estimated_Q2 = val.computePredictivityFactor()[0] print("Support points (n={}) predictivity coefficient: {:.3}".format(test_size, estimated_Q2)) .. rst-class:: sphx-glr-script-out .. code-block:: none Support points (n=10) predictivity coefficient: 0.758 .. GENERATED FROM PYTHON SOURCE LINES 127-131 To take this into account, let us compute optimal weights for validation. After applying the weights, the estimated performance is more optimistic and closer to the reference value. .. GENERATED FROM PYTHON SOURCE LINES 131-136 .. code-block:: default tsw = otkd.TestSetWeighting(x_learn, x_test, sp._candidate_set) optimal_test_weights = tsw.compute_weights() weighted_Q2 = 1 - np.mean(np.array(y_test).flatten() * optimal_test_weights) / y_test.computeVariance()[0] print("Weighted support points (n={}) predictivity coefficient: {:.3}".format(test_size, weighted_Q2)) .. rst-class:: sphx-glr-script-out .. code-block:: none Weighted support points (n=10) predictivity coefficient: 0.871 .. GENERATED FROM PYTHON SOURCE LINES 137-140 Adding test set to learning set ------------------------------- The test set can now be added to the learning set to enhance the Kriging model .. GENERATED FROM PYTHON SOURCE LINES 140-150 .. code-block:: default x_learn.add(x_test) y_learn.add(y_test) algo_enhanced = ot.KrigingAlgorithm(x_learn, y_learn, covariance_model, basis) algo_enhanced.run() result_enhanced = algo_enhanced.getResult() kriging_enhanced = result_enhanced.getMetaModel() ref_val_enhanced = ot.MetaModelValidation(xref_test, yref_test, kriging_enhanced) ref_Q2_enhanced = ref_val_enhanced.computePredictivityFactor()[0] print("Enhanced Kriging - Monte Carlo (n=10000) predictivity coefficient: {:.3}".format(ref_Q2_enhanced)) .. rst-class:: sphx-glr-script-out .. code-block:: none Enhanced Kriging - Monte Carlo (n=10000) predictivity coefficient: 0.922 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.162 seconds) .. _sphx_glr_download_auto_example_plot_ML_validation_example.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_ML_validation_example.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_ML_validation_example.ipynb `