Regression

Using the GUI

Training an XRegressor model with the embedded xplainable GUI is easy. Run the following lines of code, and you can configure and optimise your model within the GUI to minimise the amount of code you need to write.

Examples

GUI

import xplainable as xp
import pandas as pd
import os

# Initialise your session
xp.initialise(api_key=os.environ['XP_API_KEY'])

# Load your data
data = pd.read_csv('data.csv')

# Train your model (this will open an embedded gui)
model = xp.regressor(data)

Using the Python API

You can also train an xplainable regression model programmatically. This works in a very similar way to other popular machine learning libraries.

You can import the XRegressor class and train a model as follows:

Examples

XRegressor

from xplainable.core.models import XRegressor
from sklearn.model_selection import train_test_split
import pandas as pd

# Load data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# Train model
model = XRegressor()
model.fit(x_train, y_train)

# Optimise the model
model.optimise_tail_sensitivity(x_train, y_train)

# <-- Add XEvolutionaryNetwork here -->

# Predict on the test set
y_pred = model.predict(x_test)

PartitionedRegressor

from xplainable.core.models import PartitionedRegressor
from xpainable.core.models import XRegressor
import pandas as pd
from sklearn.model_selection import train_test_split

# Load your data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2)

# Instantiate the partitioned model
partitioned_model = PartitionedRegressor(partition_on='partition_column')

# Train the base model
base_model = XRegressor()
base_model.fit(
      train.drop(columns=['target', 'partition_column']),
      train['target']
      )

# Optimise the model
base_model.optimise_tail_sensitivity(
      train.drop('target', axis=1), train['target'])

# <-- Add XEvolutionaryNetwork here -->

# Add the base model to the partitioned model (call this '__dataset__')
partitioned_model.add_partition(base_model, '__dataset__')

# Iterate over the unique values in the partition column
for partition in train['partition_column'].unique():
      # Get the data for the partition
      part = train[train['partition_column'] == partition]
      x_train, y_train = part.drop('target', axis=1), part['target']

      # Fit the embedded model
      model = XRegressor()
      model.fit(x_train, y_train)

      # Optimise the model
      model.optimise_tail_sensitivity(x_train, y_train)

      # <-- Add XEvolutionaryNetwork here -->

      # Add the model to the partitioned model
      partitioned_model.add_partition(model, partition)

# Prepare the test data
x_test, y_test = test.drop('target', axis=1), test['target']

# Predict on the partitioned model
y_pred = partitioned_model.predict(x_test)

Classes – Regressors

Copyright Xplainable Pty Ltd, 2023

class xplainable.core.ml.regression.PartitionedRegressor(partition_on=None, *args, **kwargs)[source]

Bases: BasePartition

Partitioned XRegressor model.

This class is a wrapper for the XRegressor model that allows for individual models to be trained on subsets of the data. Each model can be used in isolation or in combination with the other models.

Individual models can be accessed using the partitions attribute.

Example

>>> from xplainable.core.models import PartitionedRegressor
>>> import pandas as pd
>>> from sklearn.model_selection import train_test_split

>>> data = pd.read_csv('data.csv')
>>> train, test = train_test_split(data, test_size=0.2)

>>> # Train your model (this will open an embedded gui)
>>> partitioned_model = PartitionedClassifier(partition_on='partition_column')

>>> # Iterate over the unique values in the partition column
>>> for partition in train['partition_column'].unique():
>>>         # Get the data for the partition
>>>         part = train[train['partition_column'] == partition]
>>>         x_train, y_train = part.drop('target', axis=1), part['target']
>>>         # Fit the embedded model
>>>         model = XRegressor()
>>>         model.fit(x_train, y_train)
>>>         model.optimise_tail_sensitivity(x_train, y_train)
>>>         # <-- Add XEvolutionaryNetwork here -->
>>>         # Add the model to the partitioned model
>>>         partitioned_model.add_partition(model, partition)

>>> # Prepare the test data
>>> x_test, y_test = test.drop('target', axis=1), test['target']

>>> # Predict on the partitioned model
>>> y_pred = partitioned_model.predict(x_test)

Parameters:: partition_on (str, optional) – The column to partition on.

add_partition(model, partition: str)

Adds a partition to the model.

All partitions must be of the same type.

Parameters:

model (XClassifier | XRegressor) – The model to add.
partition (str) – The name of the partition to add.

drop_partition(partition: str)

Removes a partition from the model.

Parameters:: partition (str) – The name of the partition to drop.

explain(partition: str = '__dataset__')

Generates a global explainer for the model.

Parameters:: partition (str) – The partition to explain.
Raises:: ImportError – If user does not have altair installed.

predict(x) → array[source]

Predicts the target value for each row in the data across all partitions.

The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted target values
Return type:: np.array

class xplainable.core.ml.regression.XRegressor(max_depth=8, min_info_gain=0.0001, min_leaf_size=0.0001, ignore_nan=False, weight=1, power_degree=1, sigmoid_exponent=0, tail_sensitivity: float = 1.0, prediction_range: tuple = (-inf, inf))[source]

Bases: BaseModel

Xplainable Regression model for transparent machine learning.

XRegressor offers powerful predictive power and complete transparency for regression problems on tabular data. It is designed to be used in place of black box models such as Random Forests and Gradient Boosting Machines when explainabilty is important.

XRegressor is a feature-wise ensemble of decision trees. Each tree is constructed using a custom algorithm that optimises for information with respect to the target variable. The trees are then weighted and normalised against one another to produce a variable step function for each feature. The summation of these functions produces a score that can be explained in real time. The bounds of the prediction can be set using the prediction_range parameter.

When the fit method is called, the specified params are set across all features. Following the initial fit, the update_feature_params method may be called on a subset of features to update the params for those features only. This allows for a more granular approach to model tuning.

Important note on performance:: XRegressor alone can be a weak predictor. There are a number of ways to get the most out of the model in terms of predictive power: - use the optimise_tail_sensitivity method - fit an XEvolutionaryNetwork to the model. This will iteratively optimise the weights of the model to produce a much more accurate predictor. You can find more information on this in the XEvolutionaryNetwork documentation at xplainable/core/optimisation/genetic.py.

Example

>>> from xplainable.core.models import XRegressor
>>> import pandas as pd
>>> from sklearn.model_selection import train_test_split

>>> data = pd.read_csv('data.csv')
>>> x = data.drop(columns=['target'])
>>> y = data['target']
>>> x_train, x_test, y_train, y_test = train_test_split(
>>>     x, y, test_size=0.2, random_state=42)

>>> model = XRegressor()
>>> model.fit(x_train, y_train)

>>> # This will be a weak predictor
>>> model.predict(x_test)

>>> # For a strong predictor, apply optimisations
>>> model.optimise_tail_sensitivity(x_train, y_train)
>>> # Add evolutionary network here
>>> ...

Parameters:

max_depth (int) – The maximum depth of each decision tree.
min_leaf_size (float) – The minimum number of samples required to make a split.
min_info_gain (float) – The minimum information gain required to make a split.
tail_sensitivity (float) – Adds weight to divisive leaf nodes.
prediction_range (tuple) – The lower and upper limits for predictions.

constructs_from_json(data)

constructs_to_json()

convert_to_model_profile_categories(x)

evaluate(x: DataFrame | ndarray, y: Series | ndarray) → dict[source]

Evaluates the model performance.

Parameters:

x (pd.DataFrame | np.ndarray) – The x variables to predict.
y (pd.Series | np.array) – The target values.

Returns:

The model performance metrics.

Return type:

dict

explain(label_rounding=5)

property feature_importances: dict

Calculates the feature importances for the model decision process.

Returns:: The feature importances.
Return type:: dict

fit(x: DataFrame | ndarray, y: Series | ndarray, id_columns: list = [], column_names: list | None = None, target_name: str = 'target', alpha=0.1) → XRegressor[source]

Fits the model to the data.

Parameters:

x (pd.DataFrame | np.ndarray) – The x variables used for training.
y (pd.Series | np.array) – The target values.
id_columns (list, optional) – id_columns to ignore from training.
column_names (list, optional) – column_names to use for training if using a np.ndarray
target_name (str, optional) – The name of the target column if using a np.ndarray
alpha (float, optional) – Sets the number of possible splits with respect to unique values.

Returns:

The fitted model.

Return type:

XRegressor

get_construct_from_column_name(column_name: str)

local_explainer(x, subsample)

optimise_tail_sensitivity(X: DataFrame | ndarray, y: Series | ndarray) → XRegressor[source]

Optimises the tail_sensitivity parameter at a global level.

Parameters:

X (pd.DataFrame | np.ndarray) – The x variables to fit.
y (pd.Series | np.ndarray) – The target values.

Returns:

The optimised model.

Return type:

XRegressor

property params: ConstructorParams

Returns the parameters of the model.

Returns:: The default model parameters.
Return type:: ConstructorParams

predict(x: DataFrame | ndarray) → array[source]

Predicts the target value for each row in the data.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted target values.
Return type:: np.array

predict_explain(x)

Predictions with explanations.

Parameters:: x (array-like) – data to predict
Returns:: prediction and explanation
Return type:: pd.DataFrame

property profile: dict

Returns the model profile.

The model profile contains more granular information about the model and how it makes decisions. It is the primary property for interpreting a model and is used by the xplainable client to render the model.

Returns:: The model profile.
Return type:: dict

set_params(default_parameters: ConstructorParams) → None

Sets the parameters of the model. Generally used for model tuning.

Parameters:: default_parameters (ConstructorParams) – default constructor parameters
Returns:: None

update_feature_params(features: list, max_depth=None, min_info_gain=None, min_leaf_size=None, ignore_nan=None, weight=None, power_degree=None, sigmoid_exponent=None, tail_sensitivity=None, *args, **kwargs) → XRegressor[source]

Updates the parameters for a subset of features.

XRegressor allows you to update the parameters for a subset of features for a more granular approach to model tuning. This is useful when you identify under or overfitting on some features, but not all.

This also refered to as ‘refitting’ the model to a new set of params. Refitting parameters to an xplainable model is extremely fast as it has already pre-computed the complex metadata required for training. This can yeild huge performance gains compared to refitting traditional models, and is particularly powerful when parameter tuning. The desired result is to have a model that is well calibrated across all features without spending considerable time on parameter tuning.

It’s important to note that if a model has been further optimised using an XEvolutionaryNetwork, the optimised feature_params will be overwritten by this method and will need to be re-optimised.

Parameters:

features (list) – The features to update.
max_depth (int) – The maximum depth of each decision tree in the subset.
min_info_gain (float) – The minimum information gain required to make a split in the subset.
min_leaf_size (float) – The minimum number of samples required to make a split in the subset.
ignore_nan (bool) – Whether to ignore nan/null/empty values
weight (float) – Activation function weight.
power_degree (float) – Activation function power degree.
sigmoid_exponent (float) – Activation function sigmoid exponent.
tail_sensitivity (float) – Adds weight to divisive leaf nodes in the subset.

Returns:

The refitted model.

Return type:

XRegressor

Classes – Regression Optimisation

Regression Optimisers can optimise XRegressor model weights to a specific metric. They are used on top of pre-trained models and can be a powerful tool for optimising models for maximum predictive power while maintaining complete transparency.

Example:

from xplainable.core.optimisation.genetic import XEvolutionaryNetwork
from xplainable.core.optimisation.layers import Tighten, Evolve
import pandas as pd
from sklearn.model_selection import train_test_split

# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# Train your model
model = XRegressor()
model.fit(x_train, y_train)
model.optimise_tail_sensitivity(x_train, y_train)

# Create an optimiser
optimiser = XEvolutionaryNetwork(model)
optimiser.fit(x_train, y_train)

# Add a layers to tighten the model
optimiser.add_layer(Tighten())
optimiser.add_layer(Evolve())
optimiser.add_layer(Evolve())
optimiser.add_layer(Tighten())

# Optimise the model weights in place
optimiser.optimise()

# Predict on the test set
y_pred = model.predict(x_test)

Copyright Xplainable Pty Ltd, 2023

class xplainable.core.optimisation.genetic.XEvolutionaryNetwork(model: XRegressor, apply_range: bool = False)[source]

Bases: object

A layer-based optimisation framework for XRegressor models.

XEvolutionaryNetwork is a novel optimisation framework for XRegressor models that allows for flexibility and depth. It is inspired by deep learning frameworks, but is applied over additive models for weight optimisation.

It works by taking a pre-trained XRegressor model and fitting it, along with the training data, to an evolutionary network. The evolutionary network consists of a series of layers, each of which is responsible for optimising the model weights given a set of constraints.

What are layers?:

There are currently two types of layers: Tighten() and Evolve().

More information on each layer can be found in their respective documentation.

There is no limit to the number of layers that can be added to the network, and each layer can be customised for specific objectives. Like other machine learning methods, the network can be prone to over-fitting, so it is recommended to use a validation set to monitor performance.

An XEvolutionaryNetwork can be stopped mid-training and resumed at any time. This is useful for long-running optimisations and iterative work. You can track the remaining and completed layers using the future_layers and completed_layers attributes.

Parameters:

model (XRegressor) – The model to optimise.
apply_range (bool) – Whether to apply the model’s prediction range to the output.

add_layer(layer, idx: int | None = None)[source]

Adds a layer to the network.

Parameters:

layer (Tighten | Evolve) – The layer to add.
idx (int, optional) – The index to add the layer at.

clear_layers()[source]: Removes all layers from the network.

drop_layer(idx: int)[source]

Removes a layer from the network.

Parameters:: idx (int) – The index of the layer to remove.

fit(x: DataFrame | ndarray, y: Series | ndarray, subset: list = []) → XEvolutionaryNetwork[source]

Fits the model and data to the evolutionary network.

Parameters:

x (pd.DataFrame | np.ndarray) – The data to fit.
y (pd.Series | np.ndarray) – The target to fit.
subset (list, optional) – A list of columns to subset for feature level optimisation.

Returns:

The fitted network.

Return type:

XEvolutionaryNetwork

optimise(callback=None) → XEvolutionaryNetwork[source]

Sequentially runs the layers in the network.

Parameters:: callback (any, optional) – Callback for progress tracking.
Returns:: The evolutionary network.
Return type:: XEvolutionaryNetwork

class xplainable.core.optimisation.layers.BaseLayer(metric='mae')[source]

Bases: object

Base class for optimisation layers.

Parameters:: metric (str, optional) – Metric to optimise on. Defaults to ‘mae’.

class xplainable.core.optimisation.layers.Evolve(mutations: int = 100, generations: int = 50, max_generation_depth: int = 10, max_severity: float = 0.5, max_leaves: int = 20, early_stopping: int | None = None)[source]

Bases: BaseLayer

Evolutionary algorithm to optimise XRegressor leaf weights.

The Evolve layer uses a genetic algorithm to optimise the leaf weights of an XRegressor model. The algorithm works by mutating the leaf weights of the model and scoring the resulting predictions. The best mutations are then selected to reproduce and mutate again. This process is repeated until the maximum number of generations is reached, or the early stopping threshold is reached.

Parameters:

mutations (int, optional) – The number of mutations to generate per generation.
generations (int, optional) – The number of generations to run.
max_generation_depth (int, optional) – The maximum depth of a generation.
max_severity (float, optional) – The maximum severity of a mutation.
max_leaves (int, optional) – The maximum number of leaves to mutate.
early_stopping (int, optional) – Stop early if no improvement after n iters.

property params: dict

Returns the parameters of the layer.

Returns:: The layer parameters.
Return type:: dict

transform(xnetwork: XEvolutionaryNetwork, x: ndarray, y: array, callback=None)[source]

Optimises an XRegressor profile given the set of parameters.

Parameters:

xnetwork (XEvolutionaryNetwork) – The evolutionary network.
x (np.ndarray) – The input variables used for prediction.
y (np.array) – The target values.
callbacks (list) – Callback function for progress tracking.

Returns:

The original x data to pass to the next layer. np.ndarray: The final optimised chromosome to pass to the next layer.

Return type:

np.ndarray

class xplainable.core.optimisation.layers.Tighten(iterations: int = 100, learning_rate: float = 0.03, early_stopping: int | None = None)[source]

Bases: BaseLayer

A leaf boosting algorithm to optimise XRegressor leaf node weights.

The Tighten layer uses a novel leaf boosting algorithm to optimise the leaf weights of an XRegressor model. The algorithm works by iteratively identifying the leaf node that will have the greatest impact on the overall model score, and then incrementally increasing or decreasing the leaf node weight to improve the model score. This process is repeated until the maximum number of iterations is reached, or the early stopping threshold is reached.

Args:
iterations (int): The number of iterations to run. learning_rate (float): How fast the model learns. Between 0.001 - 1 early_stopping (int): Stop early if no improvement after n iters.

property params: dict

Returns the parameters of the layer.

Returns:: The layer parameters.
Return type:: dict

transform(xnetwork: XEvolutionaryNetwork, x: ndarray, y: array, callback=None) → tuple[source]

Optimises an XRegressor profile given the set of parameters.

Parameters:

x (np.ndarray) – The input variables used for prediction.
y (np.array) – The target values.
callback (any) – Callback function for progress tracking.

Returns:

The optimised feature score map.

Return type:

dict