Classification – Binary

Using the GUI

Training an XClassifier model with the embedded xplainable GUI is easy. Run the following lines of code, and you can configure and optimise your model within the GUI to minimise the amount of code you need to write.

Example – GUI

import xplainable as xp
import pandas as pd
import os

# Initialise your session
xp.initialise(api_key=os.environ['XP_API_KEY'])

# Load your data
data = pd.read_csv('data.csv')

# Train your model (this will open an embedded gui)
model = xp.classifier(data)

Using the Python API

You can also train an xplainable classification model programmatically. This works in a very similar way to other popular machine learning libraries.

You can import the XClassifier class and train a model as follows:

Example – XClassifier()

from xplainable.core.models import XClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# Train your model
model = XClassifier()
model.fit(x_train, y_train)

# Predict on the test set
y_pred = model.predict(x_test)

Example – PartitionedClassifier()

from xplainable.core.models import PartitionedClassifier
from xpainable.core.models import XClassifier
import pandas as pd
from sklearn.model_selection import train_test_split

# Load your data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2)

# Instantiate the partitioned model
partitioned_model = PartitionedClassifier(partition_on='partition_column')

# Train the base model
base_model = XClassifier()
base_model.fit(
      train.drop(columns=['target', 'partition_column']),
      train['target']
      )

# Add the base model to the partitioned model (call this '__dataset__')
partitioned_model.add_partition(base_model, '__dataset__')

# Iterate over the unique values in the partition column
for partition in train['partition_column'].unique():
      # Get the data for the partition
      part = train[train['partition_column'] == partition]
      x_train, y_train = part.drop('target', axis=1), part['target']

      # Fit the embedded model
      model = XClassifier()
      model.fit(x, y)

      # Add the model to the partitioned model
      partitioned_model.add_partition(model, partition)

# Prepare the test data
x_test, y_test = test.drop('target', axis=1), test['target']

# Predict on the partitioned model
y_pred = partitioned_model.predict(x_test)

Classifier Classes

Copyright Xplainable Pty Ltd, 2023

class xplainable.core.ml.classification.PartitionedClassifier(partition_on: str | None = None, *args, **kwargs)[source]

Bases: BasePartition

Partitioned XClassifier model.

This class is a wrapper for the XClassifier model that allows for individual models to be trained on subsets of the data. Each model can be used in isolation or in combination with the other models.

Individual models can be accessed using the partitions attribute.

Example

>>> from xplainable.core.models import PartitionedClassifier
>>> import pandas as pd
>>> from sklearn.model_selection import train_test_split

>>> data = pd.read_csv('data.csv')
>>> train, test = train_test_split(data, test_size=0.2)

>>> # Train your model (this will open an embedded gui)
>>> partitioned_model = PartitionedClassifier(partition_on='partition_column')

>>> # Iterate over the unique values in the partition column
>>> for partition in train['partition_column'].unique():
>>>         # Get the data for the partition
>>>         part = train[train['partition_column'] == partition]
>>>         x_train, y_train = part.drop('target', axis=1), part['target']
>>>         # Fit the embedded model
>>>         model = XClassifier()
>>>         model.fit(x_train, y_train)
>>>         # Add the model to the partitioned model
>>>         partitioned_model.add_partition(model, partition)

>>> # Prepare the test data
>>> x_test, y_test = test.drop('target', axis=1), test['target']

>>> # Predict on the partitioned model
>>> y_pred = partitioned_model.predict(x_test)

Parameters:: partition_on (str, optional) – The column to partition on.

add_partition(model, partition: str)

Adds a partition to the model.

All partitions must be of the same type.

Parameters:

model (XClassifier | XRegressor) – The model to add.
partition (str) – The name of the partition to add.

drop_partition(partition: str)

Removes a partition from the model.

Parameters:: partition (str) – The name of the partition to drop.

explain(partition: str = '__dataset__')

Generates a global explainer for the model.

Parameters:: partition (str) – The partition to explain.
Raises:: ImportError – If user does not have altair installed.

predict(x, use_prob=False, threshold=0.5)[source]

Predicts the target for each row in the data across all partitions.

The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted targets
Return type:: np.array

predict_proba(x)[source]

Predicts the probability for each row in the data across all partitions.

The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted probabilities
Return type:: np.array

predict_score(x: DataFrame | ndarray, proba: bool = False)[source]

Predicts the score for each row in the data across all partitions.

The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted scores
Return type:: np.array

class xplainable.core.ml.classification.XClassifier(max_depth=8, min_info_gain=0.0001, min_leaf_size=0.0001, ignore_nan=False, weight=1, power_degree=1, sigmoid_exponent=0, tail_sensitivity: float = 1.0, map_calibration: bool = True)[source]

Bases: BaseModel

Xplainable Classification model for transparent machine learning.

XClassifier offers powerful predictive power and complete transparency for classification problems on tabular data. It is designed to be used in place of black box models such as Random Forests and Gradient Boosting Machines when explainabilty is important.

XClassifier is a feature-wise ensemble of decision trees. Each tree is constructed using a custom algorithm that optimises for information with respect to the target variable. The trees are then weighted and normalised against one another to produce a variable step function for each feature. The summation of these functions produces a score that can be explained in real time. The score is a float value between 0 and 1 and represents the likelihood of the positive class occuring. The score can also be mapped to a probability when probability is important.

When the fit method is called, the specified params are set across all features. Following the initial fit, the update_feature_params method may be called on a subset of features to update the params for those features only. This allows for a more granular approach to model tuning.

Example

>>> from xplainable.core.models import XClassifier
>>> import pandas as pd
>>> from sklearn.model_selection import train_test_split

>>> data = pd.read_csv('data.csv')
>>> x = data.drop(columns=['target'])
>>> y = data['target']
>>> x_train, x_test, y_train, y_test = train_test_split(
>>>     x, y, test_size=0.2, random_state=42)

>>> model = XClassifier()
>>> model.fit(x_train, y_train)

>>> model.predict(x_test)

Parameters:

max_depth (int, optional) – The maximum depth of each decision tree.
min_info_gain (float, optional) – The minimum information gain required to make a split.
min_leaf_size (float, optional) – The minimum number of samples required to make a split.
alpha (float, optional) – Sets the number of possible splits with respect to unique values.
weight (float, optional) – Activation function weight.
power_degree (float, optional) – Activation function power degree.
sigmoid_exponent (float, optional) – Activation function sigmoid exponent.
map_calibration (bool, optional) – Maps the associated probability for each possible feature score.

constructs_from_json(data)

constructs_to_json()

convert_to_model_profile_categories(x)

evaluate(x: DataFrame | ndarray, y: Series | array, use_prob: bool = False, threshold: float = 0.5)[source]

Evaluates the model performance.

Parameters:

x (pd.DataFrame | np.ndarray) – The x variables to predict.
y (pd.Series | np.array) – The target values.
use_prob (bool, optional) – Use probability instead of score.
threshold (float, optional) – The threshold to use for classification.

Returns:

The model performance metrics.

Return type:

dict

explain(label_rounding=5)

property feature_importances: dict

Calculates the feature importances for the model decision process.

Returns:: The feature importances.
Return type:: dict

fit(x: DataFrame | ndarray, y: Series | array, id_columns: list = [], column_names: list | None = None, target_name: str = 'target', alpha=0.1) → XClassifier[source]

Fits the model to the data.

Parameters:

x (pd.DataFrame | np.ndarray) – The x variables used for training.
y (pd.Series | np.array) – The target values.
id_columns (list, optional) – id_columns to ignore from training.
column_names (list, optional) – column_names to use for training if using a np.ndarray
target_name (str, optional) – The name of the target column if using a np.array
alpha (float) – Controlls the number of possible splits with respect to unique values.

Returns:

The fitted model.

Return type:

XClassifier

get_construct_from_column_name(column_name: str)

local_explainer(x, subsample)

property params: ConstructorParams

Returns the parameters of the model.

Returns:: The default model parameters.
Return type:: ConstructorParams

predict(x: DataFrame | ndarray, use_prob: bool = False, threshold: float = 0.5, remap: bool = True) → array[source]

Predicts the target for each row in the data.

Parameters:

x (pd.DataFrame | np.ndarray) – The x variables to predict.
use_prob (bool, optional) – Use probability instead of score.
threshold (float, optional) – The threshold to use for classification.
remap (bool, optional) – Remap the target values to their original values.

Returns:

The predicted targets

Return type:

np.array

predict_explain(x)[source]

Predictions with explanations.

Parameters:: x (array-like) – data to predict
Returns:: prediction and explanation
Return type:: pd.DataFrame

predict_proba(x: DataFrame | ndarray) → array[source]

Predicts the probability for each row in the data.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted probabilities
Return type:: np.array

predict_score(x: DataFrame | ndarray) → array[source]

Predicts the score for each row in the data.

Parameters:: x (pd.DataFrame | np.ndarray) – The x variables to predict.
Returns:: The predicted scores
Return type:: np.array

property profile: dict

Returns the model profile.

The model profile contains more granular information about the model and how it makes decisions. It is the primary property for interpreting a model and is used by the xplainable client to render the model.

Returns:: The model profile.
Return type:: dict

set_params(default_parameters: ConstructorParams) → None

Sets the parameters of the model. Generally used for model tuning.

Parameters:: default_parameters (ConstructorParams) – default constructor parameters
Returns:: None

update_feature_params(features: list, max_depth=None, min_info_gain=None, min_leaf_size=None, ignore_nan=None, weight=None, power_degree=None, sigmoid_exponent=None, tail_sensitivity=None, x: DataFrame | ndarray | None = None, y: Series | array | None = None, *args, **kwargs) → XClassifier[source]

Updates the parameters for a subset of features.

XClassifier allows you to update the parameters for a subset of features for a more granular approach to model tuning. This is useful when you identify under or overfitting on some features, but not all.

This also referred to as ‘refitting’ the model to a new set of params. Refitting parameters to an xplainable model is extremely fast as it has already pre-computed the complex metadata required for training. This can yeild huge performance gains compared to refitting traditional models, and is particularly powerful when parameter tuning. The desired result is to have a model that is well calibrated across all features without spending considerable time on parameter tuning.

Parameters:

features (list) – The features to update.
max_depth (int) – The maximum depth of each decision tree in the subset.
min_info_gain (float) – The minimum information gain required to make a split in the subset.
min_leaf_size (float) – The minimum number of samples required to make a split in the subset.
ignore_nan (bool) – Whether to ignore nan/null/empty values
weight (float) – Activation function weight.
power_degree (float) – Activation function power degree.
sigmoid_exponent (float) – Activation function sigmoid exponent.
tail_sensitivity (float) – Adds weight to divisive leaf nodes in the subset.
x (pd.DataFrame | np.ndarray, optional) – The x variables used for training. Use if map_calibration is True.
y (pd.Series | np.array, optional) – The target values. Use if map_calibration is True.

Returns:

The refitted model.

Return type:

XClassifier

Hyperparameter Optimisation

You can optimise XClassifier models automatically using the embedded GUI or programmatically using the Python API. The speed of hyperparameter optimisation with xplainable is much faster than traditional methods due to the concept of rapid refits first introduced by xplainable. You can find documentation on rapid refits in the advanced_concepts/rapid_refitting section.

The hyperparameter optimisation process uses a class called XParamOptimiser which is based on Bayesian optimisation using the Hyperopt library. Xplainable’s wrapper has pre-configured optimisation objectives and an easy way to set the search space for each parameter. You can find more details in the XParamOptimiser docs.

Example

from xplainable.core.models import XClassifier
from xplainable.core.optimisation.bayesian import XParamOptimiser
from sklearn.model_selection import train_test_split
import pandas as pd

# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# Find optimised params
optimiser = XParamOptimiser(n_trials=200, n_folds=5, early_stopping=40)
params = optimiser.optimise(x_train, y_train)

# Train your optimised model
model = XClassifier(**params)
model.fit(x_train, y_train)

Optimiser Class

class xplainable.core.optimisation.bayesian.XParamOptimiser(metric='roc-auc', n_trials=30, n_folds=5, early_stopping=30, shuffle=False, subsample=1, alpha=0.01, max_depth_space=[4, 10, 2], min_leaf_size_space=[0.005, 0.05, 0.005], min_info_gain_space=[0.005, 0.05, 0.005], ignore_nan_space=[False, True], weight_space=[0, 1.2, 0.05], power_degree_space=[1, 3, 2], sigmoid_exponent_space=[0.5, 1, 0.1], verbose=True, random_state=1)[source]

Bases: object

Baysian optimisation for hyperparameter tuning XClassifier models.

This optimiser is built on top of the Hyperopt library. It has pre-configured optimisation objectives and an easy way to set the search space for each parameter.

The accepted metrics are:

‘macro-f1’
‘weighted-f1’
‘positive-f1’
‘negative-f1’
‘macro-precision’
‘weighted-precision’
‘positive-precision’
‘negative-precision’
‘macro-recall’
‘weighted-recall’
‘positive-recall’
‘negative-recall’
‘accuracy’
‘brier-loss’
‘log-loss’
‘roc-auc’

Parameters:

metric (str, optional) – Optimisation metric. Defaults to ‘roc-auc’.
n_trials (int, optional) – Number of trials to run. Defaults to 30.
n_folds (int, optional) – Number of folds for CV split. Defaults to 5.
early_stopping (int, optional) – Stops early if no improvement after n trials.
shuffle (bool, optional) – Shuffle the CV splits. Defaults to False.
subsample (float, optional) – Subsamples the training data.
alpha (float, optional) – Sets the alpha of the model.
max_depth_space (list, optional) – Sets the max_depth search space.
min_leaf_size_space (list, optional) – Sets the min_leaf_size search space.
min_info_gain_space (list, optional) – Sets the min_info_gain search space.
ignore_nan_space (list, optional) – Sets the ignore_nan search space.
weight_space (list, optional) – Sets the weight search space.
power_degree_space (list, optional) – Sets the power_degree search space.
sigmoid_exponent_space (list, optional) – Sets the sigmoid_exponent search space.
verbose (bool, optional) – Sets output amount. Defaults to True.
random_state (int, optional) – Random seed. Defaults to 1.

optimise(x: DataFrame, y: Series, id_columns: list = [], verbose: bool = True, callback=None)[source]

Get an optimised set of parameters for an XClassifier model.

Parameters:

x (pd.DataFrame) – The x variables used for prediction.
y (pd.Series) – The true values used for validation.
id_columns (list, optional) – ID columns in dataset. Defaults to [].
verbose (bool, optional) – Sets output amount. Defaults to True.
callback (any, optional) – Callback for progress tracking.
return_model (bool, optional) – Returna model, else returns params

Returns:

The optimised set of parameters.

Return type:

dict