Welcome to xplainable’s documentation!
About Xplainable
Docs
See our detailed docs at xplainable docs
Installation
Quickstart
PyPI is the distribution channel for xplainable release versions. The best way to install it is with pip:
pip install xplainable
Optional dependencies
To use xplainable’s embedded GUI in jupyter, you will need to install
xplainable
with the gui
extra:
pip install xplainable[gui]
To use xplainable’s advanced plotting functions, you will need to install
xplainable
with the plotting
extra:
pip install xplainable[plotting]
Environment
Reproducible Installs
As libraries get updated, results from running your code can change, or your code can break completely. It’s essential to be able to reconstruct the set of packages and versions you’re using. Best practice is to:
use a different environment per project you’re working on,
record package names and versions using your package installer; each has it’s own metadata format for this:
Conda: conda environments and environment.yml
Pip: virtual environments and requirements.txt
Poetry: virtual environments and pyproject.toml
Cloud Client
What is Xplainable Cloud?
Xplainable Cloud is a hosted service that allows you to persist and load models and preprocessing pipelines and collaborate on them within teams and organisations. Persisted models are also able to be deployed as API endpoints in seconds. The cloud service is accessible via a web interface to manage organisations, teams, and users and provides an excellent interface for visualising model explainers and metrics. You can find more information about Xplainable Cloud at https://www.xplainable.io.
What is the Cloud Client?
The cloud client is built into the xplainable python package, allowing you to connect to Xplainable Cloud and query the API, enabling you to manage your account, models, and deployments within Python.
Initialising a session
To initialise a session, you first must generate an API key at xplainable cloud <https://beta.xplainable.io>.
Copyright Xplainable Pty Ltd, 2023
- xplainable.client.init.initialise(api_key=None, hostname='https://api.xplainable.io')[source]
Initialise the client with an API Key.
API Keys can be generated from https://beta.xplainable.io with a valid account.
Example
>>> import xplainable as xp >>> import os >>> xp.initialise(api_key=os.environ['XP_API_KEY'])
- Returns:
The users account information.
- Return type:
dict
Querying the API
When you connect successfully to Xplainable Cloud, you can use the client to query the API. The client is accessible by running:
import xplainable as xp
import os
# Initialise your session
xp.initialise(api_key=os.environ['XP_API_KEY'])
# Query the API
xp.client.list_models()
- class xplainable.client.client.Client(api_key, hostname='https://api.xplainable.io')[source]
Bases:
object
A client for interfacing with the xplainable web api (xplainable cloud).
Access models, preprocessors and user data from xplainable cloud. API keys can be generated at https://beta.xplainable.io.
- Parameters:
api_key (str) – A valid api key.
- activate_deployment(deployment_id)[source]
Activates a model deployment.
- Parameters:
deployment_id (str) – The deployment id
- add_deployment_middleware(deployment_id, func, name, description=None)[source]
Adds or replaces a middleware function to a deployment.
- Parameters:
deployment_id (str) – The deployment id
func (function) – The middleware function
- create_model_id(model, model_name: str, model_description: str) str [source]
Creates a new model and returns the model id.
- Parameters:
model_name (str) – The name of the model
model_description (str) – The description of the model
model (XClassifier | XRegressor) – The model to create.
- Returns:
The model id
- Return type:
int
- create_model_version(model, model_id: str, x: DataFrame, y: Series) str [source]
Creates a new model version and returns the version id.
- Parameters:
model_id (int) – The model id
partition_on (str) – The partition column name
ruleset (dict | str) – The feeature ruleset
health_info (dict) – Feature health information
versions (dict) – Versions of current environment
- Returns:
The model version id
- Return type:
int
- create_preprocessor_id(preprocessor_name: str, preprocessor_description: str) str [source]
Creates a new preprocessor and returns the preprocessor id.
- Parameters:
preprocessor_name (str) – The name of the preprocessor
preprocessor_description (str) – The description of the preprocessor
- Returns:
The preprocessor id
- Return type:
int
- create_preprocessor_version(preprocessor_id: str, pipeline: list, df: DataFrame | None = None) str [source]
Creates a new preprocessor version and returns the version id.
- Parameters:
preprocessor_id (int) – The preprocessor id
pipeline (xplainable.preprocessing.pipeline.Pipeline) – pipeline
- Returns:
The preprocessor version id
- Return type:
int
- deactivate_deployment(deployment_id)[source]
Deactivates a model deployment.
- Parameters:
deployment_id (str) – The deployment id
- delete_deployment_middleware(deployment_id)[source]
Deletes a middleware function from a deployment.
- Parameters:
deployment_id (str) – The deployment id
- deploy(model_id: str, version_id: str, hostname: str = 'https://inference.xplainable.io', location: str = 'syd', raw_output: bool = True) dict [source]
Deploys a model partition to xplainable cloud.
The hostname should be the url of the inference server. For example: https://inference.xplainable.io
- Parameters:
hostname (str) – The host name for the inference server
model_id (int) – The model id
version_id (int) – The version id
partition_id (int) – The partition id
raw_output (bool, optional) – returns a dictionary
- Returns:
deployment status and details.
- Return type:
dict
- generate_deploy_key(description: str, deployment_id: str, days_until_expiry: float = 90, clipboard: bool = True, surpress_output: bool = False) None [source]
Generates a deploy key for a model deployment.
- Parameters:
description (str) – Description of the deploy key use case.
deployment_id (str) – The deployment id.
days_until_expiry (float) – The number of days until the key expires.
surpress_output (bool) – Surpress output. Defaults to False.
- Returns:
No key is returned. The key is copied to the clipboard.
- Return type:
None
- generate_example_deployment_payload(deployment_id)[source]
Generates an example deployment payload for a deployment.
- Parameters:
deployment_id (str) – The deployment id.
- get_user_data() dict [source]
Retrieves the user data for the active user.
- Returns:
User data
- Return type:
dict
- list_deployments()[source]
Lists all deployments of the active user’s team.
- Returns:
Dictionary of deployments.
- Return type:
dict
- list_model_versions(model_id: int) list [source]
Lists all versions of a model.
- Parameters:
model_id (int) – The model id
- Returns:
Dictionary of model versions.
- Return type:
dict
- list_models() list [source]
Lists all models of the active user’s team.
- Returns:
Dictionary of saved models.
- Return type:
dict
- list_preprocessor_versions(preprocessor_id: int) list [source]
Lists all versions of a preprocessor.
- Parameters:
preprocessor_id (int) – The preprocessor id
- Returns:
Dictionary of preprocessor versions.
- Return type:
dict
- list_preprocessors() list [source]
Lists all preprocessors of the active user’s team.
- Returns:
Dictionary of preprocessors.
- Return type:
dict
- load_classifier(model_id: int, version_id: int, model=None)[source]
Loads a binary classification model by model_id
- Parameters:
model_id (str) – A valid model_id
version_id (str) – A valid version_id
model (PartitionedClassifier) – An existing model to add partitions
- Returns:
The loaded xplainable classifier
- Return type:
xplainable.PartitionedClassifier
- load_preprocessor(preprocessor_id: int, version_id: int, gui_object: bool = False, response_only: bool = False)[source]
Loads a preprocessor by preprocessor_id and version_id.
- Parameters:
preprocessor_id (int) – The preprocessor id
version_id (int) – The version id
response_only (bool, optional) – Returns the preprocessor metadata.
- Returns:
The loaded pipeline
- Return type:
xplainable.preprocessing.pipeline.Pipeline
- load_regressor(model_id: int, version_id: int, model=None)[source]
Loads a regression model by model_id and version_id
- Parameters:
model_id (str) – A valid model_id
version_id (str) – A valid version_id
model (PartitionedRegressor) – An existing model to add partitions to
- Returns:
The loaded xplainable regressor
- Return type:
xplainable.PartitionedRegressor
Classification – Binary
Using the GUI
Training an XClassifier
model with the embedded xplainable GUI is easy. Run
the following lines of code, and you can configure and optimise your model
within the GUI to minimise the amount of code you need to write.
Example – GUI
import xplainable as xp
import pandas as pd
import os
# Initialise your session
xp.initialise(api_key=os.environ['XP_API_KEY'])
# Load your data
data = pd.read_csv('data.csv')
# Train your model (this will open an embedded gui)
model = xp.classifier(data)
Using the Python API
You can also train an xplainable classification model programmatically. This works in a very similar way to other popular machine learning libraries.
You can import the XClassifier
class and train a model as follows:
Example – XClassifier()
from xplainable.core.models import XClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Train your model
model = XClassifier()
model.fit(x_train, y_train)
# Predict on the test set
y_pred = model.predict(x_test)
Example – PartitionedClassifier()
from xplainable.core.models import PartitionedClassifier
from xpainable.core.models import XClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
# Load your data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2)
# Instantiate the partitioned model
partitioned_model = PartitionedClassifier(partition_on='partition_column')
# Train the base model
base_model = XClassifier()
base_model.fit(
train.drop(columns=['target', 'partition_column']),
train['target']
)
# Add the base model to the partitioned model (call this '__dataset__')
partitioned_model.add_partition(base_model, '__dataset__')
# Iterate over the unique values in the partition column
for partition in train['partition_column'].unique():
# Get the data for the partition
part = train[train['partition_column'] == partition]
x_train, y_train = part.drop('target', axis=1), part['target']
# Fit the embedded model
model = XClassifier()
model.fit(x, y)
# Add the model to the partitioned model
partitioned_model.add_partition(model, partition)
# Prepare the test data
x_test, y_test = test.drop('target', axis=1), test['target']
# Predict on the partitioned model
y_pred = partitioned_model.predict(x_test)
Classifier Classes
Copyright Xplainable Pty Ltd, 2023
- class xplainable.core.ml.classification.PartitionedClassifier(partition_on: str | None = None, *args, **kwargs)[source]
Bases:
BasePartition
Partitioned XClassifier model.
This class is a wrapper for the XClassifier model that allows for individual models to be trained on subsets of the data. Each model can be used in isolation or in combination with the other models.
Individual models can be accessed using the partitions attribute.
Example
>>> from xplainable.core.models import PartitionedClassifier >>> import pandas as pd >>> from sklearn.model_selection import train_test_split
>>> data = pd.read_csv('data.csv') >>> train, test = train_test_split(data, test_size=0.2)
>>> # Train your model (this will open an embedded gui) >>> partitioned_model = PartitionedClassifier(partition_on='partition_column')
>>> # Iterate over the unique values in the partition column >>> for partition in train['partition_column'].unique(): >>> # Get the data for the partition >>> part = train[train['partition_column'] == partition] >>> x_train, y_train = part.drop('target', axis=1), part['target'] >>> # Fit the embedded model >>> model = XClassifier() >>> model.fit(x_train, y_train) >>> # Add the model to the partitioned model >>> partitioned_model.add_partition(model, partition)
>>> # Prepare the test data >>> x_test, y_test = test.drop('target', axis=1), test['target']
>>> # Predict on the partitioned model >>> y_pred = partitioned_model.predict(x_test)
- Parameters:
partition_on (str, optional) – The column to partition on.
- add_partition(model, partition: str)
Adds a partition to the model.
All partitions must be of the same type.
- Parameters:
model (XClassifier | XRegressor) – The model to add.
partition (str) – The name of the partition to add.
- drop_partition(partition: str)
Removes a partition from the model.
- Parameters:
partition (str) – The name of the partition to drop.
- explain(partition: str = '__dataset__')
Generates a global explainer for the model.
- Parameters:
partition (str) – The partition to explain.
- Raises:
ImportError – If user does not have altair installed.
- predict(x, use_prob=False, threshold=0.5)[source]
Predicts the target for each row in the data across all partitions.
The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted targets
- Return type:
np.array
- predict_proba(x)[source]
Predicts the probability for each row in the data across all partitions.
The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted probabilities
- Return type:
np.array
- predict_score(x: DataFrame | ndarray, proba: bool = False)[source]
Predicts the score for each row in the data across all partitions.
The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted scores
- Return type:
np.array
- class xplainable.core.ml.classification.XClassifier(max_depth=8, min_info_gain=0.0001, min_leaf_size=0.0001, ignore_nan=False, weight=1, power_degree=1, sigmoid_exponent=0, tail_sensitivity: float = 1.0, map_calibration: bool = True)[source]
Bases:
BaseModel
Xplainable Classification model for transparent machine learning.
XClassifier offers powerful predictive power and complete transparency for classification problems on tabular data. It is designed to be used in place of black box models such as Random Forests and Gradient Boosting Machines when explainabilty is important.
XClassifier is a feature-wise ensemble of decision trees. Each tree is constructed using a custom algorithm that optimises for information with respect to the target variable. The trees are then weighted and normalised against one another to produce a variable step function for each feature. The summation of these functions produces a score that can be explained in real time. The score is a float value between 0 and 1 and represents the likelihood of the positive class occuring. The score can also be mapped to a probability when probability is important.
When the fit method is called, the specified params are set across all features. Following the initial fit, the update_feature_params method may be called on a subset of features to update the params for those features only. This allows for a more granular approach to model tuning.
Example
>>> from xplainable.core.models import XClassifier >>> import pandas as pd >>> from sklearn.model_selection import train_test_split
>>> data = pd.read_csv('data.csv') >>> x = data.drop(columns=['target']) >>> y = data['target'] >>> x_train, x_test, y_train, y_test = train_test_split( >>> x, y, test_size=0.2, random_state=42)
>>> model = XClassifier() >>> model.fit(x_train, y_train)
>>> model.predict(x_test)
- Parameters:
max_depth (int, optional) – The maximum depth of each decision tree.
min_info_gain (float, optional) – The minimum information gain required to make a split.
min_leaf_size (float, optional) – The minimum number of samples required to make a split.
alpha (float, optional) – Sets the number of possible splits with respect to unique values.
weight (float, optional) – Activation function weight.
power_degree (float, optional) – Activation function power degree.
sigmoid_exponent (float, optional) – Activation function sigmoid exponent.
map_calibration (bool, optional) – Maps the associated probability for each possible feature score.
- constructs_from_json(data)
- constructs_to_json()
- convert_to_model_profile_categories(x)
- evaluate(x: DataFrame | ndarray, y: Series | array, use_prob: bool = False, threshold: float = 0.5)[source]
Evaluates the model performance.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
y (pd.Series | np.array) – The target values.
use_prob (bool, optional) – Use probability instead of score.
threshold (float, optional) – The threshold to use for classification.
- Returns:
The model performance metrics.
- Return type:
dict
- explain(label_rounding=5)
- property feature_importances: dict
Calculates the feature importances for the model decision process.
- Returns:
The feature importances.
- Return type:
dict
- fit(x: DataFrame | ndarray, y: Series | array, id_columns: list = [], column_names: list | None = None, target_name: str = 'target', alpha=0.1) XClassifier [source]
Fits the model to the data.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables used for training.
y (pd.Series | np.array) – The target values.
id_columns (list, optional) – id_columns to ignore from training.
column_names (list, optional) – column_names to use for training if using a np.ndarray
target_name (str, optional) – The name of the target column if using a np.array
alpha (float) – Controlls the number of possible splits with respect to unique values.
- Returns:
The fitted model.
- Return type:
- get_construct_from_column_name(column_name: str)
- local_explainer(x, subsample)
- property params: ConstructorParams
Returns the parameters of the model.
- Returns:
The default model parameters.
- Return type:
ConstructorParams
- predict(x: DataFrame | ndarray, use_prob: bool = False, threshold: float = 0.5, remap: bool = True) array [source]
Predicts the target for each row in the data.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
use_prob (bool, optional) – Use probability instead of score.
threshold (float, optional) – The threshold to use for classification.
remap (bool, optional) – Remap the target values to their original values.
- Returns:
The predicted targets
- Return type:
np.array
- predict_explain(x)[source]
Predictions with explanations.
- Parameters:
x (array-like) – data to predict
- Returns:
prediction and explanation
- Return type:
pd.DataFrame
- predict_proba(x: DataFrame | ndarray) array [source]
Predicts the probability for each row in the data.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted probabilities
- Return type:
np.array
- predict_score(x: DataFrame | ndarray) array [source]
Predicts the score for each row in the data.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted scores
- Return type:
np.array
- property profile: dict
Returns the model profile.
The model profile contains more granular information about the model and how it makes decisions. It is the primary property for interpreting a model and is used by the xplainable client to render the model.
- Returns:
The model profile.
- Return type:
dict
- set_params(default_parameters: ConstructorParams) None
Sets the parameters of the model. Generally used for model tuning.
- Parameters:
default_parameters (ConstructorParams) – default constructor parameters
- Returns:
None
- update_feature_params(features: list, max_depth=None, min_info_gain=None, min_leaf_size=None, ignore_nan=None, weight=None, power_degree=None, sigmoid_exponent=None, tail_sensitivity=None, x: DataFrame | ndarray | None = None, y: Series | array | None = None, *args, **kwargs) XClassifier [source]
Updates the parameters for a subset of features.
XClassifier allows you to update the parameters for a subset of features for a more granular approach to model tuning. This is useful when you identify under or overfitting on some features, but not all.
This also referred to as ‘refitting’ the model to a new set of params. Refitting parameters to an xplainable model is extremely fast as it has already pre-computed the complex metadata required for training. This can yeild huge performance gains compared to refitting traditional models, and is particularly powerful when parameter tuning. The desired result is to have a model that is well calibrated across all features without spending considerable time on parameter tuning.
- Parameters:
features (list) – The features to update.
max_depth (int) – The maximum depth of each decision tree in the subset.
min_info_gain (float) – The minimum information gain required to make a split in the subset.
min_leaf_size (float) – The minimum number of samples required to make a split in the subset.
ignore_nan (bool) – Whether to ignore nan/null/empty values
weight (float) – Activation function weight.
power_degree (float) – Activation function power degree.
sigmoid_exponent (float) – Activation function sigmoid exponent.
tail_sensitivity (float) – Adds weight to divisive leaf nodes in the subset.
x (pd.DataFrame | np.ndarray, optional) – The x variables used for training. Use if map_calibration is True.
y (pd.Series | np.array, optional) – The target values. Use if map_calibration is True.
- Returns:
The refitted model.
- Return type:
Hyperparameter Optimisation
You can optimise XClassifier
models automatically using the embedded GUI or
programmatically using the Python API. The speed of hyperparameter optimisation
with xplainable is much faster than traditional methods due to the concept
of rapid refits first introduced by xplainable. You can find documentation
on rapid refits in the advanced_concepts/rapid_refitting section.
The hyperparameter optimisation process uses a class called XParamOptimiser
which is based on Bayesian optimisation using the Hyperopt library. Xplainable’s wrapper has
pre-configured optimisation objectives and an easy way to set the search space
for each parameter. You can find more details in the XParamOptimiser docs.
Example
from xplainable.core.models import XClassifier
from xplainable.core.optimisation.bayesian import XParamOptimiser
from sklearn.model_selection import train_test_split
import pandas as pd
# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Find optimised params
optimiser = XParamOptimiser(n_trials=200, n_folds=5, early_stopping=40)
params = optimiser.optimise(x_train, y_train)
# Train your optimised model
model = XClassifier(**params)
model.fit(x_train, y_train)
Optimiser Class
- class xplainable.core.optimisation.bayesian.XParamOptimiser(metric='roc-auc', n_trials=30, n_folds=5, early_stopping=30, shuffle=False, subsample=1, alpha=0.01, max_depth_space=[4, 10, 2], min_leaf_size_space=[0.005, 0.05, 0.005], min_info_gain_space=[0.005, 0.05, 0.005], ignore_nan_space=[False, True], weight_space=[0, 1.2, 0.05], power_degree_space=[1, 3, 2], sigmoid_exponent_space=[0.5, 1, 0.1], verbose=True, random_state=1)[source]
Bases:
object
Baysian optimisation for hyperparameter tuning XClassifier models.
This optimiser is built on top of the Hyperopt library. It has pre-configured optimisation objectives and an easy way to set the search space for each parameter.
- The accepted metrics are:
‘macro-f1’
‘weighted-f1’
‘positive-f1’
‘negative-f1’
‘macro-precision’
‘weighted-precision’
‘positive-precision’
‘negative-precision’
‘macro-recall’
‘weighted-recall’
‘positive-recall’
‘negative-recall’
‘accuracy’
‘brier-loss’
‘log-loss’
‘roc-auc’
- Parameters:
metric (str, optional) – Optimisation metric. Defaults to ‘roc-auc’.
n_trials (int, optional) – Number of trials to run. Defaults to 30.
n_folds (int, optional) – Number of folds for CV split. Defaults to 5.
early_stopping (int, optional) – Stops early if no improvement after n trials.
shuffle (bool, optional) – Shuffle the CV splits. Defaults to False.
subsample (float, optional) – Subsamples the training data.
alpha (float, optional) – Sets the alpha of the model.
max_depth_space (list, optional) – Sets the max_depth search space.
min_leaf_size_space (list, optional) – Sets the min_leaf_size search space.
min_info_gain_space (list, optional) – Sets the min_info_gain search space.
ignore_nan_space (list, optional) – Sets the ignore_nan search space.
weight_space (list, optional) – Sets the weight search space.
power_degree_space (list, optional) – Sets the power_degree search space.
sigmoid_exponent_space (list, optional) – Sets the sigmoid_exponent search space.
verbose (bool, optional) – Sets output amount. Defaults to True.
random_state (int, optional) – Random seed. Defaults to 1.
- optimise(x: DataFrame, y: Series, id_columns: list = [], verbose: bool = True, callback=None)[source]
Get an optimised set of parameters for an XClassifier model.
- Parameters:
x (pd.DataFrame) – The x variables used for prediction.
y (pd.Series) – The true values used for validation.
id_columns (list, optional) – ID columns in dataset. Defaults to [].
verbose (bool, optional) – Sets output amount. Defaults to True.
callback (any, optional) – Callback for progress tracking.
return_model (bool, optional) – Returna model, else returns params
- Returns:
The optimised set of parameters.
- Return type:
dict
Classification – Multi-Class
Important!
Multi-Class is still being developed and is yet to be available in a release version of xplainable. Please check back soon for updates.
The following documentation is a preview of the functionality that will be available in an upcoming release of xplainable.
Using the GUI
Training a classification model with the embedded xplainable GUI is easy. Run the following lines of code, and you can configure and optimise your model within the GUI to minimise the amount of code you need to write.
Example – GUI
import xplainable as xp
import pandas as pd
import os
# Initialise your session
xp.initialise(api_key=os.environ['XP_API_KEY'])
# Load your data
data = pd.read_csv('data.csv')
# Train your model (this will open an embedded gui)
model = xp.multiclass_classifier(data)
Using the Python API
You can also train a multi-class classification model programmatically. This works in a very similar way to other popular machine learning libraries.
You can import the XMultiClassifier
class and train a model as follows:
Example – XMultiClassifier()
from xplainable.core.models import XMultiClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Train your model
model = XMultiClassifier()
model.fit(x_train, y_train)
# Predict on the test set
y_pred = model.predict(x_test)
Example – PartitionedMultiClassifier()
from xplainable.core.models import PartitionedMultiClassifier
from xpainable.core.models import XMultiClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
# Load your data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2)
# Instantiate the partitioned model
partitioned_model = PartitionedMultiClassifier(partition_on='partition_column')
# Train the base model
base_model = XMultiClassifier()
base_model.fit(
train.drop(columns=['target', 'partition_column']),
train['target']
)
# Add the base model to the partitioned model (call this '__dataset__')
partitioned_model.add_partition(base_model, '__dataset__')
# Iterate over the unique values in the partition column
for partition in train['partition_column'].unique():
# Get the data for the partition
part = train[train['partition_column'] == partition]
x_train, y_train = part.drop('target', axis=1), part['target']
# Fit the embedded model
model = XMultiClassifier()
model.fit(x, y)
# Add the model to the partitioned model
partitioned_model.add_partition(model, partition)
# Prepare the test data
x_test, y_test = test.drop('target', axis=1), test['target']
# Predict on the partitioned model
y_pred = partitioned_model.predict(x_test)
Regression
Using the GUI
Training an XRegressor
model with the embedded xplainable GUI is easy. Run
the following lines of code, and you can configure and optimise your model
within the GUI to minimise the amount of code you need to write.
Examples
GUI
import xplainable as xp
import pandas as pd
import os
# Initialise your session
xp.initialise(api_key=os.environ['XP_API_KEY'])
# Load your data
data = pd.read_csv('data.csv')
# Train your model (this will open an embedded gui)
model = xp.regressor(data)
Using the Python API
You can also train an xplainable regression model programmatically. This works in a very similar way to other popular machine learning libraries.
You can import the XRegressor
class and train a model as follows:
Examples
XRegressor
from xplainable.core.models import XRegressor
from sklearn.model_selection import train_test_split
import pandas as pd
# Load data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Train model
model = XRegressor()
model.fit(x_train, y_train)
# Optimise the model
model.optimise_tail_sensitivity(x_train, y_train)
# <-- Add XEvolutionaryNetwork here -->
# Predict on the test set
y_pred = model.predict(x_test)
PartitionedRegressor
from xplainable.core.models import PartitionedRegressor
from xpainable.core.models import XRegressor
import pandas as pd
from sklearn.model_selection import train_test_split
# Load your data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2)
# Instantiate the partitioned model
partitioned_model = PartitionedRegressor(partition_on='partition_column')
# Train the base model
base_model = XRegressor()
base_model.fit(
train.drop(columns=['target', 'partition_column']),
train['target']
)
# Optimise the model
base_model.optimise_tail_sensitivity(
train.drop('target', axis=1), train['target'])
# <-- Add XEvolutionaryNetwork here -->
# Add the base model to the partitioned model (call this '__dataset__')
partitioned_model.add_partition(base_model, '__dataset__')
# Iterate over the unique values in the partition column
for partition in train['partition_column'].unique():
# Get the data for the partition
part = train[train['partition_column'] == partition]
x_train, y_train = part.drop('target', axis=1), part['target']
# Fit the embedded model
model = XRegressor()
model.fit(x_train, y_train)
# Optimise the model
model.optimise_tail_sensitivity(x_train, y_train)
# <-- Add XEvolutionaryNetwork here -->
# Add the model to the partitioned model
partitioned_model.add_partition(model, partition)
# Prepare the test data
x_test, y_test = test.drop('target', axis=1), test['target']
# Predict on the partitioned model
y_pred = partitioned_model.predict(x_test)
Classes – Regressors
Copyright Xplainable Pty Ltd, 2023
- class xplainable.core.ml.regression.PartitionedRegressor(partition_on=None, *args, **kwargs)[source]
Bases:
BasePartition
Partitioned XRegressor model.
This class is a wrapper for the XRegressor model that allows for individual models to be trained on subsets of the data. Each model can be used in isolation or in combination with the other models.
Individual models can be accessed using the partitions attribute.
Example
>>> from xplainable.core.models import PartitionedRegressor >>> import pandas as pd >>> from sklearn.model_selection import train_test_split
>>> data = pd.read_csv('data.csv') >>> train, test = train_test_split(data, test_size=0.2)
>>> # Train your model (this will open an embedded gui) >>> partitioned_model = PartitionedClassifier(partition_on='partition_column')
>>> # Iterate over the unique values in the partition column >>> for partition in train['partition_column'].unique(): >>> # Get the data for the partition >>> part = train[train['partition_column'] == partition] >>> x_train, y_train = part.drop('target', axis=1), part['target'] >>> # Fit the embedded model >>> model = XRegressor() >>> model.fit(x_train, y_train) >>> model.optimise_tail_sensitivity(x_train, y_train) >>> # <-- Add XEvolutionaryNetwork here --> >>> # Add the model to the partitioned model >>> partitioned_model.add_partition(model, partition)
>>> # Prepare the test data >>> x_test, y_test = test.drop('target', axis=1), test['target']
>>> # Predict on the partitioned model >>> y_pred = partitioned_model.predict(x_test)
- Parameters:
partition_on (str, optional) – The column to partition on.
- add_partition(model, partition: str)
Adds a partition to the model.
All partitions must be of the same type.
- Parameters:
model (XClassifier | XRegressor) – The model to add.
partition (str) – The name of the partition to add.
- drop_partition(partition: str)
Removes a partition from the model.
- Parameters:
partition (str) – The name of the partition to drop.
- explain(partition: str = '__dataset__')
Generates a global explainer for the model.
- Parameters:
partition (str) – The partition to explain.
- Raises:
ImportError – If user does not have altair installed.
- predict(x) array [source]
Predicts the target value for each row in the data across all partitions.
The partition_on columns will be used to determine which model to use for each observation. If the partition_on column is not present in the data, the ‘__dataset__’ model will be used.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted target values
- Return type:
np.array
- class xplainable.core.ml.regression.XRegressor(max_depth=8, min_info_gain=0.0001, min_leaf_size=0.0001, ignore_nan=False, weight=1, power_degree=1, sigmoid_exponent=0, tail_sensitivity: float = 1.0, prediction_range: tuple = (-inf, inf))[source]
Bases:
BaseModel
Xplainable Regression model for transparent machine learning.
XRegressor offers powerful predictive power and complete transparency for regression problems on tabular data. It is designed to be used in place of black box models such as Random Forests and Gradient Boosting Machines when explainabilty is important.
XRegressor is a feature-wise ensemble of decision trees. Each tree is constructed using a custom algorithm that optimises for information with respect to the target variable. The trees are then weighted and normalised against one another to produce a variable step function for each feature. The summation of these functions produces a score that can be explained in real time. The bounds of the prediction can be set using the prediction_range parameter.
When the fit method is called, the specified params are set across all features. Following the initial fit, the update_feature_params method may be called on a subset of features to update the params for those features only. This allows for a more granular approach to model tuning.
- Important note on performance:
XRegressor alone can be a weak predictor. There are a number of ways to get the most out of the model in terms of predictive power: - use the optimise_tail_sensitivity method - fit an XEvolutionaryNetwork to the model. This will iteratively optimise the weights of the model to produce a much more accurate predictor. You can find more information on this in the XEvolutionaryNetwork documentation at xplainable/core/optimisation/genetic.py.
Example
>>> from xplainable.core.models import XRegressor >>> import pandas as pd >>> from sklearn.model_selection import train_test_split
>>> data = pd.read_csv('data.csv') >>> x = data.drop(columns=['target']) >>> y = data['target'] >>> x_train, x_test, y_train, y_test = train_test_split( >>> x, y, test_size=0.2, random_state=42)
>>> model = XRegressor() >>> model.fit(x_train, y_train)
>>> # This will be a weak predictor >>> model.predict(x_test)
>>> # For a strong predictor, apply optimisations >>> model.optimise_tail_sensitivity(x_train, y_train) >>> # Add evolutionary network here >>> ...
- Parameters:
max_depth (int) – The maximum depth of each decision tree.
min_leaf_size (float) – The minimum number of samples required to make a split.
min_info_gain (float) – The minimum information gain required to make a split.
tail_sensitivity (float) – Adds weight to divisive leaf nodes.
prediction_range (tuple) – The lower and upper limits for predictions.
- constructs_from_json(data)
- constructs_to_json()
- convert_to_model_profile_categories(x)
- evaluate(x: DataFrame | ndarray, y: Series | ndarray) dict [source]
Evaluates the model performance.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
y (pd.Series | np.array) – The target values.
- Returns:
The model performance metrics.
- Return type:
dict
- explain(label_rounding=5)
- property feature_importances: dict
Calculates the feature importances for the model decision process.
- Returns:
The feature importances.
- Return type:
dict
- fit(x: DataFrame | ndarray, y: Series | ndarray, id_columns: list = [], column_names: list | None = None, target_name: str = 'target', alpha=0.1) XRegressor [source]
Fits the model to the data.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables used for training.
y (pd.Series | np.array) – The target values.
id_columns (list, optional) – id_columns to ignore from training.
column_names (list, optional) – column_names to use for training if using a np.ndarray
target_name (str, optional) – The name of the target column if using a np.ndarray
alpha (float, optional) – Sets the number of possible splits with respect to unique values.
- Returns:
The fitted model.
- Return type:
- get_construct_from_column_name(column_name: str)
- local_explainer(x, subsample)
- optimise_tail_sensitivity(X: DataFrame | ndarray, y: Series | ndarray) XRegressor [source]
Optimises the tail_sensitivity parameter at a global level.
- Parameters:
X (pd.DataFrame | np.ndarray) – The x variables to fit.
y (pd.Series | np.ndarray) – The target values.
- Returns:
The optimised model.
- Return type:
- property params: ConstructorParams
Returns the parameters of the model.
- Returns:
The default model parameters.
- Return type:
ConstructorParams
- predict(x: DataFrame | ndarray) array [source]
Predicts the target value for each row in the data.
- Parameters:
x (pd.DataFrame | np.ndarray) – The x variables to predict.
- Returns:
The predicted target values.
- Return type:
np.array
- predict_explain(x)
Predictions with explanations.
- Parameters:
x (array-like) – data to predict
- Returns:
prediction and explanation
- Return type:
pd.DataFrame
- property profile: dict
Returns the model profile.
The model profile contains more granular information about the model and how it makes decisions. It is the primary property for interpreting a model and is used by the xplainable client to render the model.
- Returns:
The model profile.
- Return type:
dict
- set_params(default_parameters: ConstructorParams) None
Sets the parameters of the model. Generally used for model tuning.
- Parameters:
default_parameters (ConstructorParams) – default constructor parameters
- Returns:
None
- update_feature_params(features: list, max_depth=None, min_info_gain=None, min_leaf_size=None, ignore_nan=None, weight=None, power_degree=None, sigmoid_exponent=None, tail_sensitivity=None, *args, **kwargs) XRegressor [source]
Updates the parameters for a subset of features.
XRegressor allows you to update the parameters for a subset of features for a more granular approach to model tuning. This is useful when you identify under or overfitting on some features, but not all.
This also refered to as ‘refitting’ the model to a new set of params. Refitting parameters to an xplainable model is extremely fast as it has already pre-computed the complex metadata required for training. This can yeild huge performance gains compared to refitting traditional models, and is particularly powerful when parameter tuning. The desired result is to have a model that is well calibrated across all features without spending considerable time on parameter tuning.
It’s important to note that if a model has been further optimised using an XEvolutionaryNetwork, the optimised feature_params will be overwritten by this method and will need to be re-optimised.
- Parameters:
features (list) – The features to update.
max_depth (int) – The maximum depth of each decision tree in the subset.
min_info_gain (float) – The minimum information gain required to make a split in the subset.
min_leaf_size (float) – The minimum number of samples required to make a split in the subset.
ignore_nan (bool) – Whether to ignore nan/null/empty values
weight (float) – Activation function weight.
power_degree (float) – Activation function power degree.
sigmoid_exponent (float) – Activation function sigmoid exponent.
tail_sensitivity (float) – Adds weight to divisive leaf nodes in the subset.
- Returns:
The refitted model.
- Return type:
Classes – Regression Optimisation
Regression Optimisers can optimise XRegressor
model weights to a
specific metric. They are used on top of pre-trained models and can be a
powerful tool for optimising models for maximum predictive power while
maintaining complete transparency.
Example:
from xplainable.core.optimisation.genetic import XEvolutionaryNetwork
from xplainable.core.optimisation.layers import Tighten, Evolve
import pandas as pd
from sklearn.model_selection import train_test_split
# Load your data
data = pd.read_csv('data.csv')
x, y = data.drop('target', axis=1), data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# Train your model
model = XRegressor()
model.fit(x_train, y_train)
model.optimise_tail_sensitivity(x_train, y_train)
# Create an optimiser
optimiser = XEvolutionaryNetwork(model)
optimiser.fit(x_train, y_train)
# Add a layers to tighten the model
optimiser.add_layer(Tighten())
optimiser.add_layer(Evolve())
optimiser.add_layer(Evolve())
optimiser.add_layer(Tighten())
# Optimise the model weights in place
optimiser.optimise()
# Predict on the test set
y_pred = model.predict(x_test)
Copyright Xplainable Pty Ltd, 2023
- class xplainable.core.optimisation.genetic.XEvolutionaryNetwork(model: XRegressor, apply_range: bool = False)[source]
Bases:
object
A layer-based optimisation framework for XRegressor models.
XEvolutionaryNetwork is a novel optimisation framework for XRegressor models that allows for flexibility and depth. It is inspired by deep learning frameworks, but is applied over additive models for weight optimisation.
It works by taking a pre-trained XRegressor model and fitting it, along with the training data, to an evolutionary network. The evolutionary network consists of a series of layers, each of which is responsible for optimising the model weights given a set of constraints.
- What are layers?:
There are currently two types of layers: Tighten() and Evolve().
More information on each layer can be found in their respective documentation.
There is no limit to the number of layers that can be added to the network, and each layer can be customised for specific objectives. Like other machine learning methods, the network can be prone to over-fitting, so it is recommended to use a validation set to monitor performance.
An XEvolutionaryNetwork can be stopped mid-training and resumed at any time. This is useful for long-running optimisations and iterative work. You can track the remaining and completed layers using the future_layers and completed_layers attributes.
- Parameters:
model (XRegressor) – The model to optimise.
apply_range (bool) – Whether to apply the model’s prediction range to the output.
- drop_layer(idx: int)[source]
Removes a layer from the network.
- Parameters:
idx (int) – The index of the layer to remove.
- fit(x: DataFrame | ndarray, y: Series | ndarray, subset: list = []) XEvolutionaryNetwork [source]
Fits the model and data to the evolutionary network.
- Parameters:
x (pd.DataFrame | np.ndarray) – The data to fit.
y (pd.Series | np.ndarray) – The target to fit.
subset (list, optional) – A list of columns to subset for feature level optimisation.
- Returns:
The fitted network.
- Return type:
- optimise(callback=None) XEvolutionaryNetwork [source]
Sequentially runs the layers in the network.
- Parameters:
callback (any, optional) – Callback for progress tracking.
- Returns:
The evolutionary network.
- Return type:
Copyright Xplainable Pty Ltd, 2023
- class xplainable.core.optimisation.layers.BaseLayer(metric='mae')[source]
Bases:
object
Base class for optimisation layers.
- Parameters:
metric (str, optional) – Metric to optimise on. Defaults to ‘mae’.
- class xplainable.core.optimisation.layers.Evolve(mutations: int = 100, generations: int = 50, max_generation_depth: int = 10, max_severity: float = 0.5, max_leaves: int = 20, early_stopping: int | None = None)[source]
Bases:
BaseLayer
Evolutionary algorithm to optimise XRegressor leaf weights.
The Evolve layer uses a genetic algorithm to optimise the leaf weights of an XRegressor model. The algorithm works by mutating the leaf weights of the model and scoring the resulting predictions. The best mutations are then selected to reproduce and mutate again. This process is repeated until the maximum number of generations is reached, or the early stopping threshold is reached.
- Parameters:
mutations (int, optional) – The number of mutations to generate per generation.
generations (int, optional) – The number of generations to run.
max_generation_depth (int, optional) – The maximum depth of a generation.
max_severity (float, optional) – The maximum severity of a mutation.
max_leaves (int, optional) – The maximum number of leaves to mutate.
early_stopping (int, optional) – Stop early if no improvement after n iters.
- property params: dict
Returns the parameters of the layer.
- Returns:
The layer parameters.
- Return type:
dict
- transform(xnetwork: XEvolutionaryNetwork, x: ndarray, y: array, callback=None)[source]
Optimises an XRegressor profile given the set of parameters.
- Parameters:
xnetwork (XEvolutionaryNetwork) – The evolutionary network.
x (np.ndarray) – The input variables used for prediction.
y (np.array) – The target values.
callbacks (list) – Callback function for progress tracking.
- Returns:
The original x data to pass to the next layer. np.ndarray: The final optimised chromosome to pass to the next layer.
- Return type:
np.ndarray
- class xplainable.core.optimisation.layers.Tighten(iterations: int = 100, learning_rate: float = 0.03, early_stopping: int | None = None)[source]
Bases:
BaseLayer
A leaf boosting algorithm to optimise XRegressor leaf node weights.
The Tighten layer uses a novel leaf boosting algorithm to optimise the leaf weights of an XRegressor model. The algorithm works by iteratively identifying the leaf node that will have the greatest impact on the overall model score, and then incrementally increasing or decreasing the leaf node weight to improve the model score. This process is repeated until the maximum number of iterations is reached, or the early stopping threshold is reached.
- Args:
iterations (int): The number of iterations to run. learning_rate (float): How fast the model learns. Between 0.001 - 1 early_stopping (int): Stop early if no improvement after n iters.
- property params: dict
Returns the parameters of the layer.
- Returns:
The layer parameters.
- Return type:
dict
- transform(xnetwork: XEvolutionaryNetwork, x: ndarray, y: array, callback=None) tuple [source]
Optimises an XRegressor profile given the set of parameters.
- Parameters:
x (np.ndarray) – The input variables used for prediction.
y (np.array) – The target values.
callback (any) – Callback function for progress tracking.
- Returns:
The optimised feature score map.
- Return type:
dict
Preprocessing
Xplainable offers a preprocessing module that allows you to build reproducible preprocessing pipelines. The module aims to rapidly develop and deploy pipelines in production environments and play friendly with ipywidgets.
The preprocessing module is built on the XPipeline
class from xplainable and
is used similarly to the scikit-learn Pipeline class. All transformers in the
pipeline are expected to have a fit and transform method, along with an
inverse_transform method.
To create custom transformers, you can inherit from the XBaseTransformer
class. You can render these custom transformers in the embedded xplainable
GUI, which allows you to build pipelines without writing any code. You can find
documentation on how to embed them in the GUI in the
advanced_concepts/custom_transformers section.
Using the GUI
Xplainable offers a GUI for making preprocessing pipelines easy and reproducible. You can start the GUI by running a few simple lines.
Example
import xplainable as xp
import pandas as pd
from sklearn.model_selection import train_test_split
# Load data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2, random_state=42)
# Instantiate the preprocessor object
pp = xp.Preprocessor()
# Open the GUI and build pipeline
pp.preprocess(train)
# Apply the pipeline on new data
test_transformed = pp.transform(test)
Using the Python API
You can develop preprocessing pipelines using the Python API with
XPipeline
. The following example shows how to build a pipeline.
Example
from xplainable.preprocessing import transformers as xtf
from xplainable.preprocessing.pipeline import XPipeline
from sklearn.model_selection import train_test_split
import pandas as pd
# Load data
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.2, random_state=42)
# Instantiate a pipeline
pipeline = XPipeline()
# Add stages for specific features
pipeline.add_stages([
{"feature": "age", "transformer": xtf.Clip(lower=18, upper=99)},
{"feature": "balance", "transformer": xtf.LogTransform()}
])
# add stages on multiple features
pipeline.add_stages([
{"transformer": xtf.FillMissing({'job': 'mode', 'age': 'mean'})},
{"transformer": xtf.DropCols(columns=['duration', 'campaign'])}
])
# Share a single transformer across multiple features.
# Note this can only be applied when no fit method is required.
upper_case = xtf.ChangeCase(case='upper')
pipeline.add_stages([
{"feature": "job", "transformer": upper_case},
{"feature": "month", "transformer": upper_case}
])
# Fit and transform the data
train_transformed = pipeline.fit_transform(train)
# Apply transformations on new data
test_transformed = pipeline.transform(test)
# Inverse transform (only applies to configured features)
test_inv_transformed = pipeline.inverse_transform(test_transform)
XPipeline
Copyright Xplainable Pty Ltd, 2023
- class xplainable.preprocessing.pipeline.XPipeline[source]
Bases:
object
Pipeline builder for xplainable transformers.
- Parameters:
stages (list) – list containing xplainable pipeline stages.
- add_stages(stages: list) XPipeline [source]
Adds multiple stages to the pipeline.
- Parameters:
stages (list) – list containing xplainable pipeline stages.
- Returns:
self
- Return type:
- drop_stage(stage: int) XPipeline [source]
Drops a stage from the pipeline.
- Parameters:
stage (int) – index of the stage to drop.
- Returns:
self
- Return type:
- fit(x: DataFrame) XPipeline [source]
Sequentially iterates through pipeline stages and fits data.
- Parameters:
x (pd.DataFrame) – A non-empty DataFrame to fit.
- Returns:
The fitted pipeline.
- Return type:
- fit_transform(x: DataFrame, start: int = 0)[source]
Runs the fit method followed by the transform method.
- Parameters:
x (pd.DataFrame) – A non-empty DataFrame to fit.
start (int) – index of the stage to start fitting from.
- Returns:
The transformed dataframe.
- Return type:
pd.DataFrame
- get_blueprint()[source]
Returns a blueprint of the pipeline.
- Returns:
A list containing the pipeline blueprint.
- Return type:
list
- inverse_transform(x: DataFrame)[source]
Iterates through pipeline stages applying inverse transformations.
- Parameters:
x (pd.DataFrame) – A non-empty DataFrame to inverse transform.
- Returns:
The inverse transformed dataframe.
- Return type:
pd.DataFrame
Base Transformer
Copyright Xplainable Pty Ltd, 2023
- class xplainable.preprocessing.transformers.base.XBaseTransformer[source]
Bases:
object
Base class for all transformers.
This base class is used as a template for all xplainable transformers. It contains the basic methods that all transformers should have, and is used to enforce a consistent API across all transformers.
the __call__ method is used to allow the transformers to be called inside the xplainable gui in jupyter, but does not need to be called.
- fit(*args, **kwargs)[source]
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)[source]
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)[source]
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()[source]
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- transform(x: Series | DataFrame)[source]
Placeholder for transformation operation. Intended to be overridden.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
Categorical Transformers
Copyright Xplainable Pty Ltd, 2023
- class xplainable.preprocessing.transformers.categorical.ChangeCase(case='lower')[source]
Bases:
XBaseTransformer
Changes the case of a string.
- Parameters:
case (str) – ‘upper’ or ‘lower’
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.Condense(pct=0.8, categories=[])[source]
Bases:
XBaseTransformer
Condenses a feature into categories that make up x pct of obserations.
- Parameters:
pct (int) – The minumum pct of observations the categories should cover.
- fit(ser: Series) Condense [source]
Determines the categories that make up x pct of obserations.
- Parameters:
ser (pandas.Series) – The series in which to analyse.
- Raises:
TypeError – If the series is not of type string.
- Returns:
The fitted transformer.
- Return type:
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.DetectCategories(max_categories=10, category_list=[])[source]
Bases:
XBaseTransformer
Auto-detects categories from a string column.
- Parameters:
max_categories (int) – The maximum number of categories to extract.
- fit(ser: Series) DetectCategories [source]
Identifies the top categories from a text series.
- Parameters:
ser (pandas.Series) – The series in which to analyse.
- Returns:
The fitted transformer.
- Return type:
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.FillMissingCategorical(fill_with='missing')[source]
Bases:
XBaseTransformer
Fills missing values with a specified value.
- Parameters:
fill_with (str) – Text to fill with.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.MapCategories(category_values={})[source]
Bases:
XBaseTransformer
Maps all categories of a string column to new values
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.MergeCategories(merge_from=[], merge_to='')[source]
Bases:
XBaseTransformer
Merges specified categories in a series into one category.
- Parameters:
merge_from (list) – List of categories to merge from.
merge_to (str) – The category to merge to.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.ReplaceCategory(target=None, replace_with='')[source]
Bases:
XBaseTransformer
Replaces a category in a series with specified value.
- Parameters:
target – The target value to replace.
replace_with – The value to insert in place.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.ReplaceWith(target=None, replace_with=None)[source]
Bases:
XBaseTransformer
Replaces specified value in series
- Parameters:
case (str) – ‘upper’ or ‘lower’
- case
The case the string will convert to.
- Type:
str
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.TextContains(selector=None, value=None)[source]
Bases:
XBaseTransformer
Flags series values that contain, start with, or end with a value.
- Parameters:
selector (str) – The type of search to make.
value (str) – The value to search.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.TextRemove(numbers=False, characters=False, uppercase=False, lowercase=False, special=False, whitespace=False, stopwords=False, text=None, custom_regex=None)[source]
Bases:
XBaseTransformer
Remove specified values from a str type series.
This transformer cannot be inverse_transformed and does not require fitting.
- Parameters:
numbers (bool, optional) – Removes numbers from string.
characters (bool, optional) – Removes characters from string.
uppercase (bool, optional) – Removes uppercase characters from string.
lowercase (bool, optional) – Removes lowercase characters from string.
special (bool, optional) – Removes special characters from string.
whitespace (bool, optional) – Removes whitespace from string.
stopwords (bool, optional) – Removes stopwords from string.
text (str, optional) – Removes specific text match from string.
custom_regex (str, optional) – Removes matching regex text from string.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.TextSlice(start=None, end=None, action='keep')[source]
Bases:
XBaseTransformer
Selects slice from categorical column string.
- Parameters:
start (int) – Starting character.
end (int) – Ending character.
action (str) – [keep, drop] selected slice.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
- class xplainable.preprocessing.transformers.categorical.TextTrim(selector=None, n=0, action='keep')[source]
Bases:
XBaseTransformer
Drops or keeps first/last n characters of a categorical column.
- Parameters:
selector (str) – [first, last].
n (int) – Number of characters to identify.
action (str) – [keep, drop] the identified characters.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical']
Numeric Transformers
Copyright Xplainable Pty Ltd, 2023
- class xplainable.preprocessing.transformers.numeric.Clip(lower=None, upper=None)[source]
Bases:
XBaseTransformer
Clips numeric values to a specified range.
- Parameters:
lower (float) – The lower threshold value.
upper (float) – The upper threshold value.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['numeric']
- class xplainable.preprocessing.transformers.numeric.FillMissingNumeric(fill_with='mean', fill_value=None)[source]
Bases:
XBaseTransformer
Fills missing values with a specified strategy.
- Parameters:
fill_with (str) – The strategy [‘mean’, ‘median’, ‘mode’].
- fit(ser: Series) FillMissingNumeric [source]
Calculates the fill value from a series.
- Parameters:
ser (pandas.Series) – The series to analyse.
- Returns:
The fitted transformer.
- Return type:
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['numeric']
- class xplainable.preprocessing.transformers.numeric.LogTransform[source]
Bases:
XBaseTransformer
Log transforms a given numeric series.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(ser: Series) Series [source]
- Parameters:
ser (pd.Series) – The series to inverse transform.
- Returns:
The inverse transformed series.
- Return type:
pd.Series
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['numeric']
- class xplainable.preprocessing.transformers.numeric.MinMaxScale(min_value=None, max_value=None)[source]
Bases:
XBaseTransformer
Scales a numeric series between 0 and 1.
- fit(ser: Series) MinMaxScale [source]
Extracts the min and max value from a series.
- Parameters:
ser (pandas.Series) – The series in which to analyse.
- Returns:
The fitted transformer.
- Return type:
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['numeric']
Mixed Transformers
Copyright Xplainable Pty Ltd, 2023
- class xplainable.preprocessing.transformers.mixed.SetDType(to_type=None)[source]
Bases:
XBaseTransformer
Changes the data type of a specified column.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['numeric', 'categorical']
- class xplainable.preprocessing.transformers.mixed.Shift(step=0)[source]
Bases:
XBaseTransformer
Shifts a series up or down n steps.
- Parameters:
step (str) – The number of steps to shift.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['categorical', 'numeric']
Dataset Transformers
Copyright Xplainable Pty Ltd, 2023
- class xplainable.preprocessing.transformers.dataset.ChangeCases(columns=[], case='lower')[source]
Bases:
XBaseTransformer
Changes the case of all specified categorical columns.
- Parameters:
columns (list) – To apply the case change to.
case (str) – ‘upper’ or ‘lower’.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.ChangeNames(col_names={})[source]
Bases:
XBaseTransformer
Changes names of columns in a dataset
- Parameters:
col_names (dict) – Dictionary of old and new column names.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.DateTimeExtract(target=None, year=False, month=False, day=False, weekday=False, day_name=False, hour=False, minute=False, second=False, drop=False)[source]
Bases:
XBaseTransformer
Extracts Datetime values from datetime object.
- Parameters:
target (str) – The datetime column to extract from.
year (bool) – Extracts year if True.
month (bool) – Extracts month if True.
day (bool) – Extracts day if True.
weekday (bool) – Extracts weekday if True.
day_name (bool) – Extracts day name if True.
hour (bool) – Extracts hour if True.
minute (bool) – Extracts minute if True.
second (bool) – Extracts second if True.
drop (bool) – Drops original datetime column if True.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.DropCols(columns=None)[source]
Bases:
XBaseTransformer
Drops specified columns from a dataset.
- Parameters:
columns (str) – The columns to be dropped.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.DropNaNs(subset=None)[source]
Bases:
XBaseTransformer
Drops nan rows from a dataset.
- Parameters:
subset (list, optional) – A subset of columns to apply the transfomer.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.FillMissing(fill_with={}, fill_values={})[source]
Bases:
XBaseTransformer
Fills missing values of all columns with a specified value/strategy.
- fit(df: DataFrame) FillMissing [source]
Calculates the fill_value for all columns in the dataset.
The fill values are based on a specified strategy for each column.
- Parameters:
df (pd.DataFrame) – The dataset to fit
- Returns:
The fitted transformer.
- Return type:
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.GroupbyShift(columns=None, step=0, as_new=True, col_names=[], group_by=None, order_by=None, descending=None)[source]
Bases:
XBaseTransformer
Shifts a series up or down n steps within specified group.
- Parameters:
target (str) – The target feature to shift.
step (int) – The number of steps to shift.
as_new (bool) – Creates new column if True.
group_by (str) – The column to group by.
order_by (str) – The column to order by.
descending (bool) – Orders the value descending if True.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.GroupedSignalSmoothing(target=None, group_by=None, order_by=None, descending=None)[source]
Bases:
XBaseTransformer
Smooths signal data within specified group.
- Parameters:
target (str) – The target feature to shift.
as_new (bool) – Creates new column if True.
group_by (str) – The column to group by.
order_by (str) – The column to order by.
descending (bool) – Orders the value descending if True.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.Operation(columns=[], operation=None, alias: str | None = None, drop: bool = False)[source]
Bases:
XBaseTransformer
Applies operation to multiple columns (in order) into new feature.
- Parameters:
columns (list) – Column names to add.
alias (str) – Name of newly created column.
drop (bool) – Drops original columns if True
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.OrderBy(order_by=None, ascending=True)[source]
Bases:
XBaseTransformer
Orders the dataset by the values of a given series.
- Parameters:
order_by (str) – The series to order by.
ascending (bool) – Orders in ascending order if True.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.RollingOperation(groupby=None, orderby=None, direction=None, columns=[], window=None, operation=None, drop: bool = False)[source]
Bases:
XBaseTransformer
Applies operation to multiple columns (in order) into new feature.
- Parameters:
columns (list) – Column names to add.
alias (str) – Name of newly created column.
drop (bool) – Drops original columns if True
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.SetDTypes(types={})[source]
Bases:
XBaseTransformer
Sets the data type of all columns in the dataset.
- Parameters:
types (dict) – Dictionary of column names and data types.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.TextSplit(target=None, separator=None, max_splits=0)[source]
Bases:
XBaseTransformer
Splits a string column into multiple columns on a specified separator.
- Parameters:
target (str) – The columns to split.
separator (str) – The separator to split on.
max_splits (int) – The maximum number of splits to make.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
- class xplainable.preprocessing.transformers.dataset.TextTrimMulti(column='', selector=None, n=0, action='keep', drop_col=False, alias='')[source]
Bases:
XBaseTransformer
Drops or keeps first/last n characters of a categorical column.
- Parameters:
selector (str) – [first, last].
n (int) – Number of characters to identify.
action (str) – [keep, drop] the identified characters.
- fit(*args, **kwargs)
No fit is required for this transformer.
This is a default fit method in case no fit is needed. This method is used to allow the transformer to be used in a pipeline, and is intended to be overridden by transformers that require fitting.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- fit_transform(x: Series | DataFrame)
Fit and transforms data on a series or dataframe.
- Parameters:
x (pd.Series | pd.DataFrame) – Series or df to fit & transform.
- Returns:
The transformed series or df.
- Return type:
pandas.Series
- inverse_transform(x: Series | DataFrame)
No inverse transform is available for this transformer.
This is a default inverse method in case no inverse transform is available.
The input parameter is either a pd.Series or a pd.DataFrame, depending on the transformer. Documentation for each individual transformer should specify which type of input is expected in this method when it is being overridden.
- Parameters:
x (pd.Series | pd.DataFrame) – To be specified by transformer.
- Decorators:
raise_errors (decorator): Raises detailed errors.
- raise_errors()
Decorator to raise detailed errors in transformer functions.
This decorator is used to wrap the transformer methods and raise any errors that occur during processing. This is done to allow the gui to catch the errors and display them.
- supported_types = ['dataset']
NLP
Documentation Coming Soon.