Trainer Class#

class Trainer(name: str, auth_key: Optional[str] = None, environment: str = 'default', env_var: str = 'JAI_AUTH', verbose: int = 1, safe_mode: bool = False)#

Trainer task class.

An authorization key is needed to use the Jai API.

Parameters
  • name (str) – String with the name of a database in your JAI environment.

  • environment (str) – Jai environment id or name to use. Defaults to “default”

  • env_var (str) – The environment variable that contains the JAI authentication token. Defaults to “JAI_AUTH”.

  • verbose (int) – The level of verbosity. Defaults to 1

  • safe_mode (bool) – When safe_mode is True, responses from Jai API are validated. If the validation fails, the current version you are using is probably incompatible with the current API version. We advise updating it to a newer version. If the problem persists and you are on the latest SDK version, please open an issue so we can work on a fix. Defaults to False.

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
append(data, *, frequency_seconds: int = 1)#

Insert raw data and extract their latent representation.

This method should be used when we already setup up a database using fit() and want to create the vector representations of new data using the model we already trained for the given database.

Parameters
  • data (pandas.DataFrame) – Data to be inserted and used for training.

  • frequency_seconds (int) – Time in between each check of status. If less than 1, it won’t wait for setup to finish, allowing to perform other actions, but could cause errors on some scripts. Default is 1.

Returns

insert_responses – Dictionary of responses for each batch. Each response contains information of whether or not that particular batch was successfully inserted.

Return type

dict

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.append(data)
property db_type#
delete_database()#

Delete a database and everything that goes with it (I thank you all).

Parameters

name (str) – String with the name of a database in your JAI environment.

Returns

response – Dictionary with the API response.

Return type

dict

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.delete_database()
delete_ids(ids)#

Delete the specified ids from database.

Parameters

ids (list) – List of ids to be removed from database.

Returns

response – Dictionary with the API response.

Return type

dict

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.delete_ids([0, 1])
delete_raw_data()#

Delete raw data. It is good practice to do this after training a model.

Returns

response – Dictionary with the API response.

Return type

dict

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.delete_raw_data()
describe()#

Get the database hyperparameters and parameters of a specific database.

Parameters

name (str) – String with the name of a database in your JAI environment.

Returns

response – Dictionary with database description.

Return type

dict

fit(data, *, overwrite: bool = False, frequency_seconds: int = 1, verbose: int = 1)#

Takes in a dataframe or dictionary of dataframes, and inserts the data into Jai.

Otherwise, it calls the wait_setup function to wait for the model to finish training, and then calls the report function to print out the model’s performance metrics.

Finally, it returns the get_query function, which returns the class to consume the model..

Parameters
  • data (pd.DataFrame or dict of pd.DataFrame)) – The data to be inserted into the database. It is required to be an pandas.Dataframe, unless it’s a RecommendationSystem, then it’s a dictionary of pandas.DataFrame.

  • overwrite (bool) – If overwrite is True, then deletes previous database with the same name if exists. Defaults to False.

  • frequency_seconds (int) – How often to check the status of the model. If frequency_seconds is less than 1, it returns the insert_responses and setup_response and it won’t wait for training to finish, allowing to perform other actions, but could cause errors on some scripts if the model is expected to be ready for consuming. Defaults to 1.

Returns

  • Tuple (tuple) – If frequency_seconds < 1, the returned value is a tuple of two elements. The first element is a list of responses from the insert_data function. The second element is a dictionary of the response from the setup function.

  • Query (jai.Query class) – If frequency_seconds >= 1, then the return will be an Query class of the database trained. If the database is a RecommendationSystem type, then it will return a dictionary of Query classes.

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.fit(data)
property fit_parameters#
get_query(name: Optional[str] = None)#

This method returns a new Query object with the same initial values as the current Trainer object

Parameters

name (str) – The name of the query. Defaults to the same name as the current Trainer object.

Return type

A Query object with the name and init values.

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.get_query()
property insert_parameters#

Parameters used for insert data.

is_valid()#

Check if a given name is a valid database name (i.e., if it is in your environment).

Returns

response – True if name is in your environment. False, otherwise.

Return type

bool

property name#
report(verbose: int = 2, return_report: bool = False)#

Get a report about the training model.

Parameters
  • verbose (int, optional) – Level of description. The default is 2. Use verbose 2 to get the loss graph, verbose 1 to get only the metrics result.

  • return_report (bool, optional) – Returns the report dictionary and does not print or plot anything. The default is False.

Returns

Dictionary with the information.

Return type

dict

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.report()
set_parameters(db_type: str, hyperparams: Optional[Dict[str, Dict]] = None, features: Optional[Dict[str, Dict]] = None, num_process: Optional[dict] = None, cat_process: Optional[dict] = None, datetime_process: Optional[dict] = None, pretrained_bases: Optional[list] = None, label: Optional[dict] = None, split: Optional[dict] = None, verbose: int = 1)#

It checks the input parameters and sets the fit_parameters attribute for setup.

Args: db_type (str):Type of the database to be created. hyperparams (dict): Dictionary of the fit parameters. Varies for each database type. features (dict): Dictionary of name of the features as keys and dictionary of parameters for each feature. num_process (dict): Dictionary defining the default process for numeric features. cat_process (dict): Dictionary defining the default process for categorical features. datetime_process (dict): Dictionary defining the default process for datetime features. pretrained_bases (list): List of dictionaries mapping the features to the databases trained previously. label (dict): Dictionary defining the label. split (dict): Dictionary defining the train/validation split for the model training.

Example

>>> from jai import Trainer
...
>>> trainer = Trainer(name)
>>> trainer.set_parameters(db_type)
status()#

Get the status of your JAI environment when training.

Returns

response – A JSON file with the current status of the training tasks.

Return type

dict

update_database(name: str, display_name: Optional[str] = None, project: Optional[str] = None)#
property url#

Get name and type of each database in your environment.

wait_setup(frequency_seconds: int = 1)#

Wait for the fit (model training) to finish

Parameters

frequency_seconds (int, optional) – Number of seconds apart from each status check. Default is 5.

Return type

None.