Query Class#

class Query(name: str, auth_key: str | None = None, environment: str = 'default', env_var: str = 'JAI_AUTH', verbose: int = 1, safe_mode: bool = False, batch_size: int = 1048576)#

Query task class.

An authorization key is needed to use the Jai API.

Parameters:
  • name (str) – String with the name of a database in your JAI environment.

  • environment (str) – Jai environment id or name to use. Defaults to “default”

  • env_var (str) – The environment variable that contains the JAI authentication token. Defaults to “JAI_AUTH”.

  • verbose (int) – The level of verbosity. Defaults to 1

  • safe_mode (bool) – When safe_mode is True, responses from Jai API are validated. If the validation fails, the current version you are using is probably incompatible with the current API version. We advise updating it to a newer version. If the problem persists and you are on the latest SDK version, please open an issue so we can work on a fix. Defaults to False.

  • batch_size (int) – Size of the batch to split data sent to the API. It won’t change results, but a value too small could increase the total process time and a value too large could exceed the data limit of the request. Defaults to 2**20 (1.048.576).

check_features(columns: List[str], name: str | None = None)#

It checks if the columns you want to use in your model match the expected from the API.

Parameters:
  • (List[str]) (columns) –

  • (str) (name) –

Return type:

A list of names of columns that were not found.

property db_type#
describe()#

Get the database hyperparameters and parameters of a specific database.

Parameters:

name (str) – String with the name of a database in your JAI environment.

Returns:

response – Dictionary with database description.

Return type:

dict

download_vectors()#

Download vectors from a particular database.

Parameters:

name (str) – String with the name of a database in your JAI environment.

Returns:

vector – Numpy array with all vectors.

Return type:

np.array

Example

>>> from jai import Query
...
>>> q = Query(name)
>>> q.download_vectors()
>>> print(vectors)
[[ 0.03121682  0.2101511  -0.48933393 ...  0.05550333  0.21190546  0.19986008]
[-0.03121682 -0.21015109  0.48933393 ...  0.2267401   0.11074653  0.15064166]
...
[-0.03121682 -0.2101511   0.4893339  ...  0.00758727  0.15916921  0.1226602 ]]
fields()#

Get the table fields for a Supervised/SelfSupervised database.

Parameters:

name (str) – String with the name of a database in your JAI environment.

Returns:

response – Dictionary with table fields.

Return type:

dict

Example

>>> from jai import Query
...
>>> q = Query(name)
>>> q.fields()
filters()#

Gets the valid values of filters.

Returns:

response – List of valid filter values.

Return type:

list of strings

ids(mode: Mode = 'complete')#

Get id information of a given database.

Args mode : str, optional

Returns:

response – List with the actual ids (mode: ‘complete’) or a summary of ids (‘simple’) of the given database.

Return type:

list

Example

>>> from jai import Query
...
>>> q = Query(name)
>>> q.ids()
>>> print(ids)
['891 items from 0 to 890']
is_valid()#

Check if a given name is a valid database name (i.e., if it is in your environment).

Returns:

response – True if name is in your environment. False, otherwise.

Return type:

bool

property name#
predict(data: Series | DataFrame, predict_proba: bool = False, as_frame: bool = False, max_workers: int | None = None)#

Predict the output of new data for a given database.

Parameters:
  • data (pd.Series or pd.DataFrame) – Data to be queried for similar inputs in your database.

  • predict_proba (bool) – Whether or not to return the probabilities of each prediction is it’s a classification. Default is False.

  • as_frame (bool) – Whether or not to return the result of prediction as a DataFrame or list. Default is False.

  • max_workers (bool) – Number of workers to use to parallelize the process. If None, use all workers. Defaults to None.

Returns:

results – List of dictionaries with ‘id’ of the inputed data and ‘predict’ as predictions for the data passed as input.

Return type:

list of dicts

recommendation(data: list | ndarray | Index | Series | DataFrame, top_k: int = 5, orient: str = 'nested', filters: List[str] | None = None, max_workers: int | None = None)#

Query a database in search for the top_k most recommended entries for each input data passed as argument.

Parameters:
  • data (list, np.ndarray, pd.Index, pd.Series or pd.DataFrame) – Data to be queried for recommendation in your database. - Use list, np.ndarray or pd.Index for id. - Use pd.Series or pd.Dataframe for raw data.

  • top_k (int) – Number of k recommendations that we want to return. Default is 5.

  • orient ("nested" or "flat") – Changes the output format. Default is “nested”.

  • filters (List of strings) – Filters to use on the similarity query. Default is None.

  • max_workers (bool) – Number of workers to use to parallelize the process. If None, use all workers. Defaults to None.

Returns:

results – A list with a dictionary for each input value identified with ‘query_id’ and ‘result’ which is a list with ‘top_k’ most recommended items dictionaries, each dictionary has the ‘id’ from the database previously setup and ‘distance’ in between the correspondent ‘id’ and ‘query_id’.

Return type:

list of dicts

similar(data: list | ndarray | Index | Series | DataFrame, top_k: int = 5, orient: str = 'nested', filters: List[str] | None = None, max_workers: int | None = None)#

Query a database in search for the top_k most similar entries for each input data passed as argument.

Parameters:
  • data (list, np.ndarray, pd.Index, pd.Series or pd.DataFrame) – Data to be queried for similar inputs in your database. - Use list, np.ndarray or pd.Index for id. - Use pd.Series or pd.Dataframe for raw data.

  • top_k (int) – Number of k similar items that we want to return. Default is 5.

  • orient ("nested" or "flat") – Changes the output format. Default is “nested”.

  • filters (List of strings) – Filters to use on the similarity query. Default is None.

  • max_workers (bool) – Number of workers to use to parallelize the process. If None, use all workers. Defaults to None.

Returns:

results – A list with a dictionary for each input value identified with ‘query_id’ and ‘result’ which is a list with ‘top_k’ most similar items dictionaries, each dictionary has the ‘id’ from the database previously setup and ‘distance’ in between the correspondent ‘id’ and ‘query_id’.

Return type:

list of dicts

property url#

Get name and type of each database in your environment.