Processing Results#

Here are auxiliar functions used internally on applications methods of Jai to process outputs.

filter_resolution(results, threshold=None, return_self=True, res_id='resolution_id')#

Process the results of similarity for resolution goals.

Differs from process_similiar on cases where A is similar to B and B is similar to C, it should give the result of both A and B are similar to C, and so on.

Parameters:
  • results (List of Dicts.) – output from similar methods.

  • threshold (float, optional) – value for the distance threshold. The default is None. if set to None, we used the auxiliar function find_threshold.

  • return_self (bool, optional) – option to return the queried id from the query result or not. The default is True.

  • res_id (str, optional) – name of the key for the resolution. The default is “resolution_id”.

Returns:

connect – List of dicts with each id and their correspondent resolution.

Return type:

list of dicts

filter_similar(results, threshold: float | None = None, return_self: bool = True, skip_null: bool = True)#

Process the output from the similar methods.

For each of the inputs, gives back the closest value. If result_self is False, avoids returning cases where ‘id’ is equal to ‘query_id’ and returns the next closest if necessary.

Parameters:
  • results (List of Dicts.) – output from similar methods.

  • threshold (float, optional) – value for the distance threshold. The default is None. if set to None, we used the auxiliar function find_threshold.

  • return_self (bool, optional) – option to return the queried id from the query result or not. The default is True.

  • skip_null (bool, optional) – option to skip ids without similar results, if False, returns empty results. The default is True.

Raises:

NotImplementedError – If priority inputed is not implemented.

Returns:

mapping the query id to the similar value.

Return type:

list

find_threshold(results, sample_size=0.1, quantile=0.05)#

Auxiliar function to find a threshold value.

Takes a sample of size sample_size of the results list and uses the quantile of the distances of the sample to use as threshold.

This is a automated function, we strongly advise to set the threshold manualy to get more accurate results.

Parameters:
  • results (list of dicts, output of similar) – DESCRIPTION.

  • sample_size (float, optional) – Percentage of the results taken to calculate the threshold. If len(results) is too small, i.e., len(results) * sample_size is less than 1, then we use sample_size=0.5 or 1. The default is 0.1.

  • quantile (float, optional) – Quantile of the distances of all the query results of the sample taken. We suggest to use the similar method with a top_k big enough for the quantile, i.e., the total number of distances is len(results) * sample_size * top_k, top_k helps to get more values of distances as of using a small top_k will make a distance group of only distances close to 0 and threshold may not be representative. The default is 0.05.

Returns:

Threshold result.

Return type:

float

predict2df(predicts, digits: int = 2, percentage: bool = True)#

Process the output from the predict methods from supervised models.

Parameters:
  • predicts (List of Dicts.) – output from predict methods.

  • digits (int, optional) – If prediction is a probability, number of digits to round the predicted values.

  • percentage (bool, optional) – If prediction is a probability, whether to return percentage value or decimal.

Returns:

mapping the query id to the predicted value.

Return type:

list

Example

>>> from jai.utilities import predict2df
...
>>> split_bases, main_base = predict2df(
...     df,
...     {"tower_1": ["columns_tower1"], "tower_2": ["columns_tower2"]},
...     ["split_column"],
... )
treat_unix(df_unix_col)#

Transform the type of the unix timestamp column to datetime returning a series that replaces the original column.

Parameters:

dataframe (pd.DataFrame) – Dataframe with only the unix column.

Returns:

datime_col – should substitute the unix timestamp column.

Return type:

column with the type altered to datetime that