Processing Results#
Here are auxiliar functions used internally on applications methods of Jai to process outputs.
- filter_resolution(results, threshold=None, return_self=True, res_id='resolution_id')#
Process the results of similarity for resolution goals.
Differs from process_similiar on cases where A is similar to B and B is similar to C, it should give the result of both A and B are similar to C, and so on.
- Parameters:
results (List of Dicts.) – output from similar methods.
threshold (float, optional) – value for the distance threshold. The default is None. if set to None, we used the auxiliar function find_threshold.
return_self (bool, optional) – option to return the queried id from the query result or not. The default is True.
res_id (str, optional) – name of the key for the resolution. The default is “resolution_id”.
- Returns:
connect – List of dicts with each id and their correspondent resolution.
- Return type:
list of dicts
- filter_similar(results, threshold: float | None = None, return_self: bool = True, skip_null: bool = True)#
Process the output from the similar methods.
For each of the inputs, gives back the closest value. If result_self is False, avoids returning cases where ‘id’ is equal to ‘query_id’ and returns the next closest if necessary.
- Parameters:
results (List of Dicts.) – output from similar methods.
threshold (float, optional) – value for the distance threshold. The default is None. if set to None, we used the auxiliar function find_threshold.
return_self (bool, optional) – option to return the queried id from the query result or not. The default is True.
skip_null (bool, optional) – option to skip ids without similar results, if False, returns empty results. The default is True.
- Raises:
NotImplementedError – If priority inputed is not implemented.
- Returns:
mapping the query id to the similar value.
- Return type:
list
- find_threshold(results, sample_size=0.1, quantile=0.05)#
Auxiliar function to find a threshold value.
Takes a sample of size sample_size of the results list and uses the quantile of the distances of the sample to use as threshold.
This is a automated function, we strongly advise to set the threshold manualy to get more accurate results.
- Parameters:
results (list of dicts, output of similar) – DESCRIPTION.
sample_size (float, optional) – Percentage of the results taken to calculate the threshold. If len(results) is too small, i.e., len(results) * sample_size is less than 1, then we use sample_size=0.5 or 1. The default is 0.1.
quantile (float, optional) – Quantile of the distances of all the query results of the sample taken. We suggest to use the similar method with a top_k big enough for the quantile, i.e., the total number of distances is len(results) * sample_size * top_k, top_k helps to get more values of distances as of using a small top_k will make a distance group of only distances close to 0 and threshold may not be representative. The default is 0.05.
- Returns:
Threshold result.
- Return type:
float
- predict2df(predicts, digits: int = 2, percentage: bool = True)#
Process the output from the predict methods from supervised models.
- Parameters:
predicts (List of Dicts.) – output from predict methods.
digits (int, optional) – If prediction is a probability, number of digits to round the predicted values.
percentage (bool, optional) – If prediction is a probability, whether to return percentage value or decimal.
- Returns:
mapping the query id to the predicted value.
- Return type:
list
Example
>>> from jai.utilities import predict2df ... >>> split_bases, main_base = predict2df( ... df, ... {"tower_1": ["columns_tower1"], "tower_2": ["columns_tower2"]}, ... ["split_column"], ... )
- treat_unix(df_unix_col)#
Transform the type of the unix timestamp column to datetime returning a series that replaces the original column.
- Parameters:
dataframe (pd.DataFrame) – Dataframe with only the unix column.
- Returns:
datime_col – should substitute the unix timestamp column.
- Return type:
column with the type altered to datetime that