Skip to main content
Ctrl+K
Logo image
  • What is JAI-SDK?

An overview of JAI-SDK

  • Getting Started
  • JAI in 5 minutes
    • How to configure your auth key
    • Flower classification with Iris dataset
    • Credit Card Fraud Detection
  • Data Cleaning with JAI
  • What’s New
    • Version 0.20.1
    • Version 0.20.0

Basic User Guide

  • Using Jai
    • The Fit Method
    • Getting Fit Reports
    • Working with your environment
    • Removing data
  • Consuming Collections
    • Similarity Seach
    • Model Inference
  • Examples
    • Smartphone - Similarity Search
    • California Housing - Quantile Regression
    • US Census - Classification
  • Task Module
    • Explorer Module
    • Trainer Module
    • Query Module
    • Vectors Module
    • Linear Models
  • Task Examples
    • Smartphone - Similarity Search
    • California Housing - Quantile Regression
    • US Census - Classification

Advanced User Guide

  • Creating a new collection with filters
  • Training Linear Models
  • Recommendation Systems

API and Package Reference

  • JAI Python Class
  • Fit kwargs (keyword arguments)
  • Utilities Functions
    • Authentication
    • Splitting Data
    • Image Functions
    • Processing Results
  • Task Module Reference
    • Explorer Class
    • Trainer Class
    • Query Class
    • Vectors Class
    • Linear Model Class
  • Repository
  • Suggest edit
  • Open issue
  • .ipynb

Smartphone Similarity Search

Contents

  • What are we going to do?
  • Importing libraries
  • Dataset quick look
  • Search by product attribute
  • Checking the collections available on Jai
  • Search similar smartphones - by all attributes

Smartphone Similarity Search#

What are we going to do?#

Note: This is a replication of the Smartphone Similarity Search example using the Task Module.

In this quick demo, we will use JAI to:

  • Train and deploy models into a secure and scalable production-ready environment.

  • Complete a similarity search - Given a database of smartphone models and its attributes, identify which smartphones are most similar to the target.


Importing libraries#

[1]:
import pandas as pd
from jai import Trainer

Dataset quick look#

This dataset contains data related to the specs of different models of smartphones. Some of the analyzed specs are, for example, dimensions, display type, model and brand. In this example, we’re going to identify which cellphones are most similar to one another based on their sspecs.

[2]:
# Obtaining the data from website
file_url = 'https://myceliademo.blob.core.windows.net/smartphone-dataset/smartphones_sample_db.csv?sp=rl&st=2021-05-17T16:30:09Z&se=2025-01-18T16:30:00Z&sv=2020-02-10&sr=b&sig=6LeB2OPLM33LXiPaaQ6LvO00mt0MMfgczKtt92AJvMU%3D'
df = pd.read_csv(file_url,index_col=0)
[3]:
# Show name of columns and non-null count
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 4439 to 2276
Data columns (total 37 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   objectId            1000 non-null   object
 1   Model               999 non-null    object
 2   Brand               999 non-null    object
 3   Network             999 non-null    object
 4   TwoG                970 non-null    object
 5   ThreeG              571 non-null    object
 6   FourG               199 non-null    object
 7   Network_Speed       574 non-null    object
 8   GPRS                996 non-null    object
 9   EDGE                997 non-null    object
 10  Announced           998 non-null    object
 11  Status              999 non-null    object
 12  Dimensions          998 non-null    object
 13  field13             69 non-null     object
 14  SIM                 999 non-null    object
 15  Display_type        999 non-null    object
 16  Display_resolution  874 non-null    object
 17  Display_size        995 non-null    object
 18  Operating_System    569 non-null    object
 19  CPU                 561 non-null    object
 20  Chipset             419 non-null    object
 21  GPU                 409 non-null    object
 22  Memory_card         999 non-null    object
 23  Internal_memory     791 non-null    object
 24  RAM                 543 non-null    object
 25  Primary_camera      861 non-null    object
 26  Secondary_camera    859 non-null    object
 27  Loud_speaker        999 non-null    object
 28  Audio_jack          996 non-null    object
 29  WLAN                998 non-null    object
 30  Bluetooth           999 non-null    object
 31  GPS                 998 non-null    object
 32  NFC                 101 non-null    object
 33  Radio               984 non-null    object
 34  USB                 902 non-null    object
 35  Sensors             551 non-null    object
 36  Battery             999 non-null    object
dtypes: object(37)
memory usage: 296.9+ KB
[4]:
# Showing first 5 rows of dataframe
df.head()
[4]:
objectId Model Brand Network TwoG ThreeG FourG Network_Speed GPRS EDGE ... Loud_speaker Audio_jack WLAN Bluetooth GPS NFC Radio USB Sensors Battery
4439 9X9xdNsAcI _M32 Bird GSM GSM 900 / 1800 NaN NaN NaN Class 10 No ... Yes No No No No NaN FM radio (optional) Yes NaN Removable Li-Ion 600 mAh battery
2922 uWQacAjFLE _Galaxy J3 Pro Samsung GSM / CDMA / HSPA / EVDO / LTE GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 HSDPA 850 / 900 / 1900 / 2100 LTE band 1(2100)| 3(1800)| 7(2600)| 41(2500) HSPA 42.2/5.76 Mbps LTE Cat4 150/50 Mbps Yes Yes ... Yes Yes Wi-Fi 802.11 b/g/n| Wi-Fi Direct| hotspot 4.1| A2DP Yes with A-GPS GLONASS BDS Yes FM radio| RDS| recording microUSB 2.0| USB On-The-Go Accelerometer| proximity Removable Li-Ion 2600 mAh battery
4071 D60tQhWyLw _KP501 Cookie LG GSM GSM 850 / 900 / 1800 / 1900 NaN NaN NaN Class 10 Class 10 ... Yes No No 2.1| A2DP No NaN Stereo FM radio| RDS 2 Accelerometer Removable Li-Ion 900 mAh battery
3689 JgAmFNzZGT _Iconic Phablet ZTE CDMA / EVDO / LTE CDMA 800 / 1900 CDMA2000 1xEV-DO LTE EV-DO Rev.A 3.1 Mbps LTE Yes Yes ... Yes Yes Wi-Fi 802.11 b/g/n| hotspot 4.0| A2DP Yes with A-GPS NaN To be confirmed microUSB 2.0 Accelerometer| proximity Non-removable Li-Ion 3200 mAh battery
3882 taHMJkJclc _J501 Asus GSM GSM 900 / 1800 / 1900 NaN NaN NaN Class 12 No ... No No No 1.2| A2DP No NaN FM radio 1.1 NaN Removable Li-Ion 750 mAh battery

5 rows × 37 columns


Search by product attribute#

Now we’re going to execute a similarity search by product attribute. To do so, we first need to insert the smartphone dataframe into jai. In this example, we’ll use the Trainer module to send data to Jai. We’ll use the trainer.fit method to created the collection, which can then be consumed through the Query Module.

[5]:
# Instantiate Jai class
trainer = Trainer(
    name = 'smartphones',
    verbose = 2
)

# Define the collection parameters
trainer.set_parameters(
    db_type = 'SelfSupervised'
)

# Using jai setup with self-supervised learning
# The raw data is inserted into jai, which turns the data into a latent vector collection that is then stored in a database
query = trainer.fit(
    data = df,
    overwrite = True
)

Recognized fit arguments:
- db_type: SelfSupervised
Insert Data: 100%|██████████| 1/1 [00:00<00:00,  4.04it/s]

Recognized fit arguments:
- db_type: SelfSupervised
JAI is working: 100%|██████████|20/20 [00:15]
../../_images/source_task_examples_Smartphone_Demo_8_4.png

Setup Report:

Best model at epoch: 32 val_loss: 0.07

Checking the collections available on Jai#

You can use Explorer Module to get an overview of your Jai’s work environment.

[6]:
from jai import Explorer
explorer = Explorer()
[7]:
# Verifying databases that have already been created for the used AuthKey
explorer.names
[7]:
['california_housing', 'smartphones']
[8]:
explorer.info()
# Gets name, type and other information about each created database
[8]:
name type last modified dependencies size embedding_dimension
1 california_housing Supervised 2022-06-27-15h03 [] 20640 64
0 smartphones SelfSupervised 2022-06-27-15h05 [] 1000 64

Search similar smartphones - by all attributes#

Once the data is inserted into Jai, we can execute a similarity search to find smartphones that are most similar to the one we want. The search is based on the distance between the vector representations of the smartphones, meaning that a smartphone will be considered to be the most similar to another if the distance between them is smaller than the distance between that smartphone and any other one.

[9]:
# We can execute a similarity search on a specific index, which is the index of the initial pandas dataframe
results = query.similar([856], top_k=5)
results
# Smaller distances between smartphones mean that they are more similar
# The distance of 0.0 is obtained when comparing the smartphone to itself
Similar: 100%|██████████| 1/1 [00:00<00:00, 10.53it/s]
[9]:
[{'query_id': 856,
  'results': [{'id': 856, 'distance': 0.0},
   {'id': 841, 'distance': 0.31705671548843384},
   {'id': 7061, 'distance': 0.6069900989532471},
   {'id': 3739, 'distance': 0.7463890314102173},
   {'id': 2283, 'distance': 0.7998510003089905}]}]
[10]:
# We can also do it on the whole dataframe at once
res = query.similar(df.index, top_k = 5)
# By doing this, we are finding the top 5 most similar smartphones to each smartphone on the initial dataframe
# Result is a list of dictionaries, where each dictionary shows the id of the top 5 results
res[0]
Similar: 100%|██████████| 1/1 [00:00<00:00,  2.77it/s]
[10]:
{'query_id': 4439,
 'results': [{'id': 4439, 'distance': 0.0},
  {'id': 3298, 'distance': 0.7135592699050903},
  {'id': 5684, 'distance': 0.8416876196861267},
  {'id': 8623, 'distance': 0.883758544921875},
  {'id': 6787, 'distance': 0.9160052537918091}]}
[11]:
# Now, we use the IDs we found on the last step to locate the most similar smartphones on the initial dataframe
df.loc[pd.DataFrame(res[0]['results']).id]
[11]:
objectId Model Brand Network TwoG ThreeG FourG Network_Speed GPRS EDGE ... Loud_speaker Audio_jack WLAN Bluetooth GPS NFC Radio USB Sensors Battery
4439 9X9xdNsAcI _M32 Bird GSM GSM 900 / 1800 NaN NaN NaN Class 10 No ... Yes No No No No NaN FM radio (optional) Yes NaN Removable Li-Ion 600 mAh battery
3298 1h3warRdRh _Grand Energy BLU GSM / HSPA GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 HSDPA 850 / 1700(AWS) / 1900 / 2100 - G130U NaN HSPA 21.1/5.76 Mbps Yes Yes ... Yes Yes Wi-Fi 802.11 b/g/n| hotspot 4.0| A2DP Yes with A-GPS NaN NaN microUSB 2.0 Accelerometer| proximity Non-removable Li-Po 4000 mAh battery
5684 WIIDgYKAHo _OT-V212 alcatel GSM GSM 900 / 1800 NaN NaN NaN No No ... Yes No No No No NaN NaN miniUSB NaN Removable Li-Ion 670 mAh battery
8623 QH4GMDv4hA _Zum Bianco Parla GSM GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 NaN NaN NaN No No ... Yes No No 2 No NaN Stereo FM radio miniUSB NaN Removable Li-Ion 600 mAh battery
6787 CtiDBKiMY9 _Snap Plum GSM GSM 850 / 900 / 1800 / 1900 - SIM 1 NaN NaN NaN Class 12 Class 6 ... Yes No No 2.1| A2DP| EDR No NaN Stereo FM radio| recording microUSB NaN Removable Li-Ion 650 mAh battery (BL-4C)

5 rows × 37 columns

previous

Task Examples

next

California Housing Regression

Contents
  • What are we going to do?
  • Importing libraries
  • Dataset quick look
  • Search by product attribute
  • Checking the collections available on Jai
  • Search similar smartphones - by all attributes

By Jai Dev Team

© Copyright 2022, Jai.