Smartphone Similarity Search#
What are we going to do?#
In this quick demo, we will use JAI to:
Train and deploy models into a secure and scalable production-ready environment.
Complete a similarity search - Given a database of smartphone models and its attributes, identify which smartphones are most similar to the target.
Importing libraries#
[6]:
import pandas as pd
from jai import Jai
JAI Auth Key#
If you don’t already have an auth key, you can get your auth key here - free forever. Also, please make sure to check your spam folder if you can’t find it in your inbox!
[ ]:
from jai import get_auth_key
get_auth_key(email = 'email@emailnet.com', firstName = 'JAI', lastName = 'Z')
<Response [201]>
Dataset quick look#
This dataset contains data related to the specs of different models of smartphones. Some of the analyzed specs are, for example, dimensions, display type, model and brand. In this example, we’re going to identify which cellphones are most similar to one another based on their sspecs.
[7]:
# Obtaining the data from website
file_url = 'https://myceliademo.blob.core.windows.net/smartphone-dataset/smartphones_sample_db.csv?sp=rl&st=2021-05-17T16:30:09Z&se=2025-01-18T16:30:00Z&sv=2020-02-10&sr=b&sig=6LeB2OPLM33LXiPaaQ6LvO00mt0MMfgczKtt92AJvMU%3D'
df = pd.read_csv(file_url,index_col=0)
[8]:
# Show name of columns and non-null count
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 4439 to 2276
Data columns (total 37 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 objectId 1000 non-null object
1 Model 999 non-null object
2 Brand 999 non-null object
3 Network 999 non-null object
4 TwoG 970 non-null object
5 ThreeG 571 non-null object
6 FourG 199 non-null object
7 Network_Speed 574 non-null object
8 GPRS 996 non-null object
9 EDGE 997 non-null object
10 Announced 998 non-null object
11 Status 999 non-null object
12 Dimensions 998 non-null object
13 field13 69 non-null object
14 SIM 999 non-null object
15 Display_type 999 non-null object
16 Display_resolution 874 non-null object
17 Display_size 995 non-null object
18 Operating_System 569 non-null object
19 CPU 561 non-null object
20 Chipset 419 non-null object
21 GPU 409 non-null object
22 Memory_card 999 non-null object
23 Internal_memory 791 non-null object
24 RAM 543 non-null object
25 Primary_camera 861 non-null object
26 Secondary_camera 859 non-null object
27 Loud_speaker 999 non-null object
28 Audio_jack 996 non-null object
29 WLAN 998 non-null object
30 Bluetooth 999 non-null object
31 GPS 998 non-null object
32 NFC 101 non-null object
33 Radio 984 non-null object
34 USB 902 non-null object
35 Sensors 551 non-null object
36 Battery 999 non-null object
dtypes: object(37)
memory usage: 296.9+ KB
[9]:
# Showing first 5 rows of dataframe
df.head()
[9]:
objectId | Model | Brand | Network | TwoG | ThreeG | FourG | Network_Speed | GPRS | EDGE | ... | Loud_speaker | Audio_jack | WLAN | Bluetooth | GPS | NFC | Radio | USB | Sensors | Battery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4439 | 9X9xdNsAcI | _M32 | Bird | GSM | GSM 900 / 1800 | NaN | NaN | NaN | Class 10 | No | ... | Yes | No | No | No | No | NaN | FM radio (optional) | Yes | NaN | Removable Li-Ion 600 mAh battery |
2922 | uWQacAjFLE | _Galaxy J3 Pro | Samsung | GSM / CDMA / HSPA / EVDO / LTE | GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 | HSDPA 850 / 900 / 1900 / 2100 | LTE band 1(2100)| 3(1800)| 7(2600)| 41(2500) | HSPA 42.2/5.76 Mbps LTE Cat4 150/50 Mbps | Yes | Yes | ... | Yes | Yes | Wi-Fi 802.11 b/g/n| Wi-Fi Direct| hotspot | 4.1| A2DP | Yes with A-GPS GLONASS BDS | Yes | FM radio| RDS| recording | microUSB 2.0| USB On-The-Go | Accelerometer| proximity | Removable Li-Ion 2600 mAh battery |
4071 | D60tQhWyLw | _KP501 Cookie | LG | GSM | GSM 850 / 900 / 1800 / 1900 | NaN | NaN | NaN | Class 10 | Class 10 | ... | Yes | No | No | 2.1| A2DP | No | NaN | Stereo FM radio| RDS | 2 | Accelerometer | Removable Li-Ion 900 mAh battery |
3689 | JgAmFNzZGT | _Iconic Phablet | ZTE | CDMA / EVDO / LTE | CDMA 800 / 1900 | CDMA2000 1xEV-DO | LTE | EV-DO Rev.A 3.1 Mbps LTE | Yes | Yes | ... | Yes | Yes | Wi-Fi 802.11 b/g/n| hotspot | 4.0| A2DP | Yes with A-GPS | NaN | To be confirmed | microUSB 2.0 | Accelerometer| proximity | Non-removable Li-Ion 3200 mAh battery |
3882 | taHMJkJclc | _J501 | Asus | GSM | GSM 900 / 1800 / 1900 | NaN | NaN | NaN | Class 12 | No | ... | No | No | No | 1.2| A2DP | No | NaN | FM radio | 1.1 | NaN | Removable Li-Ion 750 mAh battery |
5 rows × 37 columns
Search by product attribute#
Now we’re going to execute a similarity search by product attribute. To do so, we first need to insert the smartphone dataframe into jai. The method used to send data to Jai is j.setup (or j.fit; they are the same), which can then be consumed through the methods j.similar and j.predict.
[10]:
# Instantiate Jai class
j = Jai()
# Using jai setup with self-supervised learning
# The raw data is inserted into jai, which turns the data into a latent vector collection that is then stored in a database
j.setup(
name = 'smartphones',
verbose = 2,
data = df,
db_type = 'SelfSupervised',
overwrite = True
)
Insert Data: 100%|██████████| 1/1 [00:00<00:00, 3.08it/s]
Training might finish early due to early stopping criteria.
Recognized setup args:
- db_type: SelfSupervised
JAI is working: 100%|██████████|20/20 [00:10]

Setup Report:
Best model at epoch: 19 val_loss: 0.07
[10]:
({0: {'Task': 'Adding new data for tabular setup',
'Status': 'Completed',
'Description': 'Insertion completed.',
'Interrupted': False}},
{'Task': 'Training Model',
'Status': 'Job Created',
'Description': 'Check status after some time!',
'kwargs': {'db_type': '"SelfSupervised"'}})
Checking the collections available on Jai#
[11]:
# Verifying databases that have already been created for the used AuthKey
j.names
[11]:
['california_housing', 'census', 'smartphones']
[12]:
j.info
# Gets name, type and other information about each created database
[12]:
name | type | last modified | dependencies | size | embedding_dimension | |
---|---|---|---|---|---|---|
2 | california_housing | Supervised | 2022-06-20-14h39 | [] | 20640 | 64 |
1 | census | Supervised | 2022-06-20-14h41 | [] | 32561 | 64 |
0 | smartphones | SelfSupervised | 2022-06-20-14h43 | [] | 1000 | 64 |
Search similar smartphones - by all attributes#
Once the data is inserted into Jai, we can execute a similarity search to find smartphones that are most similar to the one we want. The search is based on the distance between the vector representations of the smartphones, meaning that a smartphone will be considered to be the most similar to another if the distance between them is smaller than the distance between that smartphone and any other one.
[13]:
# We can execute a similarity search on a specific index, which is the index of the initial pandas dataframe
results = j.similar('smartphones', [856], top_k=5)
results
# Smaller distances between smartphones mean that they are more similar
# The distance of 0.0 is obtained when comparing the smartphone to itself
Similar: 100%|██████████| 1/1 [00:00<00:00, 2.85it/s]
[13]:
[{'query_id': 856,
'results': [{'id': 856, 'distance': 0.0},
{'id': 1978, 'distance': 0.6144497394561768},
{'id': 7040, 'distance': 0.7259373664855957},
{'id': 1597, 'distance': 0.7516287565231323},
{'id': 6476, 'distance': 0.7594107985496521}]}]
[14]:
# We can also do it on the whole dataframe at once
res = j.similar('smartphones', df.index, top_k = 5)
# By doing this, we are finding the top 5 most similar smartphones to each smartphone on the initial dataframe
# Result is a list of dictionaries, where each dictionary shows the id of the top 5 results
res[0]
Similar: 100%|██████████| 1/1 [00:01<00:00, 1.84s/it]
[14]:
{'query_id': 4439,
'results': [{'id': 4439, 'distance': 0.0},
{'id': 4490, 'distance': 0.5862273573875427},
{'id': 8627, 'distance': 0.6433966159820557},
{'id': 1483, 'distance': 0.7437235713005066},
{'id': 6375, 'distance': 0.7853354215621948}]}
[15]:
# Now, we use the IDs we found on the last step to locate the most similar smartphones on the initial dataframe
df.loc[pd.DataFrame(res[0]['results']).id]
[15]:
objectId | Model | Brand | Network | TwoG | ThreeG | FourG | Network_Speed | GPRS | EDGE | ... | Loud_speaker | Audio_jack | WLAN | Bluetooth | GPS | NFC | Radio | USB | Sensors | Battery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4439 | 9X9xdNsAcI | _M32 | Bird | GSM | GSM 900 / 1800 | NaN | NaN | NaN | Class 10 | No | ... | Yes | No | No | No | No | NaN | FM radio (optional) | Yes | NaN | Removable Li-Ion 600 mAh battery |
4490 | vxtdVQdFMg | _M580 | BenQ | GSM | GSM 900 / 1800 / 1900 | NaN | NaN | NaN | Class 10 | No | ... | No | No | No | 1.2 | No | NaN | No | No | NaN | Removable Li-Ion 870 mAh battery |
8627 | DJy3DEvEue | _Zum | Parla | GSM | GSM 850 / 900 / 1800 / 1900 | NaN | NaN | NaN | No | No | ... | Yes | No | No | No | No | NaN | Stereo FM radio | miniUSB | NaN | Removable Li-Ion 700 mAh battery |
1483 | 7VWSVUS0lt | _C30 | BenQ | GSM | GSM 900 / 1800 | NaN | NaN | NaN | Class 12 | No | ... | Yes | No | No | No | No | NaN | FM radio | 1.1 | NaN | Removable Li-Ion 650 mAh battery |
6375 | U5MelmKcdH | _S1600 | Kyocera | GSM | GSM 850 / 1900 | NaN | NaN | NaN | Class 10 | No | ... | Yes | No | No | No | No | NaN | No | microUSB | NaN | Removable Li-Ion 650 mAh battery |
5 rows × 37 columns
Requests via REST API#
[ ]:
import requests
[ ]:
url = 'https://mycelia.azure-api.net/similar/id/smartphones?id=841&top_k=5'
auth_header = {'Auth': "INSERT_YOUR_AUTH_KEY_HERE"}
[ ]:
response = requests.get(url, headers=auth_header)
response.json()
{'similarity': [{'query_id': 841,
'results': [{'distance': 0.0, 'id': 841},
{'distance': 0.5942456126213074, 'id': 856},
{'distance': 0.7880990505218506, 'id': 3885},
{'distance': 0.8442420959472656, 'id': 1597},
{'distance': 0.895087480545044, 'id': 2531}]}]}