Linear Models#

Basics#

In this section, we’ll show how to use Jai to train a linear model.

[1]:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

california_housing = fetch_california_housing(as_frame=True)
df = california_housing['frame']

# target is true median value of house per block group
target = california_housing.target

[2]:

X_train, X_test, y_train, y_test = train_test_split(df, target)

Linear model#

Here it is how to train a Linear Model using the LinearModel Module.

We use scikit-learn’s models in the back end. You can use most of the parameters as describe in the documentation.

Tasks: - regression : LinearRegression - sgd_regression : SGDRegressor - classification : LogisticRegression - sgd_classification: SGDClassifier

[3]:

from jai import LinearModel
model = LinearModel("california_linear",  "regression")
report = model.fit(X_train, y_train, overwrite=True)

[4]:

# After training, the model is ready to be consumed
model.predict(X_test)

[4]:

	predict
id
0	1.308
1	0.885
2	2.193
3	3.427
4	3.042
...	...
5155	3.442
5156	0.665
5157	1.487
5158	2.750
5159	2.329

5160 rows × 1 columns

[5]:

# You can improve the model using one new sample
model.learn(X_test.iloc[[0]], y_test.iloc[[0]])

[5]:

{'before': {'MAE': 2.1094237467877974e-14,
  'MSE': 4.44966854351227e-28,
  'MAPE': 1.6127092865350133e-14},
 'after': {'MAE': 2.1094237467877974e-14,
  'MSE': 4.44966854351227e-28,
  'MAPE': 1.6127092865350133e-14},
 'change': True}

[6]:

# Or you can improve the model using multiple new samples
model.learn(X_test.iloc[1:4], y_test.iloc[1:4])

[6]:

{'before': {'MAE': 1.021405182655144e-14,
  'MSE': 1.3995707226796117e-28,
  'MAPE': 8.507385228034694e-15,
  'R2_Score': 1.0},
 'after': {'MAE': 1.021405182655144e-14,
  'MSE': 1.3995707226796117e-28,
  'MAPE': 8.507385228034694e-15,
  'R2_Score': 1.0},
 'change': True}

[7]:

model.predict(X_test)

[7]:

	predict
id
0	1.308
1	0.885
2	2.193
3	3.427
4	3.042
...	...
5155	3.442
5156	0.665
5157	1.487
5158	2.750
5159	2.329

5160 rows × 1 columns

Pretrained Bases application#

On Jai, you can use previously trained collections and reuse them for new tasks.

So if we had trained a model before, now we can build a linear model using the vectors from that database.

[8]:

from jai import Trainer

# Let's create a collection using only the features of the dataset.
trainer = Trainer("pretrained_california")

trainer.set_parameters(db_type="SelfSupervised")


Recognized fit arguments:
- db_type: SelfSupervised

[9]:

query = trainer.fit(X_train, overwrite=True)

Insert Data: 100%|██████████| 1/1 [00:00<00:00,  1.73it/s]


Recognized fit arguments:
- db_type: SelfSupervised

JAI is working: 100%|██████████|20/20 [02:22]


Setup Report:

Best model at epoch: 63 val_loss: 0.71

Now we’ll build a linear model using that collection.

We’ll first make a mapping to train the new Linear Model.

In this case, we’ll only use one feature, id_california, which is the id value from the previous collection. Each id will correspond to its vector stored in Jai, and those vectors are used to train the linear model.

[10]:

import pandas as pd
df_train = pd.DataFrame(X_train.index, index=X_train.index, columns=["id_california"])

[11]:

model = LinearModel("california_pretrained",  "regression")
report = model.fit(df_train, y_train, pretrained_bases=[{"id_name":"id_california", "db_parent": "pretrained_california"}], overwrite=True)

[12]:

# You can add the new data to the previous collection
trainer.append(X_test)

Insert Data: 100%|██████████| 1/1 [00:00<00:00,  3.75it/s]
JAI is working: 100%|██████████|20/20 [00:02]

[12]:

({0: {'Task': 'Adding new data for tabular setup',
   'Status': 'Completed',
   'Description': 'Insertion completed.',
   'Interrupted': False}},
 {'Task': 'Adding new data to database',
  'Status': 'Running',
  'Description': 'Task started to run now',
  'Interrupted': False})

[13]:

# And map the ids to make the prediction
df_test = pd.DataFrame(X_test.index, index=X_test.index, columns=["id_california"])
model.predict(df_test)

[13]:

	predict
id
0	1.386100
1	0.785691
2	2.185110
3	3.764664
4	3.319453
...	...
5155	4.088492
5156	0.703170
5157	1.311731
5158	2.087266
5159	2.143795

5160 rows × 1 columns

[14]:

# Or, if you don't want to modify the previous collection, you can consume the model using the original raw data
model.predict(X_test)

[14]:

	predict
id
0	1.386100
1	0.785691
2	2.185110
3	3.764664
4	3.319453
...	...
5155	4.088492
5156	0.703170
5157	1.311731
5158	2.087266
5159	2.143795

5160 rows × 1 columns

Linear Models

Contents

Linear Models#

Basics#

Linear model#

Pretrained Bases application#