# Linear Models#

## Basics#

In this section, we’ll show how to use Jai to train a linear model.

[1]:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

california_housing = fetch_california_housing(as_frame=True)
df = california_housing['frame']

# target is true median value of house per block group
target = california_housing.target

[2]:

X_train, X_test, y_train, y_test = train_test_split(df, target)


## Linear model#

Here it is how to train a Linear Model using the LinearModel Module.

We use scikit-learn’s models in the back end. You can use most of the parameters as describe in the documentation.

Tasks: - regression : LinearRegression - sgd_regression : SGDRegressor - classification : LogisticRegression - sgd_classification: SGDClassifier

[3]:

from jai import LinearModel
model = LinearModel("california_linear",  "regression")
report = model.fit(X_train, y_train, overwrite=True)

[4]:

# After training, the model is ready to be consumed
model.predict(X_test)

[4]:

predict
id
0 1.308
1 0.885
2 2.193
3 3.427
4 3.042
... ...
5155 3.442
5156 0.665
5157 1.487
5158 2.750
5159 2.329

5160 rows × 1 columns

[5]:

# You can improve the model using one new sample
model.learn(X_test.iloc[[0]], y_test.iloc[[0]])

[5]:

{'before': {'MAE': 2.1094237467877974e-14,
'MSE': 4.44966854351227e-28,
'MAPE': 1.6127092865350133e-14},
'after': {'MAE': 2.1094237467877974e-14,
'MSE': 4.44966854351227e-28,
'MAPE': 1.6127092865350133e-14},
'change': True}

[6]:

# Or you can improve the model using multiple new samples
model.learn(X_test.iloc[1:4], y_test.iloc[1:4])

[6]:

{'before': {'MAE': 1.021405182655144e-14,
'MSE': 1.3995707226796117e-28,
'MAPE': 8.507385228034694e-15,
'R2_Score': 1.0},
'after': {'MAE': 1.021405182655144e-14,
'MSE': 1.3995707226796117e-28,
'MAPE': 8.507385228034694e-15,
'R2_Score': 1.0},
'change': True}

[7]:

model.predict(X_test)

[7]:

predict
id
0 1.308
1 0.885
2 2.193
3 3.427
4 3.042
... ...
5155 3.442
5156 0.665
5157 1.487
5158 2.750
5159 2.329

5160 rows × 1 columns

## Pretrained Bases application#

On Jai, you can use previously trained collections and reuse them for new tasks.

So if we had trained a model before, now we can build a linear model using the vectors from that database.

[8]:

from jai import Trainer

# Let's create a collection using only the features of the dataset.
trainer = Trainer("pretrained_california")

trainer.set_parameters(db_type="SelfSupervised")


Recognized fit arguments:
- db_type: SelfSupervised

[9]:

query = trainer.fit(X_train, overwrite=True)

Insert Data: 100%|██████████| 1/1 [00:00<00:00,  1.73it/s]


Recognized fit arguments:
- db_type: SelfSupervised

JAI is working: 100%|██████████|20/20 [02:22]


Setup Report:

Best model at epoch: 63 val_loss: 0.71


Now we’ll build a linear model using that collection.

We’ll first make a mapping to train the new Linear Model.

In this case, we’ll only use one feature, id_california, which is the id value from the previous collection. Each id will correspond to its vector stored in Jai, and those vectors are used to train the linear model.

[10]:

import pandas as pd
df_train = pd.DataFrame(X_train.index, index=X_train.index, columns=["id_california"])

[11]:

model = LinearModel("california_pretrained",  "regression")
report = model.fit(df_train, y_train, pretrained_bases=[{"id_name":"id_california", "db_parent": "pretrained_california"}], overwrite=True)

[12]:

# You can add the new data to the previous collection
trainer.append(X_test)

Insert Data: 100%|██████████| 1/1 [00:00<00:00,  3.75it/s]
JAI is working: 100%|██████████|20/20 [00:02]

[12]:

({0: {'Task': 'Adding new data for tabular setup',
'Status': 'Completed',
'Description': 'Insertion completed.',
'Interrupted': False}},
'Status': 'Running',
'Description': 'Task started to run now',
'Interrupted': False})

[13]:

# And map the ids to make the prediction
df_test = pd.DataFrame(X_test.index, index=X_test.index, columns=["id_california"])
model.predict(df_test)

[13]:

predict
id
0 1.386100
1 0.785691
2 2.185110
3 3.764664
4 3.319453
... ...
5155 4.088492
5156 0.703170
5157 1.311731
5158 2.087266
5159 2.143795

5160 rows × 1 columns

[14]:

# Or, if you don't want to modify the previous collection, you can consume the model using the original raw data
model.predict(X_test)

[14]:

predict
id
0 1.386100
1 0.785691
2 2.185110
3 3.764664
4 3.319453
... ...
5155 4.088492
5156 0.703170
5157 1.311731
5158 2.087266
5159 2.143795

5160 rows × 1 columns