How to Train a Prophet Time-Series Forecasting Model in OML4Py

Architecture flow showing how to train a Prophet time-series model in an Oracle Database using OML4Py.
Welcome back! In Part 1, we mastered data ingestion, successfully converting local Excel spreadsheet data into a pristine, ready-to-use Oracle table named ONLINE_RETAIL.

But a table full of transactional data isn’t a forecast. It’s just history. To predict the future, we need to train a machine learning model to learn from past patterns. In this post, we are going to dive into time series forecasting with OML4Py by doing three major things:

  1. Transform our raw data into a clear daily sales trend from thousands of individual sales records.
  2. Execute a Pandas train-test split in Oracle SQL to evaluate our model’s accuracy on unseen data.
  3. Train a Prophet model in the Oracle database and securely save the resulting model to the OML Datastore.

If you are new to this, don’t worry. We’ll try to break down every single piece of logic in the simplest way possible.

 

Step 1: Aggregate Daily Sales and Create a Train/Test Split

Before training any machine learning model, the data needs to be shaped. Our current table (ONLINE_RETAIL) lists individual purchases. But the Prophet forecasting library doesn’t care about who bought what at 10:03 AM. It cares about how much was sold in total on that day. For time series, this means two crucial things

  • Time Series Format: We must aggregate daily sales and rename the columns to the Prophet library standard: ds (for date) and y (for value).
  • Train/Test Split: We designate the last 30 days of data as the “Test Set”. The rest becomes the “Train Set”

 

Defining the Python Scripts with sys.pyqScriptCreate

We define two separate Python functions using sys.pyqScriptCreate. Both scripts load the same raw data, but the final filtering logic sets them apart. The get_train_data script calculates the SalesTotal (Quantity * UnitPrice), aggregates it by date, finds the cutoff date (the last 30 days), and then filters to keep everything BEFORE the cutoff

 

Script A: Getting the Training Data

BEGIN

sys.pyqScriptCreate(‘get_train_data’,

‘def get_train_data(df_final):

import pandas as pd

# 1. Calculate Total Sales per transaction

# We multiply Quantity by UnitPrice to get the total cash value

df_final[“SalesTotal”] = df_final[“Quantity”] * df_final[“UnitPrice”]

 

# 2. Group by Date

# We squash all transactions from one day into a single number

daily_sales = df_final.groupby(df_final[“InvoiceDate”].dt.date)[“SalesTotal”].sum().reset_index()

 

# 3. Rename columns for Prophet

# Prophet assumes the date column is “ds” and the value is “y”

daily_sales.columns = [“ds”, “y”]

daily_sales[“ds”] = pd.to_datetime(daily_sales[“ds”])

 

# 4. The “Cutoff” Logic

# We decide that the last 30 days are for testing.

test_period_days = 30

cutoff_date = daily_sales[“ds”].max() – pd.Timedelta(days=test_period_days)

 

# 5. Filter: Keep only dates BEFORE the cutoff

train_df = daily_sales[daily_sales[“ds”] < cutoff_date]

 

return train_df’,

FALSE, TRUE);

END;

/

 

Script B: Getting the Test Data

The (get_test_data) script is nearly identical, but the final filter is reversed—it keeps only the data ON or AFTER the cutoff date. This is the data the model will predict against later

 

BEGIN

sys.pyqScriptCreate(‘get_test_data’,

‘def get_test_data(df_final):

import pandas as pd

 

# 1. Calculate SalesTotal (Using all-caps column names)

df_final[“SalesTotal”] = df_final[“Quantity”] * df_final[“UnitPrice”]

 

# 2. Aggregate Data to Daily Sales

daily_sales = df_final.groupby(df_final[“InvoiceDate”].dt.date)[“SalesTotal”].sum().reset_index()

daily_sales.columns = [“ds”, “y”]

daily_sales[“ds”] = pd.to_datetime(daily_sales[“ds”])

 

# 3. Define the Cutoff Date

test_period_days = 30

cutoff_date = daily_sales[“ds”].max() – pd.Timedelta(days=test_period_days)

 

# 4. Filter for the Testing Set

test_df = daily_sales[daily_sales[“ds”] >= cutoff_date]

 

return test_df’,

FALSE, TRUE);

END;

/

 

Executing pyqTableEval to Save the Split Tables in Oracle

Now we execute those scripts. Unlike Part 1 where we used pyqEval (which creates data from scratch), here we use pyqTableEval. it is used when you want to feed an existing Oracle table (ONLINE_RETAIL) into your Python script as a DataFrame.

 

— Create the Training Table

CREATE TABLE SALES_TRAIN AS

(SELECT * FROM pyqTableEval(

‘ONLINE_RETAIL’,  — The Input Table

NULL,

‘{“ds”:”TIMESTAMP”, “y”:”NUMBER(15, 2)”}’, — The Output Format

‘get_train_data’  — The Script Name

));

SQL code using pyqTableEval to execute a Pandas train/test split natively in the Oracle Database.

 

— Create the Testing Table

CREATE TABLE SALES_TEST AS

(SELECT * FROM pyqTableEval(

‘ONLINE_RETAIL’,

NULL,

‘{“ds”:”TIMESTAMP”, “y”:”NUMBER(15, 2)”}’,

‘get_test_data’

));

SQL code using pyqTableEval to execute a Pandas train/test split natively in the Oracle Database.

 

Step 2: Train the Prophet Model in Oracle Database

With our clean SALES_TRAIN table ready, we are now ready for the core ML process which is training the model and making sure we save the result. We are going to feed the data in SALES_TRAIN table into the Prophet algorithm.

When you train a model, you are creating a complex mathematical object. If you don’t save it, it disappears when the script ends. OML4Py allows us to save this object into a “Datastore”—think of it like a secure digital locker inside the database.

The script, train_prophet_sales, takes the training data, initializes the Prophet model with seasonality parameters (yearly and weekly patterns are common for sales data), and calls the crucial .fit(dat) method to start the learning process

 

How to Save a Python Model in the Oracle OML Datastore

BEGIN

sys.pyqScriptCreate(‘train_prophet_sales’,

‘def fit_prophet_model(dat, modelName, datastoreName):

import oml

from prophet import Prophet

import pandas as pd

 

# Prophet requires lowercase columns (ds, y)

dat.columns = [col.lower() for col in dat.columns]

 

# 1. Initialize the Prophet Model

# We tell it to look for yearly and weekly patterns.

model = Prophet(yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False)

 

# 2. Fit the model (This is where the learning happens!)

model.fit(dat)

 

# 3. SAVE THE MODEL to the Datastore

# This is critical. We store the “model” object under the name “modelName”

oml.ds.save(objs={modelName: model}, name=datastoreName, overwrite=True)

 

# Return a status message so we know it finished

return pd.DataFrame({“MODEL_NAME”: [modelName], “STATUS”: [“TRAINED”]})’,

FALSE, TRUE);

END;

/

 

Executing the Training Script using oml_connect:1

We execute the training script by feeding it the SALES_TRAIN database table. The most important part of this execution is passing the parameter “oml_connect”:1.

You might be wondering Why oml_connect:1? By default, Python scripts runs in a sandbox. Setting this flag to 1 gives the script permission to talk to the Oracle Database’s Datastore which is essentially a secure storage area for Python objects so it can save your model inside the database .

We run this using pyqTableEval again, but look closely at the parameters we pass in the JSON string.

 

SELECT *

FROM table(pyqTableEval(

‘SALES_TRAIN’, — Feed it the training data

— The Parameters:

— 1. Name the model “prophet_sales_forecast_model”

— 2. Name the locker (datastore) “prophet_sales_store”

— 3. “oml_connect”:1 <- This is the key that opens the locker!

‘{“modelName”:”prophet_sales_forecast_model”, “datastoreName”:”prophet_sales_store”, “oml_connect”:1}’,

‘{“MODEL_NAME”:”VARCHAR2(50)”, “STATUS”:”VARCHAR2(10)”}’,

‘train_prophet_sales’

));

 

If you ran the code above and got a “STATUS: TRAINED” message, congratulations! You have officially created a machine learning model (prophet_sales_forecast_model) and saved it securely using the OML Datastore. Right now, your model is sitting natively inside the database, waiting for instructions.

By completing this tutorial, you now know how to aggregate transactional data, leverage pyqTableEval, and train a Prophet model in an Oracle database without ever exporting the data locally.

In the third and final part of this series, we will retrieve this saved model and put it to work! We’ll score the hidden SALES_TEST data to evaluate its accuracy and then generate a highly anticipated 90-day future sales forecast.

If you require any assistance regarding AI integrations, DevOps solutions, or optimizing your Oracle APEX environment, please feel free to contact us at MaxAPEX.

See you in Part 3!

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

What to read next