Written by

BPLus Tecnologia
Article André Dienes Friedrich · 3 hr ago 9m read

Real-Time Solar Irradiance Forecasting with Deep Learning on InterSystems IRIS: A Streaming Analytics ApproachContestant

Abstract

Solar irradiance forecasting is critical for grid stability in photovoltaic (PV) power plants. This article replicates and extends the methodology of Lara-Benítez et al. (2023) "Short-term solar irradiance forecasting in streaming with deep learning" replacing the original offline simulation with a fully operational streaming pipeline built on InterSystems IRIS. We leverage IRIS Interoperability Productions as the streaming backbone, Embedded Python to run MLP, LSTM, and CNN deep learning models, and IntegratedML as an AutoML baseline. The result is a production-grade, self-updating forecasting system that ingests real 5-minute PV plant telemetry, predicts the next irradiance interval, and continuously adapts to concept drift all within a single IRIS namespace.

This is a model that I applied in my master's thesis; here, I only use a portion of the data to exemplify how it works.


1. Introduction

The global expansion of solar photovoltaic capacity creates an operational challenge: solar generation is inherently volatile. A cloud passing over a field of panels can cause power output to drop by 60% or more in under two minutes. Grid operators and plant owners need short-term irradiance predictions typically 5 to 30 minutes ahead to balance load, schedule spinning reserves, and maximize energy sales.

The academic community has responded with increasingly sophisticated machine learning approaches. The work of Lara-Benítez et al. (2023) stands out by addressing the problem in a data streaming setting, where the model must predict and retrain continuously as new measurements arrive. Their ADLStream framework (Asynchronous Dual-Pipeline Deep Learning) separates the training and inference pipelines into concurrent processes, enabling deep learning models to run in near-real-time without the inference process waiting for retraining.

In production PV plants, however, the data does not arrive as a lab simulation. It flows through SCADA systems, monitoring APIs, and proprietary protocols into structured databases. In our case, an InterSystems IRIS instance receives 5-minute telemetry from inverters and meteorological stations for each plant. IRIS is already the operational data platform; the natural question is: can we build the entire streaming forecasting pipeline inside IRIS, end-to-end?

This article demonstrates that the answer is yes. We implement:

  1. A SQL-native feature engineering layer that assembles sliding-window feature matrices from IRIS tables.
  2. An IRIS Interoperability Production that replicates the ADLStream dual-pipeline concept using concurrent Business Services and Operations.
  3. Embedded Python deep learning models (MLP, LSTM, CNN) trained and updated incrementally within IRIS.
  4. An IntegratedML baseline using IRIS's built-in AutoML to contextualise the custom model results.

2. Data Architecture in IRIS

2.1 Source Tables

The monitoring system feeds two key tables every 5 minutes via a Data Transformation Layer (DTL):

Inverter Table — CONSULTAHRINVERSOR

Column Description Unit
PSID Plant identifier
TOTALACTIVEPOWER AC power delivered to grid W
TOTALDCPOWER DC power from PV strings W
TODAYYIELD Cumulative energy today Wh
MPPT1VOLTAGE DC bus voltage V
ABLINEVOLTAGE / BCLINEVOLTAGE / CALINEVOLTAGE Line-to-line voltages V
INTERNALAIRTEMPERATURE Inverter internal temperature °C
GRIDFREQUENCY Grid frequency Hz
DEVICETIME Measurement timestamp DATETIME

Meteorological Station Table — CONSULTAHRMETEOROLOGIA

Column Description Unit
SLOPETRANSIENTIRRADIATION Plane-of-array (POA) irradiance W/m²
TRANSIENTHORIZONTALIRRADIATION Global horizontal irradiance (GHI) W/m²
DAILYHORIZONTALIRRADIATION Cumulative GHI today Wh/m²
SLOPEDAILYIRRADIATION Cumulative POA today Wh/m²
AMBIENTTEMPERATURE Ambient temperature °C
TEMPPVMODULE PV module back-sheet temperature °C
WINDSPEED Wind speed m/s
DEVICETIME Measurement timestamp DATETIME

2.2 Data Access Pattern

All queries follow the established pattern in our namespace:

objectscript
Set sql = "SELECT TOP 36 m.DEVICETIME, m.SLOPETRANSIENTIRRADIATION, "
        _ "m.TRANSIENTHORIZONTALIRRADIATION, m.AMBIENTTEMPERATURE, "
        _ "m.TEMPPVMODULE, m.WINDSPEED, i.TOTALACTIVEPOWER, i.TOTALDCPOWER "
        _ "FROM CONSULTAHRMETEOROLOGIA m "
        _ "JOIN CONSULTAHRINVERSOR i ON m.PSID = i.PSID "
        _ "  AND m.DEVICETIME = i.DEVICETIME "
        _ "WHERE m.PSID = ? "
        _ "ORDER BY m.DEVICETIME DESC"
 Set tStmt = ##class(%SQL.Statement).%New()
Set tSC   = tStmt.%Prepare(sql)
Set result = tStmt.%Execute(psid)   // result is a %SQL.StatementResult

The TOP 36 retrieves 3 hours of 5-minute data (36 × 5 min = 180 min), forming the past-history window analogous to the 3-minute window used in the original paper.


3. System Architecture

The overall system is structured as an IRIS Interoperability Production with three lanes running concurrently, mirroring the ADLStream dual-pipeline concept:

┌─────────────────────────────────────────────────────────────────────┐
│                    IRIS Interoperability Production                   │
│                                                                       │
│  ┌──────────────────┐    ┌──────────────────┐   ┌────────────────┐  │
│  │  Business Service│    │ Business Process  │   │Business Oper.  │  │
│  │                  │    │                   │   │                │  │
│  │  SolarDataPoller │───▶│ FeatureEngineer   │──▶│ MLInference    │  │
│  │  (every 5 min)   │    │ (sliding window   │   │ (predict next  │  │
│  │                  │    │  + normalization) │   │  interval)     │  │
│  └──────────────────┘    └──────────────────┘   └────────┬───────┘  │
│                                    │                       │          │
│  ┌──────────────────────────────── │ ─────────────────────┘          │
│  │                                 ▼                                  │
│  │   ┌──────────────────┐    ┌──────────────────┐                   │
│  │   │  TrainingBuffer  │    │  ResultStore     │                   │
│  │   │  (async queue)   │───▶│  (IRIS globals   │                   │
│  │   │                  │    │   + SQL table)   │                   │
│  │   └──────────────────┘    └──────────────────┘                   │
│   ─ ─ ─ Async Training Pipeline ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  │
└─────────────────────────────────────────────────────────────────────┘

Key design choices:

  • The inference pipeline (SolarDataPoller → FeatureEngineer → MLInference) runs synchronously every 5 minutes.
  • The training pipeline receives batches asynchronously via an IRIS persistent queue, retrain a dedicated model copy, and atomically swaps model weights — exactly the ADLStream pattern.
  • Model weights are serialized to IRIS globals (^solar.model.weights) allowing instant warm-restart after system reboot.

4. Evaluation Methodology

Following the paper exactly, we use the prequential MAE with a decay factor α = 0.99 (equation 1). This gives more weight to recent predictions, making the metric sensitive to concept drift:

Pa(i) = Sa(i) / Ba(i)
Sa(i) = |y_i - ŷ_i| + α · Sa(i-1)
Ba(i) = 1 + α · Ba(i-1)

A dedicated ObjectScript method computes this over the ^solar.predictions global:

objectscript
ClassMethod ComputePrequentialMAE(pPSID As %String, pAlpha As %Float = 0.99) As %Float
{
    Set tSa = 0, tBa = 0
    Set tKey = ""
    For {
        Set tKey = $Order(^solar.predictions(pPSID, tKey))
        Quit:tKey=""
        Set tPred   = ^solar.predictions(pPSID, tKey, "predicted")
        Set tActual = ^solar.predictions(pPSID, tKey, "actual")
        Set tLoss   = $ZABS(tPred - tActual)
        Set tSa = tLoss + (pAlpha * tSa)
        Set tBa = 1     + (pAlpha * tBa)
    }
    Return $Select(tBa > 0: tSa / tBa, 1: 0)
}

5. Results

After running the pipeline for one week on plant PSID 1543287 with data from May 2026, the models converged to the following prequential MAE values (irradiance in W/m²):

Model Prequential MAE (W/m²) MAPE (%) Convergence Steps
CNN 42.1 4.8 ~120
MLP 51.3 5.9 ~90
LSTM 67.4 7.7 ~150
IntegratedML (AutoML) 88.6 10.2 — (batch)

These results align qualitatively with the paper's findings:

  • CNN leads on periods with sharp irradiance transitions (analogous to the paper's Very Variable dataset).
  • MLP converges faster and competes closely during stable production hours.
  • LSTM struggles more in this 5-minute domain than in the paper's sub-second setting, likely because the longer interval reduces the "sequence density" that LSTM benefits from.
  • IntegratedML, while easier to deploy, underperforms the custom streaming models since it trains offline without incremental updates.

Critically, the online learning advantage is evident: after 200 steps, the CNN's prequential MAE dropped from an initial 180 W/m² to under 45 W/m², demonstrating continuous adaptation to the plant's specific irradiance patterns and seasonal dynamics.


6. IRIS as a Streaming AI Platform

This implementation demonstrates several IRIS capabilities working in concert:

Interoperability Productions as ADLStream: The dual-pipeline concept maps cleanly to IRIS's synchronous/asynchronous operation model. The Production's built-in queue persistence means the training pipeline is durable if the process restarts, pending training batches are not lost.

IRIS Globals for Model State: Serializing model weights to ^solar.model.weights provides instant warm-restart without an external model registry. The global's ACID semantics ensure that the atomic weight swap (train → swap) never leaves the inference pipeline with an inconsistent model.

Embedded Python without process boundaries: Unlike architectures where Python is a microservice called over HTTP, IRIS Embedded Python runs in the same process. The in-process _FORECASTERS cache persists across calls, avoiding model deserialization on every prediction a critical performance advantage at 5-minute cadence across tens of plants.

IntegratedML as a Governance Layer: For regulated environments, IntegratedML offers an auditable, SQL-native model that non-ML engineers can inspect, retrain, and monitor via standard IRIS tools without touching Python code.


7. Conclusions and Future Work

We have demonstrated a complete, production-grade replication of the ADLStream streaming forecasting methodology using InterSystems IRIS as the sole operational platform. The architecture achieves:

  • Real-time adaptation: CNN prequential MAE converges to ~42 W/m² within ~120 steps on real plant data.
  • Zero external dependencies: training, inference, persistence, and orchestration run entirely within IRIS.
  • Operational simplicity: a single Production configuration manages multi-plant forecasting with per-plant model isolation.

Future directions include:

  1. Spatial multi-sensor fusion: Our plant has a single meteorological station per inverter cluster. Adding inter-inverter correlation features (as the paper's multi-sensor grid approach showed) could further reduce MAE.
  2. CNN-LSTM hybrid: The paper suggests this as future work; IRIS Embedded Python makes the architecture trivial to add.
  3. FHIR-inspired telemetry standardisation: Applying interoperability patterns (analogous to FHIR resource modeling) to PV plant telemetry could enable cross-plant model transfer.
  4. IRIS Analytics dashboards: Connecting the ^solar.predictions global to DeepSee/Analytics for real-time MAE trending and drift detection.

References

  1. Lara-Benítez, P., Carranza-García, M., Luna-Romera, J. M., & Riquelme, J. C. (2023). Short-term solar irradiance forecasting in streaming with deep learning. Applied Intelligence.
  2. Lara-Benítez, P., et al. (2020). Asynchronous dual-pipeline deep learning framework for online data stream classification. Integrated Computer-Aided Engineering, 27(2), 101–119.
  3. InterSystems Corporation. (2024). IRIS IntegratedML Guide. https://docs.intersystems.com
  4. InterSystems Corporation. (2024). Embedded Python in IRIS. https://docs.intersystems.com
  5. Vaswani, A., et al. (2017). Attention is all you need. NeurIPS 30.
  6. ADLStream Python Library: https://github.com/pedrolarben/ADLStream