Privacy-Preserving Transfer Learning Framework for Building Energy Forecasting with Fully Anonymized Data

A novel framework that enables effective transfer learning using exclusively anonymized time-series data, achieving median MSE reductions of 27–31% across 89 real-world buildings while requiring only 0.51% of federated learning's communication bandwidth.

Abstract

AI-driven forecasting offers a promising solution for optimal building energy control, yet is constrained by scarce labeled data and strict privacy regulations. While transfer learning (TL) can alleviate data scarcity by leveraging data from other buildings, conventional TL relies on metadata unavailable in fully anonymized datasets. We propose a Privacy-Preserving Transfer Learning (PPTL) framework that overcomes this deadlock by learning similarity directly from anonymized time-series dynamics.

Using an unsupervised contrastive encoder, the framework maps each building’s dynamics to high-dimensional representation vectors learned solely from temporal patterns. Cosine distance between representations guides source selection to pretrain a lightweight forecaster, which is then fine-tuned on limited target data.

Leave-one-out experiments on 89 real-world buildings validate that learned similarity strongly correlates with transfer performance: models pretrained on highly similar sources achieve median MSE reductions of 27–31%, peaking at 31% with optimal configurations, compared to target-only baselines. The framework improves forecasting in 99.2% of configurations (353 of 356), with only three instances showing marginal degradation (maximum 2.2%).


Motivation: The Privacy–Performance Deadlock

Building energy efficiency is a critical mandate for global decarbonization—buildings account for about 37% of CO₂ emissions. AI has emerged as a powerful tool for energy forecasting, but the building sector faces two structural barriers:

  1. Heterogeneity — Each building is a unique system defined by distinct materials, form, usage patterns, and microclimates, resisting one-size-fits-all modeling.
  2. Privacy — Energy patterns reveal occupancy behaviors and business operations. Regulations like GDPR strictly restrict data sharing.

These barriers create a paradox: heterogeneity demands diverse training data from many buildings, yet privacy prevents the data aggregation needed to compile such datasets. Transfer learning could bridge this gap, but conventional methods rely on metadata (building type, size, climate zone) that anonymization removes—creating a methodological deadlock.


The PPTL Framework

Our framework shifts the paradigm from metadata-based heuristics to data-native learned similarity. Three modular components work in sequence:

Framework schematic of the PPTL system depicting the interaction between the encoder, transfer strategy controller, and forecaster.
Framework schematic of the PPTL system depicting the interaction between the encoder, transfer strategy controller, and forecaster.

1. TS2Vec Encoder — Unsupervised Contrastive Learning

The encoder employs TS2Vec, a time-series contrastive learning model that learns representations from unlabeled, anonymized data. Unlike image-based contrastive learning, TS2Vec uses contextual consistency: the representation of a timestamp must remain consistent regardless of the temporal window from which it is viewed.

This approach captures temporal dependencies and operational logic—diurnal cycling, seasonal periodicity, load-shape dynamics—without requiring metadata or manual feature engineering.

2. TiDE Forecaster — Lightweight and Efficient

The forecasting module uses TiDE (Time-series Dense Encoder), an MLP-based encoder-decoder model that scales linearly O(L)O(L) and supports full parallel computation.

Schematic of the TiDE architecture showing the dense encoder-decoder structure with residual connections.
Schematic of the TiDE architecture showing the dense encoder-decoder structure with residual connections.

TiDE processes historical target features, static features (anonymized building index), and nontarget covariates (weather, time indicators) through a dense encoder-decoder architecture with residual connections and dropout.

3. Strategy Controller — Source Selection

The controller orchestrates similarity-based source selection using cosine distance in the learned representation space, then manages a two-stage learning process:

  1. Pretraining on top-ranked source buildings
  2. Full fine-tuning on limited target data

Workflow

The PPTL framework follows a systematic four-step pipeline:

Four-step execution workflow: unsupervised learning, source selection, pretraining, and fine-tuning.
Four-step execution workflow: unsupervised learning, source selection, pretraining, and fine-tuning.

Step 1. Train the TS2Vec encoder on anonymized source data to construct the latent representation space.

Step 2. Generate representations for source and target buildings, then rank sources by cosine distance.

Step 3. Pretrain the TiDE forecaster on the most similar source datasets.

Step 4. Fine-tune the pretrained model on limited target data to produce the final forecaster.


Dataset

The framework is validated using the Cambridge University Estates building energy archive—24 years of hourly electricity usage from ~120 fully anonymized buildings including lecture halls, offices, laboratories, and museums. All building information is completely anonymized: each building is identified only by a randomized numerical index.

We curate a 16-month interval with 89 gap-free buildings. The first 14 months are used for model development and the final 2 months for testing.


Results

Learned Similarity Captures Operational Patterns

Validation of the learned representation space showing 2D latent-space visualization and normalized electricity profiles for target building B4 and selected sources.
Validation of the learned representation space showing 2D latent-space visualization and normalized electricity profiles for target building B4 and selected sources.

The 2D visualization confirms that proximity in the learned space correlates with actual operational similarity—nearest buildings show nearly identical weekly patterns, while farthest buildings display irregular profiles incompatible with the target.

Source Selection Strategy Validation

Impact of source selection strategy on forecasting performance for target B4, comparing Closest and Farthest strategies.
Impact of source selection strategy on forecasting performance for target B4, comparing Closest and Farthest strategies.

Three hypotheses are validated across all 89 buildings:

  • H1: The Closest strategy consistently outperforms the Farthest strategy
  • H2: A performance sweet spot exists under the Closest strategy
  • H3: Farthest strategy shows monotonic improvement with more sources
Distribution of relative MSE across 89 target buildings showing Closest vs. Farthest strategy performance.
Distribution of relative MSE across 89 target buildings showing Closest vs. Farthest strategy performance.

Framework Robustness

Heatmap of relative MSE change across 89 target buildings. Blue indicates improvement, red indicates degradation.
Heatmap of relative MSE change across 89 target buildings. Blue indicates improvement, red indicates degradation.

The Closest strategy achieves improvements in nearly all cases, with only 3 instances of marginal degradation (max 2.2%). This demonstrates remarkable stability across diverse building types.

Forecasting Performance

Time-series forecasts for target B4 showing superior tracking of demand curves by transfer learning models compared to No-TL baseline.
Time-series forecasts for target B4 showing superior tracking of demand curves by transfer learning models compared to No-TL baseline.

Transfer learning models demonstrate superior stability and generalization. The optimally configured model (Closest 4) achieves better peak forecasting accuracy, while the No-TL baseline exhibits erratic fluctuations and systematically underestimates peak demand.


Comparison with Federated Learning

MetricFederated LearningPPTL Framework
Privacy approachStructural locality (trustless)Regulatory compliance (trusted)
Communication~608 MB over 100 rounds~3.1 MB (0.51% of FL)
Client computationGPU-class hardware requiredNo local training needed
Non-IID robustnessVulnerableRobust by design
PersonalizationGeneric global modelTarget-specific models

Contributions

  1. Metadata-free transfer learning framework — Enables effective TL using exclusively anonymized time-series data.
  2. Representation distance as transferability proxy — Cosine distance reliably predicts transfer success (99.2% improvement rate).
  3. Negative transfer as manageable engineering risk — Characterizes the quantity–quality trade-off for systematic decision-making.
  4. Scalable deployment complementing FL — Only 0.51% communication bandwidth with server-side computation.

Authors

  • Wonjun Choi (School of Architecture, Chonnam National University) — Co-first, Corresponding
  • Sangwon Lee (Dartwork) — Co-first
  • Max Langtry (University of Cambridge)
  • Ruchi Choudhary (University of Cambridge)

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant (No. RS-2023-00277318 and RS-2025-00512551) funded by the Korean government (Ministry of Science and ICT).