With both normalizing flows and physics-inspired neural networks (PINNs) finally making their way from machine learning and physics literature to be applied in biogeochemical model inference problems, I found myself recently thinking about the similarities in both approaches. I have arrived at the notions below, with the general sentiment that normalizing flows can almost be thought of as belonging to a more specific, less flexible subset of PINNs, except for a key difference in optimization objectives. (As I will touch on later, flows do not involve dynamical system derivative residuals in loss calculations.) Please reach out to me and correct me if you feel that I have erred in understanding.
Overlap:
- Both approaches involve the informing of deep learning neural network weights with knowledge from user-contributed dynamical systems models in addition to data observations.
- In neural network training iterations of each approach, dynamical system output contributes to backpropagation and the updating of the neural network weights particularly during computation of the optimization objectives.
- Both approaches facilitate learning of unknown dynamical systems parameters.
- Both approaches suffer from inaccurate model choices that are mechanistically inconsistent with true underlying data generating processes.
Contrasts:
- For statistical inference of unknown dynamical system parameters, an approach using normalizing flows inherently operates under a variational inference framework. For PINNs, variation inference of unknown system parameters is still an option, but other Bayesian inference methods, such as transition sampling Markov chain Monte Carlo or even approximate Bayesian computation algorithms, are compatible. Non-Bayesian approaches that do not estimate posterior distributions and are predicated only on optimization rather than uncertainty assessment are also compatible with PINNs.
- Normalizing flows specifically involve state space model approximations of dynamical systems for neural network weight training; state trajectories are sampled from these state space models in each training iteration for likelihood computations. On the other hand, PINNs invoke continuous-time differential equation approximations with the calculation of differential equation derivatives in the training objective.
- Normalizing flow neural networks parameterize bijective transformations from layer to layer that facilitate effective density estimation of transformed probability densities and/or state space sampling. Without the necessary correspondence with bijections and requirement for tractable probability density estimation, PINN neural networks can be less constrained in structure.
- Normalizing flows and PINNs differ in optimization objectives. Normalizing flows are tied to variational Bayesian inference and their optimization objective is thereby summarized as maximization of the evidence lower bound (ELBO) metric based on log likelihood calculations. PINN optimization objectives allow for more general and flexible choice in loss functions such as mean squared error that are then augmented with the incorporation of differential equation derivative residuals into the loss expression. Of course, a probabilistic ELBO-based objective could also be deployed for PINNs, but flow objectives will still be distinguished by their lack of inclusion of explicit system derivative calculations.