The functional scope of the platform includes:
of health data from different sources.
Create a basic foundational technology layer suitable for any digital health application providing better interoperability, portability, availability, analysis, security of the data.
The platform consists of two primary components:
In theory, any kind of human-generated data which can be ingested and used for deriving health insights should be defined as health data and be made accessible for further analysis.
The challenge is to acquire, extract, transform, and normalize the countless unstandardized data export file formats and data structures and load them into a standardized structure that can be easily analyzed.
Proposed is the development of an application programming interface (API) and OpenAPI specification for receiving and sharing data with the core database. Software development kits (SDK’s) made available for 3rd party applications allow the interaction with the API. SDK’s will enable developers to implement easy automatic sharing options in their applications.
Separate plugins will enable spreadsheet upload/import and scheduled imports from existing third-party APIs. The API connector framework will allow the ongoing regular import of user data after a single user authorization.
API Name | Supporting IGs |
---|---|
Patient Access API | The CARIN Consumer Directed Payer Data Exchange IG (also referred to as the CARIN IG for Blue Button®) HL7 FHIR Da Vinci PDex IGHL7 US Core IG HL7 FHIR Da Vinci - PDex US Drug Formulary IG |
Provider Access API | * See Above IGs for Patient Access API |
Payer-to-Payer API | * See Above IGs for Patient Access API |
Provider Directory API | *HL7 FHIR Da Vinci PDex Plan Net IG |
Documentation Requirements Lookup Service API | HL7 FHIR Da Vinci - Coverage Requirements Discovery (CRD) IG HL7 FHIR Da Vinci - Documentation Templates and Rules (DTR) IG |
Prior Authorization Support (PAS) API | *HL7 FHIR Da Vinci - PAS IG |
Bulk Data | *HL7 FHIR Bulk Data Access(Flat FHIR) Specification |
To preserve originality in case of data processing errors or protocol changes the ingested raw files like CSV files, PDF reports, and the raw API responses are stored separately in a binary data and file storage system. Data will be encrypted and stored in its raw format in flat files on a secure cloud provider defined in the framework instance platform settings. Preservation of the data in its original format will allow for:
The original raw data and files can be accessed at any time by the owner, independently from any other process involved with the structured data storage.
To make the standardized structured storage of health data and the envisioned queries possible, the data has to be ingested from files or API requests and mapped from many different standards and proprietary formats into a standard schema. These will be executed in an asynchronous queue to map the raw data to a standardized format and provide it to the validator. The most common data mappers will be integrated into the core. Less common data mappers will be available as plugins from 3rd party developers.
Core data mappers (Initial proposal):
The database schema will be consistent with existing formats while addressing some gaps of missing definitions. The goal of a simplified, universal standard format is to make multi-omics data as well as environmental, social, digital biomarkers easily analyzable.
In order to ensure a level of quality required by healthcare and clinical trials, data quality and consistency must be ensured. The data validation middleware will validate the data before it is stored in the time-series database.
Validation involves the following steps:
Invalid Data Handling:
Mapping data from different formats into one standardized format suitable for a measurements analysis requires a reference database with tables of definitions and descriptions to be used by the data mappers and by the API for displaying this information in applications. The type of data to be defined includes biomarkers, health-related variables of any kind, interventions, therapies, outcomes, conditions, etc.
Examples of currently used existing reference databases include LOINC, RXNORM, ICD-10 but are partly not suitable enough and have to be unified for the scope of more efficient data handling and analysis. Especially definitions for environmental factors, natural supplementation or therapies, digital biomarkers, and social and lifestyle data sources have a lack of integration.
The proposed solution for overcoming challenges with interoperating with data formats like FHIR is a single measurements table with all definitions query-able by beforehand mentioned categories and types. The main reason for this solution is the complexity of the nature of the definition of a health-related measurement.
Often a variable can be either be both an input factor or a health outcome with respect to the black-box system human body. Thus, whether or not a variable is a controllable input factor and/or a health outcome is stored in the variable settings table. All measured values are thrown in one "pool" and can be retrieved in a flat universal format without having to worry about transforming complex nested data structures for compatibility.
Units of Measurement
The Unified Code for Units of Measure (UCUM) system will be used to include all units of measures being contemporarily used in international science. The full list of units of measure is available here.
After validation and mapping, the data will be stored in a standardized and structured time-series database.
Functional Requirements:
Data should be owned by the individual who generated it. It should remain under their control throughout the entire data life-cycle from generation to deletion. The data owner shall have the unrestricted ability to manage their digital health identity.
Ownership management functionalities will allow the individual to manage their data and access control settings for sharing purposes. It will allow them to:
This feature can be used by user-centered applications and dashboards for personal health management, for data sharing with care providers, research, or for participation in trials.
Health data is a sensitive and valuable commodity.Therefore the handling of the data alongside its attached value is proposed to be built natively into the core. Value stream management functionalities will allow the exchange from data against tokenized value assets in different scenarios. It will allow:
Data Value Scenarios:
This feature can be used for exchanging data on marketplace applications or clinical trial platforms.
Defined interfaces will allow 3rd party development of software modules that interact with the core and provide additional functionality. They may be free or monetized by their creator.
Plugins will be stored in their own repositories based on a plugin template repository. The plugin template repository will contain defined interfaces required for interoperability with the core.
The impact of effective and detailed analysis is:
Data Analysis Plugins will apply statistical and machine learning methods to the ocean of high-frequency longitudinal individual and population-level data. The resulting value will include:
This will mitigate the incidence of chronic illnesses by informing the user of symptom triggers, such as dietary sensitivities, to be avoided. This will also assist patients and clinicians in assessing the effectiveness of treatments despite the hundreds of uncontrollable variables in any prescriptive experiment.
Large cohort clinical analysis could reveal new molecules for longevity.
Data visualization plugins convert data from its raw form into useful insights. They may be used to display data from individual or multiple subjects. Some regular ways to visualize data are scatter plots, timeline charts, heatmaps, or novel ways like the in the following proposed outcome labels. Visualizations can be embedded in studies, publications, or personal dashboards.
Tasks of data visualization plugins:
Example Data Presentation Plugins
Many applications and service providers offer a direct exchange of structured health data through an API, which upon user authentication allow access to automated and scheduled exports of the generated data.
So far the proprietary silo developments have produced many different data formats, which could be replaced with the data standard proposed within this project. Until the success of a common language for all types of health data and between all stakeholders, many API connecting plugins are necessary for this interoperability.
An API connector plugin handles:
Connector Technical Flow
API Connector plugins will be called by the webserver to:
A job scheduler will call the API connectors periodically (usually daily) to:
File importing plugins are needed for specific sources or devices, where APIs are not available and the user only has access to raw files. Types of files include spreadsheets, PDFs, and raw genomic data.
The background job scheduler will:
A link between the created structured data and the original file allows backup and reprocessing (e.g. if the data import plugin functionality is expanded in future versions).
Challenges include changing proprietary formats, spreadsheet column matching, long upload times with raw files like from genomic testing.