Raj Abrol

22 January 2022

Author(s):

Raj Abrol, Co-founder and CEO, Galytix & Dr. Jeffrey Bohn , Chief Research Officer, Swiss Re Institute & Jeff Tilson, Chief Commercial Officer, Galytix

Transforming data into actionable insight is key to its value. Leaders from companies in a variety of industries that are materially investing in technology architectures want return on their investment, and they want it now. Unfortunately, many of these leaders are missing a crucial first step that precedes generating useful analyses.

A range of challenges are putting the brakes on leaders’ efforts to extract value from their data and system investments. Crucially, data used in analysis first have to be clean, and data ingestion and curation efforts are time consuming – the process (still quite labor intensive) requires 70%-90% of a typical data professional's time. Most companies hire data professionals for their specialised and expensive analytical skills, not for their ability to clean data. Second, technology architectures lack end-to-end data pipelines, data ontologies and taxonomies. Instead of performing data ingestion and curation effects in a centralised manner, the simultaneous processing of data in multiple departments within a firm results in fragmentation, limiting the ability to spread and use data insights across the organisation. It's just not efficient.

The current state of data engineering has become a key drag on making Machine Intelligence (MI) profitable in the insurance industry.

Data Engineering - key to extracting value from data investments

We define data engineering as the process of refining data into fuel – clean, easily accessible, allowing its insights to be available to use and benefit from anywhere in an organisation. At-scale success requires an automated end-to-end data pipeline performing ingestion, curation, transformation, visualisation, and finally widescale distribution of data. Think of this pipeline as a production line, similar to what one may find in a factory.

Usefully transformed data provide competitive advantage in the insurance industry. Firms can differentiate themselves through identifying, ingesting, and curating novel (not generally available) data and even further, developing more useful derived analytics & insights with these data. These data-driven differentiation opportunities can be found at nearly every step in the risk-transfer value chain -- from product development through core underwriting and claims management to portfolio analysis.

Historically, firms mastered traditional data sources at the needed scale for success in one or more of those steps. However, the ingestion, curation, and transformation of data within each of those steps became tailored to the particular needs and data types of the activity of the department processing the data. That tactically fragmented data engineering landscape has a cost today and places a significant cap on data transformation efforts for tomorrow.

Most insurers today have multiple bespoke ingestion and curation processes, embedded in various analytical processes. None are optimised for cost-efficient engineering, but rather for the desired target analysis. The costs add up. As different areas try to leverage each other's predictive analytics, data professionals, underwriters, actuaries and claims managers must reconcile how data have been differently engineered in different areas before moving onto analysis. This carries not only time and effort cost, but often prevents cross-department leverage from occurring at all.

Looking further, the missing ability to engineer data "at scale" is what prevents firms from extracting value from their investments in data scientists and data analytics tools. A good analogy is that those investments are like powerful engines purchased by the company, however the company cannot deliver them fuel or the fuel they do deliver, is full of sand, which materially degrade engine performance. In the insurance industry, traditional data are already riddled with regulatory and privacy challenges, and the need for decision certainty and traceability already slows the exploration of non-traditional data. Fragmenting internal data engineering capabilities compounds the problem.

Looking specifically through the lens of "making MI profitable", at-scale data engineering gaps prevent promising MI projects from becoming successful operational implementations. Making MI systems operational from the beginning at scale, vs. more siloed project-scale efforts, requires access to high-quality, always ready-for-use data. Project solutions relying on brute force, i.e., high-cost manual coordination of fragmented data engineering capabilities, can not scale into production, regardless how promising the algorithms and models are. New MI techniques, driven by exponentially growing sources of data, will impact competitive advantage in every step of the risk-transfer value chain. However, right now, useful and relevant information often does not find its way into underwriting decisions and more granular profitability analysis does not steer the business. This will have to change.

The nuts-and-bolts elements of data engineering to make data work

Data engineering can be broken out into a series of steps. New data must first be discovered and sourced, and on an ongoing basis. Previously, we used to refer to the "ETL" acronym to boil down these steps: extract, transform, load. But the world has changed.

Today, enterprises should have pipes attached to rivers of data so that sourced data are nearly continuous—not "extracted" ad hoc from a static datastore. No longer should we aim to just "transform" data to a single format when there is a given use case. Now, we should apply a collection of analytical transformations on curated data. Thus, ETL becomes something different-- continuous ingestion, curation, transformation, and visualisation (ICTV). These steps should inform and evolve the data-value-chain management process. Indeed, validation and differential and robust privacy-preservation along with meta-data tracking become integral to a more robust data-value-chain management process.

A challenge threaded through these steps is excessive dependence on manual and script-driven processes. This can happen when processes are designed around the manual steps and project code originally developed for experimental, interactive analysis. Multiple (or all) steps might involve a manual trigger, intervention, or review. Manual auditing, versioning and checking for errors and anomalies reduces not only efficiency, but also confidence in downstream modelled outcomes. Lack of a holistic standardised data engineering process increases operational risk as data scientists tailor existing processes or create brand new ones for their needs. The potential for error rates in both data and derived analytics grows.

The use of relational databases as storage solutions impacts speed and scale. Traditional database management, which relies on using rigid technology infrastructure, is by itself insufficient to handle the rising number of heterogeneous formats that are necessary to unlock value from data. The lack of enterprise-wide data ontologies (the defining of relationships among data) also limits the ability to scale and operationalise data across the organisation. Infrequent release iterations of data, coupled with limited CI/CD (Continuous Integration / Continuous Deployment) of both data and MI models fail to adapt to changes in the data environment, or changes in the data that describes the data environment.

Looking forward, running tomorrow's at-scale, operational data capability will require a pipeline driven approach to data engineering – similar to how automotive companies implement their production lines. A single rules-based data pipeline is needed that covers the entire data engineering lifecycle. Most processes in the pipeline are algorithmic, supervised by humans. The automated pipeline connects and streams relevant data from sources as, and when, it becomes available. Depending on the data format, an algorithmic template approach is applied to transforming and extracting data from unstructured / semi-structured documents real-time, e.g. PDF. Data are tagged automatically based on prescribed rules based on business requirements. The data ontology and data dictionary (data terms and definitions) – key to operationalising the data across the organisation - is built dynamically and kept up-to-date based on periodic functions as the new data become available. A single, flexible document-based data lake (well-fed by various data rivers) houses all structured, semi-structured, and unstructured data types. An automated high-integrity versioning and traceability of data process runs throughout the pipeline as part of scheduled batch processing.

Exhibit A: Designing Insurers’ Data Engineering Pipeline

Galytix chart

Source: Galytix

Invest in a data engineering capability to help speed transforming data into value

An array of plug-and-play solutions built on open-source technologies can be applied with humans still intelligently integrated into the loop to speed up evolution of data engineering into a fully integrated end-to-end data pipeline process.

Data sourcing and discovery should be done using algorithms that scan new datasets based on relevant sources associated with the domain. Indexing of downloadable content and comparison of new files versus existing files should also be done using automated scripts
On the data transformation front, using custom OCR tools developed leveraging a combination of computer-vision-based algorithms and PDF parsers that allow template-based training of new document structure types are critical to performing this process at scale. Also important is to create logical and physical structures to enable automated tagging of documents against business requirements
Data credentialing and parametrisation done on the fly where appropriate tags are applied to the dataset in the form of unique data attributes, and data ranges calculations are performed on the data to identify any anomalies
Dynamic data ontology model and data stored into networked data rivers where a highly flexible ontologic model is implemented, which is capable of modelling almost any type of user defined data model – supervised by data architects; the ontology model operates as a lossless data abstraction layer and each data property and relationship can be sourced back to original document sources
High integrity associated with data versioning and traceability where an automated process retains the complete history of data that enters the pipeline
Ready-for-analysis data served in the form of microservices with a REST API
Modularised code with reusable components that are shareable across the pipeline including across both development and production environments

Building a scalable data engineering capability has now become table-stakes for insurers to not only extract value from their data investments but also expedite the speed of data transformation across the organisation. As one data leader put it: “Data initiatives are a team sport – requiring business and data staff to work hand in hand to bring data and technology together – ultimately, delivering a business transformation”. Insurers that make the right design and implementation choices with regard to data engineering related transformation will succeed in the future.

ShareHolders Name	Shares Held	Shares Outstanding %
Elon Musk	410,794,076	12.8
Vanguard Fiduciary Trust Co.	243,193,181	7.576
BlackRock Advisors LLC	153,685,950	4.788
STATE STREET CORPORATION	112,211,396	3.496
Geode Capital Management LLC	61,011,604	1.901

Metric	FY24	FY23	FY22	3Q24
Total Debt	8,213.00	5,230.00	3,099.00	7,696.00
Long - term debt and financial lease	5,757.00	2,857.00	1,597.00	5,405.00
Current portion of long - term debt and financial lease	2,456.00	2,373.00	1,502.00	2,291.00
Total Equity	73,680.00	63,609.00	45,898.00	70,710.00

Metric	FY24	FY23	FY22	3Q24
Total Debt / Equity (x)	0.11	0.08	0.07	0.11
Total Debt / Total Capital (x)	0.10	0.08	0.06	0.10

Metric	FY24	FY23	FY22	3Q24	3Q23
Revenues	97,690.00	96,773.00	81,462.00	71,983.00	71,606.00
Gross Profit	22,818.00	22,327.00	24,600.00	17,143.00	16,657.00
Gross Margin (%)	23.36	23.07	30.20	23.82	23.26
EBITDA	12,444.00	13,558.00	17,403.00	9,365.00	10,262.00
EBITDA Margin (%)	12.74	14.01	21.36	13.01	14.33
Net Profit/Loss	7,153.00	14,974.00	12,587.00	4,821.00	7,031.00
Net Margin (%)	7.32	15.47	15.45	6.70	9.82

Metric	FY24	FY23	FY22	3Q24
Cash and cash equivalents	16,139.00	16,398.00	16,253.00	18,111.00
Undrawn Committed Debt	5,000.00	5,000.00	5,000.00	5,000.00

Data engineering - enterprise-scale, end-to-end streams

Author(s):

Data Engineering - key to extracting value from data investments

The nuts-and-bolts elements of data engineering to make data work

Exhibit A: Designing Insurers’ Data Engineering Pipeline

Invest in a data engineering capability to help speed transforming data into value

Credit Memorandum

Company Description

Recent Highlights

Performance highlights

Credit weaknesses

Compliance With Financial Covenants

Company specifics

Overview

Shareholding structure and dividend payment

Management & Governance Structure

Business risk profile

Diversification and scale

Competitive position and growth potential

Financial Performance

Financial statements quality

Capital structure and financial policy

Overview of capital structure

Currency and Interest Rate Risk

Hedging Strategies

Profitability

Liquidity position

Debt service capacity and Leverage Position

Peer Comparison

Comparison Table

Metric	FY24	FY23	FY22	3Q24	3Q23
Total Changes in Working Capital	81.00	-2,248.00	-3,712.00	-949.00	-2,730.00
Net Cash Flow from Operating Activities	14,923.00	13,256.00	14,724.00	10,109.00	8,886.00
Capex, net	-11,342.00	-8,899.00	-6,236.00	-8,562.00	-6,592.00
Free cash Flow (FCF)	3,581.00	4,357.00	8,488.00	1,547.00	2,294.00

Metric	FY24	FY23	FY22	3Q24	3Q23
NOCF / Revenue %	15.28	13.7	18.07	14.04	12.41
(Cash + Credit lines) / ST debt (x)	8.61	9.02	14.15	10.09	0

Metric	FY24	FY23	FY22	3Q24	3Q23
Interest Coverage Ratio (x)	35.55	86.91	91.12	36.87	108.02
Total Debt / EBITDA (x)	0.66	0.39	0.18	0.82	0
Net Debt / EBITDA (x)	-0.64	-0.82	-0.76	-1.11	0
Debt Service Coverage Ratio (x)	35.55	86.91	91.12	36.87	108.02
Total Debt / Equity (x)	0.11	0.08	0.07	0.11	0
Total Debt / Total Capital (x)	0.10	0.08	0.06	0.10	0

Metric	Tesla	Average	Paccar Inc	Ford Motor Company	General Motors	Li Auto Inc
Earnings before interest and taxes, depreciation and amortisation (EBITDA)	12,444.00	10,900.67	6,048.00	11,086.00	25,173.00	1,295.69
Free cash Flow (FCF)	3,581.00	6,426.51	3,495.00	6,739.00	9,297.00	6,175.04
Revenues	97,690.00	106,044.75	34,324.80	184,992.00	187,442.00	17,420.19
Total Debt	8,213.00	75,928.84	14,234.50	158,522.00	129,732.00	1,226.85
Total Equity	73,680.00	33,711.74	15,878.80	44,858.00	65,590.00	8,520.15