What Cloud Marketplaces Do and Don’t Do
Not long ago, we observed here in our blog that the critical insights that drive business value come from data that is both (1) fast and (2) reliable.
In my last blog post, I talked about how informatics firms help companies ‘orchestrate, implement, and operate’ their information supply chains. What exactly do I mean by that? As an Informatics firm, this is what Crux does:
Orchestrate: ‘Orchestrating’ in general means pulling together and coordinating a variety of components to work together effectively, the way the conductor of an orchestra makes sure the individual musicians are playing together effectively to bring the music to life. The first step in creating a supply chain is deciding the elements that need to go into it. This is driven by the use case of the consuming customer (hedge fund, bank, insurance co, etc). What data do they need, and in what form do they need it? Crux works with a supplier network of partners: data publishers, analytics firms, and service providers who form the components of the supply chain that Crux implements and operates. In some cases, a consumer may have a specific dataset or vendor that they know they want to work with. In some cases, the consumer only knows the type of data they want and they look to Crux to help them surface potential providers of that data and possibly to run tests on candidate datasets to objectively test the fitness of that data to the customer’s use case. Crux works with a wide range of tools and 3rd party service providers and pulls them into the appropriate set to meet the needs of the specific supply chain. For instance, there may be a specialist who transforms the data in some specific way (akin to a ‘refiner’ in my last blog post). Crux partners can make themselves visible to clients on the Crux platform so that customers can browse and learn about specific datasets, analytics, and services, get inspired, and express interest in exploring any of them more deeply.
Importantly, Crux does not sell or resell any data or analytics itself — producers and consumers can count on Crux being an objective neutral partner and producers have full control over where their data goes. Providers license their content directly to customers and Crux acts as a third party facilitator to wire up and watch over the data pipelines, on behalf of customers, as described below.
Implement: A supply chain fundamentally involves the flow of goods from producer to consumer. In the physical goods world (traditional logistics), that involves transportation, storage, and (potentially) repackaging. In the case of an information supply chain, it involves the transportation, storage, and repackaging of data. These are the fundamental data engineering tasks that allow data to flow between parties in a way that is maximally actionable for the consumer. These data engineering tasks generally involve writing software that ingests the data (maybe picks up FTP files, copies from an entitled S3 bucket, legally scrapes a web site, hits an API, etc.), validates it (look for missing, unrecognizable, or erroneous data), structures it (usually into one or more database tables), cleans it, normalizes it, transforms it, enriches it, maps embedded identifiers, joins it with other data, removes duplicate entries, etc., all to support the specific use case of the customer. This is the kind of data engineering work that Crux does to implement a specific supply chain for a customer, pulling in the appropriate data providers, tools, and value-added service providers identified in the Orchestration phase.
Operate: Rarely is a dataset static. The vast majority of datasets receive regular updates, whether that’s once a month, or once per millisecond. As that data flows, constant vigilance is needed to make sure data shows up when it is supposed to, that it’s not missing anything, that it doesn’t contain unidentifiable components. Data Operations includes the monitoring and remediation of ongoing data streams. Crux Data Operators set up dashboards and alerts to keep a close eye on data in motion and all the systems it travels through. When a problem is spotted, they immediately begin diagnosing and remediating the issue, in tight collaboration with the relevant data provider(s), to try to get ahead of the issue before it affects downstream consumers. Data Operations also includes handling standard maintenance tasks such as watching for and reacting to data specification changes and scheduled maintenance outages coming from the data provider(s).
These are the key elements of Information Supply Chain Logistics in a nutshell. It is a rich process and gives customers tremendous leverage in harnessing the integrated value of a network of suppliers.
Contact Crux if you’d like to learn more.
Not long ago, we observed here in our blog that the critical insights that drive business value come from data that is both (1) fast and (2) reliable.
This past year has been exciting, representing the dawning of a new age for artificial intelligence (AI) and machine learning (ML)—with large...
How do you get white-glove customer service from a major data supplier?