A Data Scientist I met from a large corp cited an internal study showing they spend $700m annually in labor costs searching for, requesting permission to use, and accessing raw internal just to begin the process of creating value. What has been your experience ?
I think it would depend on the industry, def it cant be applied to all the industries. It could be applied for tech-industry.
1) Discovering data with value is an unsolved problem. Often you have to get your grubby hands on it and play with it to see if it applies to your problem or not. The problem is that you have to pay upfront, which no one wants to do, which creates an uncomfortable economic position of producers heavily discounting their data to shift the skepticism/prospective-value balance, and consumers taking a lot of risk buying random datasets. From our DMs, you have a partial solution to this - time-bound permissions to allow trial periods - but you can't prevent downloads of data out of your sandbox. Solve this, and you're a billionaire.
2) Permissions are a narrow, tall vertical. I don't have State or defense backgrounds, and quite frankly, most of the economy doesn't exist in that kind of space. Health and pharma are slightly different, but HIPAA has locked a lot of that shit down - build a HIPAA permissioning registry, and you'd own that, and could even be formally blessed by the State the way that credit rating agencies are.
3) "Accessing raw internal data" is interesting. My understanding based on our DMs is that you basically wish to expand your infrastructure's footprint to own everything. Your permissioned data can only ETL with data that your customers upload into your platform if they need to integrate with it. Well, Google Analytics wants you to do that too. So does SFDC. Everyone wants to own all of the data for an entire company. And you know what? Customers aren't stupid: they know to avoid that kind of lock-in. My (Infallisys) approach is to let customers own all of their data. Anything they buy from 3rd parties would have to be copied into there. So we're at a conundrum. How do you enforce permissions on infrastructure that isn't yours??? We seem to be competing directly with each other on this point. We should speak with each other over how to resolve this and be partners instead. I have ideas centered on BLOB permissions delegated with time-bound API keys handed out based on blockchain for public consensus-based evaluation.