Your phrasing "... lots of interesting data ... don't know how to analyze .... PhD candidate to look at it."
Really set of some red flags for me.
Everyone with a huge tranche of data says "we have all of this VALUABLE data but we don't know how to use it."
Then they imagine a scientist with a magic data wand will zap it, and the billions in value will pop out.
The problem with that story is that the data scientist will know many ways of processing the data, but if they do not intimately know what kinds of outcomes would be really valuable, and to whom, they will not know which way to go.
So no matter who you try to connect with, if you cant articulate the kinds of value propositions might exist in the data, the collaboration will likely be sterile.
I agree with one of the other posters, that If you have the money to higher a data scientist, this is probably your best bet. They will have no other agenda, other than trying to uncover value in your data, and they will not need to perform novel research (which the PhD student will need to do.)
If that is not an option, and you think there is novel research to be done on your data, AND you are in a position to invest many MONTHS in teaching this scientist about your data and what is important to do with it, then you are in a position to go hunting.
I agree that professors are the easiest starting point, and they will be mature enough to be able to quickly think about your ideas for the data.
I was a program manager at DARPA where it was my job to get professors interested in my agendas. The thing that I could offer (besides money) they held the greatest sway was DATA.
DATA is magic for the ML/AI researcher. Often they will have ideas, but cannot easily test them out, because they don't have the data to do it. If you offer easy access to easily processed data, AND you offer them the ability to publish at least some aspects of what they uncover with the data, THEN you have something that will turn heads.
Indeed if you try to go fishing to interest, I would LEAD wit the data. write up a description of your dataset. what are the rows and columns, and what kinds of outputs to do you think can be derived from the data.
If you describe data that could be bent to fit the agenda of some professor, THEY WILL TAKE NOTICE.
Once they do, it will be in their interest to collaborate with you. Ideally you can cover 10% of their salary, and a 1/2 time appointment of one of their grad students. They will jump at that, if they think it will further their research agenda, and you will have some freedom to aim the grad student's efforts, if they want that data.
and of course you can also try to turn the grad student to the dark side, and convince them to leave their PhD and join you.... but don't let the prof know of that nefarious agenda. btw, cash will likely NOT work... you will need to offer equity and a dream of 'making it' still grad students at strong research institutes often have that hack 80 hours a week mentality that you might want in a sweat equity kind of guy.
P.S. I am also an ML guy.... for me it would be about noticeable equity, and a problem in NLP probably focused on relational kinds of knowlegeknowledge