Uniper employs around 11,000 employees.
Large company eh? The only thing larger to consider is the size of the potential for their data to become too large to manage efficiently.
Perhaps the top 5 tech companies aside, getting the data journey right is certainly a rarity but with their latest initiative: the Data Lake Project; Uniper is looking to do just that.
Ahead of their exposé at ETOT on this very matter, we got the chance to field a few questions to Surya Ayyagari, Senior Data Manager.
Could you give us an overview of what the Uniper Data Lake project is?
Couple of years ago, we’ve led a large data-ware- house project in traditional technologies for all our reporting requirements like Risk, Regulatory, Finance, etc. The project was a successful one. During the course of the journey, it was quite clear that with the traditional Data Warehouse there were limitations for data usage and analytics, data sharing and so on. Being a trading organization, where data is at the core of our decision making, rapid data ingestion and data analytics, was an important requirement.
And that’s been the actual purpose of our Data Lake Initiative. To build a platform where Uniper’s Data value can be unlocked to its full potential, while overcoming Data Silos and monetization of our Data.
When was the data lake project introduced?
We started our Data Journey during Summer 2017 and since then we went live with number of use cases delivering high business value to our stakeholders.
"The key is that our platform allows that flexibility and we as Data Stewards define quality requirements for each use together with Data Owner and Use case owner".
Is there any limitations to the data lake project?
It is not a project or program in the first place. We are executing the initiative within our team as our daily business activities. We have a ‘Use Case Driven Approach’, where in Business Problems with high pain points and maximum business value are prioritized. From a technical stand point, the platform can deal with all kinds of data and has components that can facilitate varying types of technical solutions like Mobile Apps, Data Visualization, etc. There are no limitations that way. However, from time to time we do experience challenges while establishing Data Integration points with other systems in our landscape which are on premise.
Is there such thing as, especially for a company as large as Uniper, perfectly clean data? Is it even necessary?
Of course, it is necessary to have clean and reliable data. There is no point in having data which is not helping the business do what they want to do. Having said that, it is also very important to understand that 100% clean data may not be a necessity for all use cases. There are certainly some cases where 95-98% would suffice and there are also cases where 100% is a must. The key is that our platform allows that flexibility and we as Data Stewards define quality requirements for each use together with Data Owner and Use case owner.
"Imagine that there is a ML program running on this data set, it can go severely wrong and facilitate incorrect decision making".
How do you get perfectly clean data? What is the process?
We define quality requirements/expectations of the data together with the Data Owner and Use Owner in the Ideation or Exploration phase already.
For data from external sources, we may have to rely on the provided data. The Data Integration points should be robust enough to take care of efficient data transfer. But, if this data is then being married to another set of data (internal/external), the quality focus on the resultant data set is crucial. Imagine that there is a ML program running on this data set, it can go severely wrong and facilitate incorrect decision making.
We have a quality framework in place on the platform and we apply adequate processes based on the use case. There’s no one rule that’s fits for everything.
"We are quite clear that to apply technologies like Machine Learning or AI at scale in a sustainable way, data comes first".
What role did machine learning play in the project?
Machine learning is a journey that we started this year. So far on our Data Journey, our main focus was to enable data on the platform. We have reached a point where we now are enabled for the next steps like Machine Learning. We have some use cases around sensor data from our Power Plants and we are working on use cases to apply Machine Learning to help us improve or optimise our performance.
How long do you think it will take for it to be properly be implemented across your company? Also, do you think other companies will be adopting the machine learning play?
Machine learning, as I mentioned is one of our key focus areas and I don’t believe that there will be an end to that journey in foreseeable future. It will certainly remain an ongoing one simply because with what we develop we will only learn more and we will be looking to do the next step all the time.
We are quite clear that to apply technologies like Machine Learning or AI at scale in a sustainable way, data comes first. And this is what we focussed on so far. I think that this holds for any company who wish to get on the ML journey. They will have to first go through a Data Journey.
SEE MORE OF UNIPER IN PERSON AT ETOT - SEE THE PROGRAMME HERE