Businesses are failing their data science teams, and we are seeing a mass transition from Data Science to Data Engineering for that reason.
Data Scientists join a company expecting to apply statistical modeling and machine intelligence to challenging business problems. What they find is an impenetrable maze of low-quality data, virtually indecipherable JSON blobs with little indication of ownership or semantic meaning.
Instead of model building, data scientists spend the majority of their time on validation and 'untangling' spaghetti SQL in the Data Warehouse. They are encouraged to 'prove business value' and 'move fast and break things' yet the underlying infrastructure allows them to do neither effectively.
This is not sustainable. For data scientists to truly add value in a scalable way data engineers and data scientists need to operate from a shared understanding of data quality and infrastructure. That means:
1. An investment in data architecture early on
2. Clearly defined ownership of core data assets
3. Business meaning of the data, defined centrally
4. A shared responsibility for data quality
Good luck!
#dataengineering
Data Scientists join a company expecting to apply statistical modeling and machine intelligence to challenging business problems. What they find is an impenetrable maze of low-quality data, virtually indecipherable JSON blobs with little indication of ownership or semantic meaning.
Instead of model building, data scientists spend the majority of their time on validation and 'untangling' spaghetti SQL in the Data Warehouse. They are encouraged to 'prove business value' and 'move fast and break things' yet the underlying infrastructure allows them to do neither effectively.
This is not sustainable. For data scientists to truly add value in a scalable way data engineers and data scientists need to operate from a shared understanding of data quality and infrastructure. That means:
1. An investment in data architecture early on
2. Clearly defined ownership of core data assets
3. Business meaning of the data, defined centrally
4. A shared responsibility for data quality
Good luck!
#dataengineering