The following is a popular definition of a data scientist:
According to this definition, a data scientist is not necessarily the best statistician, and not necessarily the best software engineer.
Hiring such data scientists is clearly a compromise. The reason behind the compromise is that a data scientist needs to juggle both software engineering and statistics – so being great at one of these but not at the other might not work out for the best.
It does not have to be that way.
If I were building a data science team, I would rather hire the best statisticians and the best software engineers, and have them work together.
How can they work together?
One way is to separate the analytics logic (the “what”) and the engineering aspects (the “how”). The statisticians can then work on the analytics logic, while the engineers work on the engineering.
In Sclera, this is facilitated by high-level building blocks for data access, data transformation, data cleaning, machine learning, pattern matching, and visualization.
Statisticians specify the analytics logic by building a pipeline of these building blocks, while the engineers provide implementations of these building blocks.
From the statistician’s point of view, this results in greater productivity. The high-level analytics specification is only a few lines of ScleraSQL code – easy to write, and easy to modify for iterative experimentation. Sclera optimizes the code automatically, ensuring the best performance on the available resources.
From the engineer’s point of view, the problem is well-defined – build the most efficient implementation of a building block. The semantics are clear, so no distractions in terms of ever-changing specifications, and the code is reused in a structured manner across multiple applications.
Sclera helps you get the best out of your statisticians, and the best out of your engineers. So why compromise on the hiring?