If data is the true currency of modern business, then data jobs must surely be among the most high-profile — and high-paying — roles in tech.
There’s plenty of evidence that this is precisely what has happened over the past decade or so.
There are now so many graduate programs in data science that those programs inspire rankings like Fortune’s Best Online Master’s in Data Science list, which evokes the long-running U.S. News College Rankings. In fact, U.S. News itself now ranks the top undergraduate programs in data science.
Those programs exist, among other reasons, to prepare people for a glut of available jobs. For example, a search in April for data scientist positions on LinkedIn in the U.S. generated more than 150,000 open positions; a search for data engineer jobs produced nearly 230,000.
The scores of parents who once dreamed of their children becoming doctors and lawyers may now just as well hope their offspring grow up to be data scientists or engineers. Data scientists and data engineers rank third and seventh, respectively, on Glassdoor’s “50 Best Jobs in America for 2022” list. (For the record, physician ranks 16th and attorney comes in at 23rd.) Another data-centric role, data analyst, also appears on the Glassdoor list at 35.
So yes, data jobs are big. They’re also confusing: What’s the difference between these different titles? What skills does someone need to gain one of those positions? And what’s actually driving demand for these roles when you dig beneath the hype?
We’re here to find out. Let’s start by sorting out the different titles.
Data Analysts, Data Scientists, Data Engineers
There’s no doubt overlap among these three primary titles — they’re all “data jobs” — but they are distinct roles and the terms shouldn’t be viewed or used as interchangeable.
Let’s dig into some definitions, title by title.
Data analysts are sort of like the descendants of the longstanding business analyst role, evolved for the era of big data and analytics. They’re essentially the folks who both understand the data they’re looking at and can apply it to specific business domains.
“A data analyst is someone who can query a database, visualize data in ways that provide insight, and interpret that data to guide business decisions,” Chris Nicholson, hiring manager for data science at Clipboard Health, an online staffing marketplace for the healthcare industry, told The New Stack.
The best data analysts are also storytellers. They don’t just run a query and then hand off the results. They explain to the end audience what they’re looking at and why it matters.
Of the three titles, it’s probably the least technical — but it does require a keen ability to translate the outputs of data analytics for wider audiences.
“Data analysts are usually in charge of doing some relatively simple ETL”— extract, transform, load — “to prepare data for visualization tools like Amazon QuickSight or Tableau,” said Ryan Ries, the practice lead for data, analytics and machine learning at managed cloud services provider and Amazon Web Services partner Mission Cloud Services. “They work closely with business stakeholders and partners to answer analytics questions.”
The best data analysts are also storytellers, according to Mike Frazzini, data science chief at Iterate.ai, a low-code app development platform. They don’t just run a query and then hand off the results, even if beautifully visualized. They explain to the end audience what they’re looking at and why it matters.
Of the three main data job titles, data scientists are probably the one whose role most often overlaps with the other two. But they are distinguished by how they use a combination of mathematical and programming skills to work with data.
“Data scientists may do some level of data engineering and data analysis, but they will typically be more focused on statistical interpretation of data, like designing and interpreting A/B tests,” Frazzini said.
On most data or artificial intelligence/machine learning (AI/ML) teams, the data scientist is also focused on developing data models based on machine learning, deep learning, or natural language processing techniques, Frazzini added. “An example might be a predictive model for customer demand forecasting, or a fraud classification and detection model.”
Data scientists, Ries noted, are usually concerned with solving data problems that require more complex math than you would typically see in a data analyst’s dashboard. The role also requires programming ability.
“A data scientist is someone who can query a database as part of data exploration, but who also programs proficiently in a language like Python or R, and can use that language to build data pipelines, create features and test algorithms to make useful predictions,” Nicholson said.
Data scientists are usually concerned with solving data problems that require more complex math than you would typically see in a data analyst’s dashboard.
The data science role is also probably most susceptible to debate — as in, what defines a true data scientist? The ultimate distinction between a data scientist and a different role like ML engineer, Ries said, is the former’s ability to understand the why behind what they’re doing, whereas an ML engineer may be more focused on daily operations and making sure things work.
“A true data scientist understands the algorithms, creates a scientific approach, and goes about answering a question and verifying that the answer makes sense,” Ries said. “I can’t stress enough the importance of making sure the answer makes sense.”
Finally, the data engineer is the heavy-machine operator of the data team, so to speak.
“Data engineers do the behind-the-scenes dirty work,” Ries said. “They connect to data sources to pull data into your data lake. They clean the data, create the schemas, and write the more in-depth ETL code. They often also set up infrastructure pieces to be used by data analysts and data scientists.”
At the risk of sparking an intra-team watercooler fight, it should be noted that the data analysts and data scientists wouldn’t be able to do their jobs if the data engineers don’t do theirs.
“Data engineers do the heavy lifting of transforming and wrangling the data so it’s in a state ready for data analysts and data scientists,” Frazzini said. “This role is foundational and easy to take for granted given how much effort and investment is required here to ensure a reliable pipeline of data.”
How Much Do Data Jobs Pay?
These roles also differ in terms of compensation. Glassdoor lists median base salaries of $120,000 for data scientists, $113,000 for data engineers, and $74,224 for data analysts.
“All three are in high demand, although the additional math skills required to be a data scientist, and the additional programming skills required to be a data engineer, may make them more scarce than a data analyst, and therefore able to command higher salaries,” Nicholson said.
Tech jobs site Dice’s 2022 salary report includes even higher average salaries of $120,650 (data scientist), $117,295 (data engineer), and $84,779 (data analyst). Salary data can fluctuate considerably based on multiple factors, and the hybrid/remote work era has thrown another variable in the mix. But suffice it to say these jobs pay well — especially if you have the skills and experience required of a data scientist or data engineer.
Key Skills Needed for Data Jobs
The required and desired qualifications for each of these jobs will naturally vary from employer to employer, and job description to description.
With help from Ries, Frazzini, and Nicholson, as well as Jayaprakash Nair, head of analytics at Altimetrik, a digital transformation company, we culled a representative list of core skills (such as languages and tools) and other knowledge relevant to each position.
These are not exhaustive lists. But whether you’re just starting out, pivoting from another tech role, or already advancing in a data career, these are good areas to concentrate on.
- SQL – at minimum, a data analyst can run simple queries. Deeper expertise will only help, and an analyst who aspires to grow into a more advanced role would do well to learn R or Python at some point.
- At least one tool for data analysis, dashboarding, or visualization — examples include Excel, Tableau, Looker, Amazon QuickSIght, and PowerBI.
- Business domain acumen/experience — this can be both general business acumen, but Nair also encourages aspiring data analysts to develop domain or industry-specific knowledge in areas they’re interested in, such as healthcare, financial services or retail.
- Python and/or R programming.
- Statistics, probability, applied math.
- Machine learning and deep learning.
- Research and test design.
- Advanced degrees — data scientists often (but definitely not always) have a Ph.D. or Master of Science.
- Data governance, security and privacy.
“It is very important that data scientists also have awareness of key governance areas like data privacy and data security, and that they can focus on the important ethical and human elements of the data to mitigate bias and abuse with data models,” Frazzini said.
- Java, Python, SQL, Spark and/or other programming expertise.
- Experience with at least one of the major cloud data stacks (AWS, Azure, GCP) — Frazzini notes that the hyperscale platforms offer data engineering education and certification.
- General proficiency with cloud infrastructure and tools.
- ETL tools.
Why Data Experts Are Likely to Stay in Demand
If you have the right skills (or are actively building them), then you are very much a hot commodity on the job market. Hiring managers and recruiters are looking anywhere and everywhere, especially when it comes to the harder-to-find combination of skills required of data scientists and data engineers.
“The demand for data analysts, data scientists and data engineers have increased dramatically over the last few years,” Nair from Altimetrik said. “At the same time, the supply of this talent has not kept pace. This demand-supply skewness is increasing the need for more such folks to enter the market.”
Moreover, the demand is unlikely to decrease at any point soon — if ever. As a result, most organizations are still only using a very small percentage of the data available to them today in a meaningful way, pointed out Frazzini, from Iterate.ai.
“The incentive to invest in these jobs is strong, as there is gold in the data — from more timely and effective data-driven strategy and decision-making to data productization, which opens up new growth and revenue centers,” he said.
Data growth continues unabated, too. “Available data” is a constantly growing pool.
“Everyone is collecting tons of data from all kinds of sources, and they aren’t really sure what to do with it or what insights could be hidden in all the data they collected,” said Ries, from Mission Cloud Services.
It is the likes of the data analyst and the data engineer who can wrangle and make sense of gobs of information coming from an ever-increasing number of sources.
“The incentive to invest in these jobs is strong, as there is gold in the data — from more timely and effective data-driven strategy and decision-making to data productization, which opens up new growth and revenue centers.”
—Mike Frazzini, data science chief, Iterate.ai
These organizations are also under pressure to “do AI” — even if they don’t really know what that means.
“Customers are hearing from outside sources, their boards, news and everywhere else that AI/ML is the future and if they aren’t using it they are going to be left behind,” Ries said.
As a result, they rush to get a ML model into production, even if it doesn’t make sense. Yet the whole “make sense” part is absolutely crucial — and also a vital part of the data scientist’s job, working on the foundation enabled by smart data engineering and analysis.
“Once you have a certain sophistication with visualizations, then you can start working on ML problems,” Ries says.
Put another way: Unless you think AI and machine learning are passing fads or flights of fancy, bet on data jobs being mainstays — like doctors and lawyers — for years to come.
Amazon Web Services is a sponsor of The New Stack.
Featured image by Maxim Berg via Unsplash.