Data Engineer Job Description Template 2026

Q: Is it worth naming the cloud platform in the JD, or does that narrow the pool too much?

Name it. Experience with AWS, Azure, or GCP is less useful than production experience in Databricks on Azure. The pool shrinks slightly. Quality improves significantly. Engineers with adjacent cloud experience do apply to specific-stack JDs. Trust the pool.

Table of Contents

Data Engineer Job Description Template 2026

Last updated: April 28, 2026

A data engineer builds and maintains the pipelines that transform raw source data into clean, queryable datasets for analysts and ML teams, with 2026 U.S. base salaries of $110,000 to $155,000 mid-level and $145,000 to $190,000 for senior engineers. Below is a ready-to-adapt job description template, a four-source salary comparison table, and the specification failures that consistently add 60 to 90 days to an already competitive search.

The data engineering job description that fails most often does not fail because it asks for the wrong stack. It fails because it is actually three different jobs inside one posting. I am Tom Kenaley, and KORE1’s data engineering practice fills searches across the full spectrum — from analytics transformation work in dbt to real-time streaming pipelines in Kafka to cloud data platform builds on Snowflake, Databricks, and Azure. The template below is calibrated for the version that closes.

KORE1 earns a fee when searches come through our data engineer staffing practice. Use this template regardless.

Data engineer reviewing Snowflake and Databricks pipeline architecture on multiple monitors in a modern tech office

What the Role Actually Involves

A data engineer designs, builds, and maintains the infrastructure that moves data from source systems into a form that analysts, data scientists, and business stakeholders can reliably use for reporting, modeling, and operational decisions.

The daily work splits roughly into three areas. First, ingestion: pulling data from source systems on schedule. Relational databases, SaaS platforms, third-party APIs, event streams. Sources vary. That expectation holds constant: data must arrive on time and in the right shape. The sources don’t. Second, transformation: cleaning, joining, and reshaping raw data into the models downstream consumers actually use. Third, platform maintenance: keeping pipelines from silently breaking when upstream schemas change, catching orchestration hangs before 3 a.m. becomes a 9 a.m. crisis, and surfacing data quality issues before they reach the analyst’s dashboard.

What the title does not capture. A data engineer spends a lot of time debugging things that are not their pipelines. Upstream systems change without notice. Schemas drift. Authentication tokens expire. The role requires patience for ambiguous failures and enough operational instinct to know something is wrong before anyone reports it. Often long before.

One clarification that changes how you write the JD. A data engineer is not a data analyst. A data analyst consumes clean data and produces insights. A data engineer produces the clean data. Not the same work. A data engineer is also not a data scientist — data scientists build statistical or ML models using the data the engineer delivers. A JD that assigns all three sets of responsibilities to one person is describing a team. It’s not describing a hire.

The Four Profiles You Are Actually Choosing Between

Data engineering has split into distinct sub-types over the last four years. The profiles share tools and a title. The day-to-day work differs enough that strong candidates in one profile regularly underperform in another. Getting this wrong in the JD means screening for the wrong person and discovering the mismatch after the hire. That’s the expensive version of the problem.

Analytics engineer. The most common data hire in mid-market tech companies right now. Owns the transformation layer: taking data already in a warehouse — Snowflake, BigQuery, Redshift — and building the dbt models that analytics and BI teams consume. SQL fluency is the core requirement. Python is useful but secondary. Orchestration is usually handled by a lightweight tool or a separate platform team. Salary at mid-to-senior level: $105,000 to $150,000 in most markets. The signal is candidates with hands-on dbt in a production project: 50 or more models, multiple staging environments, documentation maintained alongside code. Not course projects. Real production.

Pipeline and ETL engineer. Owns data ingestion from source systems into the warehouse or data lake. Writes Airflow DAGs or Prefect flows. Integrates with APIs, databases, and file-based sources. Python is primary. SQL is necessary. This profile is fluent in failure-handling logic, retry patterns, and incremental load strategies. They know why the nightly sync broke. More importantly, they know what to do when the connector toggle does not fix it. Salary: $115,000 to $165,000 depending on stack and orchestration complexity.

Data platform and infrastructure engineer. The most senior and most expensive profile. Owns the data infrastructure itself: the Databricks workspace configuration, Snowflake compute and storage optimization, and the network architecture that allows cross-account data sharing. This person has opinions about medallion architecture that are worth hearing. Operates close to the cloud infrastructure team. Will be uncomfortable at a company still running on-premises SQL Server with no migration roadmap. Salary: $150,000 to $200,000 and above depending on scope and company stage. The pool is smaller than most teams expect. Start sourcing early.

Streaming and real-time engineer. Owns event-driven pipelines: Kafka producers and consumers, Flink jobs, real-time aggregations feeding operational dashboards or ML models. The smallest candidate pool in 2026. Harder to hire than the other three profiles because batch pipeline experience does not transfer cleanly to streaming. “Experience with streaming data preferred” added to a batch pipeline JD attracts batch engineers who list Kafka on a resume. Write a separate JD for this role if you actually need it. The pool is real. It will not respond to a posting calibrated for someone else.

Three Questions to Settle Before You Write the JD

Most data engineer searches take longer than they should because these three questions were not answered before the posting went live. Fixing them after adds weeks.

What is the data actually for? Analytics and reporting pipelines, ML feature stores, and operational data products are different environments. Different instincts. A data engineer who has spent five years building the analytical layer in a Snowflake warehouse will not automatically be effective building real-time feature pipelines for a recommendation model. The downstream consumer — and what they need from the data — changes the technical profile significantly. A JD that does not specify the use case is attracting candidates who cannot evaluate their own fit before applying. That costs time on both sides.

Stack specificity is the second one. “Experience with cloud data platforms including AWS, Azure, or GCP” is not a technical requirement. It is a category. Your actual environment is either AWS Glue and Redshift, or Azure Data Factory and Azure Synapse, or Databricks on GCP with Unity Catalog. Specific services determine which candidates are immediately productive. A senior Snowflake engineer and a senior BigQuery engineer have different strengths. If you are six months into a migration from Redshift to Snowflake, that context belongs in the JD. Not in the recruiter screen call.

Does domain knowledge matter? Sometimes it does not. But when it does, it matters significantly. A data engineer at an alternative investment firm needs to understand how fund accounting data is structured, what ILPA reporting templates look like, and how portfolio company data flows from systems like Investran into the quarterly LP reporting layer. None of those things appear on a standard data engineering resume. When domain knowledge is genuinely required — financial services, healthcare, manufacturing, compliance-heavy environments — say so explicitly. Lower the stack experience bar accordingly. The candidate who understands your domain by week two and learns your tools by week eight is usually a better hire than the technically perfect engineer who needs six months to understand why the data is structured the way it is. Usually. Not always. But usually.

Data Engineer Job Description Template

Calibrated for a mid-to-senior data engineer on a cloud warehouse or data lake platform. Adjust the stack, the orchestration tools, and the experience floor for your actual environment. Notes in italics are for internal intake — remove before posting.

Job Title: Data Engineer

Location: [City, State / Remote / Hybrid]
Employment Type: [Full-time / Contract / Contract-to-Hire]
Department: Data Engineering / Data Platform / Analytics Engineering
Reports To: Director of Data Engineering / VP of Data / Head of Data Platform

About the Role

We are looking for a data engineer to own [specific area: pipeline development / transformation layer / platform infrastructure] for our [team description]. You will build and maintain the systems that move data from [source systems — be specific: Salesforce, PostgreSQL, Kafka, third-party APIs] into [Snowflake / Databricks / BigQuery] in a form that [analytics team / data science team / business stakeholders] can reliably use. This is an ownership role.

Hiring note: “Own” should be literally true. If this engineer will maintain pipelines that a senior engineer or architect has already designed, lower the title and experience bar. Misaligning autonomy expectations is one of the fastest ways to lose a good hire in the first 90 days.

What You Will Do

Design, build, and maintain ELT/ETL pipelines from source systems into [warehouse/lake], with error handling, retry logic, and data quality checks at each stage
Write and maintain orchestration workflows in [Airflow / Prefect / Dagster / Azure Data Factory] for scheduled and event-triggered pipeline execution
Build and document the transformation layer in [dbt / Spark / Python] producing the tables and views that downstream analytics and data science consumers depend on
Monitor pipeline health, resolve failures, and address schema changes from upstream systems before they break downstream consumers
Collaborate with data analysts, data scientists, and business stakeholders to translate data requirements into reliable infrastructure
Contribute to architecture decisions including warehouse design, partitioning strategy, and compute and storage cost optimization
Write technical documentation covering data lineage, pipeline logic, and source-to-target mappings

What We Are Looking For

[3 or 5]+ years of hands-on data engineering experience building and maintaining production pipelines, not only ETL tooling configuration
Strong SQL proficiency: complex query writing, execution plan analysis, and warehouse performance troubleshooting
Python fluency for pipeline development, data transformation, and automation scripting
Production experience with at least one cloud warehouse or lakehouse platform: Snowflake, Databricks, BigQuery, Redshift, or Azure Synapse
Experience with a production orchestration tool: Airflow, Prefect, Dagster, Azure Data Factory, or equivalent
Ability to own work independently: manage task prioritization, communicate blockers early, and deliver without close supervision

Preferred

Experience with dbt in a production environment: 50 or more models, multiple environments, tests and documentation maintained alongside code
Familiarity with streaming architectures: Kafka, Flink, Spark Streaming, or equivalent
Cloud platform certification: AWS Certified Data Engineer, Google Professional Data Engineer, or Microsoft Certified Azure Data Engineer Associate
[Domain knowledge where applicable: financial services, healthcare, e-commerce, or other regulated data environment — fill in your actual context]
Version control habits: pull requests for pipeline changes, tests before deployment, documentation alongside code

On Compensation

[$X to $X base salary, plus [equity / bonus / benefits]. For candidates with strong domain expertise in [specific area], comp is negotiable at the top of the range.]

Hiring note: post a range. California, Colorado, New York, Washington, and a growing list of states require it by law. More practically: data engineering candidates in 2026 screen postings by comp band before reading a single requirement. Hidden ranges narrow your pool and extend your timeline.

Data Engineer Salary in 2026

Salary variance for data engineers is significant. Stack depth, domain expertise, market, and company stage all move the number. The table below pulls from four independent sources for U.S. base salary — not total compensation at tech companies, which runs materially higher.

Level	Glassdoor	ZipRecruiter	Built In	Levels.fyi (Base)
Mid-Level (3–5 yr)	$100K–$135K	$105K–$140K	$110K–$150K	$120K–$165K
Senior (5–8 yr)	$130K–$165K	$135K–$170K	$145K–$185K	$155K–$210K
Staff / Principal (8+ yr)	$160K–$200K	$170K–$215K	$185K–$230K	$210K–$310K

Levels.fyi skews toward FAANG and major tech companies, where total compensation — base plus RSU plus bonus — runs 1.5 to 2.5 times base. The Glassdoor and ZipRecruiter ranges are more representative of mid-market employers. Built In tends to reflect funded startups and growth-stage companies.

Use at least two benchmarks when setting the band. A $30,000 range anchored to ZipRecruiter when the role requires Databricks depth and domain expertise in financial services data will sit open longer than it should. The candidate pool will tell you if the range is off — typically in the first two weeks, when applications stop. That’s the signal.

Market adjustments that matter. New York and San Francisco add 20 to 30 percent over national averages. Austin, Dallas, and Denver run close to national benchmark. Remote roles without geographic restrictions create internal comp equity challenges when existing team members are in lower-cost markets. Decide before the offer stage whether you are paying by location or by role. The candidate will ask. Be ready.

Stack fluency matters more than tenure for calibrating the offer. A four-year data engineer who has run production Databricks and Snowflake at meaningful data volume is not the same hire as a four-year engineer who has maintained Talend jobs in a lighter environment. The resume looks similar. The offer should not be. According to the Bureau of Labor Statistics, database and data-focused roles are projected to grow 9 percent through 2033, which tracks with the search volume we see across our data engineering practice year over year.

Hiring manager reviewing data engineer job description template documents in a conference room

Four JD Mistakes That Add Months to the Search

These are consistent. They come up in intake calls, they come up during screening, and they come up in post-search debriefs. Each one has a straightforward fix.

Listing every tool the team uses instead of what the candidate must own. A JD that requires Airflow, Prefect, Dagster, dbt, Fivetran, Snowflake, Databricks, Spark, and Kafka is not thorough. It is a committee wish list that describes five people. The candidates who genuinely meet all of those requirements in depth are rare. The candidates who list all of them on a resume have not been production-tested on most of them. Pick the three tools that represent 80 percent of the role’s daily work. Make those required. Put the others in preferred. The pool that results is smaller and stronger. Every time.

Underspecifying domain context. Last year, KORE1 ran a search for a data engineering role at Altriarch, a private equity firm running a Snowflake and Databricks environment on Azure. The original JD said “financial services experience preferred.” Technically accurate. Almost useless. The environment required Investran for fund accounting data ingestion, Affinity CRM for portfolio company relationship data, and quarterly LP reporting feeds built to ILPA reporting standards. When “financial services” means “has worked at a bank,” you attract engineers fluent in retail banking data who have never seen a private equity fund data model. We reset the JD with explicit callouts for alternative investment experience, PE fund data, and ILPA familiarity. Applications went from 40 to 11. Qualified finalists went from 3 to 7. Smaller pool. Faster close.

Setting the experience floor too high for the actual environment. “Seven or more years of data engineering experience” sounds rigorous. In most mid-market environments, it is an arbitrary screen. It eliminates candidates who have three or four years of genuinely sophisticated work in modern stacks and selects for engineers who have seven years of maintaining legacy ETL jobs that have nothing to do with your environment. Not a trade worth making. Set the floor based on what the work actually requires, not on how important the role feels internally. A three-year engineer who has built Snowflake data models with dbt, managed Airflow DAGs in production, and worked in a disciplined engineering culture is ready for most senior data engineering work. The resume floor is not the same as the competence floor.

Posting without a compensation range. California, Colorado, New York, and Washington require it. Full stop. In every other state, data engineering candidates screen postings by comp band before reading a single requirement. A posting without a range signals either that the budget has not been approved or that the company is hoping to anchor low in the offer stage. Neither interpretation is good for time-to-fill. KORE1’s average time-to-hire for data engineer roles with a posted comp range is 17 days. Roles without a range take longer. Close rates on finalist conversations are lower. Post the range.

What Hiring Managers Ask Us

So how long does a realistic data engineer search take?

17 days average for roles with a posted comp range and a specific stack in the JD. Six to ten weeks when the range is hidden, the JD is vague about the stack, or the interview process has more than three rounds without a clear decision point. The JD is the single largest lever. A specific, honest posting with a range closes faster than a thorough, committee-reviewed posting without one — even if the thorough posting is technically better written.

Is it worth naming the cloud platform in the JD, or does that narrow the pool too much?

Name it. “Experience with AWS, Azure, or GCP” is less useful than “production experience in Databricks on Azure.” The pool shrinks slightly. Quality improves significantly. A candidate who has spent three years in Snowflake will be productive faster in your Snowflake environment than a generalist who has worked across three platforms at surface level. Cloud-specific skills transfer at different rates for different services. Engineers with adjacent cloud experience do apply to specific-stack JDs — they self-select based on honest assessments of their own gap. Trust the pool. It is smarter than the screen.

What is the difference between a data engineer and an analytics engineer?

An analytics engineer owns the transformation layer: the dbt models, the staging tables, the documentation of what each data mart contains and why. A data engineer in the traditional sense owns ingestion, orchestration, and the platform infrastructure beneath the transformation layer. Some organizations run these as two separate roles. Others roll them into one. The distinction matters for hiring because the candidate profiles differ: analytics engineers skew toward SQL depth and stakeholder communication; pipeline engineers skew toward Python, distributed systems, and operational reliability. Writing a JD that uses both terms interchangeably without committing to one attracts candidates for both and fits neither cleanly.

What does a reasonable interview process look like for this role?

Three rounds is the ceiling for most qualified candidates in 2026. A skills screen, a technical assessment or paired working session on a realistic scenario, and a hiring manager conversation covering role scope and fit. Adding a fourth or fifth round without a clear rationale signals slow internal alignment. Data engineers with strong Snowflake or Databricks production experience are being actively recruited — they drop out of slow processes faster than other technical profiles. For data engineer interview questions that surface production depth rather than theoretical knowledge, see our breakdown of the questions worth asking and the ones that waste everyone’s time.

At what point does it make sense to bring in a staffing firm?

When you are past 45 days with no qualified finalists, when the role requires domain expertise that is genuinely hard to screen for internally, or when parallel searches are competing for sourcing capacity. KORE1’s data engineer staffing practice fills 30 to 40 searches a year across cloud, pipeline, and platform roles. More than 15,000 professionals placed across tech verticals since 2005. 92 percent 12-month retention rate. Clients who come to us after six weeks of unsuccessful direct recruiting typically close in three to four weeks from our first sourcing pass. The cases where it is not worth it: entry-level roles without urgency, or teams with strong internal sourcing tooling and a role in a stack where the candidate pool is genuinely deep. Most data engineering searches through KORE1 run as direct hire placements, though contract and contract-to-hire options are available when budget or timeline creates uncertainty. Most data engineering searches through KORE1 run as direct hire placements, though contract and contract-to-hire options are available when budget or timeline creates uncertainty. Not sure if the economics work? Reach out to our team and we will give you an honest read.

Data Engineer Job Description Template 2026

What the Role Actually Involves

The Four Profiles You Are Actually Choosing Between

Three Questions to Settle Before You Write the JD

Data Engineer Job Description Template

Data Engineer Salary in 2026

Four JD Mistakes That Add Months to the Search

What Hiring Managers Ask Us

So how long does a realistic data engineer search take?

Is it worth naming the cloud platform in the JD, or does that narrow the pool too much?

What is the difference between a data engineer and an analytics engineer?

What does a reasonable interview process look like for this role?

At what point does it make sense to bring in a staffing firm?

Leave a Comment Cancel reply