AI Engineer Job Description Template 2026
Last updated: April 27, 2026
An AI engineer in 2026 earns $145,000 to $210,000 in base pay at mid-level and $200,000 to $310,000 at senior, with total compensation at top AI labs and FAANG companies regularly clearing $400,000 once equity is included. Below is a ready-to-adapt job description, a salary table sourced from five independent benchmarks, and the specific JD mistakes that turn a $180K role into a three-month search that closes at $200K after a round of ghosting and counteroffers.
Tom Kenaley here. KORE1 places AI engineers through our AI/ML engineer staffing practice, and the AI engineer job description is currently the most mis-calibrated posting in tech hiring. Not because hiring managers don’t know the work. They do, mostly. The breakdown happens earlier. The JD gets written by a committee that wasn’t in the intake meeting, or it gets copied from a LinkedIn posting that was already generic six months ago, or it lists every AI framework anyone on the team has ever touched and ends up describing four different jobs in one. The candidate who fits that description doesn’t exist for less than $350,000. The template below is built around the version of this role that actually closes.
One disclosure. We charge a fee when you hire through us. The framework works whether you call us or not.

What Is an AI Engineer?
An AI engineer builds, deploys, and maintains artificial intelligence and machine learning systems that run in production. Not research prototypes. Not Jupyter notebooks that live on a single laptop. Production systems that customers or internal teams depend on, that have to operate within latency budgets, stay within cost constraints, degrade gracefully under load, and continue working when the underlying model is replaced six months later because a better one came out.
The role sits between data science and software engineering. More production-focused than a data scientist, more AI-specific than a general software engineer. In 2026 the title is also one of the most overloaded in tech, which is most of why AI engineer JDs fail. Three things that all get called “AI engineer” but are genuinely different jobs:
- LLM / Generative AI engineer. Builds applications on top of large language models. RAG pipelines, fine-tuning, prompt engineering at the system level, evaluation harnesses, and agent architectures using frameworks like LangChain, LlamaIndex, or LangGraph. This is the fastest-growing profile right now and the hardest one to calibrate on compensation, because the market for people who have shipped real production LLM applications at any scale is still small relative to demand.
- Classical ML / production model engineer. Trains, validates, and deploys supervised and unsupervised ML models. Fluent in PyTorch or TensorFlow, comfortable with feature engineering and model registries, owns post-deployment monitoring and drift detection. Closer to the traditional machine learning engineer definition, though the line between this profile and LLM work is blurring fast in 2026 as companies retrofit classical prediction systems with generative components.
- AI infrastructure / platform engineer. Owns the systems that run training and inference. GPU cluster management, distributed training frameworks (PyTorch DDP, Ray, Horovod), model serving infrastructure (Triton, vLLM, TorchServe), and the cost optimization work that determines whether a model costs $4,000 a month to run or $40,000 a month to run. Usually found at companies with significant model training budgets or high inference volume. Hardest to source. Compensation bands are their own category.
Most JDs pull from all three. The intro paragraph sounds like LLM work. The responsibilities section lists model training and evaluation. The “preferred” bullets add GPU cluster experience and distributed training. The candidate who meets all of it exists. They work at OpenAI or Anthropic, where they are not reading your job posting.
Pick the profile before the posting goes live. It changes every word that follows.
The Three AI Engineer Profiles
LLM / Generative AI engineer. The profile most hiring managers want in 2026 and the one with the fewest candidates who have actually shipped something. Production LLM work is not fine-tuning a base model and calling it done. It is evaluation at every layer: faithfulness scoring on the RAG pipeline, latency measurement across the prompt-retrieval-generation cycle, hallucination quantification before a feature goes to customers. The engineers who have done this at scale at a company most people have heard of number in the thousands nationally. Everyone else is learning in your production environment at your expense. The interview filter is not “have you worked with GPT-4.” It is “tell me about a real production failure in an LLM system you owned, what the failure mode was, and what you built to detect it earlier next time.” If the candidate needs to think for more than fifteen seconds, that’s the answer.
Classical ML / production model engineer. The largest pool and the most predictable one to source. Solid Python fundamentals, real PyTorch or TensorFlow production experience, at least one cloud ML platform, and a working understanding of what happens to a model’s performance distribution over time in the real world. The filter for this profile is feature engineering and model monitoring, not algorithm knowledge. Ask them to walk through a model that degraded in production. The strong candidates describe a specific incident with a specific root cause. The weaker ones describe the general concept of data drift.
AI infrastructure / platform engineer. The rarest profile and the one most JDs don’t know they need until they’ve been running inference costs past their budget threshold for two quarters. This person optimizes GPU utilization, writes CUDA or Triton kernels, manages distributed training runs across clusters that would cost tens of thousands of dollars if the job ran inefficiently. Located in meaningful density in San Francisco, Seattle, New York, and Austin. The comp is separate from the other two profiles. Do not benchmark this person against a general software engineer or even a general ML engineer. You will miss them by $40,000 and they will politely decline.

AI Engineer Job Description Template
This template is structured for a mid to senior LLM / generative AI engineer. For a classical ML or infrastructure profile, swap the LLM-specific language for the appropriate stack and adjust the experience requirements accordingly.
Job Title: AI Engineer
Location: [City, State / Remote / Hybrid]
Employment Type: [Full-time / Contract / Contract-to-Hire]
Department: AI Platform / Machine Learning / Product Engineering
Reports To: Director of AI Engineering / VP of Engineering / Head of AI
About the Role
This role owns the generative AI and machine learning systems our product runs on. Not prototypes. Not a research sandbox. The production systems that actual users depend on, from model selection and evaluation through deployment, monitoring, and the evaluation infrastructure that determines whether a new AI feature ships or gets rolled back. You are accountable when inference costs spike, output quality drops, or a model swap breaks something in production that worked fine in staging.
What You’ll Do
- Design and implement LLM-powered application systems including RAG pipelines, agent architectures, and multi-step reasoning workflows using frameworks like LangChain, LlamaIndex, or LangGraph
- Build and maintain evaluation harnesses that measure accuracy, faithfulness, hallucination rate, latency, and instruction-following quality across model versions and prompt changes
- Fine-tune foundation models using supervised fine-tuning, RLHF, or DPO approaches for domain-specific tasks where base model performance falls short of product requirements
- Implement retrieval and embedding systems that make organizational data accessible to LLM-powered features, including chunking strategy, vector store management (Pinecone, Weaviate, pgvector), and hybrid search
- Deploy and operate AI systems on cloud infrastructure (AWS SageMaker, Google Vertex AI, or Azure ML), including inference optimization, cost management, and production monitoring
- Collaborate with product and data teams on feature design, dataset curation, and the boundary between what needs a model and what a simpler approach handles better
- Define and maintain the golden test sets, regression suites, and human-eval workflows that govern how AI features are released and how model upgrades are evaluated
- Mentor engineers earlier in their AI journey on production system design, evaluation methodology, and the practical difference between working in a notebook and working in a production code path
What We’re Looking For
- 4 or more years of software engineering experience, with at least 2 years working directly on production ML or AI systems
- Strong Python. Comfortable in a production codebase, not just a research notebook.
- Real LLM application development experience: RAG, prompt engineering at the system level, evaluation frameworks, at least one production deployment of a feature powered by a foundation model
- Familiarity with at least one vector database or embedding retrieval system in a production context, not just a demo
- Working knowledge of fine-tuning approaches for LLMs, including the tradeoffs between full fine-tuning, LoRA, and retrieval-based approaches
- Experience with one or more cloud ML platforms (SageMaker, Vertex AI, Azure ML) for model serving and pipeline management
- Solid understanding of software engineering fundamentals: APIs, testing, CI/CD, version control. AI engineering is still engineering.
Preferred
- Production experience with open-source foundation models (Llama, Mistral, Gemma) including quantization, inference optimization, and self-hosting for cost or privacy reasons
- Familiarity with inference optimization techniques: quantization (GPTQ, GGUF, AWQ), speculative decoding, batching strategies, and their latency/throughput tradeoffs
- Experience with agentic AI systems, including tool use, multi-agent orchestration, and the failure modes specific to long-horizon AI tasks
- Background in classical ML alongside generative AI work. The candidate who can recognize when a gradient-boosted tree is the better solution than a $30/month LLM call is rare and valuable.
- Familiarity with AI safety and alignment concepts in a production context, including output filtering, red-teaming, and guardrail implementation
Compensation
$160,000 to $220,000 base, plus equity and bonus. [Adjust for your specific market, seniority target, and total comp structure. See the salary breakdown below.]
Core Responsibilities in Depth
The bullets above are the intake document. Here is what those bullets mean in production, because the interview process surfaces the gap between candidates who have done it and candidates who have read about it within the first twenty minutes.
RAG pipeline development is the responsibility that gets the most airtime in 2026 AI engineer JDs and the one most hiring managers assess least rigorously in interviews. “Have you built a RAG pipeline” is a question that almost everyone who has touched LangChain for three weeks can answer yes to. The real interview question is “walk me through a specific production RAG system you owned, what the retrieval quality looked like before and after you improved it, and what the biggest problem was that wasn’t solvable by changing the chunk size.” Strong candidates describe a specific vector store, a specific retrieval failure mode, a specific metric they used to quantify improvement. Weak candidates describe the concept of RAG and mention they read the LangChain documentation. That gap is real, and it is one of the primary reasons AI engineering hires fail to perform at expectations in the first six months.
Evaluation is where the gap between AI engineering and AI demos becomes most visible. Demos don’t have eval harnesses. Production systems do, or should, and the ones that don’t tend to find out why during a high-visibility product failure rather than during QA. The work is structured: defining golden test sets that reflect the actual distribution of user queries, not the ones that make the model look good. Choosing the right automated metrics for the use case: RAGAS for retrieval faithfulness, G-Eval or a custom LLM-as-judge setup for instruction following, human preference scoring for open-ended generation. Running the eval pipeline on every prompt change, model swap, and embedding model update. The strong AI engineers I’ve placed in the last two years can describe a specific eval failure they caught before launch. Not a general description of evaluation methodology. A specific thing, in a specific system, where the eval caught a real problem that would have reached customers otherwise.
Fine-tuning is the most oversold skill in AI engineering JDs right now. Most companies that think they need it don’t. Prompt engineering, better retrieval, or a different base model solves the problem cheaper and faster, with no maintenance burden from managing a fine-tuned checkpoint through every foundation model upgrade cycle. The companies that actually need fine-tuning have highly domain-specific language (medical, legal, proprietary code), strict latency requirements that rule out larger base models, or production cost pressures severe enough to justify the engineering time. In those cases the skill is genuinely valuable and genuinely rare. The interview question is not “have you fine-tuned a model.” It is “tell me about a situation where you considered fine-tuning and chose not to, and why.” The answer tells you more about engineering judgment than any resume line does.
Collaboration across the AI-product boundary is the responsibility most JDs gesture at with “strong communication skills” and never test. The AI engineer who can tell a product manager “this feature can work, but not in the way you’ve described it, and here’s the version that actually ships in three weeks instead of three months” is the one who prevents the eighteen-month research project that never reached production. The one who says yes to every product specification and then reappears six months later with a technical explanation for why the demo worked but the production deployment didn’t is the one who generates the frustrated Slack messages that come up in every executive offsite. Test this in the interview by asking about a feature they pushed back on and what happened after.
AI Engineer Salary in 2026
Five sources. They disagree by as much as $80,000 on the same title. The disagreement is the content, not a data quality problem.
| Source | Metric | Base / Range | Notes |
|---|---|---|---|
| BLS, May 2024 | Median, Software Developers (closest BLS code) | $132,270 | BLS does not isolate AI engineer. Software developer is the nearest occupation code and captures the full range of software roles, pulling the median well below AI-specific benchmarks from tech-heavy aggregators. |
| Glassdoor, April 2026 | Average, United States | $158,347 average base; range $120,000–$211,000 | Self-reported. Skews toward larger employers and established tech companies. Captures more mid-level and senior than entry-level roles. Sample size approximately 8,500 reports. |
| ZipRecruiter, April 2026 | National average | $147,631 | Job posting data. Reflects what employers are advertising, which trends below self-reported figures because it includes the bottom of the market. Treat as a floor for roles at established tech companies. |
| Built In, 2026 | Average, U.S. tech companies | $165,000–$200,000 average base | Funded tech companies actively hiring through Built In. Skews toward Series B and beyond, where base ranges run higher than the broader labor market. Treats “AI engineer” more specifically than BLS or ZipRecruiter. |
| Levels.fyi, 2026 | Median total comp, ML/AI engineers at tech companies | $211,000 median base; $290,000+ total comp | Late-stage and big tech, self-reported. At Google, Meta, or Anthropic, senior AI engineer total comp routinely clears $400,000. At frontier AI labs, the numbers exist in their own category entirely. |
The $60,000 to $80,000 gap between ZipRecruiter and Built In is not a data problem. ZipRecruiter casts the widest net and captures job postings from organizations that added “AI” to a software engineering title in 2024 because it sounded good. Built In captures funded tech companies actively trying to hire engineers with production AI experience. They describe different slices of the market. Your comp band belongs in one of those slices, and picking the wrong one means you either can’t close the candidate you want or you’ve overpaid by $30,000 for someone who doesn’t have the production experience the role requires.
For a city-by-city and specialization breakdown, the AI engineer salary guide covers 2026 base and total comp across experience levels, cities, and AI specializations. The gap between a Bay Area LLM engineer and a mid-market MLOps engineer in Columbus, Ohio is over $100,000 in base compensation. Understanding it before the intake call saves a lot of time.

What Great Recruiters Do When a JD Is Not Working
Jenny, a senior technical recruiter, described her process like this: “I take all my clients’ JDs, the intake calls, and completely gut and rewrite JDs using AI so I get those little Christmas presents in my inbox that apply.”
The “Christmas presents” are the qualified applicants who show up when the JD actually describes the job. Not a hundred applications from candidates who don’t fit. Fifteen who genuinely do. The difference between a posting that generates forty unqualified applicants and one that generates twelve qualified ones is not the platform you post on. It’s what you wrote.
The pattern that Jenny and every other high-performing technical recruiter I know follows: start with the intake call, not the old JD. What did the hiring manager say out loud about who they need that never made it into the written posting? Usually it’s something like “we need someone who has actually shipped a RAG system to real users, not just built a demo” or “the last person we interviewed had all the right keywords and couldn’t explain why their chunking strategy affected retrieval recall.” That specificity goes in the JD.
What comes out: cargo-culted bullet points that have nothing to do with the role. “Experience with Agile methodologies.” “Strong communication skills.” The generic line about “passion for AI.” These are the bullets that signal to a strong candidate that the JD wasn’t written by anyone who does this work, and they close the tab. That’s the gap. That’s why the qualified applications don’t come.
Common AI Engineer JD Mistakes
Specific patterns from the postings we see week after week that extend searches from six weeks to twelve.
Listing every AI framework that anyone on the team has ever touched. LangChain, LlamaIndex, PyTorch, TensorFlow, Hugging Face, SageMaker, Vertex AI, Azure ML, Ray, Kubernetes, Docker, Spark. “Experience with all of the above.” The candidate who has deep production experience with every item on that list is either at a frontier AI lab already or is not currently reading your job posting for a role paying under $250,000. Pick five. List the others as preferred. You will see a different applicant pool within a week.
Writing “strong ML fundamentals” without specifying which ones. Classical supervised learning and gradient boosting? LLM evaluation methodology? Distributed training theory? These are different. The candidate who is excellent at the first has not necessarily touched the second. “Working knowledge of transformer architecture, attention mechanisms, and the practical tradeoffs between fine-tuning and retrieval” is a sentence a qualified candidate reads and thinks “yes, that’s me.” “Strong ML fundamentals” is a sentence they skip.
Asking for experience with tools that are six months old. Some AI frameworks have been in production for three years and have stable hiring markets. Some were announced at a conference last fall and have a user base of a few hundred engineers globally. When a JD lists them at the same level of requirement, it tells candidates that the team chases trends rather than ships products. The tools that belong in a JD are the ones that will be in production eighteen months from now, not the ones generating the most LinkedIn posts this quarter.
Underspecifying the production context. “Experience deploying ML models” is one of the least useful sentences in any AI engineer JD. Deploying to a Streamlit demo is deploying. Deploying to an API endpoint that serves 50,000 requests per day with a p99 latency requirement and an uptime SLA is a different job. How many users? What latency? What does the serving infrastructure look like? A good AI engineer reads the JD and figures out whether this is a startup doing fifty requests a day or an enterprise platform doing five million. If they can’t tell, something is missing.
Mis-titling adjacent roles. AI engineer, ML engineer, data scientist, AI research scientist, and AI software engineer are not synonyms. The candidate pools overlap by roughly 30 to 40 percent. Posting the wrong title pulls applicants who aren’t quite right and misses engineers who are right but aren’t searching that keyword. If the role is primarily building LLM applications on top of foundation models using existing tools, it is probably closer to “AI software engineer” than “ML engineer.” Title it accordingly. Our AI/ML engineer staffing practice can help sort out which profile you’re actually hiring for at intake.
Questions Worth Asking About AI Engineer Searches
How Quickly Can You Fill an AI Engineer Role?
KORE1’s average time-to-hire for AI/ML engineers is 17 days for contract roles and 3 to 5 weeks for direct hire. Generative AI specialists and AI infrastructure engineers take longer. That number degrades quickly if the JD is misspecified or the comp band is off-market, which is why the intake call matters more for this role than for most.
Contract or Direct Hire for an AI Engineer?
Project scope is the cleaner signal than job type. Contract works well for specific project-scoped work, like building a RAG pipeline for a defined use case or standing up an evaluation framework before a product launch. Direct hire is the right structure when you need the person to own the AI roadmap long-term, grow the function, and be the institutional memory for production systems that will run for three to five years. A lot of AI engineering work in 2026 is still project-scoped, which makes contract and contract-to-hire common arrangements.
How Do You Screen AI Engineers Without an Internal ML Team?
Ask candidates to walk through a specific AI system they built, describe what went wrong in production that didn’t go wrong in development, and explain a time they recommended against using AI for something. Generic descriptions of RAG or fine-tuning without a specific production failure story are the most common signal that the experience is theoretical. If you don’t have someone internal to run a technical screen, our team can help structure the first-round assessment at intake.
What’s the Realistic Comp for a Mid-Level AI Engineer Outside Major Tech Hubs?
Outside San Francisco, Seattle, and New York, strong mid-level AI engineers close between $145,000 and $175,000 in base. In those three metros add $30,000 to $50,000. Remote-first roles still face the same market rates because candidates working remotely are seeing the same range of offers. The days of hiring a San Francisco-caliber AI engineer at Denver rates on a remote contract are largely over. If the comp band is materially below market, the JD can be perfect and the search still won’t close.
Do You Need a PhD to Hire a Strong AI Engineer?
For most production AI engineering roles, no. A PhD matters for AI research positions, novel architecture work, and roles at frontier labs where the job is advancing the state of the art. For building LLM applications, deploying models to production, and operating AI systems at scale, industry experience shipping real systems is a better signal than an academic credential. Filtering for PhD on a general AI engineering JD eliminates a large portion of the strongest candidates and is one of the more reliable ways to extend a search by two months.
If you’re ready to start the search, reach out to our team and we’ll work through the profile at intake before a word of the JD gets written.
