AWS re:Invent 2025: New AgentCore Features for Production AI Agents

At AWS re:Invent 2025, Amazon announced significant enhancements to Amazon Bedrock AgentCore, addressing the key challenges enterprises face when moving AI agents from prototype to production. The three major new capabilities—Policy, Evaluations, and Memory—tackle security, observability, and personalization respectively.

This post breaks down each new feature and what it means for teams building production AI agents.

TL;DR

Three New AgentCore Capabilities:

Feature	Purpose	Status
Policy	Define agent boundaries using natural language	Preview
Evaluations	13 pre-built monitors for agent performance	Available
Memory	Episodic memory for personalized responses	Available

Key Takeaways:

Policy enables governance without code—define what agents can/cannot do in plain English
Evaluations provides out-of-the-box monitoring for correctness, safety, and tool accuracy
Memory allows agents to learn from past interactions and personalize responses over time
These features address the #1 blocker for enterprise AI adoption: trust and control

Policy in AgentCore (Preview)

The Problem

Moving AI agents to production requires clear operational boundaries. Without guardrails, agents might:

Access sensitive data they shouldn’t see
Take actions outside their intended scope
Interact with systems without proper authorization

Traditionally, implementing these controls required custom code, complex IAM policies, and manual review processes.

The Solution

Policy in AgentCore lets developers define operational boundaries using natural language. Instead of writing code or complex policy documents, you describe what the agent should and shouldn’t do:

This agent should:
- Only access customer data for the logged-in user
- Never modify financial records directly
- Require human approval for refunds over $500
- Not access internal HR systems

How It Works

Natural Language Definition: Write policies in plain English describing agent constraints
Gateway Integration: Policies integrate with AgentCore Gateway to intercept every agent action
Automatic Enforcement: The system automatically checks each action against defined policies
Violation Handling: Actions that violate policies are halted, logged, and can trigger human review

Example Use Cases

Customer Service Agent:

Restrictions:
- Can only view order history for the authenticated customer
- Cannot issue refunds exceeding $100 without manager approval
- Cannot access payment card details (only last 4 digits)
- Must not share customer data with external systems

Internal IT Agent:

Restrictions:
- Can only access systems the requesting employee has permissions for
- Cannot modify production databases directly
- Requires ticket approval for any infrastructure changes
- Cannot disable security monitoring or alerting

Why This Matters

Policy addresses the governance gap that prevents many enterprises from deploying AI agents. Security and compliance teams can now:

Review agent boundaries in plain English (no code review needed)
Set organization-wide policies that apply across all agents
Audit policy violations with clear logs
Iterate on policies without redeploying agent code

AgentCore Evaluations

The Problem

Once agents are in production, how do you know they’re working correctly? Traditional monitoring tracks latency and errors, but doesn’t capture AI-specific issues:

Is the agent selecting the right tools for each task?
Are responses accurate and safe?
Is the agent following instructions correctly?
How does performance change over time?

The Solution

AgentCore Evaluations provides 13 pre-built evaluation systems that continuously monitor agent behavior across multiple dimensions.

The 13 Evaluators

AgentCore Evaluations covers three main categories:

Correctness Evaluators:

Task Completion: Did the agent successfully complete the user’s request?
Response Accuracy: Is the information provided factually correct?
Instruction Following: Does the agent adhere to its system instructions?
Tool Selection Accuracy: Did the agent choose the appropriate tools?
Tool Usage Correctness: Were tools called with correct parameters?

Safety Evaluators:

Harmful Content Detection: Does the response contain inappropriate content?
PII Exposure: Is personally identifiable information being leaked?
Prompt Injection Detection: Is the agent being manipulated by adversarial inputs?
Off-Topic Detection: Is the agent staying within its intended domain?

Quality Evaluators:

Response Relevance: Does the response address what the user asked?
Response Coherence: Is the response well-structured and clear?
Groundedness: Are claims supported by the agent’s knowledge sources?
Tone Appropriateness: Does the response match the expected communication style?

How It Works

Continuous Sampling: Evaluations samples live interactions automatically
Parallel Assessment: Multiple evaluators run simultaneously on each sample
Threshold Alerts: Define acceptable ranges; get alerted when performance drops
Dashboard Visibility: View trends and drill into specific failed evaluations
Historical Analysis: Track how agent performance changes over time

Example Configuration

evaluations:
  sampling_rate: 0.1  # Evaluate 10% of interactions
  
  alerts:
    task_completion:
      threshold: 0.85
      alert_channel: pagerduty
    
    pii_exposure:
      threshold: 0.0  # Zero tolerance
      alert_channel: security-team
    
    tool_selection_accuracy:
      threshold: 0.90
      alert_channel: slack-engineering

Why This Matters

Evaluations transforms AI agent monitoring from “hope it works” to “know it works.” Teams can:

Catch regressions before users complain
Quantify agent quality for stakeholders
Identify specific failure modes to fix
Build confidence for expanding agent capabilities

AgentCore Memory

The Problem

Stateless agents treat every interaction as if it’s the first. This creates frustrating user experiences:

Repeating preferences every session
No continuity in ongoing tasks
Unable to learn from past interactions
Generic responses that don’t adapt to users

The Solution

AgentCore Memory introduces episodic memory functionality, enabling agents to retain and utilize information from past interactions.

Memory Types

AgentCore Memory supports different memory patterns:

User Preferences:

Memory: User prefers window seats on flights
Memory: User is vegetarian
Memory: User's timezone is Pacific
Memory: User prefers formal communication style

Interaction History:

Memory: Last discussed project Alpha timeline on Dec 5
Memory: User reported bug #1234 three times this month
Memory: User's team uses Slack for notifications

Learned Behaviors:

Memory: User typically needs Python code examples
Memory: User prefers detailed explanations over summaries
Memory: User asks follow-up questions about security implications

How It Works

Interaction Logging: Agent interactions are automatically logged
Memory Extraction: System identifies memorable information from conversations
Memory Storage: Information stored with user context and timestamps
Memory Retrieval: Relevant memories surfaced during future interactions
Memory Decay: Old/unused memories can be deprioritized or removed

Example Flow

Session 1:

User: Book me a flight to NYC next Tuesday
Agent: I found several options. Do you have a seat preference?
User: Always window seat, and I'm vegetarian for meals.
Agent: Got it! [Books flight with window seat, vegetarian meal]
       [Stores: user prefers window seats, user is vegetarian]

Session 2 (weeks later):

User: I need a flight to Chicago on Friday
Agent: I found 3 options with window seats available. 
       I'll also note your vegetarian meal preference.
       Which departure time works best?

Privacy and Control

AgentCore Memory includes privacy controls:

User Consent: Memory collection can require explicit opt-in
Memory Visibility: Users can view what the agent remembers about them
Deletion Rights: Users can request memory deletion
Retention Policies: Automatic memory expiration based on time or usage
Scope Limits: Restrict what types of information can be memorized

Why This Matters

Memory transforms agents from tools into assistants that genuinely know you. This enables:

Significantly improved user satisfaction
Reduced friction in repetitive tasks
Natural conversation continuity
Personalized recommendations and responses

How These Features Work Together

The real power comes from combining Policy, Evaluations, and Memory:

Scenario: Enterprise Customer Service Agent

Policy defines boundaries:

- Only access customer's own order history
- Cannot process refunds over $200 without approval
- Must not store payment information in memory

Memory enables personalization:

- Remembers customer's preferred contact method
- Tracks ongoing support cases
- Notes communication preferences

Evaluations ensures quality:

- Monitors resolution rate
- Tracks policy violations
- Measures customer satisfaction signals

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    User Request                          │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  AgentCore Gateway                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │   Policy    │  │   Memory    │  │ Evaluations │     │
│  │ Enforcement │  │  Retrieval  │  │  Sampling   │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  AgentCore Runtime                       │
│         (Your agent logic + tool execution)              │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                   Tool Execution                         │
│  (APIs, databases, external services via Gateway)        │
└─────────────────────────────────────────────────────────┘

Early Adopter Results

Several organizations shared their AgentCore experiences at re:Invent:

Cox Automotive:

Scaled from experimentation to production in one month
Leveraged runtime, identity, gateway, and memory components
Achieved significant reduction in development complexity

PGA TOUR:

Used AgentCore for content generation
Reported major improvements in content generation speed

MongoDB:

Streamlined AI initiative deployment
Reduced operational complexity with managed infrastructure

Getting Started

Prerequisites

AWS Account with Bedrock access
AgentCore enabled in your region
Existing agent or new agent project

Quick Start: Adding Policy

import boto3

agentcore = boto3.client('bedrock-agentcore')

# Define policy in natural language
policy = """
This agent assists with customer inquiries.
Restrictions:
- Only access data for the authenticated customer
- Cannot modify account settings
- Cannot process payments or refunds
- Must escalate billing disputes to human agents
"""

response = agentcore.create_policy(
    agentId='your-agent-id',
    policyName='customer-service-boundaries',
    policyDefinition=policy
)

Quick Start: Enabling Evaluations

# Enable evaluations for your agent
response = agentcore.configure_evaluations(
    agentId='your-agent-id',
    evaluators=[
        'task_completion',
        'response_accuracy', 
        'pii_exposure',
        'tool_selection_accuracy'
    ],
    samplingRate=0.1,
    alertConfig={
        'snsTopicArn': 'arn:aws:sns:...',
        'thresholds': {
            'task_completion': 0.85,
            'pii_exposure': 0.0
        }
    }
)

Quick Start: Enabling Memory

# Enable memory for your agent
response = agentcore.configure_memory(
    agentId='your-agent-id',
    memoryConfig={
        'enabled': True,
        'memoryTypes': ['user_preferences', 'interaction_history'],
        'retentionDays': 90,
        'requireConsent': True
    }
)

What’s Next for AgentCore

Based on the re:Invent announcements, AWS is positioning AgentCore as the complete platform for enterprise AI agents. Expect continued investment in:

Multi-agent orchestration: Coordinating multiple specialized agents
Advanced policy templates: Industry-specific compliance frameworks
Enhanced observability: Deeper integration with CloudWatch and X-Ray
Cross-account agent sharing: Deploying agents across organizational boundaries

Conclusion

The re:Invent 2025 AgentCore announcements address the practical challenges of deploying AI agents in production:

Policy gives enterprises the governance controls they need
Evaluations provides visibility into agent behavior at scale
Memory enables the personalization users expect

Together, these features significantly reduce the gap between “cool demo” and “production-ready system.” If you’ve been waiting for AI agent infrastructure to mature before committing to production deployments, that time may have arrived.

Kevin Xu Blog

AWS re:Invent 2025: New AgentCore Features for Production AI Agents

AWS re:Invent 2025: New AgentCore Features for Production AI Agents

TL;DR

Policy in AgentCore (Preview)

The Problem

The Solution

How It Works

Example Use Cases

Why This Matters

AgentCore Evaluations

The Problem

The Solution

The 13 Evaluators

How It Works

Example Configuration

Why This Matters

AgentCore Memory

The Problem

The Solution

Memory Types

How It Works

Example Flow

Privacy and Control

Why This Matters

How These Features Work Together

Scenario: Enterprise Customer Service Agent

Architecture Overview

Early Adopter Results

Getting Started

Prerequisites

Quick Start: Adding Policy

Quick Start: Enabling Evaluations

Quick Start: Enabling Memory

What’s Next for AgentCore

Conclusion

Resources

Comments