Teradata Architecture Overview
Teradata is a shared-nothing, massively parallel processing (MPP) database designed to scale linearly. Every component participates in parallel query execution, which is why Teradata performs especially well for large analytical workloads.
At a high level:
Client → Parsing Engine (PE) → BYNET → AMPs → Virtual Disk (vDisk) → Physical Storage
All of this runs on one or more nodes.
1. Nodes
A node is a physical or virtual machine in the Teradata system.
Each node contains:
- CPU(s)
- Memory
- Storage
- Teradata software components (PEs, AMPs)
- BYNET network interface
Key points:
- A Teradata system can have many nodes
- Each node operates independently (shared‑nothing)
- Nodes can host both PEs and AMPs
- Adding nodes increases CPU, memory, and I/O in parallel
Scalability principle: More nodes = more parallelism
2. Parsing Engines (PEs)
The Parsing Engine is responsible for SQL understanding and coordination, not data access.
PE responsibilities
- Accept SQL from clients (JDBC/ODBC/CLI)
- Parse SQL syntax
- Validate objects and privileges
- Optimize the query (cost-based optimizer)
- Generate a query execution plan
- Dispatch steps to AMPs
- Manage sessions and locks
- Collect and return results
Characteristics
- PEs do not store data
- Multiple PEs exist for concurrency
- A single session is always connected to one PE
- PEs coordinate all AMPs involved in a query
When users say:
“Teradata is slow at parsing”
They are almost always referring to PE bottlenecks, not AMP bottlenecks.
3. Access Module Processors (AMPs)
AMPs do the actual data work. They are the heart of Teradata’s parallelism.
AMP responsibilities
- Store table rows
- Retrieve rows
- Insert, update, delete data
- Perform aggregations (SUM, COUNT, GROUP BY)
- Perform joins locally
- Sort data
- Apply filters (WHERE, HAVING)
Key principles
- Each AMP owns a portion of the data
- Data is distributed by Primary Index (PI) hash
- AMPs work independently and in parallel
- No shared disk between AMPs
Why AMPs matter
- Good PI choice = even AMP workload
- Skewed PI = hot AMPs = bad performance
- Teradata tuning usually means AMP-level tuning
4. BYNET (Messaging Fabric)
The BYNET is Teradata’s high-speed interconnect that connects all components.
What BYNET does
- Sends messages between PEs and AMPs
- Redistributes rows between AMPs
- Supports broadcast and point-to-point communication
- Ensures fault tolerance (dual BYNETs)
Think of BYNET as:
- A private, ultra-fast network optimized for database messaging
- Not Ethernet TCP/IP for client traffic
- Critical for joins requiring data redistribution
Example
If two tables have different primary indexes:
- Rows must be redistributed via BYNET
- EachBelow is a Teradata-expert overview of the Teradata Database architecture, focused on the core building blocks you mentioned—Parsing Engines (PEs), AMPs, BYNET, Nodes, and vdisks—and how they work together to deliver massively parallel processing (MPP).
Teradata Architecture — Expert Explanation
Teradata is built on a shared-nothing, massively parallel processing (MPP) architecture. Each component has a well-defined role, and performance comes from doing many things in parallel with minimal contention.
1. High-Level Architecture View
At the highest level, a Teradata system consists of:
Client Applications
|
v
+-------------------+
| Parsing Engines |
| (PEs) |
+-------------------+
|
v
===================== <-- BYNET (Interconnect)
|
v
+-------------------+
| Access Module |
| Processors (AMPs) |
+-------------------+
|
v
+-------------------+
| Virtual Disks |
| (vdisks) |
+-------------------+
These components are deployed across Nodes, which are physical or virtual servers.
2. Nodes
What is a Node?
A node is a physical or virtual server in the Teradata system.
Each node typically contains:
- CPUs
- Memory
- One or more PEs
- Multiple AMPs
- Disk storage (managed as vdisks)
- One or more BYNET connections
Key points:
- Nodes are independent (shared-nothing)
- Scaling is achieved by adding nodes
- Node failure does not stop the system (fault tolerance)
👉 MPP principle: More nodes = more parallelism.
3. Parsing Engines (PEs)
Role of the PE
The Parsing Engine is responsible for query orchestration, not data access.
Main responsibilities:
- Session control
- Logon authentication
- Managing user sessions
- SQL parsing
- Syntax and semantic checks
- Object resolution
- Query optimization
- Cost-based optimization
- Generates the execution plan
- Dispatcher
- Breaks the plan into steps
- Sends those steps to AMPs
- Result set assembly
- Collects AMP results
- Returns final output to client
Important Characteristics
- PEs do not store data
- Multiple PEs handle multiple sessions concurrently
- One PE handles a request at a time, but many PEs exist per system
4. BYNET (Teradata Interconnect)
What is the BYNET?
The BYNET is Teradata’s high-speed, fault-tolerant messaging fabric.
Think of it as:
The “nervous system” of Teradata
Key Functions
- Routes all messages between:
- PEs ↔ AMPs
- AMP ↔ AMP
- Node ↔ Node
- Ensures guaranteed message delivery
- Automatically reroutes traffic on failure
Performance Features
- Very low latency
- Broadcast and point-to-point communication
- Parallel message delivery
Without the BYNET, Teradata’s parallelism would collapse.
5. Access Module Processors (AMPs)
What is an AMP?
The AMP is the workhorse of Teradata.
Each AMP:
- Owns a portion of the data
- Manages that data exclusively
- Performs all row-level data operations
Responsibilities of an AMP
- Store table rows
- Retrieve rows
- Insert, update, delete data
- Perform:
- Joins
- Aggregations
- Sorts
- Index access
💡 No two AMPs share the same data.
6. Data Distribution & Primary Index
Hash-Based Distribution
Data is distributed using a hash of the Primary Index (PI):
Row PI → HASH → AMP Number → Stored on that AMP
Benefits:
- Even data distribution
- Parallel access
- Minimal data skew (if PI is chosen well)
Why this matters
When you query by PI:
- Teradata knows exactly which AMP has the row
- Only that AMP is accessed → very fast
7. Virtual Disks (vdisks)
What are vdisks?
vdisks are Teradata’s logical abstraction of physical disks.
Key ideas:
- Physical disks are pooled
- Storage is virtualized
- AMPs see vdisks, not raw disks
Advantages
- Better load balancing across disks
- Simplified management
- Improved I/O parallelism
- Supports hot spare disks and recovery
Each AMP owns a set of vdisks where its data resides.
8. How a Query Executes (End-to-End Flow)
Let’s tie it all together.
Example Query
SQL
SELECT customer_id, SUM(amount)
FROM sales
WHERE region = ‘WEST’
GROUP BY customer_id;
Show more lines
Execution Flow
- Client → PE
- SQL submitted
- PE
- Parses SQL
- Optimizes query
- Creates execution steps
- PE → AMPs (via BYNET)
- Sends steps to all relevant AMPs
- AMPs
- Scan local rows
- Perform aggregation in parallel
- AMP ↔ AMP (via BYNET)
- Redistribute intermediate results if needed
- AMP → PE
- Return partial/final results
- PE → Client
- Assemble and return final result set
Every step is parallelized.
9. Fault Tolerance & Reliability
Teradata is built for continuous availability:
- Fallback: Second copy of data on a different AMP
- AMP vproc migration: AMP can move to another node on failure
- BYNET redundancy: Dual networks
- RAID / vdisk recovery
Result: high availability even during hardware failures.
10. Summary Table
| Component | Purpose |
|---|---|
| Node | Physical/virtual server |
| PE | SQL parsing, optimization, coordination |
| BYNET | High-speed interconnect for messaging |
| AMP | Data storage and parallel processing |
| vdisk | Virtualized disk storage layer |
Final Takeaway
Teradata’s power comes from:
- Shared-nothing design
- Hash-based data distribution
- Parallel execution at every layer
- Highly optimized interconnect (BYNET)
Leave a Reply