TeradataArch

Teradata Architecture Overview

Teradata is a shared-nothing, massively parallel processing (MPP) database designed to scale linearly. Every component participates in parallel query execution, which is why Teradata performs especially well for large analytical workloads.

At a high level:

Client → Parsing Engine (PE) → BYNET → AMPs → Virtual Disk (vDisk) → Physical Storage

All of this runs on one or more nodes.

1. Nodes

A node is a physical or virtual machine in the Teradata system.

Each node contains:

CPU(s)
Memory
Storage
Teradata software components (PEs, AMPs)
BYNET network interface

Key points:

A Teradata system can have many nodes
Each node operates independently (shared‑nothing)
Nodes can host both PEs and AMPs
Adding nodes increases CPU, memory, and I/O in parallel

Scalability principle: More nodes = more parallelism

2. Parsing Engines (PEs)

The Parsing Engine is responsible for SQL understanding and coordination, not data access.

PE responsibilities

Accept SQL from clients (JDBC/ODBC/CLI)
Parse SQL syntax
Validate objects and privileges
Optimize the query (cost-based optimizer)
Generate a query execution plan
Dispatch steps to AMPs
Manage sessions and locks
Collect and return results

Characteristics

PEs do not store data
Multiple PEs exist for concurrency
A single session is always connected to one PE
PEs coordinate all AMPs involved in a query

When users say:

“Teradata is slow at parsing”

They are almost always referring to PE bottlenecks, not AMP bottlenecks.

3. Access Module Processors (AMPs)

AMPs do the actual data work. They are the heart of Teradata’s parallelism.

AMP responsibilities

Store table rows
Retrieve rows
Insert, update, delete data
Perform aggregations (SUM, COUNT, GROUP BY)
Perform joins locally
Sort data
Apply filters (WHERE, HAVING)

Key principles

Each AMP owns a portion of the data
Data is distributed by Primary Index (PI) hash
AMPs work independently and in parallel
No shared disk between AMPs

Why AMPs matter

Good PI choice = even AMP workload
Skewed PI = hot AMPs = bad performance
Teradata tuning usually means AMP-level tuning

4. BYNET (Messaging Fabric)

The BYNET is Teradata’s high-speed interconnect that connects all components.

What BYNET does

Sends messages between PEs and AMPs
Redistributes rows between AMPs
Supports broadcast and point-to-point communication
Ensures fault tolerance (dual BYNETs)

Think of BYNET as:

A private, ultra-fast network optimized for database messaging
Not Ethernet TCP/IP for client traffic
Critical for joins requiring data redistribution

Example

If two tables have different primary indexes:

Rows must be redistributed via BYNET
EachBelow is a Teradata-expert overview of the Teradata Database architecture, focused on the core building blocks you mentioned—Parsing Engines (PEs), AMPs, BYNET, Nodes, and vdisks—and how they work together to deliver massively parallel processing (MPP).

Teradata Architecture — Expert Explanation

Teradata is built on a shared-nothing, massively parallel processing (MPP) architecture. Each component has a well-defined role, and performance comes from doing many things in parallel with minimal contention.

1. High-Level Architecture View

At the highest level, a Teradata system consists of:

Client Applications
        |
        v
+-------------------+
| Parsing Engines   |
|     (PEs)         |
+-------------------+
        |
        v
=====================  <-- BYNET (Interconnect)
        |
        v
+-------------------+
| Access Module     |
| Processors (AMPs) |
+-------------------+
        |
        v
+-------------------+
| Virtual Disks     |
|   (vdisks)        |
+-------------------+

These components are deployed across Nodes, which are physical or virtual servers.

2. Nodes

What is a Node?

A node is a physical or virtual server in the Teradata system.

Each node typically contains:

CPUs
Memory
One or more PEs
Multiple AMPs
Disk storage (managed as vdisks)
One or more BYNET connections

Key points:

Nodes are independent (shared-nothing)
Scaling is achieved by adding nodes
Node failure does not stop the system (fault tolerance)

👉 MPP principle: More nodes = more parallelism.

3. Parsing Engines (PEs)

Role of the PE

The Parsing Engine is responsible for query orchestration, not data access.

Main responsibilities:

Session control
- Logon authentication
- Managing user sessions
SQL parsing
- Syntax and semantic checks
- Object resolution
Query optimization
- Cost-based optimization
- Generates the execution plan
Dispatcher
- Breaks the plan into steps
- Sends those steps to AMPs
Result set assembly
- Collects AMP results
- Returns final output to client

Important Characteristics

PEs do not store data
Multiple PEs handle multiple sessions concurrently
One PE handles a request at a time, but many PEs exist per system

4. BYNET (Teradata Interconnect)

What is the BYNET?

The BYNET is Teradata’s high-speed, fault-tolerant messaging fabric.

Think of it as:

The “nervous system” of Teradata

Key Functions

Routes all messages between:
- PEs ↔ AMPs
- AMP ↔ AMP
- Node ↔ Node
Ensures guaranteed message delivery
Automatically reroutes traffic on failure

Performance Features

Very low latency
Broadcast and point-to-point communication
Parallel message delivery

Without the BYNET, Teradata’s parallelism would collapse.

5. Access Module Processors (AMPs)

What is an AMP?

The AMP is the workhorse of Teradata.

Each AMP:

Owns a portion of the data
Manages that data exclusively
Performs all row-level data operations

Responsibilities of an AMP

Store table rows
Retrieve rows
Insert, update, delete data
Perform:
- Joins
- Aggregations
- Sorts
- Index access

💡 No two AMPs share the same data.

6. Data Distribution & Primary Index

Hash-Based Distribution

Data is distributed using a hash of the Primary Index (PI):

Row PI → HASH → AMP Number → Stored on that AMP

Benefits:

Even data distribution
Parallel access
Minimal data skew (if PI is chosen well)

Why this matters

When you query by PI:

Teradata knows exactly which AMP has the row
Only that AMP is accessed → very fast

7. Virtual Disks (vdisks)

What are vdisks?

vdisks are Teradata’s logical abstraction of physical disks.

Key ideas:

Physical disks are pooled
Storage is virtualized
AMPs see vdisks, not raw disks

Advantages

Better load balancing across disks
Simplified management
Improved I/O parallelism
Supports hot spare disks and recovery

Each AMP owns a set of vdisks where its data resides.

8. How a Query Executes (End-to-End Flow)

Let’s tie it all together.

Example Query

SQL

SELECT customer_id, SUM(amount)

FROM sales

WHERE region = ‘WEST’

GROUP BY customer_id;
Show more lines

Execution Flow

Client → PE
- SQL submitted
PE
- Parses SQL
- Optimizes query
- Creates execution steps
PE → AMPs (via BYNET)
- Sends steps to all relevant AMPs
AMPs
- Scan local rows
- Perform aggregation in parallel
AMP ↔ AMP (via BYNET)
- Redistribute intermediate results if needed
AMP → PE
- Return partial/final results
PE → Client
- Assemble and return final result set

Every step is parallelized.

9. Fault Tolerance & Reliability

Teradata is built for continuous availability:

Fallback: Second copy of data on a different AMP
AMP vproc migration: AMP can move to another node on failure
BYNET redundancy: Dual networks
RAID / vdisk recovery

Result: high availability even during hardware failures.

10. Summary Table

Component	Purpose
Node	Physical/virtual server
PE	SQL parsing, optimization, coordination
BYNET	High-speed interconnect for messaging
AMP	Data storage and parallel processing
vdisk	Virtualized disk storage layer

Final Takeaway

Teradata’s power comes from:

Shared-nothing design
Hash-based data distribution
Parallel execution at every layer
Highly optimized interconnect (BYNET)