TeradataFASTLoad

FastLoad is a high‑performance bulk data loading utility in Teradata, designed to load very large volumes of data into empty tables as fast as possible.

FastLoad achieves its speed by:

  • Bypassing most SQL overhead
  • Loading data directly into AMPs in parallel
  • Minimizing logging
  • Avoiding row‑by‑row processing

In short: FastLoad is optimized for speed, not flexibility.


2. Key Characteristics of FastLoad

FeatureDescription
Load TypeBulk insert only
Target TableMust be empty
Supported TablesPermanent tables only
SpeedFastest Teradata load utility
LoggingMinimal (no rollback)
SQL SupportVery limited
Error HandlingError tables required
ConcurrenceSingle FastLoad per table

3. What FastLoad Can Do

FastLoad is ideal for:

  • Initial data population
  • Loading millions or billions of rows
  • Batch ETL processing
  • Data warehouse staging or base tables

FastLoad:

  • Loads data directly to AMPs based on hashing
  • Uses multiple sessions to maximize parallelism
  • Skips row‑level locking
  • Writes data blocks efficiently to vdisks

4. What FastLoad Cannot Do (Important!)

FastLoad has many restrictions by design:

🚫 Cannot load into:

  • Tables with existing data
  • Tables with:
    • Secondary indexes
    • Join indexes
    • Triggers
    • Referential integrity
    • Identity columns (older versions)
  • Volatile tables
  • Non-empty tables

🚫 Cannot perform:

  • UPDATE or DELETE
  • Upserts
  • Row‑level restart
  • Concurrent loads into the same table

This is why FastLoad is usually followed by index creation after the load.


5. FastLoad Architecture (Behind the Scenes)

FastLoad works below SQL:

Source File
   ↓
FastLoad Controller
   ↓
Multiple FastLoad Sessions
   ↓
BYNET
   ↓
Target AMPs
   ↓
Data written directly to vdisks

Each AMP receives only the rows it owns based on PI hashing.


6. FastLoad Processing Phases

FastLoad runs in two major phases, often called Phase 1 and Phase 2.


Phase 1 – Acquisition Phase

Purpose: Collect and distribute data to AMPs

Steps in Phase 1

  1. Target table is locked with an exclusive lock
  2. FastLoad sessions start (typically 8–32)
  3. Data is read from the input file
  4. Rows are hashed on Primary Index
  5. Rows are sent directly to owning AMPs via BYNET
  6. AMPs store rows in worktables (temporary storage)

Important Notes

  • No data is yet committed to the base table
  • No secondary indexes are updated
  • Duplicate PI rows are not yet validated
  • Errors are captured into error tables

📌 This phase is pure parallel data acquisition


Phase 2 – Application Phase

Purpose: Move data into final table structure

Steps in Phase 2

  1. AMPs sort rows by rowID
  2. Rows are merged into the base table
  3. Duplicate PI rows are detected
  4. Error rows are written to error tables
  5. Data blocks are finalized on disk
  6. Table lock is released

Only at the end of Phase 2 is data fully visible.

📌 If FastLoad fails before Phase 2 completes, no data is loaded


7. FastLoad Error Tables

FastLoad automatically creates two error tables:

Error TablePurpose
Error Table 1Data conversion errors (bad input data)
Error Table 2Constraint violations (duplicate PI rows)

These tables:

  • Must not exist before FastLoad
  • Are dropped at the end unless specified otherwise
  • Are crucial for data validation

8. Performance Why FastLoad Is So Fast

FastLoad performance comes from:

✅ AMP‑level parallelism
✅ Direct data path (bypass SQL engine)
✅ Minimal logging
✅ No index maintenance during load
✅ Block‑level I/O instead of row‑level

This is why FastLoad often outperforms:

  • Multi‑row INSERT
  • TPump
  • BTEQ INSERT

9. When Is FastLoad Optimal?

✅ Use FastLoad When:

  • Table is empty
  • Loading millions+ of rows
  • Data comes from flat files
  • No need for row‑level rollback
  • No secondary indexes during load
  • Batch/offline ETL processing
  • Performance is the #1 goal

Ideal example use cases

  • Initial warehouse load
  • Historical backfill
  • Nightly staging loads
  • One‑time migration

❌ Do NOT Use FastLoad When:

  • Table already contains data
  • You need to:
    • UPDATE existing rows
    • DELETE rows
    • Perform upserts
  • Table has secondary indexes or constraints you must maintain
  • Small data volume (overhead is too heavy)
  • Need concurrent loads
  • Need restartability at row level

In these cases, consider:

  • MultiLoad
  • TPump
  • INSERT…SELECT
  • TPT Load Operator

10. FastLoad vs Other Utilities (Quick Context)

UtilityBest For
FastLoadInitial bulk loads
MultiLoadBulk UPSERT (I/U/D)
TPumpSmall trickle inserts
TPTModern replacement of all above

Final Expert Summary

FastLoad:

  • Is Teradata’s fastest bulk load utility
  • Works in two distinct phases
  • Achieves speed through parallelism and minimal logging
  • Is ideal for large, empty-table loads
  • Trades flexibility and logging for raw performance

If you care about speed and your table is empty → FastLoad
If you need flexibility or incremental loads → use something else

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *