NVIDIA A40

The NVIDIA A40 GPU, built on the NVIDIA Ampere architecture, is designed to accelerate demanding visual computing workloads in data centers. Combining second-generation RT cores, third-generation Tensor cores, and 48GB of GDDR6 memory, the A40 delivers high-performance capabilities for rendering, AI training, virtual workstations, and data science workflows. With support for NVIDIA vGPU, NVLink, and enhanced security, the A40 is a versatile solution for modern enterprise environments.

Basic Product Information

Key Advantages

Specifications

Additional Features

Basic Product Information

Product Name

NVIDIA A40 GPU

Architecture

NVIDIA Ampere

Memory

48GB GDDR6 with ECC

Compute Power

Up to 299.4 TFLOPS (FP16 Tensor Core)

Release Year

2022

Use Cases

Virtual workstations, 3D rendering, AI training, data science, visual computing

Key Advantages

48GB GDDR6 Memory

Large memory capacity for data-intensive workloads such as AI training and 3D design.

Second-Generation RT Cores

Up to 2X faster ray-tracing performance for rendering and visualization tasks.

Third-Generation Tensor Cores

Enhanced AI performance with support for TF32 and structural sparsity for faster training.

Virtualization Ready

Supports NVIDIA vGPU for multiple user access and scalable workloads.

NVLink Support

Scalable up to 96GB memory with NVLink for larger datasets and higher performance.

Specifications

Performance Specifications

CUDA Cores

10,752

RT Cores

84 (Second-Generation)

Tensor Cores

336 (Third-Generation)

Peak FP32 Performance

37.4 TFLOPS

NVDEC | NVDEC (Includes AV1 decode)

1x | 2x

Tensor Core Performance:

FP16

149.7 | 299.4 TFLOPS (with sparsity)

TF32

74.8 | 149.6 TFLOPS (with sparsity)

BFLOAT16

149.7 | 299.4 TFLOPS (with sparsity)

INT8

299.3 | 598.6 TOPS (with sparsity)

INT4

598.7 | 1,197.4 TOPS (with sparsity)

Memory and Bandwidth

GPU Memory

48GB GDDR6 with ECC

Memory Bandwidth

696GB/s

Thermal and Power

Max Power Consumption

300W

Cooling

Passive cooling

Power Connector

8-pin CPU

Board Specifications

Form Factor

Dual-slot (4.4” H x 10.5” L)

Interconnect Interface:

NVIDIA NVLink® (112.5GB/s bidirectional)
PCIe Gen4 x16 (64GB/s bi-directional)

Display Outputs

3x DisplayPort 1.4 (disabled by default, can be enabled via software)

Display Ports

4 x DisplayPort 1.4a

Dimensions

4.4” (H) x 10.5” (L) - dual slot

NVENC/NVDEC

1x NVENC / 2x NVDEC (includes AV1 decode support)

Supported Technologies

Virtual GPU (vGPU)

Supports NVIDIA vPC/vApps, NVIDIA RTX Virtual Workstation, and NVIDIA Virtual Compute Server for multi-user virtualization.

Secure Boot

Hardware root of trust for enhanced security.

NVLink Support

Yes, scalable up to 96GB memory.

Server Compatibility

Supported by a wide range of servers from worldwide OEMs, with backward compatibility for PCIe Gen3 systems.

Additional Features

01 Second-Generation RT Cores

Deliver up to 2X faster ray tracing and denoising performance, making the A40 ideal for rendering, architectural design, and virtual prototyping.

02 Third-Generation Tensor Cores

Accelerate AI model training and inference with TF32, BFLOAT16, and INT8 support.

03 vGPU Support

Enables efficient sharing of GPU resources across multiple virtual machines for scalable enterprise deployments.

04 NVLink

Scalable memory and performance by linking two A40 GPUs, providing up to 96GB of memory for larger datasets.

Want to learn more?

NVIDIA A40

Basic Product Information

Key Advantages

Specifications

Additional Features

Basic Product Information

Product Name

NVIDIA A40 GPU

Architecture

NVIDIA Ampere

Memory

48GB GDDR6 with ECC

Compute Power

Up to 299.4 TFLOPS (FP16 Tensor Core)

Release Year

2022

Use Cases

Virtual workstations, 3D rendering, AI training, data science, visual computing

Key Advantages

48GB GDDR6 Memory

Large memory capacity for data-intensive workloads such as AI training and 3D design.

Second-Generation RT Cores

Third-Generation Tensor Cores

Virtualization Ready

NVLink Support

Specifications

Performance Specifications

CUDA Cores

10,752

RT Cores

84 (Second-Generation)

Tensor Cores

336 (Third-Generation)

Peak FP32 Performance

37.4 TFLOPS

NVDEC | NVDEC (Includes AV1 decode)

1x | 2x

Tensor Core Performance:

FP16

149.7 | 299.4 TFLOPS (with sparsity)

TF32

74.8 | 149.6 TFLOPS (with sparsity)

BFLOAT16

149.7 | 299.4 TFLOPS (with sparsity)

INT8

299.3 | 598.6 TOPS (with sparsity)

INT4

598.7 | 1,197.4 TOPS (with sparsity)

Memory and Bandwidth

GPU Memory

48GB GDDR6 with ECC

Memory Bandwidth

696GB/s

Thermal and Power

Max Power Consumption

300W

Cooling

Passive cooling

Power Connector

8-pin CPU

Board Specifications

Form Factor

Dual-slot (4.4” H x 10.5” L)

Interconnect Interface:

NVIDIA NVLink® (112.5GB/s bidirectional)PCIe Gen4 x16 (64GB/s bi-directional)

Display Outputs

3x DisplayPort 1.4 (disabled by default, can be enabled via software)

Display Ports

4 x DisplayPort 1.4a

Dimensions

4.4” (H) x 10.5” (L) - dual slot

NVENC/NVDEC

1x NVENC / 2x NVDEC (includes AV1 decode support)

Supported Technologies

Virtual GPU (vGPU)

Supports NVIDIA vPC/vApps, NVIDIA RTX Virtual Workstation, and NVIDIA Virtual Compute Server for multi-user virtualization.

Secure Boot

Hardware root of trust for enhanced security.

NVLink Support

Yes, scalable up to 96GB memory.

NVIDIA NVLink® (112.5GB/s bidirectional)
PCIe Gen4 x16 (64GB/s bi-directional)