CN111782565A

CN111782565A - GPU server and data transfer method

Info

Publication number: CN111782565A
Application number: CN202010611759.9A
Authority: CN
Inventors: 武正辉
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-16
Anticipated expiration: 2040-06-30
Also published as: CN111782565B

Abstract

The application discloses a GPU server and a data transmission method, and relates to the field of server architecture. The specific implementation scheme is as follows: the method comprises the following steps: the system comprises a GPU, a CPU, a first converter, a second converter and a network interface controller; the GPU is connected with the CPU through the first converter; the GPU is connected with the second converter through the first converter, and the second converter is connected with the network interface controller; the second converter is used for sending the first data received from the network interface controller to the GPU through the first converter and forwarding the second data sent by the GPU to the second converter through the first converter to the network interface controller, so that the performance of the GPU server can be improved, and the method and the system are applied to the fields of high-performance computing, deep learning and the like.

Description

GPU server and data transfer method

技术领域technical field

本申请实施例涉及计算机技术领域中的服务器架构，尤其涉及一种GPU服务器和数据传输方法。The embodiments of the present application relate to a server architecture in the field of computer technologies, and in particular, to a GPU server and a data transmission method.

背景技术Background technique

图形处理器(Graphics Processing Unit，缩写GPU)具有强大的计算能力，广泛地应用于高性能计算、深度学习等领域。Graphics Processing Unit (Graphics Processing Unit, GPU for short) has powerful computing capabilities and is widely used in high-performance computing, deep learning and other fields.

目前，GPU服务器中，GPU通过PICE switch与中央处理器(central processingunit，缩写CPU)连接，网络接口控制器(Network Interface Controller，缩写NIC)与一个CPU连接，GPU与网络接口控制器之间的通信需要通过PCIE switch和至少一个CPU，影响GPU服务器的性能。Currently, in a GPU server, the GPU is connected to a central processing unit (CPU) through a PICE switch, a network interface controller (NIC) is connected to a CPU, and the GPU communicates with the network interface controller. Requires a PCIE switch and at least one CPU to affect the performance of the GPU server.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种用于种GPU服务器和数据传输方法。The present application provides a kind of GPU server and data transmission method.

根据本申请的一方面，提供了一种GPU服务器，包括：GPU，CPU，第一转换器，第二转换器，网络接口控制器；According to an aspect of the present application, a GPU server is provided, including: a GPU, a CPU, a first converter, a second converter, and a network interface controller;

所述GPU通过所述第一转换器与所述CPU连接；the GPU is connected to the CPU through the first converter;

所述GPU通过所述第一转换器与所述第二转换器连接，所述第二转换器与所述网络接口控制器连接；The GPU is connected to the second converter through the first converter, and the second converter is connected to the network interface controller;

所述第二转换器用于将从所述网络接口控制器接收的第一数据经由所述第一转换器发送给所述GPU，以及将从所述GPU接收的第二数据转发给所述网络接口控制器，其中，所述第二数据是由所述GPU经由所述第一转换器发送给所述第二转换器的。The second converter is configured to send first data received from the network interface controller to the GPU via the first converter, and forward second data received from the GPU to the network interface a controller, wherein the second data is sent by the GPU to the second converter via the first converter.

根据本申请的一方面，提供了一种数据传输方法，所述方法应用于GPU服务器，所述GPU服务器包括GPU，CPU，第一转换器，第二转换器，网络接口控制器，所述GPU通过所述第一转换器与所述CPU连接，所述GPU通过所述第一转换器与所述第二转换器连接，所述第二转换器与所述网络接口控制器连接；所述方法包括：According to an aspect of the present application, a data transmission method is provided, the method is applied to a GPU server, and the GPU server includes a GPU, a CPU, a first converter, a second converter, a network interface controller, and the GPU The first converter is connected to the CPU, the GPU is connected to the second converter via the first converter, and the second converter is connected to the network interface controller; the method include:

所述第二转换器将从所述网络接口控制器接收的第一数据经由所述第一转换器发送给所述GPU；the second converter sends first data received from the network interface controller to the GPU via the first converter;

所述第二转换器将从所述GPU接收的第二数据转发给所述网络接口控制器，其中，所述第二数据是由所述GPU经由所述第一转换器发送给所述第二转换器的。The second converter forwards second data received from the GPU to the network interface controller, wherein the second data is sent by the GPU to the second converter via the first converter converter.

根据本申请的技术提高了GPU服务器性能。Techniques in accordance with the present application improve GPU server performance.

应当理解，本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征，也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify key or critical features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will become readily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本申请的限定。其中：The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present application. in:

图1为本申请实施例提供的一种GPU服务器架构的示意图；1 is a schematic diagram of a GPU server architecture provided by an embodiment of the present application;

图2为本申请实施例提供的另一种GPU服务器架构的示意图；2 is a schematic diagram of another GPU server architecture provided by an embodiment of the present application;

图3为本申请实施例提供的另一种GPU服务器架构的示意图；3 is a schematic diagram of another GPU server architecture provided by an embodiment of the present application;

图4为本申请实施例提供的另一种GPU服务器架构的示意图；4 is a schematic diagram of another GPU server architecture provided by an embodiment of the present application;

图5为本申请实施例提供的一种数据传输方法流程图。FIG. 5 is a flowchart of a data transmission method provided by an embodiment of the present application.

具体实施方式Detailed ways

以下结合附图对本申请的示范性实施例做出说明，其中包括本申请实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本申请的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present application are described below with reference to the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

本申请提供一种GPU服务器，应用于计算机技术领域中的服务器架构，以达到缩短GPU服务器中GPU与网络接口控制器之间的通信延时长，提供GPU服务器的性能的技术效果，可以应用于高性能计算、深度学习等领域。The present application provides a GPU server, which is applied to the server architecture in the field of computer technology, so as to shorten the communication delay between the GPU and the network interface controller in the GPU server, and provide the technical effect of the performance of the GPU server, which can be applied to High performance computing, deep learning and other fields.

图1为本申请实施例提供的一种GPU服务器架构的示意图。如图1所示，GPU服务器包括GPU10，CPU11，第一转换器12，第二转换器13，网络接口控制器14。FIG. 1 is a schematic diagram of a GPU server architecture according to an embodiment of the present application. As shown in FIG. 1 , the GPU server includes a GPU 10 , a CPU 11 , a first converter 12 , a second converter 13 , and a network interface controller 14 .

其中，GPU10通过第一转换器12与CPU11连接。GPU10通过第一转换器12与第二转换器13连接，第二转换器13与网络接口控制器14连接。The GPU 10 is connected to the CPU 11 through the first converter 12 . The GPU 10 is connected to the second converter 13 through the first converter 12 , and the second converter 13 is connected to the network interface controller 14 .

第二转换器13用于将从网络接口控制器14接收的第一数据经由第一转换器12发送给GPU10，以及将从GPU10接收的第二数据转发给网络接口控制器14，其中，第二数据是由GPU10经由第一转换器12发送给第二转换器13的。The second converter 13 is used to send the first data received from the network interface controller 14 to the GPU 10 via the first converter 12, and to forward the second data received from the GPU 10 to the network interface controller 14, wherein the second Data is sent by the GPU 10 to the second converter 13 via the first converter 12 .

本申请实施例中，对于很多大规模的机器学习模型的训练任务中，对于参与训练任务的GPU服务器，由其他GPU服务器通过网络发送过来的数据中，有很大一部分数据是不需要经过CPU做处理，而可以直接由GPU进行处理的。本申请实施例中将这种通过互联网传输到网络接口控制器的，无需CPU做处理，可以直接交由GPU进行处理的数据统称为第一数据。In the embodiments of the present application, in the training tasks of many large-scale machine learning models, for the GPU servers participating in the training tasks, a large part of the data sent by other GPU servers through the network does not need to be processed by the CPU. processing, but can be processed directly by the GPU. In this embodiment of the present application, the data that is transmitted to the network interface controller through the Internet without processing by the CPU and can be directly handed over to the GPU for processing is collectively referred to as the first data.

对于第一数据的传输过程，网络接口控制器14将第一数据发送给第二转换器13，第二转换器13将接收到的第一数据发送给第一转换器12，第一转换器12将接收到的第一数据发送给GPU10。本申请实施例通过将网络接口控制器接收到的第一数据依次经过第二转换器和第一转换器发送至GPU，无需经过CPU的转发，在减少CPU的负担的同时，可以缩短第一数据由网络接口控制器传输至GPU的延时，能够提高GPU服务器的整体性能。For the transmission process of the first data, the network interface controller 14 sends the first data to the second converter 13, the second converter 13 sends the received first data to the first converter 12, and the first converter 12 Send the received first data to GPU10. In this embodiment of the present application, the first data received by the network interface controller is sent to the GPU through the second converter and the first converter in sequence, without forwarding by the CPU, and the first data can be shortened while reducing the burden on the CPU. The latency of transmission from the network interface controller to the GPU can improve the overall performance of the GPU server.

本申请实施例中，对于GPU处理完的数据，绝大部分是不需要经过CPU处理，可以直接传入以太网或者其他网络，以通过网络传输给其他GPU服务器或者其他设备。本申请实施例中，将GPU处理完的数据中，不需要CPU做进一步处理，可以直接传入以太网或者其他网络的数据统称为第二数据。In the embodiments of the present application, most of the data processed by the GPU does not need to be processed by the CPU, and can be directly transmitted to an Ethernet or other network to be transmitted to other GPU servers or other devices through the network. In the embodiment of the present application, the data processed by the GPU does not require further processing by the CPU, and the data that can be directly transmitted to the Ethernet or other networks is collectively referred to as second data.

对于第二数据的传输过程，GPU将第二数据发送给第一转换器12，第一转换器12将接收到的第二数据发送给第二转换器13，第二转换器13将接收到的第二数据发送给网络接口控制器14，以使网络接口控制器14将第二数据传入以太网或者其他网络。本申请实施例通过将GPU处理完的第二数据依次经过第一转换器和第二转换器发送至网络接口控制器，以通过网络接口控制器传入以太网或者其他网络，无需经过CPU的转发，在减少CPU的负担的同时，可以缩短第二数据由GPU传输至网络接口控制器的延时，能够提高GPU服务器的整体性能。For the transmission process of the second data, the GPU sends the second data to the first converter 12, the first converter 12 sends the received second data to the second converter 13, and the second converter 13 sends the received second data to the second converter 13. The second data is sent to the network interface controller 14 to cause the network interface controller 14 to pass the second data into the Ethernet or other network. In this embodiment of the present application, the second data processed by the GPU is sequentially sent to the network interface controller through the first converter and the second converter, so as to be transmitted to the Ethernet or other networks through the network interface controller, without forwarding by the CPU. , while reducing the burden on the CPU, the delay in transmitting the second data from the GPU to the network interface controller can be shortened, and the overall performance of the GPU server can be improved.

本申请实施例的技术方案，通过GPU服务器中增加第二转换器，GPU通过第一转换器和第二转换器与网络接口控制器连接，由第二转换器负责进行网络接口控制器与第一转换器之间的数据转发，GPU与网络接口控制器之间的数据传输无需经过CPU，可以缩短GPU与网络接口控制器之间数据传输的延时，能够提高GPU服务器的整体性能；进一步地，各个GPU之间可以通过第一转换器和第二转换器进行数据传输，无需经过CPU，可以实现GPU之间的P2P通信的高带宽和低时延，进一步可以提高GPU服务器的整体性能。According to the technical solution of the embodiment of the present application, a second converter is added to the GPU server, the GPU is connected to the network interface controller through the first converter and the second converter, and the second converter is responsible for the connection between the network interface controller and the first converter. For data forwarding between converters, data transmission between GPU and network interface controller does not need to go through CPU, which can shorten the delay of data transmission between GPU and network interface controller, and can improve the overall performance of GPU server; further, Data transmission between each GPU can be performed through the first converter and the second converter without going through the CPU, which can achieve high bandwidth and low latency of P2P communication between GPUs, and further improve the overall performance of the GPU server.

在上述实施例的基础上，在机器学习模型的训练任务中，需要由CPU进行训练任务的分发，以及部分数据的解析等处理，也就是说，在实际应用场景中，有少部分数据需要在网络接口控制器与CPU之间，以及CPU与GPU之间进行传输。本申请实施例中，将需要由网络接口控制器传输到CPU的数据统称为第三数据。将需要由CPU传输到网络接口控制器的数据统称为第四数据。On the basis of the above embodiment, in the training task of the machine learning model, the CPU needs to perform the distribution of the training task and the parsing of some data, that is to say, in the actual application scenario, a small part of the data needs to be Transfer between the network interface controller and the CPU, and between the CPU and the GPU. In this embodiment of the present application, the data that needs to be transmitted by the network interface controller to the CPU is collectively referred to as third data. The data to be transmitted from the CPU to the network interface controller is collectively referred to as fourth data.

本申请实施例中，基于图1所示的GPU服务器架构，为了实现CPU与网络接口控制器之间的数据传输，第二转换器还用于将从网络接口控制器接收的第三数据经由第一转换器发送给CPU。In the embodiment of the present application, based on the GPU server architecture shown in FIG. 1 , in order to realize data transmission between the CPU and the network interface controller, the second converter is further configured to transmit the third data received from the network interface controller via the first converter. A converter is sent to the CPU.

具体地，网络接口控制器将第三数据发送到第二转换器，第二转换器将接收到的第三数据转发到第一转换器，第一转换器将接收到的数据转发到CPU，能够基于本申请实施例提供的GPU服务器架构实现从网络接口控制器向CPU传输数据。Specifically, the network interface controller sends the third data to the second converter, the second converter forwards the received third data to the first converter, and the first converter forwards the received data to the CPU, which can Data transmission from the network interface controller to the CPU is implemented based on the GPU server architecture provided by the embodiments of the present application.

第二转换器还用于将接收到的第四数据转发给网络接口控制器，第四数据为CPU经由第一转换器发送的。The second converter is further configured to forward the received fourth data to the network interface controller, where the fourth data is sent by the CPU via the first converter.

具体地，CPU将第四数据发送到第一转换器，第一转换器将接收到的第四数据转发到第二转换器，第二转换器将接收到的第四数据转发到网络接口控制器，能够基于本申请实施例提供的GPU服务器架构实现将从CPU向网络接口控制器传输数据。Specifically, the CPU sends the fourth data to the first converter, the first converter forwards the received fourth data to the second converter, and the second converter forwards the received fourth data to the network interface controller , data transmission from the CPU to the network interface controller can be implemented based on the GPU server architecture provided by the embodiments of the present application.

本申请实施例中，将需要由GPU传输到CPU的数据统称为第五数据。将需要由CPU传输到GPU的数据统称为第六数据。In the embodiments of the present application, the data that needs to be transmitted from the GPU to the CPU is collectively referred to as fifth data. The data that needs to be transferred from the CPU to the GPU is collectively referred to as sixth data.

本申请实施例中，基于图1所示的GPU服务器架构，GPU与CPU之间的数据传输通过第一转换器实现。第一转换器用于将从GPU接收的第五数据发送给CPU，以及将从CPU接收的第六数据发送给GPU。In the embodiment of the present application, based on the GPU server architecture shown in FIG. 1 , data transmission between the GPU and the CPU is implemented by a first converter. The first converter is used to transmit the fifth data received from the GPU to the CPU, and the sixth data received from the CPU to the GPU.

具体地，GPU将第五数据发送到第一转换器，第一转换器将接收到的第五数据转发到CPU；CPU将第六数据发送到第一转换器，第一转换器将接收到的第六数据转发到GPU，能够基于本申请实施例提供的GPU服务器架构实现GPU与CPU向之间的传输数据。Specifically, the GPU sends the fifth data to the first converter, and the first converter forwards the received fifth data to the CPU; the CPU sends the sixth data to the first converter, and the first converter forwards the received fifth data to the CPU. The sixth data is forwarded to the GPU, and data transmission between the GPU and the CPU can be implemented based on the GPU server architecture provided in the embodiment of the present application.

另外，本实施例中，GPU服务器中CPU的数量可以为一个或者多个，本实施例此处不做具体限定。In addition, in this embodiment, the number of CPUs in the GPU server may be one or more, which is not specifically limited in this embodiment.

图2为本申请实施例提供的另一种GPU服务器架构的示意图；图3为本申请实施例提供的另一种GPU服务器架构的示意图。在上述任一实施例的基础上，本实施例中，CPU的数量可以为多个，每个CPU连接至少一个第一转换器，每个第一转换器与一个CPU连接，这样，对于每一个CPU来说，CPU与GPU之间的第一转换器越多，CPU与GPU之间的数据传输链路越多，数据传输的效率越高。FIG. 2 is a schematic diagram of another GPU server architecture provided by an embodiment of the present application; FIG. 3 is a schematic diagram of another GPU server architecture provided by an embodiment of the present application. On the basis of any of the above embodiments, in this embodiment, the number of CPUs may be multiple, each CPU is connected to at least one first converter, and each first converter is connected to one CPU, so that for each CPU For the CPU, the more first converters between the CPU and the GPU, the more data transmission links between the CPU and the GPU, and the higher the efficiency of data transmission.

在一种可能的实施方式中，GPU服务器可以包括多个网络接口控制器，每个网络接口控制器与第二转换器连接，以提高GPU的网络带宽。当然，GPU服务器还可以只包括一个网络接口控制器。In a possible implementation, the GPU server may include a plurality of network interface controllers, and each network interface controller is connected to the second converter, so as to improve the network bandwidth of the GPU. Of course, the GPU server may also include only one network interface controller.

示例性地，GPU服务器还可以包括至少一个与CPU连接的网络接口控制器，这样，可以实现网络接口控制器与CPU之间直接的数据传输，对于某些需要从网络传输到CPU的数据的传输，能够提高效率，能够提高GPU服务器的整体性能。Exemplarily, the GPU server may also include at least one network interface controller connected to the CPU, so that direct data transmission between the network interface controller and the CPU can be implemented, and for some data transmissions that need to be transmitted from the network to the CPU. , which can improve the efficiency and improve the overall performance of the GPU server.

示例性地，GPU服务器可以包含GPU板，GPU板上有多个GPU。GPU服务器中的每个第一转换器与GPU板上的每个GPU连接。其中，多个CPU直接可以通过快速通道互联(QuickPath Interconnect，缩写UPI)进行数据通信。Illustratively, a GPU server may contain a GPU board with multiple GPUs. Each first converter in the GPU server is connected to each GPU on the GPU board. Among them, multiple CPUs can directly perform data communication through QuickPath Interconnect (UPI for short).

在一种可能的实施方式中，第一转换器可以为工作在系统模式(base mode)的PCIE switch芯片，第一转换器与CPU通过PCIEx16链路连接。In a possible implementation manner, the first converter may be a PCIE switch chip working in a system mode (base mode), and the first converter is connected to the CPU through a PCIEx16 link.

第二转换器为工作在网状连接模式(fabric mode)的PCIE switch芯片，第二转换器与第一转换器通过PCIEx16链路连接。The second converter is a PCIE switch chip operating in a fabric mode (fabric mode), and the second converter is connected to the first converter through a PCIEx16 link.

其中，PCIEx16链路可以是不同版本的链路，例如PCIE4.0x16链路。随着PCIE技术的发展，本申请实施例中还可以采用更高版本的PCIEx16链路实现，或者，根据实际应用场景，也可以选择用更低版本的PCIEx16链路实现，本实施例此处不做具体限定。The PCIEx16 link may be a link of different versions, such as a PCIE4.0x16 link. With the development of the PCIE technology, a higher version of PCIEx16 link may also be used for implementation in this embodiment of the present application, or, according to an actual application scenario, a lower version of PCIEx16 link may also be selected for implementation. Make specific restrictions.

本申请实施例中，第一转换器和第二转换器均使用PCIE switch芯片实现，可以提高第一转换器和第二转换器之间数据传输的稳定性。In the embodiment of the present application, both the first converter and the second converter are implemented using a PCIE switch chip, which can improve the stability of data transmission between the first converter and the second converter.

在一种可能的实施方式中，GPU可以包含一个第一转换器，如图2和图3所示。In a possible implementation, the GPU may include a first converter, as shown in FIG. 2 and FIG. 3 .

例如，图2中以GPU服务器包括：包含多个GPU的GPU板，一个第二转换器(如图2中所示的switch2)，两个网络接口控制器(如图2中所示的NIC0和NIC1)，两个CPU(如图2中所示的CPU0和CPU1)，每个CPU对应一个第一转换器为例，对GPU服务器的架构进行示例性地说明。其中两个CPU之间通过UPI进行通信。For example, the GPU server in Figure 2 includes: a GPU board containing multiple GPUs, a second switch (switch2 as shown in Figure 2), two network interface controllers (NIC0 and NIC0 as shown in Figure 2) NIC1), two CPUs (CPU0 and CPU1 as shown in FIG. 2), and each CPU corresponds to a first converter as an example to illustrate the architecture of the GPU server exemplarily. The two CPUs communicate through UPI.

如图2所示，CPU0对应的第一转换器为switch0，CPU1对应的第一转换器为switch1，CPU0和CPU1分别通过switch0和switch1与GPU板上的各GPU连接。其中，CPU0与switch0之间，以及CPU1与switch1之间均通过PCIE4.0x16链路连接。每个CPU与GPU板之间有一条数据传输链路。As shown in FIG. 2 , the first converter corresponding to CPU0 is switch0, and the first converter corresponding to CPU1 is switch1. CPU0 and CPU1 are respectively connected to each GPU on the GPU board through switch0 and switch1. Among them, between CPU0 and switch0, and between CPU1 and switch1 are connected through PCIE4.0x16 link. There is a data transmission link between each CPU and GPU board.

如图2所示，两个网络接口控制器NIC0和NIC1均与第二转换器switch2连接。第二转换器switch2通过PCIE4.0x16链路分别与第一转换器switch0和switch1连接。As shown in FIG. 2, both network interface controllers NIC0 and NIC1 are connected to the second switch switch2. The second switch switch2 is respectively connected to the first switch switch0 and switch1 through the PCIE4.0x16 link.

在一种可能的实施方式中，每个CPU对应的第一转换器的数量可以扩展，以提高CPU与GPU之间的数据传输效率。In a possible implementation manner, the number of the first converters corresponding to each CPU can be expanded to improve the data transmission efficiency between the CPU and the GPU.

示例性地，每个CPU对应的第一转换器的数量相等，这样可以使得第一转换器均分配到各个CPU，使得各个CPU与GPU之间的数据传输链路数量均衡，能够提高GPU服务器的结构的均衡性，有利于提高GPU服务的整体性能。Exemplarily, the number of first converters corresponding to each CPU is equal, so that the first converters can be allocated to each CPU, so that the number of data transmission links between each CPU and the GPU is balanced, which can improve the performance of the GPU server. The balance of the structure is conducive to improving the overall performance of GPU services.

另外，每个CPU对应的第一转换器的数量也可以不相等，例如，可以根据各个CPU的负载情况灵活地调整各个CPU对应的第一转换器的数量，本实施实例此处不做具体限定。In addition, the number of first converters corresponding to each CPU may also be unequal. For example, the number of first converters corresponding to each CPU can be flexibly adjusted according to the load conditions of each CPU, which is not specifically limited in this implementation example. .

例如，图3中以GPU服务器包括：包含多个GPU的GPU板，一个第二转换器(如图3中所示的switch2)，n个网络接口控制器(如图3中所示的NIC0，NIC1，…，NICn)，两个CPU(如图3中所示的CPU0和CPU1)，每个CPU对应两个第一转换器为例，对GPU服务器的架构进行示例性地说明。其中两个CPU之间通过UPI进行通信。其中n为正整数，表示网络接口控制器的总数。For example, the GPU server in FIG. 3 includes: a GPU board containing multiple GPUs, a second switch (switch2 as shown in FIG. 3 ), n network interface controllers (NIC0 as shown in FIG. 3 , NIC1, . The two CPUs communicate through UPI. where n is a positive integer representing the total number of network interface controllers.

如图3所示，CPU0对应的第一转换器为switch01和switch02，CPU1对应的第一转换器为switch11和switch12，CPU0通过switch01和switch02与GPU板上的各GPU连接，CPU1通过switch11和switch12与GPU板上的各GPU连接。其中，CPU0与switch01和switch02之间，以及CPU1与switch11和switch12之间均通过PCIE4.0x16链路连接。每个CPU与GPU板之间有两条数据传输链路，相较于图2所示的GPU服务器架构，每个CPU与GPU之间的数据通信链路更多，传输效率更高。As shown in Figure 3, the first converters corresponding to CPU0 are switch01 and switch02, and the first converters corresponding to CPU1 are switch11 and switch12. CPU0 is connected to each GPU on the GPU board through switch01 and switch02, and CPU1 is connected to the GPU through switch11 and switch12. Each GPU connection on the GPU board. Among them, between CPU0 and switch01 and switch02, and between CPU1 and switch11 and switch12 are connected through PCIE4.0x16 link. There are two data transmission links between each CPU and GPU board. Compared with the GPU server architecture shown in Figure 2, there are more data communication links between each CPU and GPU, and the transmission efficiency is higher.

如图3所示，n个网络接口控制器NIC0至NICn均与第二转换器switch2连接。第二转换器switch2分别与第一转换器switch01，switch02，switch11，switch12通过PCIE4.0x16链路连接。As shown in FIG. 3, the n network interface controllers NIC0 to NICn are all connected to the second switch switch2. The second switch switch2 is respectively connected with the first switch switch01, switch02, switch11, and switch12 through a PCIE4.0x16 link.

本申请实施例的另一实施方式中，GPU服务器中第二转换器的数量可以为多个，以提高网络接口控制器与GPU之间数据传输的带宽，提高GPU服务器的网络带宽。In another implementation manner of the embodiment of the present application, the number of second converters in the GPU server may be multiple, so as to improve the bandwidth of data transmission between the network interface controller and the GPU, and to improve the network bandwidth of the GPU server.

其中，每个第一转换器分别与每个第二转换器连接；每个网络接口控制器与一个第二转换器连接，多个网络接口控制器平均分配给多个第二转换器，使得GPU服务器的网络均衡性更好，能够提高GPU服务器的整体性能。Wherein, each first converter is respectively connected with each second converter; each network interface controller is connected with one second converter, and the multiple network interface controllers are evenly distributed to the multiple second converters, so that the GPU The network balance of the server is better, which can improve the overall performance of the GPU server.

本申请实施例的技术方案，GPU服务器可以包括多个与第二转换器连接的网络接口控制器，以提高GPU的网络带宽；进一步地，每个CPU对应的第一转换器的数量可以扩展，且每个CPU对应的第一转换器的数量相等，这样可以提高CPU与GPU之间的数据传输效率，并且使得各个CPU与GPU之间的数据传输链路数量均衡，能够提高GPU服务的整体性能；进一步地，通过扩展第二转换器的数量，可以提高网络接口控制器与GPU之间数据传输的带宽，提高GPU服务器的网络带宽，进一步提高GPU服务的整体性能。In the technical solution of the embodiment of the present application, the GPU server may include a plurality of network interface controllers connected to the second converters, so as to improve the network bandwidth of the GPU; further, the number of the first converters corresponding to each CPU can be expanded, And the number of first converters corresponding to each CPU is equal, which can improve the data transmission efficiency between CPU and GPU, and make the number of data transmission links between each CPU and GPU balanced, which can improve the overall performance of GPU services. ; Further, by expanding the number of second converters, the bandwidth of data transmission between the network interface controller and the GPU can be increased, the network bandwidth of the GPU server can be increased, and the overall performance of the GPU service can be further improved.

图4为本申请实施例提供的另一种GPU服务器架构的示意图。在上述任一实施例的基础上，本申请实施例中，GPU服务器还包括至少一个第三转换器。GPU与至少一个第三转换器连接，每个第三转换器与每个第二转换器连接，至少一个第三转换器与CPU不连接。FIG. 4 is a schematic diagram of another GPU server architecture provided by an embodiment of the present application. Based on any of the foregoing embodiments, in this embodiment of the present application, the GPU server further includes at least one third converter. The GPU is connected with at least one third converter, each third converter is connected with each second converter, and at least one third converter is not connected with the CPU.

第二转换器还用于将从网络接口控制器接收的数据经由第三转换器发送给GPU，以及将GPU经由第三转换器发送给第二转换器的数据转发给网络接口控制器。The second converter is further configured to send data received from the network interface controller to the GPU via the third converter, and to forward data sent by the GPU to the second converter via the third converter to the network interface controller.

本申请实施例中，通过在第二转换器与GPU之间增加第三转换器，用第三转换器分担第一转换器的数据传输任务，可以增加第二转换器与GPU之间的数据传输链路的数量，可以提高GPU的网络带宽，能够提高GPU的整体性能。In the embodiment of the present application, by adding a third converter between the second converter and the GPU, and sharing the data transmission task of the first converter with the third converter, the data transmission between the second converter and the GPU can be increased. The number of links can increase the network bandwidth of the GPU and improve the overall performance of the GPU.

具体地，对于需要第一转换器将从第二转换器接收的第一数据，第二转换器可以将第一数据中的部分数据发送给第三转换器，经由第三转换器发送给GPU。Specifically, for the first data that needs to be received by the first converter from the second converter, the second converter may send part of the data in the first data to the third converter, and then send the data to the GPU via the third converter.

对于需要第一转换器将从GPU接收的第二数据，GPU可以将第二数据中的部分数据发送给第三转换器，由第三转换器发送给第二转换器。For the second data that the first converter needs to receive from the GPU, the GPU may send part of the second data to the third converter, and the third converter sends the second data to the second converter.

例如，图4中以在图3提供的GPU服务器架构的基础上增加两个第三转换器为例，对GPU服务器的架构进行示例性地说明。如图4所示，在图3提供的GPU服务器架构的基础上增加两个第三转换器switch31和switch32。其中，switch31和switch32分别与每个第二转换器switch2连接，switch31和switch32分别与GPU板上的每个GPU连接。如图4所示，第三转换器switch31和switch32与每个CPU都不连接，专门用于进行第二转换器switch2与GPU板上的各个GPU之间的数据传输。For example, in FIG. 4 , the architecture of the GPU server is exemplarily illustrated by adding two third converters on the basis of the GPU server architecture provided in FIG. 3 as an example. As shown in FIG. 4 , two third converters switch31 and switch32 are added on the basis of the GPU server architecture provided in FIG. 3 . Wherein, switch31 and switch32 are respectively connected with each second converter switch2, and switch31 and switch32 are respectively connected with each GPU on the GPU board. As shown in FIG. 4 , the third converters switch31 and switch32 are not connected to each CPU, and are exclusively used for data transmission between the second converter switch2 and each GPU on the GPU board.

本申请实施例中，第三转换器的数量可以为CPU数量的整数倍，每个CPU对应的第三转换器的数量相等，这样可以使得增加的第三转换器均匀分布到各个CPU，用于处理与对应的CPU连接的各个第一转换器所负责转发的第一数据的部分数据和第二数据的部分数据，能够提高GPU服务器网络带宽的均衡性，进一步提高GPU服务器的整体性能。In this embodiment of the present application, the number of third converters may be an integer multiple of the number of CPUs, and the number of third converters corresponding to each CPU is equal, so that the added third converters can be evenly distributed to each CPU for use in Processing part of the data of the first data and part of the data of the second data forwarded by each first converter connected to the corresponding CPU can improve the balance of the network bandwidth of the GPU server and further improve the overall performance of the GPU server.

示例性地，另一实施方式中，第三转换器的总数量可以不是CPU数量的整数倍，每个CPU对应的第三转换器的数量可以不相等，例如，每个CPU对应的第三转换器的数量可以根据各个CPU的负载灵活地调整，本实施例此处不做具体限定。Exemplarily, in another implementation manner, the total number of third converters may not be an integer multiple of the number of CPUs, and the number of third converters corresponding to each CPU may be unequal, for example, the number of third converters corresponding to each CPU The number of CPUs can be flexibly adjusted according to the load of each CPU, which is not specifically limited in this embodiment.

在一种可能的实施方式中，第三转换器可以为工作在系统模式(base mode)的PCIE switch芯片，第三转换器与第二转换器通过PCIEx16链路连接。本申请实施例中，第三转换器与第二转换器均使用PCIE switch芯片实现，可以提高第三转换器和第二转换器之间数据传输的稳定性。In a possible implementation manner, the third converter may be a PCIE switch chip operating in a system mode (base mode), and the third converter and the second converter are connected through a PCIEx16 link. In the embodiment of the present application, both the third converter and the second converter are implemented using a PCIE switch chip, which can improve the stability of data transmission between the third converter and the second converter.

例如，如图4中所示，在第二转换器switch2与GPU之间可以增加两个第三转换器(如图4中所示的switch30和switch31)，每个CPU对应1个第三转换器，其中，CPU0对应的第三转换器为switch30，CPU1对应的第三转换器为switch31。For example, as shown in Figure 4, two third switches (switch30 and switch31 as shown in Figure 4) can be added between the second switch switch2 and the GPU, and each CPU corresponds to one third switch , wherein the third converter corresponding to CPU0 is switch30, and the third converter corresponding to CPU1 is switch31.

如图4所示，CPU0对应的第三转换器为switch30，与CPU0连接的第一转换器包括switch01和switch02，第二转换器switch2将从网络接口控制器接收的第一数据中需要经由switch01和switch02转发给GPU的部分数据发送给第三转换器switch30，由第三转换器switch30将接收到的第一数据转发给GPU；GPU需要经由switch01和switch02转发给第二转换器switch2的第二数据中的部分数据发送给第三转换器switch30，由第三转换器switch30将接收到的第二数据转发给第二转换器switch2。As shown in Figure 4, the third switch corresponding to CPU0 is switch30, the first switch connected to CPU0 includes switch01 and switch02, and the second switch switch2 needs to pass through switch01 and switch02 from the first data received from the network interface controller. Part of the data forwarded by switch02 to the GPU is sent to the third converter switch30, and the third converter switch30 forwards the received first data to the GPU; the GPU needs to forward the second data to the second converter switch2 via switch01 and switch02 in the second data Part of the data received is sent to the third switch switch30, and the third switch switch30 forwards the received second data to the second switch switch2.

如图4所示，CPU1对应的第三转换器为switch31，与CPU1连接的第一转换器包括switch11和switch12，第二转换器switch2将从网络接口控制器接收的第一数据中需要经由switch11和switch12转发给GPU的部分数据发送给第三转换器switch31，由第三转换器switch31将接收到的第一数据转发给GPU；GPU需要经由switch11和switch12转发给第二转换器switch2的第二数据中的部分数据发送给第三转换器switch31，由第三转换器switch31将接收到的第二数据转发给第二转换器switch2。As shown in Figure 4, the third switch corresponding to CPU1 is switch31, the first switch connected to CPU1 includes switch11 and switch12, and the second switch switch2 needs to pass through switch11 and switch12 from the first data received from the network interface controller. Part of the data forwarded by switch12 to the GPU is sent to the third converter switch31, and the third converter switch31 forwards the received first data to the GPU; the GPU needs to forward the second data to the second converter switch2 via switch11 and switch12 in the second data Part of the data received is sent to the third switch switch31, and the third switch switch31 forwards the received second data to the second switch switch2.

如图4所示，第三转换器switch30和switch31与第二转换器switch2之间通过PCIE4.0x16链路。As shown in FIG. 4 , a PCIE4.0x16 link is used between the third switches switch30 and switch31 and the second switch switch2.

另外，本实施例中，第一转换器，第二转换器和第三转换器还可以采用与PCIEswitch芯片类似的芯片实现，本实施例此处不做具体限定。第一转换器，第二转换器和第三转换器也可以采用不同类型的芯片实现，本实施例此处不做具体限定。In addition, in this embodiment, the first converter, the second converter and the third converter may also be implemented by a chip similar to the PCIEswitch chip, which is not specifically limited in this embodiment. The first converter, the second converter, and the third converter may also be implemented by different types of chips, which are not specifically limited in this embodiment.

本申请实施例的技术方案，通过在GPU服务器中增加至少一个第三转换器，GPU与至少一个第三转换器连接，每个第三转换器与每个第二转换器连接，至少一个第三转换器与CPU不连接；用第三转换器分担第一转换器的数据传输任务，可以增加第二转换器与GPU之间的数据传输链路的数量，可以提高GPU的网络带宽，能够提高GPU的整体性能；进一步地，第三转换器的数量可以为CPU数量的整数倍，每个CPU对应的第三转换器的数量相等，这样可以使得增加的第三转换器均匀分布到各个CPU，用于处理与对应的CPU连接的各个第一转换器所负责转发的第一数据的部分数据和第二数据的部分数据，能够提高GPU服务器网络带宽的均衡性，进一步提高GPU服务器的整体性能。In the technical solutions of the embodiments of the present application, by adding at least one third converter to the GPU server, the GPU is connected to at least one third converter, each third converter is connected to each second converter, and at least one third converter The converter is not connected to the CPU; using the third converter to share the data transmission task of the first converter can increase the number of data transmission links between the second converter and the GPU, increase the network bandwidth of the GPU, and improve the GPU performance. the overall performance; further, the number of third converters can be an integer multiple of the number of CPUs, and the number of third converters corresponding to each CPU is equal, so that the added third converters can be evenly distributed to each CPU. For processing part of the data of the first data and part of the data of the second data forwarded by each first converter connected to the corresponding CPU, the balance of the network bandwidth of the GPU server can be improved, and the overall performance of the GPU server can be further improved.

本申请提供一种数据传输方法，方法应用于GPU服务器，GPU服务器包括GPU，CPU，第一转换器，第二转换器，网络接口控制器，GPU通过第一转换器与CPU连接，GPU通过第一转换器与第二转换器连接，第二转换器与网络接口控制器连接。图5为本申请实施例提供的一种数据传输方法流程图。如图5所示，方法包括如下步骤：The present application provides a data transmission method. The method is applied to a GPU server. The GPU server includes a GPU, a CPU, a first converter, a second converter, and a network interface controller. The GPU is connected to the CPU through the first converter, and the GPU is connected to the CPU through the first converter. A switch is connected to the second switch, and the second switch is connected to the network interface controller. FIG. 5 is a flowchart of a data transmission method provided by an embodiment of the present application. As shown in Figure 5, the method includes the following steps:

S101、第二转换器将从网络接口控制器接收的第一数据经由第一转换器发送给GPU。S101. The second converter sends the first data received from the network interface controller to the GPU via the first converter.

S102、第二转换器将从GPU接收的第二数据转发给网络接口控制器，其中，第二数据是由GPU经由第一转换器发送给第二转换器的。S102. The second converter forwards the second data received from the GPU to the network interface controller, where the second data is sent by the GPU to the second converter via the first converter.

本实施例中，第二转换器执行的数据传输方法步骤具体可以参加上述实施例，本实施例此处不再赘述。In this embodiment, the steps of the data transmission method performed by the second converter may be specifically referred to in the above-mentioned embodiments, and details are not described herein again in this embodiment.

根据本申请实施例的技术方案，GPU服务器架构中采用PCIE switch的系统模式+网状连接模式的架构，利用多个工作在系统模式的第一转换器分别与CPU通过PCIEx16链路进行通信，第一转换器与GPU板上的每个GPU连接，这些第一转换器平均分配给各个CPU，用于实现GPU与CPU之间的数据传输；通过多个工作在网状连接模式的第二转换器与多个网络接口控制器连接，多个网络接口控制器平均分配给各个第二转换器，第二转换器还与每个第一转换器连接，这些第二转换器用于实现网络接口控制器与第一转换器之间的数据传输，从而可以实现网络接口控制器与GPU之间的数据动态无阻塞传输，各个GPU之间可以通过第一转换器和第二转换器进行数据传输，无需经过CPU，可以实现GPU之间的P2P通信的高带宽和低时延，进一步可以提高GPU服务器的整体性能。According to the technical solutions of the embodiments of the present application, the GPU server architecture adopts the system mode + mesh connection mode of PCIE switch, and uses a plurality of first converters working in the system mode to communicate with the CPU through the PCIEx16 link, respectively. A converter is connected to each GPU on the GPU board, and these first converters are equally distributed to each CPU to realize data transmission between GPU and CPU; through multiple second converters working in mesh connection mode Connected with a plurality of network interface controllers, the plurality of network interface controllers are equally distributed to each second converter, the second converter is also connected with each first converter, and these second converters are used to realize the network interface controller and the Data transmission between the first converters, so that data can be dynamically and non-blockingly transmitted between the network interface controller and the GPU, and data transmission between each GPU can be performed through the first converter and the second converter without going through the CPU. , which can achieve high bandwidth and low latency of P2P communication between GPUs, which can further improve the overall performance of the GPU server.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本申请公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application can be performed in parallel, sequentially or in different orders, and as long as the desired results of the technical solutions disclosed in the present application can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本申请保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等，均应包含在本申请保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

1. A graphics processor GPU server, comprising: a GPU, a CPU, a first converter, a second converter, and a network interface controller;

the GPU is connected to the CPU through the first converter;

The GPU is connected to the second converter through the first converter, and the second converter is connected to the network interface controller;

The second converter is configured to send first data received from the network interface controller to the GPU via the first converter, and forward second data received from the GPU to the network interface a controller, wherein the second data is sent by the GPU to the second converter via the first converter.

2. The GPU server of claim 1, wherein the second converter is further configured to send third data received from the network interface controller to the CPU via the first converter;

The second converter is further configured to forward the received fourth data to the network interface controller, where the fourth data is sent by the CPU via the first converter.

3. The GPU server of claim 2, wherein the first converter is configured to transmit fifth data received from the GPU to the CPU and sixth data received from the CPU to the CPU GPU.

4. The GPU server according to claim 1, wherein the number of the CPUs is multiple, each of the CPUs is connected to at least one of the first converters, and each of the first converters is connected to one of the first converters. CPU connection.

5. The GPU server of claim 4, wherein the number of the first converters connected to each of the CPUs is equal.

6 . The GPU server according to claim 1 , wherein the number of the network interface controllers is plural, and each of the network interface controllers is connected to the second converter. 7 .

7. The GPU server according to claim 6, wherein the number of the second converters is multiple,

each of the first converters is connected to each of the second converters, respectively;

Each of the network interface controllers is connected to one of the second switches, and a plurality of the network interface controllers are equally distributed to a plurality of the second switches.

8 . The GPU server according to claim 1 , wherein the first converter is a PCIEswitch chip operating in a system mode, and the first converter is connected to the CPU through a PCIE x16 link. 9 .

9 . The GPU server according to claim 8 , wherein the second converter is a PCIE switch chip operating in a mesh connection mode, and the second converter is connected to the first converter through a PCIE x16 link. 10 .

10. The GPU server according to any one of claims 1-9, further comprising: at least one third converter,

the GPU is connected to the at least one third converter, each of the third converters is connected to each of the second converters, and the at least one third converter is not connected to the CPU;

The second converter is further configured to send data received from the network interface controller to the GPU via the third converter, and to send the GPU to the third converter via the third converter. The data of the two converters are forwarded to the network interface controller.

11. The GPU server according to claim 10, wherein the number of the third converters is an integer multiple of the number of the CPUs, and the number of the third converters corresponding to each of the CPUs is equal.

12 . The GPU server according to claim 10 , wherein the third converter is a PCIE switch chip operating in a system mode, and the second converter and the third converter are connected through a PCIE x16 link. 13 .

13. The GPU server according to any one of claims 1-9, further comprising: a network interface controller connected to the CPU.

14. A data transmission method, the method is applied to a GPU server, the GPU server comprises a GPU, a CPU, a first converter, a second converter, and a network interface controller, and the GPU passes through the first converter connected with the CPU, the GPU is connected with the second converter through the first converter, and the second converter is connected with the network interface controller; the method includes:

the second converter sends first data received from the network interface controller to the GPU via the first converter;

The second converter forwards second data received from the GPU to the network interface controller, wherein the second data is sent by the GPU to the second converter via the first converter converter.