CN114612600B - Virtual image generation method, device, electronic device and storage medium - Google Patents

Virtual image generation method, device, electronic device and storage medium Download PDF

Info

Publication number
CN114612600B
CN114612600B CN202210244262.7A CN202210244262A CN114612600B CN 114612600 B CN114612600 B CN 114612600B CN 202210244262 A CN202210244262 A CN 202210244262A CN 114612600 B CN114612600 B CN 114612600B
Authority
CN
China
Prior art keywords
point data
frequency domain
virtual image
avatar
perceptual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210244262.7A
Other languages
Chinese (zh)
Other versions
CN114612600A (en
Inventor
李�杰
赵晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210244262.7A priority Critical patent/CN114612600B/en
Publication of CN114612600A publication Critical patent/CN114612600A/en
Priority to KR1020220155155A priority patent/KR20220161233A/en
Priority to JP2022211477A priority patent/JP2023026531A/en
Application granted granted Critical
Publication of CN114612600B publication Critical patent/CN114612600B/en
Priority to US18/181,371 priority patent/US20230206578A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/20Three-dimensional [3D] animation
    • G06T13/40Three-dimensional [3D] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating three-dimensional [3D] models or images for computer graphics
    • G06T19/20Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • G06T15/02Non-photorealistic rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/56Particle system, point based geometry or rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2012Colour editing, changing, or manipulating; Use of colour codes

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开提供了一种虚拟形象生成方法,涉及人工智能技术领域,尤其涉及计算机视觉、虚拟/增强现实和元宇宙技术领域。具体实现方案为:将初始三维虚拟形象的多个点数据转换到频域,得到多个频域点数据;对多个频域点数据进行渲染,生成第一三维虚拟形象;确定第一三维虚拟形象的感知特征;以及根据感知特征与预定风格特征之间的差异,生成第二三维虚拟形象。本公开还提供了一种虚拟形象生成装置、电子设备和存储介质。

Figure 202210244262

The disclosure provides a method for generating a virtual image, which relates to the technical field of artificial intelligence, and in particular to the technical fields of computer vision, virtual/augmented reality and metaverse. The specific implementation plan is as follows: convert multiple point data of the initial 3D virtual image into the frequency domain to obtain multiple frequency domain point data; render multiple frequency domain point data to generate the first 3D virtual image; determine the first 3D virtual image a perceptual characteristic of the avatar; and generating a second three-dimensional avatar based on a difference between the perceptual characteristic and a predetermined style characteristic. The present disclosure also provides a virtual image generating device, electronic equipment and a storage medium.

Figure 202210244262

Description

虚拟形象生成方法、装置、电子设备和存储介质Virtual image generation method, device, electronic device and storage medium

技术领域technical field

本公开涉及人工智能技术领域,尤其涉及计算机视觉、虚拟/增强现实 和元宇宙技术领域,可应用于图像处理场景下。更具体地,本公开提供了 一种虚拟形象生成方法、装置、电子设备和存储介质。The present disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, virtual/augmented reality and metaverse, and can be applied in image processing scenarios. More specifically, the present disclosure provides a virtual image generation method, device, electronic device and storage medium.

背景技术Background technique

虚拟形象在元宇宙、社交、直播或游戏等场景中具有广泛的应用。可 以基于人工的方式生成虚拟形象。Avatars have a wide range of applications in scenarios such as metaverse, social networking, live streaming, or gaming. Avatars can be generated in a human-based manner.

发明内容Contents of the invention

本公开提供了一种虚拟形象生成方法、装置、设备以及存储介质。The present disclosure provides a virtual image generation method, device, equipment and storage medium.

根据本公开的一方面,提供了一种虚拟信息生成方法,该方法包括: 将初始三维虚拟形象的多个点数据转换到频域,得到多个频域点数据;对 所述多个频域点数据进行调整,得到多个调整后的点数据;对所述多个调 整后的点数据进行渲染,生成第一三维虚拟形象;确定所述第一三维虚拟 形象的感知特征;以及根据所述感知特征与预定风格特征之间的差异,生 成第二三维虚拟形象。According to an aspect of the present disclosure, a method for generating virtual information is provided, the method comprising: converting multiple point data of an initial three-dimensional avatar into a frequency domain to obtain multiple frequency domain point data; adjusting the point data to obtain a plurality of adjusted point data; rendering the plurality of adjusted point data to generate a first three-dimensional virtual image; determining the perceptual characteristics of the first three-dimensional virtual image; and according to the The difference between the perceptual feature and the predetermined style feature is used to generate a second three-dimensional avatar.

根据本公开的另一方面,提供了一种虚拟形象生成装置,该装置包括: 转换模块,用于将初始三维虚拟形象的多个点数据转换到频域,得到多个 频域点数据;渲染模块,用于对所述多个频域点数据进行渲染,生成第一 三维虚拟形象;第一确定模块,用于确定所述第一三维虚拟形象的感知特 征;以及生成模块,用于根据所述感知特征与预定风格特征之间的差异, 生成第二三维虚拟形象。According to another aspect of the present disclosure, there is provided an avatar generating device, the device comprising: a conversion module, configured to convert multiple point data of an initial three-dimensional avatar into a frequency domain to obtain multiple frequency domain point data; rendering A module for rendering the plurality of frequency domain point data to generate a first three-dimensional virtual image; a first determination module for determining the perceptual characteristics of the first three-dimensional virtual image; and a generating module for generating a first three-dimensional virtual image according to the A second three-dimensional avatar is generated based on the difference between the aforementioned perceptual features and the predetermined style features.

根据本公开的另一方面,提供了一种电子设备,包括:至少一个处理 器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被 至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个 处理器能够执行根据本公开提供的方法。According to another aspect of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicated with the at least one processor; wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by At least one processor executes, so that the at least one processor can execute the method provided according to the present disclosure.

根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算 机可读存储介质,该计算机指令用于使计算机执行根据本公开提供的方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method provided according to the present disclosure.

根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程 序,所述计算机程序在被处理器执行时实现根据本公开提供的方法。According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided according to the present disclosure.

应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键 或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下 的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

附图说明Description of drawings

附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:

图1是根据本公开的一个实施例的可以应用虚拟形象生成方法和装置 的示例性系统架构示意图;Fig. 1 is a schematic diagram of an exemplary system architecture that can be applied to an avatar generating method and device according to an embodiment of the present disclosure;

图2是根据本公开的一个实施例的虚拟形象生成方法的流程图;FIG. 2 is a flowchart of a method for generating an avatar according to an embodiment of the present disclosure;

图3是根据本公开的一个实施例的虚拟形象生成方法的流程图;FIG. 3 is a flowchart of a method for generating an avatar according to an embodiment of the present disclosure;

图4是根据本公开的一个实施例的虚拟形象生成方法的流程图;FIG. 4 is a flowchart of a method for generating an avatar according to an embodiment of the present disclosure;

图5是根据本公开的一个实施例的虚拟形象生成装置的框图;以及5 is a block diagram of an avatar generating device according to an embodiment of the present disclosure; and

图6是根据本公开的一个实施例的可以应用虚拟形象生成方法的电子 设备的框图。FIG. 6 is a block diagram of an electronic device to which an avatar generating method can be applied according to one embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实 施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本 领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和 修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的 描述中省略了对公知功能和结构的描述。The exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

虚拟形象可以包括虚拟的身体。可以基于人工的方式对一个虚拟形象 进行设计、生成和优化,需要较高的时间成本。而且,基于人工的方式生 成的虚拟形象风格较为单一。An avatar may include a virtual body. An avatar can be designed, generated and optimized manually, which requires a high time cost. Moreover, the avatar style generated based on artificial methods is relatively single.

图1是根据本公开一个实施例的可以应用虚拟形象生成方法和装置的 示例性系统架构示意图。需要注意的是,图1所示仅为可以应用本公开实 施例的系统架构的示例,以帮助本领域技术人员理解本公开的技术内容, 但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。Fig. 1 is a schematic diagram of an exemplary system architecture in which the method and device for generating an avatar can be applied according to an embodiment of the present disclosure. It should be noted that Figure 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.

如图1所示,根据该实施例的系统架构100可以包括终端设备101、 102、103,网络104和服务器105。网络104用以在终端设备101、102、 103和服务器105之间提供通信链路的介质。网络104可以包括各种连接 类型,例如有线和/或无线通信链路等等。As shown in FIG. 1 , a system architecture 100 according to this embodiment may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wired and/or wireless communication links, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交 互,以接收或发送消息等。终端设备101、102、103可以是具有显示屏并 且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝 上型便携计算机和台式计算机等等。A user may interact with a server 105 over a network 104 using a terminal device 101, 102, 103 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.

服务器105可以是提供各种服务的服务器,例如对用户利用终端设备 101、102、103所浏览的网站提供支持的后台管理服务器(仅为示例)。 后台管理服务器可以对接收到的用户请求等数据进行分析等处理,并将处 理结果(例如根据用户请求获取或生成的网页、信息、或数据等)反馈给 终端设备。The server 105 may be a server that provides various services, such as a background management server that provides support for websites browsed by users using the terminal devices 101, 102, 103 (just an example). The background management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device.

需要说明的是,本公开实施例所提供的虚拟形象生成方法一般可以由 服务器105执行。相应地,本公开实施例所提供的虚拟形象生成装置一般 可以设置于服务器105中。本公开实施例所提供的虚拟形象生成方法也可 以由不同于服务器105且能够与终端设备101、102、103和/或服务器105 通信的服务器或服务器集群执行。相应地,本公开实施例所提供的虚拟形 象生成装置也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。It should be noted that, generally, the method for generating an avatar provided by the embodiment of the present disclosure can be executed by the server 105 . Correspondingly, the virtual image generation device provided by the embodiment of the present disclosure can generally be set in the server 105. The avatar generating method provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105. Correspondingly, the avatar generating apparatus provided by the embodiments of the present disclosure may also be set in a server or a server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105.

图2是根据本公开的一个实施例的虚拟形象生成方法的流程图。Fig. 2 is a flowchart of a method for generating an avatar according to an embodiment of the present disclosure.

如图2所示,该方法200可以包括操作S210~操作S240。As shown in FIG. 2 , the method 200 may include operation S210 to operation S240.

在操作S210,将初始三维虚拟形象的多个点数据转换到频域,得到 多个频域点数据。In operation S210, a plurality of point data of the initial three-dimensional avatar is converted into a frequency domain to obtain a plurality of frequency domain point data.

例如,初始三维虚拟形象可以是一个预设的三维虚拟形象。For example, the initial three-dimensional virtual image may be a preset three-dimensional virtual image.

例如,可以对初始三维虚拟形象上的多个点数据进行傅立叶变换,以 将多个点数据转换到频域。For example, a Fourier transform may be performed on multiple point data on the initial three-dimensional avatar to transform the multiple point data into the frequency domain.

在操作S220,对多个频域点数据进行渲染,生成第一三维虚拟形象。In operation S220, a plurality of frequency-domain point data are rendered to generate a first three-dimensional virtual image.

例如,可以利用各种渲染器对多个频域点数据进行渲染。在一个示例 中,可以利用Pytorch3D渲染器对多个频域点数据进行渲染。For example, multiple frequency domain point data can be rendered using various renderers. In one example, multiple frequency domain point data can be rendered using the Pytorch3D renderer.

在操作S230,确定第一三维虚拟形象的感知特征。In operation S230, perceptual characteristics of the first three-dimensional avatar are determined.

例如,可以根据各种特征提取模型确定第一三维虚拟学习的感知特征。For example, the perceptual features of the first 3D virtual learning can be determined according to various feature extraction models.

在操作S240,根据感知特征与预定风格特征之间的差异,生成第二 三维虚拟形象。In operation S240, a second 3D avatar is generated according to the difference between the perceptual feature and the predetermined style feature.

例如,可以利用各种特征提取模型对获取的风格描述信息进行特征特 征提取,得到预定风格特征。For example, various feature extraction models can be used to perform feature feature extraction on the acquired style description information to obtain predetermined style features.

例如,可以利用各种损失函数确定感知特征和预定风格特征之间的差 异值。在一个示例中,可以根据L2损失函数确定感知特征和预定风格特 征之间的差异值。若该差异值满足预定条件,可以将上文所述第一三维虚 拟形象作为第二三维虚拟形象。若该差异值不满足预定条件,可以对上文 所述第一三维虚拟形象进行调整,直到感知特征和预定风格特征之间的差 异满足预定条件。例如,预定条件可以是该差异值小于预定阈值。For example, various loss functions can be used to determine the difference value between perceptual features and predetermined stylistic features. In one example, the difference value between the perceptual feature and the predetermined style feature can be determined according to the L2 loss function. If the difference value satisfies the predetermined condition, the above-mentioned first three-dimensional virtual image can be used as the second three-dimensional virtual image. If the difference value does not meet the predetermined condition, the above-mentioned first three-dimensional avatar may be adjusted until the difference between the perceptual feature and the predetermined style feature meets the predetermined condition. For example, the predetermined condition may be that the difference value is smaller than a predetermined threshold.

通过本公开实施例,可以生成一个与预定风格特征匹配的三维虚拟形 象。Through the embodiments of the present disclosure, a three-dimensional virtual image matching predetermined style characteristics can be generated.

在一些实施例中,可以利用对比图文预训练模型处理第一三维虚拟形 象,得到第一三维虚拟形象的感知特征。In some embodiments, the first three-dimensional virtual image can be processed by using the contrastive image-text pre-training model to obtain the perceptual features of the first three-dimensional virtual image.

例如,对比图文预训练(Contrastive Language-Image Pre-training,CLIP) 模型可以提取文本的特征,也可以提取图像的特征。对比图文预训练模型 是一种开源通用模型,即连接文本和图像。对比图文预训练模型需要完成 的任务是:识别图像中的各种视觉信息,并将该信息与海量图片中的一个 相关联。For example, the Contrastive Language-Image Pre-training (CLIP) model can extract features of text as well as features of images. Contrastive Text Pre-Training Model is an open-source general-purpose model that connects text and images. The task of the contrastive image-text pre-training model is to identify various visual information in the image and associate this information with one of the massive images.

在一个示例中,可以在显示有第一三维虚拟形象的显示屏上,执行截 屏操作,得到一个截屏图像。利用对比图文预训练模型处理该截屏图像, 以得到感知特征。In an example, a screenshot operation may be performed on the display screen displaying the first three-dimensional avatar to obtain a screenshot image. The screenshot image is processed by using the contrastive image-text pre-training model to obtain perceptual features.

在一些实施例中,利用对比图文预训练模型,根据风格描述信息确定 预定风格特征。In some embodiments, the predetermined style features are determined according to the style description information by using the pre-trained model of contrasting pictures and texts.

例如,可以获取一个由目标对象输入的文本,将该文本作为一个风格 描述信息。接下来,可以利用上文所述的对比图文预训练模型处理该风格 描述信息,以确定预定风格特征。在一个示例中,风格描述信息例如可以 是包括“可爱”、“酷”等关键字的文本。对比图文预训练模型可以高效 地确定图和文本是否匹配。此外,预定风格特征和感知特征可以由同一个 对比图文预训练模型确定,在调整二者之间的差异之后,可以使得二者更 加匹配,以便生成更加符合风格描述信息的三维虚拟形象。For example, it is possible to obtain a text input by the target object and use the text as a style description information. Next, the style description information can be processed using the above-mentioned contrastive image-text pre-training model to determine predetermined style features. In one example, the style description information may be, for example, text including keywords such as "cute" and "cool". Comparing the image-text pre-training model can efficiently determine whether the image and text match. In addition, the predetermined style features and perceptual features can be determined by the same pre-training model of contrasting images and texts. After adjusting the differences between the two, the two can be more matched, so as to generate a 3D virtual image that is more in line with the style description information.

在一些实施例中,将初始三维虚拟形象的多个点数据转换到频域,以 便利用各种3D工具基于多个频域点数据进行处理。例如,3D工具可以是 Unity 3D工具。In some embodiments, the plurality of point data of the initial three-dimensional avatar is converted to the frequency domain, so that various 3D tools can be used for processing based on the plurality of frequency domain point data. For example, the 3D tool can be a Unity 3D tool.

在一些实施例中,上文所述的预定条件可以是差异值收敛。In some embodiments, the above-mentioned predetermined condition may be that the difference values converge.

图3是根据本公开的另一个实施例的虚拟形象生成方法的流程图。Fig. 3 is a flowchart of a method for generating an avatar according to another embodiment of the present disclosure.

如图3所示,该方法300可以包括操作S310~操作S330,以及操作 S341~操作S344。As shown in FIG. 3 , the method 300 may include operation S310 to operation S330, and operation S341 to operation S344.

在操作S310,将初始三维虚拟形象的多个点数据转换到频域,得到 多个频域点数据。In operation S310, a plurality of point data of the initial three-dimensional avatar is converted into a frequency domain to obtain a plurality of frequency domain point data.

例如,操作S310与上文所述操作S210相同或类似,本公开在此不再 赘述。For example, operation S310 is the same as or similar to operation S210 described above, and will not be repeated in this disclosure.

在操作S320,对多个频域点数据进行渲染,生成第一三维虚拟形象。In operation S320, a plurality of frequency-domain point data are rendered to generate a first three-dimensional virtual image.

例如,操作S320与上文所述操作S220相同或类似,本公开在此不再 赘述。For example, operation S320 is the same as or similar to operation S220 described above, and will not be repeated in this disclosure.

在操作S330,确定第一三维虚拟形象的感知特征。In operation S330, perceptual characteristics of the first three-dimensional avatar are determined.

例如,可以利用上文所述的对比图文预训练模型确定第一三维虚拟形 象的感知特征。For example, the perceptual characteristics of the first three-dimensional virtual image can be determined by using the above-mentioned contrastive image-text pre-training model.

在操作S341,确定感知特征与预定风格特征之间的差异值。In operation S341, a difference value between the perceptual feature and the predetermined style feature is determined.

例如,预定风格特征是利用上文所述的对比图文预训练模型根据风格 描述信息确定的。For example, the predetermined style feature is determined according to the style description information by using the above-mentioned contrastive image-text pre-training model.

例如,可以利用L2损失函数确定感知特征和预定风格特征之间的差 异值。L2损失函数也被称为最小平方误差(Least Square Error,LSE)损 失函数。For example, the L2 loss function can be used to determine the difference value between the perceptual feature and the predetermined style feature. The L2 loss function is also called the Least Square Error (LSE) loss function.

在操作S342,确定差异值是否收敛。In operation S342, it is determined whether the difference values converge.

在本公开实施例中,在确定差异值收敛的情况下,执行操作S343。In the embodiment of the present disclosure, in a case where it is determined that the difference values converge, operation S343 is performed.

例如,在确定第n个差异值小于或等于预定差异阈值之后,若再确定 第n个差异值之后的i个差异值均小于或等于预定差异阈值,可以确定差 异值收敛。在一个示例中,n为大于或等于1的整数,i为大于或等于1 的整数。比如,i为预设值,i=1。For example, after it is determined that the nth difference value is less than or equal to the predetermined difference threshold, if it is determined that the i difference values after the nth difference value are all less than or equal to the predetermined difference threshold, it can be determined that the difference values converge. In one example, n is an integer greater than or equal to 1, and i is an integer greater than or equal to 1. For example, i is a preset value, i=1.

在本公开实施例中,在确定差异值不收敛的情况下,执行操作S344, 再返回至操作S320。In the embodiment of the present disclosure, if it is determined that the difference value does not converge, perform operation S344, and then return to operation S320.

例如,在确定第m个差异值小于或等于预定差异阈值之后,若再确定 第m个差异值之后的j个差异值中的任一个差异值大于预设差异阈值,可 以确定差异值不收敛,可以执行操作S344。在执行操作S344之后,可以 返回至操作S320。在一个示例中,m为大于或等于1的整数,j为大于或 等于1的整数。比如,j为预设值,j=1。For example, after determining that the m-th difference value is less than or equal to a predetermined difference threshold, if any one of the j difference values after the m-th difference value is determined to be greater than the preset difference threshold, it can be determined that the difference value does not converge, Operation S344 may be performed. After performing operation S344, it may return to operation S320. In one example, m is an integer greater than or equal to 1, and j is an integer greater than or equal to 1. For example, j is a preset value, j=1.

在操作S343,将当前第一三维虚拟形象作为第二三维虚拟形象In operation S343, the current first three-dimensional virtual image is used as the second three-dimensional virtual image

例如,如上文所述,在确定差异值收敛之后,可以将与第n个差异值 对应的第一三维虚拟形象Vir_n作为第二三维虚拟形象。For example, as described above, after it is determined that the difference values converge, the first 3D avatar Vir_n corresponding to the nth difference value may be used as the second 3D avatar.

在操作S344,对多个频域点数据进行调整。In operation S344, a plurality of frequency domain point data are adjusted.

例如,如上文所述,在确定差异值不收敛之后,可以将与第m+j个差 异值对应的多个频域点数据进行调整,得到多个调整后的频域点数据。基 于多个调整后的频域点数据,返回至操作S320,对多个调整后的频域点数 据进行渲染,生成第m+j+1个第一三维虚拟形象。再执行后续操作。For example, as described above, after it is determined that the difference value does not converge, multiple frequency domain point data corresponding to the m+jth difference value may be adjusted to obtain multiple adjusted frequency domain point data. Based on the multiple adjusted frequency domain point data, return to operation S320, render the multiple adjusted frequency domain point data, and generate the m+j+1th first three-dimensional virtual image. Then perform subsequent operations.

例如,点数据包括点坐标数据和颜色数据。在一个示例中,频域点数 据包括频域点坐标数据和频域颜色数据。For example, point data includes point coordinate data and color data. In one example, the frequency domain point data includes frequency domain point coordinate data and frequency domain color data.

通过本公开实施例,在差异值不收敛的情况下,对频域点数据进行调 整,至差异值收敛,使得第二三维虚拟形象的感知特征与预定风格特征匹 配,提高用户体验。Through the embodiments of the present disclosure, when the difference value does not converge, the frequency domain point data is adjusted until the difference value converges, so that the perceptual characteristics of the second 3D avatar match the predetermined style characteristics, thereby improving user experience.

在一些实施例中,与方法300不同之处在于,可以将差异值与预设差 异阈值进行比较,以确定差异值是否收敛。In some embodiments, unlike method 300, the difference value may be compared to a preset difference threshold to determine whether the difference value has converged.

例如,在第n个差异值小于或等于预设阈值的情况下,确定差异值收 敛。For example, in a case where the nth difference value is less than or equal to a preset threshold, it is determined that the difference values converge.

又例如,在第n个差异值大于预设差异阈值的情况下,确定差异值不 收敛。For another example, when the nth difference value is greater than a preset difference threshold, it is determined that the difference value does not converge.

图4是根据本公开的另一个实施例的虚拟形象生成方法的流程图。Fig. 4 is a flowchart of a method for generating an avatar according to another embodiment of the present disclosure.

如图4所示,该方法444可以对多个频域点数据进行调整,下面将结 合操作S4441~操作S4442进行详细说明。As shown in Fig. 4, the method 444 can adjust multiple frequency-domain point data, which will be described in detail below in conjunction with operation S4441-operation S4442.

在操作S4441,针对多个频域点数据中的每个频域点数据,确定每个 频域点数据的点法线。In operation S4441, for each of the plurality of frequency-domain point data, a point normal of each frequency-domain point data is determined.

例如,可以利用Unity 3D工具根据多个频域点数据,确定一个网格 (Mesh)模型Model_Mesh_k。该网格模型Model_Mesh_k中包含多个三 角形平面片子模型。在一个示例中,一个三角形平面片子模型可以与一个 频域点数据对应。在一个示例中,根据网格模型Model_Mesh_k,进行渲 染,可以得到一个三维虚拟形象。For example, the Unity 3D tool can be used to determine a mesh (Mesh) model Model_Mesh_k based on multiple frequency domain point data. The grid model Model_Mesh_k contains multiple triangular plane slice models. In one example, a triangular planar patch submodel may correspond to a frequency-domain point data. In one example, a 3D avatar can be obtained by rendering according to the mesh model Model_Mesh_k.

在操作S4442,沿点法线延伸的方向调整每个频域点数据。In operation S4442, each frequency-domain point data is adjusted in a direction in which the point normal extends.

例如,如上文所述,频域点数据包括频域点坐标数据和频域点颜色数 据。可以沿点法线延伸的方向调整频域点坐标数据的数值。For example, as mentioned above, the frequency-domain point data includes frequency-domain point coordinate data and frequency-domain point color data. The value of the frequency-domain point coordinate data can be adjusted along the direction in which the normal line of the point extends.

例如,在沿点法线延伸的方向调整点坐标数据的数值之后,可以得到 一个调整后的网格模型Model_Mesh_k+1。k为大于或等于1的整数。For example, after adjusting the value of point coordinate data along the direction of point normal extension, an adjusted mesh model Model_Mesh_k+1 can be obtained. k is an integer greater than or equal to 1.

在一个示例中,利用渲染器对网格模型Model_Mesh_k+1进行渲染, 可以得到第k+1次调整后的第一三维虚拟形象。沿点法线延伸的方向进行 调整,可以确保每个频域点在一定的范围内移动,以便调整后的频域点数 据分布更加均匀。In an example, the mesh model Model_Mesh_k+1 is rendered by using a renderer to obtain the first 3D avatar after the k+1th adjustment. Adjusting along the direction of point normal extension can ensure that each frequency domain point moves within a certain range, so that the adjusted frequency domain point data distribution is more uniform.

在一些实施例中,对多个频域点数据进行调整可以包括:对每个频域 颜色数据的数值进行调整。In some embodiments, adjusting the plurality of frequency-domain point data may include: adjusting the value of each frequency-domain color data.

在一些实施例中,网格模型的数据结构可以是一个图结构。相应地, 网格模型可以包括多个点、多个边和多个面。In some embodiments, the data structure of the grid model may be a graph structure. Accordingly, the mesh model may include multiple points, multiple edges, and multiple faces.

例如,网格模型的数据结构可以是一个有向图结构。又例如,网格模 型的数据结构可以是一个无向图结构。For example, the data structure of the grid model can be a directed graph structure. For another example, the data structure of the grid model can be an undirected graph structure.

在一些实施例中,点法线可以是顶点法线。In some embodiments, point normals may be vertex normals.

例如,可以对三角形平面片的顶点的面领域法线进行加权平均,以得 到顶点法线。For example, the face domain normals of the vertices of the triangular planar slice can be weighted average to obtain the vertex normals.

在一些实施例中,与方法400不同,对多个频域点数据进行调整包括: 根据多个频域点数据,确定面点法线;沿面点法线延伸的方向调整多个频 域点数据。In some embodiments, different from the method 400, adjusting the multiple frequency domain point data includes: determining the surface point normal according to the multiple frequency domain point data; adjusting the multiple frequency domain point data along the extending direction of the surface point normal .

例如,一个面可以是根据至少一个三角形平面片确定的。面点法线可 以表征面内的顶点,而不是网格模型的顶点。面点法线与网格顶点之间的 关系是多对一的关系。例如,对于正方体网格模型中的一个角点,该角点 具有三个垂直的相邻面。面点法线可以根据这三个垂直的相邻面确定。For example, a face may be defined from at least one triangular planar piece. Surface normals can represent vertices within a surface rather than vertices of a mesh model. The relationship between surface normals and mesh vertices is a many-to-one relationship. For example, for a corner point in a cube mesh model, the corner point has three perpendicular adjacent faces. The face point normal can be determined from these three perpendicular adjacent faces.

图5是根据本公开的一个实施例的虚拟形象生成装置的框图。FIG. 5 is a block diagram of an avatar generating device according to an embodiment of the present disclosure.

如图5所示,该装置500可以包括转换模块510、渲染模块520、第 一确定模块530和生成模块540。As shown in FIG. 5 , the apparatus 500 may include a conversion module 510, a rendering module 520, a first determination module 530 and a generation module 540.

转换模块510,用于将初始三维虚拟形象的多个点数据转换到频域, 得到多个频域点数据。在一个示例中,该转换模块510可以用于执行例如 图2中的操作S210。The conversion module 510 is configured to convert multiple point data of the initial 3D avatar into frequency domain to obtain multiple frequency domain point data. In an example, the conversion module 510 may be used to perform, for example, operation S210 in FIG. 2 .

渲染模块520,用于对所述多个频域点数据进行渲染,生成第一三维 虚拟形象。在一个示例中,该渲染模块520可以用于执行例如图2中的操 作S220。The rendering module 520 is configured to render the plurality of frequency-domain point data to generate a first three-dimensional virtual image. In an example, the rendering module 520 may be used to perform, for example, operation S220 in FIG. 2 .

第一确定模块530,用于确定所述第一三维虚拟形象的感知特征。在 一个示例中,该第一确定模块530可以用于执行例如图2中的操作S230。The first determination module 530 is configured to determine the perceptual features of the first 3D avatar. In an example, the first determination module 530 may be used to perform, for example, operation S230 in FIG. 2 .

生成模块540,用于根据所述感知特征与预定风格特征之间的差异, 生成第二三维虚拟形象。在一个示例中,该生成模块540可以用于执行例 如图2中的操作S240。A generating module 540, configured to generate a second 3D avatar according to the difference between the perceptual feature and a predetermined style feature. In an example, the generating module 540 may be used to perform, for example, operation S240 in FIG. 2 .

在一些实施例中,所述生成模块包括:第一确定子模块,用于确定所 述感知特征与预定风格特征之间的差异值;第二确定子模块,用于确定所 述差异值是否收敛;获得子模块,用于在确定所述差异值收敛的情况下, 将当前第一三维虚拟形象作为所述第二三维虚拟形象;以及调整子模块, 用于在确定所述差异值不收敛的情况下,对所述多个频域点数据进行调整, 并返回所述对所述多个频域点数据进行渲染的操作。In some embodiments, the generation module includes: a first determination submodule, configured to determine a difference value between the perceptual feature and a predetermined style feature; a second determination submodule, used to determine whether the difference value converges ; Obtaining a submodule, used to use the current first 3D avatar as the second 3D avatar when it is determined that the difference value is convergent; and an adjustment submodule, used to determine that the difference value does not converge In some cases, the multiple frequency domain point data are adjusted, and the operation of rendering the multiple frequency domain point data is returned.

在一些实施例中,所述点数据包括点坐标数据和颜色数据。In some embodiments, the point data includes point coordinate data and color data.

在一些实施例中,所述调整子模块包括:确定单元,用于针对所述多 个频域点数据中的每个频域点数据,确定所述每个频域点数据的点法线; 以及调整单元,用于沿所述点法线延伸的方向调整所述每个频域点数据。In some embodiments, the adjustment submodule includes: a determining unit, configured to, for each frequency domain point data in the plurality of frequency domain point data, determine a point normal of each frequency domain point data; and an adjustment unit, configured to adjust each of the frequency-domain point data along the direction in which the normal line of the point extends.

在一些实施例中,所述第一确定模块包括:利用对比图文预训练模型 处理所述第一三维虚拟形象,得到所述第一三维虚拟形象的感知特征。In some embodiments, the first determination module includes: processing the first 3D avatar with a contrastive image-text pre-training model to obtain the perceptual features of the first 3D avatar.

在一些实施例中,装置500还包括:第二确定模块,用于利用对比图 文预训练模型,根据风格描述信息确定所述预定风格特征。In some embodiments, the device 500 further includes: a second determining module, configured to determine the predetermined style features according to the style description information by using the contrastive image-text pre-training model.

本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、 加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背 公序良俗。In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储 介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

例如,本公开还提供了一种电子设备,该电子设备包括:至少一个处 理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可 被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一 个处理器能够执行根据本公开提供的方法。For example, the present disclosure also provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by At least one processor executes, so that the at least one processor can execute the method provided according to the present disclosure.

例如,本公开还提供了一种存储有计算机指令的非瞬时计算机可读存 储介质,该计算机指令用于使计算机执行根据本公开提供的方法。For example, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method according to the present disclosure.

例如,本公开还提供了一种计算机程序产品,包括计算机程序,所述 计算机程序在被处理器执行时实现根据本公开提供的方法。下面将结合图 6进行详细说明。For example, the present disclosure also provides a computer program product comprising a computer program that, when executed by a processor, implements the method provided according to the present disclosure. The following will be described in detail in conjunction with Figure 6.

图6示出了可以用来实施本公开的实施例的示例电子设备600的示意 性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、 台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算 机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸 如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算 装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示 例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Figure 6 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图6所示,设备600包括计算单元601,其可以根据存储在只读存 储器(ROM)602中的计算机程序或者从存储单元608加载到随机访问存 储器(RAM)603中的计算机程序,来执行各种适当的动作和处理。在 RAM 603中,还可存储设备600操作所需的各种程序和数据。计算单元 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O) 接口605也连接至总线604。As shown in FIG. 6, the device 600 includes a computing unit 601 that can execute according to a computer program stored in a read-only memory (ROM) 602 or loaded from a storage unit 608 into a random-access memory (RAM) 603. Various appropriate actions and treatments. In the RAM 603, various programs and data necessary for the operation of the device 600 can also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604 .

设备600中的多个部件连接至I/O接口605,包括:输入单元606, 例如键盘、鼠标等;输出单元607,例如各种类型的显示器、扬声器等; 存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制 解调器、无线通信收发机等。通信单元609允许设备600通过诸如因特网 的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 600 are connected to the I/O interface 605, including: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc. ; and a communication unit 609, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 609 allows the device 600 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.

计算单元601可以是各种具有处理和计算能力的通用和/或专用处理 组件。计算单元601的一些示例包括但不限于中央处理单元(CPU)、图 形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机 器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的 处理器、控制器、微控制器等。计算单元601执行上文所描述的各个方法和处理,例如虚拟形象生成方法。例如,在一些实施例中,虚拟形象生成 方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如 存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由 ROM 602和/或通信单元609而被载入和/或安装到设备600上。当计算机 程序加载到RAM 603并由计算单元601执行时,可以执行上文描述的虚 拟形象生成方法的一个或多个步骤。备选地,在其他实施例中,计算单元 601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行 虚拟形象生成方法。Computing unit 601 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 601 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 executes various methods and processes described above, such as a method for generating an avatar. For example, in some embodiments, the avatar generating method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the avatar generating method described above can be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to execute the avatar generating method in any other suitable manner (for example, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路 系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、 专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设 备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些 各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者 多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/ 或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储 系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将 数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出 装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的 任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其 他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控 制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可 以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机 器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含 或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设 备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读 储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电 磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组 合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、 可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑 盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的 任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术, 该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线 管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠 标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算 机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的 反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉 反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入) 来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input, or, tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如, 作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、 或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器 的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处 描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部 件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络 的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此 并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具 有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或 删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地 执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望 的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure can be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术 人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、 子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和 改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (12)

1.一种虚拟形象生成方法,包括:1. A virtual image generation method, comprising: 将初始三维虚拟形象的多个点数据转换到频域,得到多个频域点数据;converting multiple point data of the initial three-dimensional avatar into the frequency domain to obtain multiple frequency domain point data; 对所述多个频域点数据进行渲染,生成第一三维虚拟形象;Rendering the plurality of frequency-domain point data to generate a first three-dimensional virtual image; 确定所述第一三维虚拟形象的感知特征;以及determining perceptual characteristics of the first 3D avatar; and 根据所述感知特征与预定风格特征之间的差异,生成第二三维虚拟形象,generating a second three-dimensional avatar based on the difference between the perceptual feature and a predetermined style feature, 其中,所述根据所述感知特征与预定风格特征之间的差异,生成第二三维虚拟形象包括:Wherein, the generating the second three-dimensional avatar according to the difference between the perceptual feature and the predetermined style feature includes: 确定所述感知特征与预定风格特征之间的差异值;determining a difference value between said perceptual characteristic and a predetermined stylistic characteristic; 确定所述差异值是否收敛;determining whether the difference values converge; 在确定所述差异值不收敛的情况下,对所述多个频域点数据进行调整,并返回所述对所述多个频域点数据进行渲染的操作;When it is determined that the difference value does not converge, adjust the multiple frequency domain point data, and return the operation of rendering the multiple frequency domain point data; 其中,所述对所述多个频域点数据进行调整包括:Wherein, said adjusting said multiple frequency domain point data includes: 针对所述多个频域点数据中的每个频域点数据,For each frequency domain point data in the plurality of frequency domain point data, 确定所述每个频域点数据的点法线;以及determining the point normal of each of the frequency domain point data; and 沿所述点法线延伸的方向调整所述每个频域点数据。Adjusting each of the frequency domain point data along a direction in which a normal line of the point extends. 2.根据权利要求1所述的方法,其中,所述根据所述感知特征与预定风格特征之间的差异,生成第二三维虚拟形象还包括:2. The method according to claim 1, wherein said generating a second three-dimensional avatar based on the difference between said perceptual features and predetermined style features further comprises: 在确定所述差异值收敛的情况下,将当前第一三维虚拟形象作为所述第二三维虚拟形象。If it is determined that the difference values converge, the current first three-dimensional virtual image is used as the second three-dimensional virtual image. 3.根据权利要求1或2所述的方法,其中,所述点数据包括点坐标数据和颜色数据。3. The method according to claim 1 or 2, wherein the point data comprises point coordinate data and color data. 4.根据权利要求1所述的方法,其中,所述确定所述第一三维虚拟形象的感知特征包括:4. The method according to claim 1, wherein said determining the perceptual characteristics of said first three-dimensional avatar comprises: 利用对比图文预训练模型处理所述第一三维虚拟形象,得到所述第一三维虚拟形象的感知特征。The first three-dimensional virtual image is processed by using the pre-training model of contrasting images to obtain the perceptual features of the first three-dimensional virtual image. 5.根据权利要求1所述的方法,还包括:5. The method of claim 1, further comprising: 利用对比图文预训练模型,根据风格描述信息确定所述预定风格特征。The predetermined style feature is determined according to the style description information by using a pre-trained model for comparing images and texts. 6.一种虚拟形象生成装置,包括:6. A virtual image generating device, comprising: 转换模块,用于将初始三维虚拟形象的多个点数据转换到频域,得到多个频域点数据;A conversion module is used to convert multiple point data of the initial three-dimensional virtual image into the frequency domain to obtain multiple frequency domain point data; 渲染模块,用于对所述多个频域点数据进行渲染,生成第一三维虚拟形象;A rendering module, configured to render the plurality of frequency-domain point data to generate a first three-dimensional virtual image; 第一确定模块,用于确定所述第一三维虚拟形象的感知特征;以及A first determining module, configured to determine the perceptual characteristics of the first three-dimensional avatar; and 生成模块,用于根据所述感知特征与预定风格特征之间的差异,生成第二三维虚拟形象;A generating module, configured to generate a second three-dimensional avatar according to the difference between the perceptual feature and the predetermined style feature; 其中,所述生成模块包括:Wherein, the generating module includes: 第一确定子模块,用于确定所述感知特征与预定风格特征之间的差异值;A first determining submodule, configured to determine a difference value between the perceptual feature and a predetermined style feature; 第二确定子模块,用于确定所述差异值是否收敛;The second determining submodule is used to determine whether the difference value is converged; 调整子模块,用于在确定所述差异值不收敛的情况下,对所述多个频域点数据进行调整,并返回所述对所述多个频域点数据进行渲染的操作;An adjustment submodule, configured to adjust the plurality of frequency-domain point data when it is determined that the difference value does not converge, and return the operation of rendering the plurality of frequency-domain point data; 所述调整子模块包括:The adjustment sub-module includes: 确定单元,用于针对所述多个频域点数据中的每个频域点数据,确定所述每个频域点数据的点法线;以及A determining unit, configured to, for each frequency domain point data in the plurality of frequency domain point data, determine a point normal of each frequency domain point data; and 调整单元,用于沿所述点法线延伸的方向调整所述每个频域点数据。An adjustment unit, configured to adjust each of the frequency-domain point data along a direction in which the normal line of the point extends. 7.根据权利要求6所述的装置,其中,所述生成模块包括:7. The apparatus according to claim 6, wherein the generating module comprises: 获得子模块,用于在确定所述差异值收敛的情况下,将当前第一三维虚拟形象作为所述第二三维虚拟形象。The obtaining submodule is configured to use the current first 3D avatar as the second 3D avatar when it is determined that the difference values converge. 8.根据权利要求6或7所述的装置,其中,所述点数据包括点坐标数据和颜色数据。8. The apparatus according to claim 6 or 7, wherein the point data comprises point coordinate data and color data. 9.根据权利要求6所述的装置,其中,所述第一确定模块还用于:9. The device according to claim 6, wherein the first determining module is further configured to: 利用对比图文预训练模型处理所述第一三维虚拟形象,得到所述第一三维虚拟形象的感知特征。The first three-dimensional virtual image is processed by using the pre-training model of contrasting images to obtain the perceptual features of the first three-dimensional virtual image. 10.根据权利要求6所述的装置,还包括:10. The apparatus of claim 6, further comprising: 第二确定模块,用于利用对比图文预训练模型,根据风格描述信息确定所述预定风格特征。The second determination module is configured to determine the predetermined style feature according to the style description information by using the pre-training model of contrasting pictures and texts. 11.一种电子设备,包括:11. An electronic device comprising: 至少一个处理器;以及at least one processor; and 与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至5中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 5 Methods. 12.一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1至5中任一项所述的方法。12. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1-5.
CN202210244262.7A 2022-03-11 2022-03-11 Virtual image generation method, device, electronic device and storage medium Active CN114612600B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202210244262.7A CN114612600B (en) 2022-03-11 2022-03-11 Virtual image generation method, device, electronic device and storage medium
KR1020220155155A KR20220161233A (en) 2022-03-11 2022-11-18 Method for generating virtual characters, device for generating virtual characters, electronic equipment, storage medium, and computer program
JP2022211477A JP2023026531A (en) 2022-03-11 2022-12-28 Virtual character generating method, apparatus, electronic equipment, storage medium, and computer program
US18/181,371 US20230206578A1 (en) 2022-03-11 2023-03-09 Method for generating virtual character, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210244262.7A CN114612600B (en) 2022-03-11 2022-03-11 Virtual image generation method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114612600A CN114612600A (en) 2022-06-10
CN114612600B true CN114612600B (en) 2023-02-17

Family

ID=81863540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210244262.7A Active CN114612600B (en) 2022-03-11 2022-03-11 Virtual image generation method, device, electronic device and storage medium

Country Status (4)

Country Link
US (1) US20230206578A1 (en)
JP (1) JP2023026531A (en)
KR (1) KR20220161233A (en)
CN (1) CN114612600B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792355B (en) * 2022-06-24 2023-02-24 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
CN114820908B (en) * 2022-06-24 2022-11-01 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium
CN115359220B (en) * 2022-08-16 2024-05-07 支付宝(杭州)信息技术有限公司 Virtual image updating method and device for virtual world
CN116310134B (en) * 2023-04-10 2026-02-27 北京百度网讯科技有限公司 Virtual avatar rendering methods, devices and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8890931B2 (en) * 2010-08-26 2014-11-18 City University Of Hong Kong Fast generation of holograms
CN107330966A (en) * 2017-06-21 2017-11-07 杭州群核信息技术有限公司 A kind of rendering intent and device
CN110531860A (en) * 2019-09-02 2019-12-03 腾讯科技(深圳)有限公司 A kind of animating image driving method and device based on artificial intelligence

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2918499B2 (en) * 1996-09-17 1999-07-12 株式会社エイ・ティ・アール人間情報通信研究所 Face image information conversion method and face image information conversion device
JP3062181B1 (en) * 1999-03-17 2000-07-10 株式会社エイ・ティ・アール知能映像通信研究所 Real-time facial expression detection device
US9520101B2 (en) * 2011-08-31 2016-12-13 Microsoft Technology Licensing, Llc Image rendering filter creation
CN109427088B (en) * 2017-08-18 2023-02-03 腾讯科技(深圳)有限公司 Rendering method for simulating illumination and terminal
CN113643412B (en) * 2021-07-14 2022-07-22 北京百度网讯科技有限公司 Virtual image generation method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8890931B2 (en) * 2010-08-26 2014-11-18 City University Of Hong Kong Fast generation of holograms
CN107330966A (en) * 2017-06-21 2017-11-07 杭州群核信息技术有限公司 A kind of rendering intent and device
CN110531860A (en) * 2019-09-02 2019-12-03 腾讯科技(深圳)有限公司 A kind of animating image driving method and device based on artificial intelligence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Simulation of combined head and room impulse response based on sound ray tracing in frequency domain;Junwei He;《IET International Conference on Smart and Sustainable City 2013 (ICSSC 2013)》;20140213;全文 *
单幅正视灰度图像三维重建及伪彩处理的研究与实现;厉为;《信息科技》;20120815(第8期);全文 *
飞行模拟器中的虚拟声音建模与实时渲染技术;杨新颖等;《航空学报》;20090725(第07期);全文 *

Also Published As

Publication number Publication date
JP2023026531A (en) 2023-02-24
CN114612600A (en) 2022-06-10
KR20220161233A (en) 2022-12-06
US20230206578A1 (en) 2023-06-29

Similar Documents

Publication Publication Date Title
CN114612600B (en) Virtual image generation method, device, electronic device and storage medium
CN116051668B (en) Training method of Vincent graph diffusion model and text-based image generation method
CN113808231B (en) Information processing method and device, image rendering method and device, and electronic device
CN114187633B (en) Image processing method and device, and training method and device for image generation model
US20210406579A1 (en) Model training method, identification method, device, storage medium and program product
CN115147265B (en) Avatar generation method, apparatus, electronic device, and storage medium
CN113052962B (en) Model training method, information output method, device, equipment and storage medium
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN113657518B (en) Training method, target image detection method, device, electronic device, and medium
US20230162426A1 (en) Image Processing Method, Electronic Device, and Storage Medium
CN114065784A (en) Training method, translation method, device, electronic equipment and storage medium
CN112529161A (en) Training method for generating countermeasure network, and method and device for translating human face image
CN113762510B (en) Data processing method and device for target model, electronic equipment and medium
CN115496916A (en) Training method of image recognition model, image recognition method and related device
CN112529154B (en) Image generation model training method and device, image generation method and device
CN114820908B (en) Virtual image generation method and device, electronic equipment and storage medium
CN115861510A (en) Object rendering method, device, electronic equipment, storage medium and program product
CN116257611A (en) Question answering model training method, question answering processing method, device and storage medium
CN115860121A (en) Text reasoning method, device, equipment and storage medium
CN114564133A (en) Application program display method, device, equipment and medium
CN113901997A (en) Image style conversion method, device, equipment, storage medium and program product
CN113608615B (en) Object data processing method, processing device, electronic device, and storage medium
CN115984947B (en) Image generation method, training device, electronic equipment and storage medium
CN119722880B (en) 3D model driving methods, devices and electronic equipment
CN116363331A (en) Image generation method, device, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant