Deep learning

# Deep learning

MNN-LLM Android App

MNN LLM Android App

MNN-LLM is an efficient inference framework designed to optimize and accelerate the deployment of large language models on mobile devices and local PCs. It addresses high memory consumption and computational cost issues through model quantization, hybrid storage, and hardware-specific optimizations. MNN-LLM excels in CPU benchmark tests with significant speed improvements, making it ideal for users who need privacy protection and efficient inference.

Artificial intelligence

Thera

Thera is an advanced super-resolution technique capable of generating high-quality images at various scales. Its main advantage lies in its built-in physical observation model, effectively avoiding aliasing artifacts. Developed by a research team at ETH Zurich, this technology is applicable to image enhancement and computer vision, particularly in remote sensing and photogrammetry.

Image Enhancement

Flex.1-alpha

Flex.1-alpha is a powerful text-to-image generation model based on an 8 billion parameter corrected flow transformer architecture. It inherits features from FLUX.1-schnell and generates images without the need for CFG through trained guided embedders. The model supports fine-tuning and is open-source (Apache 2.0), making it suitable for use in various inference engines like Diffusers and ComfyUI. Its main advantages include efficient generation of high-quality images, flexible fine-tuning capabilities, and strong community support. The development background aims to address the compression and optimization issues of image generation models while continuously improving model performance through ongoing training.

Image Generation

One Shot, One Talk

One Shot, One Talk

One Shot, One Talk is a deep learning-based image generation technology capable of reconstructing full-body dynamic speaking avatars with personalized details from a single image, supporting realistic animation effects, including vivid body movements and natural expression changes. The significance of this technology lies in its ability to greatly lower the barrier for creating realistic, animated virtual images, enabling users to generate highly personalized and expressive virtual avatars with just one image. Background information indicates that this technology was developed by research teams from the University of Science and Technology of China and Hong Kong Polytechnic University, integrating the latest image-to-video diffusion models and 3DGS-mesh hybrid avatar representations, while employing key regularization techniques to reduce inconsistencies caused by imperfect labels.

Flux.1 Lite

Flux.1 Lite is an 8B parameter text-to-image generation model published by Freepik, extracted from the FLUX.1-dev model. This version reduces RAM usage by 7GB compared to the original model and improves runtime speed by 23% while maintaining the same precision (bfloat16) as the original model. The release of this model aims to make high-quality AI models more accessible, especially for consumer-grade GPU users.

Image Generation

Revisit Anything

Revisit Anything

Revisit Anything is a visual location recognition system that utilizes image segment retrieval technology to identify and match locations across different images. It combines SAM (Spatial Attention Module) and DINO (Distributed Knowledge Distillation) technologies to enhance the accuracy and efficiency of visual recognition. This technology holds significant application value in fields such as robotic navigation and autonomous driving.

AI image detection and recognition

Aixploria

Aixploria is a website focused on artificial intelligence, offering an online directory of AI tools that helps users find and select the best AI solutions to meet their needs. With a simplified design and intuitive search engine, users can easily search for various AI applications using keywords. Aixploria not only provides a list of tools but also publishes articles explaining how each AI works, helping users understand the latest trends and popular applications. Additionally, Aixploria features a 'Top 10 AI' section that is updated in real-time, allowing users to quickly learn about the top AI tools in each category. Aixploria is suitable for anyone interested in AI, whether beginners or experts, and valuable information can be found here.

AI information platform

FasterLivePortrait

Fasterliveportrait

FasterLivePortrait is a real-time portrait animation project based on deep learning. It achieves real-time running speed of 30+ FPS, including pre-processing and post-processing, by using TensorRT on the RTX 3090 GPU. The project also implements the conversion of the LivePortrait model to an Onnx model and achieves a >70ms/frame inference speed using onnxruntime-gpu on the RTX 3090, supporting cross-platform deployment. In addition, the project supports native gradio apps, enhancing inference speed by several times and supporting simultaneous inference for multiple faces. The code structure has been restructured, no longer relying on PyTorch, all models use onnx or tensorrt for inference.

AI image generation

image-textualization

Image Textualization

image-textualization is an automated framework for generating rich and detailed image descriptions. This framework utilizes deep learning technology, enabling it to automatically extract information from images and generate accurate, comprehensive textual descriptions. This technology holds significant application value in areas such as image recognition, content generation, and assisting individuals with visual impairments.

AI image detection and recognition

SDXL Flash

SDXL Flash is a text-to-image generation model developed by the SD community in collaboration with Project Fluently. It offers faster processing speeds than LCM, Turbo, Lightning, and Hyper while maintaining high image quality. Based on the Stable Diffusion XL technology, the model achieves high efficiency and quality in image generation through optimized steps and CFG (Guidance) parameters.

AI image generation

LighTDiff

LighTDiff is a deep learning model designed to enhance surgical endoscope images under low-light conditions. Utilizing T-Diffusion technology, the model effectively increases the brightness and clarity of images, significantly contributing to surgical safety and efficiency. This technology has been accepted for early publication at the MICCAI2024 conference, and the code is open-source, available for research and practical applications.

AI image enhancement

TensorDock

TensorDock is a professional cloud service provider built for workloads that demand unwavering reliability. It offers a range of GPU server options, including NVIDIA H100 SXMs, and cost-effective virtual machine infrastructure for deep learning, AI, and rendering. TensorDock also provides fully-managed container hosting services, complete with OS-level monitoring, auto-scaling, and load balancing. In addition, TensorDock offers world-class enterprise support, provided by professionals.

Development & Tools

RAGFlow

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine based on deep document understanding, offering a streamlined RAG workflow suitable for enterprises of all sizes. It combines Large Language Models (LLM) to provide authentic Q&A capabilities and supports referencing verifiable citations from a variety of complex data formats.

Knowledge Management

FaceChain

FaceChain is a deep learning toolkit supported by ModelScope, capable of generating your digital twin with at least one portrait photo and creating personal portraits in different settings (supporting multiple styles). Users can train digital twin models and generate images through FaceChain's Python scripts, the familiar Gradio interface, or sd webui. The main advantages of FaceChain include its ability to generate personalized portraits, support for multiple styles, and an easy-to-use interface.

AI head image generation

GenAI Courses

GenAI Courses is an online platform offering AI learning courses. Through these courses, users can master technologies such as GenAI, AI, machine learning, deep learning, chatGPT, DALLE, image generation, video generation, and text generation, as well as gain insights into the latest developments in the AI field for 2024.

ControlNet++

ControlNet++ is a novel text-to-image diffusion model that significantly improves controllability under various conditioning by explicitly optimizing the pixel-level cyclic consistency between the generated image and the conditioning control. It utilizes a pre-trained discriminative reward model to extract the corresponding conditioning from the generated image and optimizes the consistency loss between the input conditioning control and the extracted conditioning. Furthermore, ControlNet++ introduces an efficient reward strategy by adding noise to the input image and then using a single-step denoised image for reward fine-tuning, avoiding the significant time and memory cost associated with image sampling.

AI image generation

img2img-turbo

img2img-turbo is an open-source project that improves upon the original img2img project, aiming to provide faster image-to-image conversion speeds. The project utilizes advanced deep learning techniques and can handle various image transformation tasks, such as style transfer, image coloring, and image restoration.

AI image generation

WSE-3

Cerebras Systems announced the release of its third-generation 5nm wafer-scale engine (WSE-3), a chip designed specifically for training the industry's largest AI models. WSE-3's performance is twice that of its predecessor, WSE-2, while maintaining the same power consumption and price. Based on a 5nm process, it features 400 billion transistors and utilizes 900,000 AI-optimized compute cores to deliver 125 petaflops of peak AI performance.

Model Training and Deployment

Depthify.ai

Depthify.ai is a tool that converts RGB images into various spatial formats compatible with Apple Vision Pro and Meta Quest. By converting RGB images to spatial photos, it supports various computer vision and 3D modeling applications. It can generate depth maps, stereoscopic images, and HEIC files, which can be used on Apple Vision Pro.

Intel NPU Acceleration Library

Intel NPU Acceleration Library

The Intel NPU Acceleration Library is designed to enhance the performance of deep learning and machine learning applications on Intel's Neural Processing Units (NPUs). It offers algorithms and tools optimized for Intel hardware, supports various deep learning frameworks, and significantly improves model inference speed and efficiency.

AI model training and inference

ComfyUI-layerdiffusion

Comfyui Layerdiffusion

ComfyUI-layerdiffusion is a GitHub project that provides a custom node implementation for the Layer Diffusion model. This project allows users to install through Python dependencies and currently only supports SDXL models. The project aims to provide a convenient integration of the Layer Diffusion model for ComfyUI users.

AI image generation

Stable Video diffusion

Stable Video Diffusion

Stable video diffusion is an AI-powered video generation platform. Users can transform concepts into compelling videos through text or images. The platform utilizes cutting-edge deep learning technology to generate a wide range of high-quality video content, including commercial promotional videos, educational videos, and demonstration videos. Its advantages include fast generation speed, high quality, and user-friendly simplicity. Pricing is based on a subscription model that depends on the number of videos created. It is targeted towards enterprise customers who frequently require high-quality video generation.

Video Production

Magika

Magika is a rapid and precise file type identification tool developed by Google, based on deep learning models, capable of identifying binary and text file types in milliseconds. Its accuracy is significantly higher than other existing tools, particularly in the identification of code and configuration files.

File Type Recognition

Keras

Keras is a human-centric API designed with best practices in mind. It simplifies the cognitive load by offering a consistent and straightforward API that minimizes user interaction for common use cases. It also provides clear and actionable error messages. Keras aims to give developers who want to launch machine learning-based applications an unfair advantage. Keras prioritizes debugging speed, code elegance and conciseness, maintainability, and deployability. With Keras, your codebase becomes smaller, more readable, and easier to iterate on. Your models run faster with the support of XLA compilation and Autograph optimization, and they are easier to deploy across various platforms (servers, mobile devices, browsers, embedded devices).

Development & Tools

AmigoAI

AmigoAI is an AI creative assistant based on large-scale language models designed to enhance user productivity and streamline automated content creation. It can automatically generate a variety of content based on prompts, supporting creations such as code, articles, and stories, as well as intelligent dialogue. Utilizing unique deep learning technology, AmigoAI supports Chinese input with smooth and coherent outputs. It is a powerful tool for boosting individual and organizational outputs.

Writing Assistant

Kaggle

Kaggle is an online learning platform for data scientists. It offers a variety of datasets, code examples, forum discussions, online courses, and machine learning competitions. Users can learn data science for free on this platform, communicate with peers, and participate in machine learning competition practices.

ClearImage

ClearImage is an image processing tool based on deep learning technology. It can quickly convert blurry images into high-definition images. It uses advanced algorithms to rebuild images, making the details clearer and sharper. ClearImage also provides features such as cutout, ID photo processing, black and white image coloring, super image compression, and DPI modification. It is suitable for individuals, photographers, designers, and other scenarios.

MyHeritage

Deep Nostalgia? is an amazing technology that can bring faces in your family photos to life with animation. Experience your family history in a whole new way! Deep Nostalgia? uses deep learning technology to apply pre-prepared motion sequences to the faces in your static photos, creating high-quality, realistic video clips. This feature can make your ancestors smile, blink, and turn their heads, truly giving life to your photos!

AI image generation

Featured AI Tools

騰訊混元圖像 2.0

騰訊混元圖像 2.0

騰訊混元圖像 2.0 是騰訊最新發布的 AI 圖像生成模型，顯著提升了生成速度和畫質。通過超高壓縮倍率的編解碼器和全新擴散架構，使得圖像生成速度可達到毫秒級，避免了傳統生成的等待時間。同時，模型通過強化學習算法與人類美學知識的結合，提升了圖像的真實感和細節表現，適合設計師、創作者等專業用戶使用。

Lovart

Lovart 是一款革命性的 AI 設計代理，能夠將創意提示轉化為藝術作品，支持從故事板到品牌視覺的多種設計需求。其重要性在於打破傳統設計流程，節省時間並提升創意靈感。Lovart 當前處於測試階段，用戶可加入等候名單，隨時體驗設計的樂趣。

FastVLM

FastVLM 是一種高效的視覺編碼模型，專為視覺語言模型設計。它通過創新的 FastViTHD 混合視覺編碼器，減少了高分辨率圖像的編碼時間和輸出的 token 數量，使得模型在速度和精度上表現出色。FastVLM 的主要定位是為開發者提供強大的視覺語言處理能力，適用於各種應用場景，尤其在需要快速響應的移動設備上表現優異。

KeySync

KeySync 是一個針對高分辨率視頻的無洩漏唇同步框架。它解決了傳統唇同步技術中的時間一致性問題，同時通過巧妙的遮罩策略處理表情洩漏和麵部遮擋。KeySync 的優越性體現在其在唇重建和跨同步方面的先進成果，適用於自動配音等實際應用場景。

Manus

Manus 是由 Monica.im 研發的全球首款真正自主的 AI 代理產品，能夠直接交付完整的任務成果，而不僅僅是提供建議或答案。它採用 Multiple Agent 架構，運行在獨立虛擬機中，能夠通過編寫和執行代碼、瀏覽網頁、操作應用等方式直接完成任務。Manus 在 GAIA 基準測試中取得了 SOTA 表現，展現了強大的任務執行能力。其目標是成為用戶在數字世界的‘代理人’，幫助用戶高效完成各種複雜任務。

Trae國內版

Trae是一款專為中文開發場景設計的AI原生IDE，將AI技術深度集成於開發環境中。它通過智能代碼補全、上下文理解等功能，顯著提升開發效率和代碼質量。Trae的出現填補了國內AI集成開發工具的空白，滿足了中文開發者對高效開發工具的需求。其定位為高端開發工具，旨在為專業開發者提供強大的技術支持，目前尚未明確公開價格，但預計會採用付費模式以匹配其高端定位。

開發與工具

Pika

Pika是一個視頻製作平臺,用戶可以上傳自己的創意想法,Pika會自動生成相關的視頻。主要功能有:支持多種創意想法轉視頻,視頻效果專業,操作簡單易用。平臺採用免費試用模式,定位面向創意者和視頻愛好者。

LiblibAI

LiblibAI是一箇中國領先的AI創作平臺,提供強大的AI創作能力,幫助創作者實現創意。平臺提供海量免費AI創作模型,用戶可以搜索使用模型進行圖像、文字、音頻等創作。平臺還支持用戶訓練自己的AI模型。平臺定位於廣大創作者用戶,致力於創造條件普惠,服務創意產業,讓每個人都享有創作的樂趣。

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase