# Voice Interaction

Vibe Coder
Vibe Coder
Vibe Coder is an open-source VS Code extension developed by Deepgram, aimed at exploring the possibilities of voice-driven programming. It leverages speech recognition technology, allowing users to interact with an AI programming assistant through voice commands, quickly translating ideas into code prototypes. This innovative programming method, termed 'vibe coding', is designed to enhance coding efficiency and reshape the future of software development. Vibe Coder is currently in its experimental phase, and Deepgram hopes to continuously improve the tool through community feedback.
Coding Assistance
70.9K
Sesame
Sesame
Sesame is a cross-disciplinary product and research team focused on voice technology, aiming to make user interaction with computers more natural and efficient through natural speech interaction. Its main products include a personal voice companion and a lightweight wearable eyewear device, designed to achieve humanization of computers and help users better organize information and improve efficiency. The main advantages of the product are the naturalness of voice interaction and the portability of the device, making it suitable for everyday use. Currently, Sesame is actively recruiting and committed to driving innovation in voice technology.
Personal Assistance
107.4K
Riviera
Riviera
Riviera is an AI-powered voice platform designed specifically for the hotel industry, aiming to enhance customer experience and optimize hotel operational efficiency through intelligent voice interaction. It supports multilingual conversations, enabling quick responses to customer inquiries and handling requests such as reservations and room service. It also offers personalized service through data analysis. Using advanced AI technology, Riviera reduces manual intervention, lowers operating costs, and helps alleviate employee workload, especially during peak seasons. The product stems from the digital transformation of the hotel industry and the increasing customer demand for service immediacy and personalization. Pricing and specific positioning are customized based on hotel size and needs.
Customer Service
71.8K
Lovify
Lovify
Lovify is an extension for Lovable.dev designed to improve developer productivity through a series of AI-driven features. It supports GitHub integration for quick repository import and management, offers intelligent prompt enhancement that optimizes prompts based on context, provides project planning tools to automatically generate PRDs and action plans, supports voice interaction for hands-free coding and debugging, and offers quick slash commands for rapid access to various functions. These features combined enable developers to write code, manage projects, and receive real-time help more efficiently. Currently, the product is in the promotion phase, and the specific price is not yet defined, but it is available for free trial through the Chrome Web Store.
Coding Assistant
68.4K
Step-Audio
Step Audio
Step-Audio is the first production-level open-source intelligent voice interaction framework, integrating voice understanding and generation capabilities. It supports multilingual dialogue, emotional intonation, dialects, speech rate, and prosodic style control. Its core technologies include a 130B parameter multimodal model, a generative data engine, fine-grained voice control, and enhanced intelligence. This framework promotes the development of intelligent voice interaction technology through open-source models and tools, and is suitable for a variety of voice application scenarios.
Speech Recognition
79.5K
Chirp AI
Chirp AI
Chirp AI is an intelligent voice assistant application designed specifically for Apple Watch. Through powerful voice recognition and artificial intelligence technology, it allows users to complete various operations with voice commands only, such as sending messages, obtaining information, and searching the web, greatly improving the operational efficiency of users in mobile scenarios. The main advantage of this product is that it achieves efficient information interaction and task processing without frequent use of the phone. It's suitable for users who want to reduce their dependence on mobile phones in their daily lives while still being able to quickly access information and complete tasks. Currently, the application is available for free download and is positioned as an intelligent tool to improve user productivity and convenience.
Personal Assistance
59.9K
FoloUp
Foloup
FoloUp is an AI-driven voice interview platform focused on the recruitment process. It facilitates efficient candidate screening and evaluation by intelligently generating interview questions, enabling real-time voice interaction, and providing in-depth analysis of candidate responses. The platform leverages advanced AI technology to ensure a natural and fluid interview process, while also delivering detailed candidate performance reports. FoloUp aims to enhance recruitment efficiency, reduce labor costs, and provide a fairer interview experience for candidates through technology. Currently, the platform is available as open-source, supporting custom deployment and usage.
Job seeking
61.3K
Storytelling Chatbot
Storytelling Chatbot
This product utilizes the Gemini 2.0 language model and Google Imagen image generation technology, integrating speech recognition and synthesis to provide users with an interactive storytelling experience. Users can choose the direction of the story through voice input, and the system will generate story content and related images in real-time. Its main advantages are innovative interaction methods and powerful content generation capabilities, making it suitable for education, entertainment, and creative inspiration. Currently, the product is in the open-source phase, with no specific pricing established, primarily targeting developers and educational institutions.
AI image generation
69.6K
SpeechGPT 2.0-preview
Speechgpt 2.0 Preview
SpeechGPT 2.0-preview is an advanced voice interaction model developed by the Natural Language Processing Laboratory at Fudan University. It employs vast amounts of voice data for training, achieving low-latency and highly natural speech interaction capabilities. The model simulates various emotional, stylistic, and role-based voice expressions while supporting tool invocation, online search, and access to external knowledge bases. Key advantages include strong voice style generalization, multi-role simulation, and low-latency interaction experience. Currently, the model supports Chinese voice interaction, with plans to expand to more languages in the future.
Speech-to-text
58.5K
Chinese Picks
Ideal Classmate
Ideal Classmate
Ideal Classmate is an AI application meticulously crafted by Ideal Automotive, leveraging its self-developed large model to provide users with a constantly available smart assistant. It possesses knowledge Q&A capabilities, answering questions across various fields such as automobiles, travel, finance, and technology, and is proficient in English translation and text generation to support users in learning and daily life. Additionally, it features visual perception abilities, allowing users and their families to explore the world and recognize various things encountered during outings. The product interface is designed to be simple and elegant, with precise and rapid voice input, and natural, smooth output that closely resembles human speech, making it an intelligent assistant that integrates knowledge Q&A, visual recognition, and voice interaction.
Personal Assistance
108.5K
Agentplace
Agentplace
Agentplace is a platform that allows users to build AI applications and websites on AI models without any coding knowledge. It leverages the adaptability, common sense, knowledge, and voice capabilities of AI, enabling users to program entirely through text. Key advantages of the product include dynamic user interfaces, voice interactions, common sense understanding, and instant publishing. Agentplace aims to simplify the website and application creation process with AI technology, allowing non-technical users to easily create interactive and dynamic websites. Regarding pricing, Agentplace offers both free and paid plans to cater to different user needs.
Development Platform
55.5K
Speek
Speek
Speek is an AI-driven assistant that guides users through website interactions with voice and animated mouse pointers. It helps answer questions, directs users on how to use website features, and simplifies purchasing decisions. By providing real-time assistance and support, it is easy to install and starts enhancing user experience immediately, boosting sales and reducing customer support inquiries.
User Guidance
56.0K
Google Gemini App
Google Gemini App
Google Gemini is an AI assistant app developed by Google, designed to help users enhance creativity and productivity through AI technology. It allows users to interact with the app via voice, brainstorm ideas, simplify complex topics, and rehearse for important occasions. Gemini seamlessly connects with favorite Google apps like Search, YouTube, Google Maps, and Gmail to provide interactive visuals, real-world examples, unlock expertise, and offer customized information on any subject. Moreover, it aids users in efficiently planning trips, creating AI-generated images, and providing summaries, in-depth research, and source links.
Personal Assistance
55.8K
GPTS4O.SO
GPTS4O.SO
GPT-4o is an advanced multimodal AI platform launched by OpenAI, expanding upon GPT-4 to implement a truly multimodal approach covering text, images, and audio. Designed to be faster, more cost-effective, and widely accessible, GPT-4o revolutionizes our interactions with AI. It offers a seamless and intuitive AI interaction experience, whether engaging in natural conversations, interpreting complex texts, or recognizing subtle emotions in speech; the adaptability of GPT-4o is unparalleled.
AI Model
56.9K
XGO Rider
XGO Rider
XGO Rider is a desktop dual-wheeled biped AI robot that integrates ChatGPT, featuring self-balancing capabilities and omnidirectional movement. Built on the Raspberry Pi CM4 core module, it supports Python and C++ programming, making it ideal for AI programming education. XGO Rider not only helps students and developers seamlessly enter the world of robotics but also facilitates various interactions and learning opportunities through its rich array of sensors and AI functionalities, including gesture recognition, facial detection, and skeletal tracking.
Education
52.7K
voice-chat-pdf
Voice Chat Pdf
voice-chat-pdf is a sample built on the LlamaIndex project using Next.js. It allows users to interact with PDF documents via voice using a simple Retrieval-Augmented Generation (RAG) system. This project requires an OpenAI API key to access the real-time API and generate embedding vectors for document interactions. It demonstrates how advanced machine learning technologies can be applied to enhance the efficiency and convenience of document interaction.
AI Conversational Agents
58.0K
English Picks
Realtime API
Realtime API
The Realtime API, launched by OpenAI, is a low-latency voice interaction API that enables developers to create fast voice-to-voice experiences within their applications. This API supports natural voice-to-voice conversation and can handle interruptions, similar to the advanced voice mode of ChatGPT. It operates through a WebSocket connection and supports function calls, allowing voice assistants to respond to user requests, trigger actions, or introduce new contexts. With this API, developers no longer need to combine multiple models to construct voice experiences; instead, they can achieve natural conversational interactions through a single API call.
AI speech recognition
95.8K
Wen Xiaoyan
Wen Xiaoyan
Wen Xiaoyan is an intelligent search assistant app launched by Baidu, based on the Wenxin large model. It leverages AI technology to offer various services such as search, creation, and chat. The product personalizes search results and creative suggestions by remembering user preferences and needs, and supports voice and photo input for easier access to information and content creation.
AI search engine
103.2K
Open-LLM-VTuber
Open LLM VTuber
Open-LLM-VTuber is an open-source project designed for interaction with large language models (LLMs) via voice, featuring real-time Live2D facial capture and cross-platform long-term memory capabilities. This project supports macOS, Windows, and Linux platforms, allowing users to select different speech recognition and speech synthesis backends, as well as customized long-term memory solutions. It is particularly suited for developers and enthusiasts looking to implement natural language conversations with AI across various platforms.
AI Agents
107.9K
Chinese Picks
Spaceship
Spaceship
The Spaceship App is an intelligent assistant application based on artificial intelligence technology, designed specifically for mobile devices. It provides engaging, informative, and useful interactive experiences through natural language dialogues, catering to users' needs for entertainment and efficiency. The product supports both text and voice inputs and offers multiple TTS options, making interactions more natural and friendly.
Personal care
66.0K
Rich AI
Rich AI
Rich AI is an app designed for iPad and iPhone, aiming to provide business and money-making creative inspiration, personalized advice, voice mode interaction, learning opportunities, expert insights, and instant feedback. It helps users succeed in entrepreneurship and earning money by explaining core entrepreneurial philosophies and marketing strategies.
AI Creation Methodology
59.6K
Xiaochuang AI Chatbot
Xiaochuang AI Chatbot
Xiaochuang? AI Chatbot is an AI product based on large language models (LLMs), primarily targeting children. It assists them in acquiring knowledge and enhancing their independent thinking, questioning, and language expression abilities in both learning and daily life. Its advantages include being a knowledgeable expert, an on-demand foreign language tutor, a creative writing assistant, and a compassionate listener. Positioned as an AI assistant for families and education, Xiaochuang? provides children with comprehensive knowledge acquisition and interactive communication.
Education
121.4K
English Picks
Play.ai
Play.ai
Play.ai is an advanced voice interaction platform that leverages AI technology to provide users with smooth and natural conversation experiences. The platform not only comprehends user instructions but also intelligently responds based on context, offering personalized services. Play.ai's main advantages lie in its high degree of interactivity and intelligence, which can adapt to the needs of different users and provide customized conversation services. Additionally, Play.ai is user-friendly and responsive, making it a powerful tool for businesses and individuals to enhance communication efficiency.
Chatbot
54.6K
Krutrim
Krutrim
Krutrim is an AI assistant independently developed in India, capable of communicating in local Indian languages. It features voice interaction, supports 22 official Indian languages, contains built-in Indian cultural knowledge, and can generate text that aligns with Indian cultural contexts. Krutrim is widely applicable in e-commerce and customer service, helping businesses enhance customer experience.
Chatbot
55.8K
Tab
Tab
Tab is a wearable AI device integrated with a voice assistant, real-time translation, and schedule management functions. It can become your intelligent companion. With its lightweight and portable design, it is comfortable to wear. Through voice interaction, it can help users improve work efficiency and accompany them in their daily lives.
Personal Care
73.4K
English Picks
Rabbit r1
Rabbit R1
r1 is a personal intelligent voice assistant that uses natural language interaction and provides a personalized operating system, allowing users to communicate with it like a friend. It incorporates artificial intelligence technologies such as voice recognition, human-computer dialogue, and personalized recommendations, helping users manage their daily affairs more efficiently and becoming a reliable helper.
Personal Assistance
225.8K
Cerence Chat Pro
Cerence Chat Pro
Cerence Chat Pro is an application designed for car manufacturers, which seamlessly integrates generative AI systems like ChatGPT into vehicle systems through voice interaction. It offers high levels of customization and compatibility, allowing car manufacturers to quickly create personalized AI dialogue experiences tailored to their brand positioning and user needs. Compared to competitors, Cerence Chat Pro has lower integration difficulties, is easier to expand and iterate, and helps reduce research and development costs for car manufacturers.
Chatbot
56.9K
Dittin AI
Dittin AI
Dittin AI is an application that provides AI voice role-playing services. Users can choose from various virtual characters, each with unique stories and personalities. Through Dittin AI, users can enjoy the fun of interacting with virtual characters and experience different scenarios and plots.
Social robot
560.8K
Imagine with Meta AI
Imagine With Meta AI
Imagine with Meta AI is an image generation tool. Using AI technology, users can automatically generate images simply by describing them with voice. This greatly enriches image content, allowing users to create freely. Currently, the product is in the internal testing stage, and users need to log in to use the image generation function.
AI image generation
178.6K
GardenofAI
Gardenofai
Garden of AI is a groundbreaking AI Assistant with superior comprehension skills, capable of handling any task you assign it. Communicating with it feels as natural as conversing with a person, free from the robotic prompts. It can automatically understand your commands and execute them.
Personal care
53.5K
Featured AI Tools
English Picks
Jules AI
Jules AI
Jules は、自動で煩雑なコーディングタスクを処理し、あなたに核心的なコーディングに時間をかけることを可能にする異步コーディングエージェントです。その主な強みは GitHub との統合で、Pull Request(PR) を自動化し、テストを実行し、クラウド仮想マシン上でコードを検証することで、開発効率を大幅に向上させています。Jules はさまざまな開発者に適しており、特に忙しいチームには効果的にプロジェクトとコードの品質を管理する支援を行います。
開発プログラミング
50.8K
NoCode
Nocode
NoCode はプログラミング経験を必要としないプラットフォームで、ユーザーが自然言語でアイデアを表現し、迅速にアプリケーションを生成することが可能です。これにより、開発の障壁を下げ、より多くの人が自身のアイデアを実現できるようになります。このプラットフォームはリアルタイムプレビュー機能とワンクリックデプロイ機能を提供しており、技術的な知識がないユーザーにも非常に使いやすい設計となっています。
開発プラットフォーム
45.8K
ListenHub
Listenhub
ListenHub は軽量級の AI ポッドキャストジェネレーターであり、中国語と英語に対応しています。最先端の AI 技術を使用し、ユーザーが興味を持つポッドキャストコンテンツを迅速に生成できます。その主な利点には、自然な会話と超高品質な音声効果が含まれており、いつでもどこでも高品質な聴覚体験を楽しむことができます。ListenHub はコンテンツ生成速度を改善するだけでなく、モバイルデバイスにも対応しており、さまざまな場面で使いやすいです。情報取得の高効率なツールとして位置づけられており、幅広いリスナーのニーズに応えています。
AI
43.6K
Chinese Picks
腾讯混元画像 2.0
腾讯混元画像 2.0
腾讯混元画像 2.0 は腾讯が最新に発表したAI画像生成モデルで、生成スピードと画質が大幅に向上しました。超高圧縮倍率のエンコード?デコーダーと新しい拡散アーキテクチャを採用しており、画像生成速度はミリ秒級まで到達し、従来の時間のかかる生成を回避することが可能です。また、強化学習アルゴリズムと人間の美的知識の統合により、画像のリアリズムと詳細表現力を向上させ、デザイナー、クリエーターなどの専門ユーザーに適しています。
画像生成
43.6K
OpenMemory MCP
Openmemory MCP
OpenMemoryはオープンソースの個人向けメモリレイヤーで、大規模言語モデル(LLM)に私密でポータブルなメモリ管理を提供します。ユーザーはデータに対する完全な制御権を持ち、AIアプリケーションを作成する際も安全性を保つことができます。このプロジェクトはDocker、Python、Node.jsをサポートしており、開発者が個別化されたAI体験を行うのに適しています。また、個人情報を漏らすことなくAIを利用したいユーザーにお勧めします。
オープンソース
46.4K
FastVLM
Fastvlm
FastVLM は、視覚言語モデル向けに設計された効果的な視覚符号化モデルです。イノベーティブな FastViTHD ミックスドビジュアル符号化エンジンを使用することで、高解像度画像の符号化時間と出力されるトークンの数を削減し、モデルのスループットと精度を向上させました。FastVLM の主な位置付けは、開発者が強力な視覚言語処理機能を得られるように支援し、特に迅速なレスポンスが必要なモバイルデバイス上で優れたパフォーマンスを発揮します。
画像処理
43.6K
English Picks
ピカ
ピカ
ピカは、ユーザーが自身の創造的なアイデアをアップロードすると、AIがそれに基づいた動画を自動生成する動画制作プラットフォームです。主な機能は、多様なアイデアからの動画生成、プロフェッショナルな動画効果、シンプルで使いやすい操作性です。無料トライアル方式を採用しており、クリエイターや動画愛好家をターゲットとしています。
映像制作
17.6M
Chinese Picks
LiblibAI
Liblibai
LiblibAIは、中国をリードするAI創作プラットフォームです。強力なAI創作能力を提供し、クリエイターの創造性を支援します。プラットフォームは膨大な数の無料AI創作モデルを提供しており、ユーザーは検索してモデルを使用し、画像、テキスト、音声などの創作を行うことができます。また、ユーザーによる独自のAIモデルのトレーニングもサポートしています。幅広いクリエイターユーザーを対象としたプラットフォームとして、創作の機会を平等に提供し、クリエイティブ産業に貢献することで、誰もが創作の喜びを享受できるようにすることを目指しています。
AIモデル
6.9M
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase