A comprehensive AI service framework that integrates various AI functionalities, including image processing, speech-to-text, text-to-speech, and more, leveraging a Flask server as its backbone.
This project is a Flask-based server designed to handle multiple AI-related tasks such as image generation, depth estimation, object detection, image variation, sketch generation, music generation, text-to-speech, and speech-to-text conversion. It uses a modular architecture, where server.py
serves as the entry point and ai_main.py
directs specific tasks to their respective modules.
- Image Processing: Generate images from text, detect objects, create image variations, convert images to sketches, and estimate image depth.
- Speech Processing: Convert text to speech, and speech to text.
- Music Generation: Create music based on text prompts.
- LLM Queries: Interact with large language models to process and answer queries.
-
Clone the Repository
git clone https://github.com/TheCompAce/ai_server cd ai_server
-
Install Dependencies Ensure you have Python 3.8+ installed. Then, install the required Python packages.
pip install -r requirements.txt
-
Environment Variables Set up necessary environment variables, such as
OPENAI_API_KEY
, for modules that require external API access. -
Running the Server
python server.py
The server exposes various endpoints for interacting with AI models:
- Home: Serves static files and the main test page.
- /vision: Process a single image with a text prompt.
- /vision/multi: Process multiple images with a text prompt.
- /ask: Ask a question to a language model.
- /ask/json: Query a language model with a JSON payload.
- /ask/embed: Get embeddings from a language model.
- /music: Generate music from a text prompt.
- /speak: Convert text to speech.
- /speak/voice: Text-to-speech with predefined voices.
- /hear: Convert audio to text.
- /image: Generate images from text prompts.
- /image/variation: Generate image variations.
- /image/transform: Transform an image based on a prompt.
- /image/inpaint: Inpaint an image with a given mask.
- /image/detect: Detect objects in an image.
- /image/sketch: Generate sketches from images.
- /image/depth: Estimate depth in images.
- /image/removebg: Remove background from images.
- /sound: Generate sound effects from text prompts.
import requests
# Generate an image from text
response = requests.post('http://localhost:5000/image', data={'prompt': 'A scenic view of the mountains'})
ai_main.py
directs tasks to specific AI models, such astext_to_image
,speech_to_text
, etc.- Modules like
moondream1.py
,openai.py
,sdc.py
, andwhisper.py
are used for specific AI tasks, interfacing with models from Hugging Face, OpenAI, and others.
This project integrates various AI technologies and models, including OpenAI's GPT, Stable Diffusion for image generation, Whisper for speech-to-text, and more. Special thanks to the creators and maintainers of these models and the Flask framework for making web server integration straightforward.