Shortcomings

Multimodal Capabilities

During the development of this project, there were limited robust open-source pre-trained models available with video (both visual and audio) capabilities. Consequently, the project has been constrained to accepting only image and text inputs.

Ollama Limitations

Using Ollama restricts the project to models compatible with its platform, which limits the range of models deployable in the system.

However, Ollama has many third-party apps available, such as Open WebUI and Ollama Telegram, which are actively being developed and improved. In the future, it may be possible for all models on HuggingFace to be integrated into Ollama.