Download DeepSeek R1 Model Locally free | AI RAG with LlamaIndex, Local Embedding and Ollama
In this article I will explain Step-by-step to
locally download and use the DeepSeek R1 Model with Ollama for free!
and also explain how to set up AI-powered Retrieval Augmented Generation (RAG)
using the nomic-embed-text:latest embedding model and run the DeepSeek
R1 Model locally via Ollama.
Prerequisites for this example is as follows:
- Visual studio code
- Python
- Ollama
Open visual studio code and create the file with name "sample.py". Now in visual studio code and go to terminal menu and click on New terminal link it will open new terminal. In terminal enter below command to install the LlamaIndex library and LlamaIndex Ollama and LlamaIndex embedding Ollama library in your machine.
pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama
Create the folder named "doc" in root directory of the application as shown in below image and store the documents you want to query.
To get Ollama models list you can visit ollama website below to select the models you want to download locally.
To install the models you need to enter below commands one by one it will take time to install the models depending upon your network bandwidth speed and model size.
ollama pull nomic-embed-text
ollama pull deepseek-r1:1.5b
To get list of all models installed locally you can run below command
ollama list
Once above models installed then in sample.py enter the code below.
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings,SimpleDirectoryReader,VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
Embeddingmodel="nomic-embed-text:latest"
llmModel="deepseek-r1:1.5b"
embedObj=OllamaEmbedding(Embeddingmodel)
llmnObj=Ollama(llmModel,request_timeout=360.0)
Settings.llm=llmnObj
Settings.embed_model=embedObj
Settings.node_parser=SentenceSplitter(chunk_size=1024,chunk_overlap=20)
#document injection
documents=SimpleDirectoryReader("doc").load_data()
index=VectorStoreIndex.from_documents(documents)
#Query document
queryengineObj=index.as_query_engine()
inputString=input("Enter the query: ")
results=queryengineObj.query(inputString)
print(results.response)
for above code there are three sections which is key for the RAG application
- The yellow colored code is for the LLM and embedding models initialization with llamaindex
- The orange colored code is for document ingestion using embedding model and storing embedded chunks in in-memory vector.
- The green colored code is for document querying part which includes the getting relevant chunk of data based on user query using embedding model for answer generation
Now enter below command to run the above application
python sample.py
it will run the application and now you can enter your query as follows in image with output of your query.
Happy coding with Generative AI applications.😀
Checkout my next article
Use Chroma DB vector database in RAG application using llama index & Deepseek R1 local model
Comments
Post a Comment