Download DeepSeek R1 Model Locally free | AI RAG with LlamaIndex, Local Embedding and Ollama

In this article I will explain Step-by-step to locally download and use the DeepSeek R1 Model with Ollama for free! and also explain how to set up AI-powered Retrieval Augmented Generation (RAG) using the nomic-embed-text:latest embedding model and run the DeepSeek R1 Model locally via Ollama.

 




Prerequisites for this example is as follows:

  1. Visual studio code
  2. Python
  3. Ollama 

Open visual studio code and create the file with name "sample.py". Now in visual studio code and go to terminal menu and click on New terminal link it will open new terminal. In terminal enter below command to install the LlamaIndex library and LlamaIndex Ollama and LlamaIndex embedding Ollama library in your machine.

 pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama  

Create the folder named "doc" in root directory of the application as shown in below image and store the documents you want to query.

file names

 

To get Ollama models list you can visit ollama website below to select the models you want to download locally.

Ollama Search 

Ollama models list

To install the models you need to enter below commands one by one it will take time to install the models depending upon your network bandwidth speed and model size.

 

 ollama pull nomic-embed-text  
 ollama pull deepseek-r1:1.5b  

 To get list of all models installed locally you can run below command

 ollama list  

 ollama model list command

 Once above models installed then in sample.py enter the code below.

 from llama_index.embeddings.ollama import OllamaEmbedding  
 from llama_index.llms.ollama import Ollama  
 from llama_index.core import Settings,SimpleDirectoryReader,VectorStoreIndex  
 from llama_index.core.node_parser import SentenceSplitter  
   
   
 Embeddingmodel="nomic-embed-text:latest"  
 llmModel="deepseek-r1:1.5b"  
   
 embedObj=OllamaEmbedding(Embeddingmodel)  
 llmnObj=Ollama(llmModel,request_timeout=360.0)  
   
 Settings.llm=llmnObj  
 Settings.embed_model=embedObj  
 Settings.node_parser=SentenceSplitter(chunk_size=1024,chunk_overlap=20)  
   
   
 #document injection  
   
 documents=SimpleDirectoryReader("doc").load_data()  
   
 index=VectorStoreIndex.from_documents(documents)  
   
 #Query document  
 queryengineObj=index.as_query_engine()  
   
 inputString=input("Enter the query: ")  
   
 results=queryengineObj.query(inputString)    
   
 print(results.response)  
   
   

for above code there are three sections which is key for the RAG application

  1. The yellow colored code is for the LLM and embedding models initialization with llamaindex
  2.  The orange colored code is for document ingestion using embedding model and storing embedded chunks in in-memory vector.
  3. The green colored code is for document querying part which includes the getting relevant chunk of data based on user query using embedding model for answer generation   

Now enter below command to run the above application

  python sample.py  

it will run the application and now you can enter your query as follows in image with output of your query.

query output

 

Happy coding with Generative AI applications.😀 

Checkout my next article  

Use Chroma DB vector database in RAG application using llama index & Deepseek R1 local model


 


 

Comments

Popular posts from this blog

Angular User Session Timeout example step by step

Implement Nlog in .Net core MVC application part 1

I have created an application to extract text from image using AI

Implement Logging in CSV file using Nlog in .net core MVC application- part 2

Create Chat bot application using Python, Streamlit and Gemini AI