Download DeepSeek R1 Model Locally free | AI RAG with LlamaIndex, Local Embedding and Ollama

- March 23, 2025

In this article I will explain Step-by-step to locally download and use the DeepSeek R1 Model with Ollama for free! and also explain how to set up AI-powered Retrieval Augmented Generation (RAG) using the nomic-embed-text:latest embedding model and run the DeepSeek R1 Model locally via Ollama.

Prerequisites for this example is as follows:

Visual studio code
Python
Ollama

Open visual studio code and create the file with name "sample.py". Now in visual studio code and go to terminal menu and click on New terminal link it will open new terminal. In terminal enter below command to install the LlamaIndex library and LlamaIndex Ollama and LlamaIndex embedding Ollama library in your machine.

 pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama

Create the folder named "doc" in root directory of the application as shown in below image and store the documents you want to query.

To get Ollama models list you can visit ollama website below to select the models you want to download locally.

Ollama Search

Ollama models list

To install the models you need to enter below commands one by one it will take time to install the models depending upon your network bandwidth speed and model size.

 ollama pull nomic-embed-text  
 ollama pull deepseek-r1:1.5b

To get list of all models installed locally you can run below command

 ollama list

ollama model list command

Once above models installed then in sample.py enter the code below.

 from llama_index.embeddings.ollama import OllamaEmbedding  
 from llama_index.llms.ollama import Ollama  
 from llama_index.core import Settings,SimpleDirectoryReader,VectorStoreIndex  
 from llama_index.core.node_parser import SentenceSplitter  
   
   
 Embeddingmodel="nomic-embed-text:latest"  
 llmModel="deepseek-r1:1.5b"  
   
 embedObj=OllamaEmbedding(Embeddingmodel)  
 llmnObj=Ollama(llmModel,request_timeout=360.0)  
   
 Settings.llm=llmnObj  
 Settings.embed_model=embedObj  
 Settings.node_parser=SentenceSplitter(chunk_size=1024,chunk_overlap=20)  
   
   
 #document injection  
   
 documents=SimpleDirectoryReader("doc").load_data()  
   
 index=VectorStoreIndex.from_documents(documents)  
   
 #Query document  
 queryengineObj=index.as_query_engine()  
   
 inputString=input("Enter the query: ")  
   
 results=queryengineObj.query(inputString)    
   
 print(results.response)

for above code there are three sections which is key for the RAG application

The yellow colored code is for the LLM and embedding models initialization with llamaindex
The orange colored code is for document ingestion using embedding model and storing embedded chunks in in-memory vector.
The green colored code is for document querying part which includes the getting relevant chunk of data based on user query using embedding model for answer generation

Now enter below command to run the above application

  python sample.py

it will run the application and now you can enter your query as follows in image with output of your query.

Happy coding with Generative AI applications.😀

Checkout my next article

Use Chroma DB vector database in RAG application using llama index & Deepseek R1 local model

Search This Blog

Download DeepSeek R1 Model Locally free | AI RAG with LlamaIndex, Local Embedding and Ollama

Comments

Post a Comment

Popular posts from this blog

I have created an application to extract text from image using AI

Create Chat bot application using Python, Streamlit and Gemini AI

Understanding the Singleton Design Pattern in C#

Angular User Session Timeout example step by step

Use Chroma DB vector database in RAG application using llama index & Deepseek R1 local model