跳轉到

Protégé to Neo4j GraphRAG: Transforming OWL Ontologies into AI-Ready, Powerful Knowledge Graphs / 從 Protégé 到 Neo4j GraphRAG:將 OWL 本體論轉換為 AI 就緒的強大知識圖譜

Author: Vishal Mysore
作者:Vishal Mysore

Published:
發布日期

Source: https://medium.com/@visrow/prot%C3%A9g%C3%A9-to-neo4j-graphrag-transforming-owl-ontologies-into-ai-ready-powerful-knowledge-graphs-700963c46a42
來源:https://medium.com/@visrow/prot%C3%A9g%C3%A9-to-neo4j-graphrag-transforming-owl-ontologies-into-ai-ready-powerful-knowledge-graphs-700963c46a42

Fetched: 2026-06-07T02:15:46.305159
擷取時間:2026-06-07T02:15:46.305159


Protégé to Neo4j GraphRAG: Transforming OWL Ontologies into AI-Ready, Powerful Knowledge Graphs / 從 Protégé 到 Neo4j GraphRAG:將 OWL 本體論轉換為 AI 就緒的強大知識圖譜

Building AI-Queryable Knowledge Graphs with Protégé, Neo4j, and Qdrant based GraphRAG for Real-World Retrieval-Augmented Generation / 運用 Protégé、Neo4j 以及基於 Qdrant 的 GraphRAG,建構可供 AI 查詢的知識圖譜,以實現真實世界的檢索增強生成 (Retrieval-Augmented Generation)

It all begins with structured meaning. Consider a cybersecurity ontology where WannaCry ransomware exploits CVE-2023–1234, APT28 targets enterprise assets, SQL injection attacks originate from threat actors, and web applications are protected by WAFs and firewalls — all formally defined in OWL/RDF using Protégé. This is not just documentation; this is machine-computable intelligence. In this article, I show how such rich semantic models are transformed into AI-queryable Neo4j knowledge graphs and supercharged with Qdrant-based GraphRAG, enabling large language models to answer complex questions like “Which malware exploits our most critical vulnerabilities and how is it mitigated?” with precision, traceability, and real-world production relevance.

一切都始於結構化的語意。試想一個資安本體論 (Ontology),其中WannaCry 勒索軟體利用 CVE-2023–1234 漏洞APT28 鎖定企業資產SQL 注入攻擊源自威脅行為者 (Threat Actor),而且Web 應用程式受到 WAF 與防火牆保護——這一切都使用 Protégé 以 OWL/RDF 形式正式定義。這不僅僅是文件記錄,而是機器可計算的情報。在本文中,我將展示如何將這類豐富的語意模型轉換為可供 AI 查詢的 Neo4j 知識圖譜 (Knowledge Graph),並透過基於 Qdrant 的 GraphRAG 大幅強化,使大型語言模型 (Large Language Model) 能夠精準、可追溯且具備真實生產環境關聯性地回答諸如「哪個惡意軟體利用了我們最關鍵的漏洞,以及它是如何被緩解的?」之類的複雜問題。

Neo4j is a Labeled Property Graph (LPG) database where both nodes and relationships are first-class citizens with labels, direction, and rich properties — making it an ideal runtime engine for ontology-driven GraphRAG systems.

Neo4j 是一個帶標籤屬性圖 (Labeled Property Graph,LPG) 資料庫,其中節點與關係都是一等公民 (first-class citizen),具備標籤、方向以及豐富的屬性——這使它成為本體論驅動的 GraphRAG 系統的理想執行引擎。

<IPIndicator rdf:about="#IOC_MaliciousIP">  
        <rdfs:label>Malicious IP Address</rdfs:label>  
        <ipAddress>45.123.45.67</ipAddress>  
        <threatLevel>High</threatLevel>  
    </IPIndicator>  

    <FileHashIndicator rdf:about="#IOC_MalwareHash">  
        <rdfs:label>Malware File Hash</rdfs:label>  
        <fileHash>ed01ebfbc9eb5bbea545af4d01bf5f1071661840480439c6e5babe8e080e41aa</fileHash>  
    </FileHashIndicator>  

    <DomainIndicator rdf:about="#IOC_MaliciousDomain">  
        <rdfs:label>Malicious Domain</rdfs:label>  
        <description>evil-command-server.com</description>  
        <threatLevel>Critical</threatLevel>  
    </DomainIndicator>  

    <CommandAndControlServer rdf:about="#C2_Server01">  
        <rdfs:label>C2 Server</rdfs:label>  
        <ipAddress>45.123.45.67</ipAddress>  
    </CommandAndControlServer>  

    <!-- Relationships -->

Full Ontology is here https://github.com/vishalmysore/graphrag/blob/main/graphrag/ontologies/cybersecurity-threat.owl

完整的本體論在此 https://github.com/vishalmysore/graphrag/blob/main/graphrag/ontologies/cybersecurity-threat.owl

Now lets load them into Protege!

現在,讓我們把它們載入 Protégé!

For setup instructions please look at my previous article here

關於設定說明,請參閱我先前的文章這裡

Once you load the ontology you can export it to Neo4J AuraDB cloud from the plugin directly

一旦你載入本體論,便可直接從外掛程式將其匯出至 Neo4J AuraDB 雲端

After you export the ontology you will get a confirmation on total number of nodes, classes, exported

匯出本體論後,你會得到一則確認訊息,顯示已匯出的節點 (node) 與類別 (class) 總數

Login to your Neo4J cloud and view the graph directly

登入你的 Neo4J 雲端,即可直接檢視圖譜

You can write different Cypher or NLP queries directly on Neo4J and view the results

你可以直接在 Neo4J 上撰寫各種 Cypher 或自然語言處理 (NLP) 查詢並檢視結果

Create a custom Dashboard for Exploits

為漏洞利用 (Exploit) 建立一個自訂儀表板

Or Run a Cypher Query directly

或是直接執行 Cypher 查詢

View Graph in different formats on Neo4J

在 Neo4J 上以不同格式檢視圖譜

Or This

或是這個

Lets take a deeper look

讓我們更深入地探討

1. Core Classes (Neo4j Node Labels) / 1. 核心類別(Neo4j 節點標籤)

The ontology defines the main entities in the cybersecurity domain, organized into a hierarchy:

本體論定義了資安領域中的主要實體 (entity),並組織成一個階層結構:

Press enter or click to view image in full size

按 Enter 鍵或點擊以檢視完整尺寸的圖片

2. Key Relationships (Neo4j Relationship Types) / 2. 關鍵關係(Neo4j 關係類型)

These define how the different entities interact, forming the connections between the nodes in your Neo4j graph:

這些定義了不同實體之間如何互動,構成你 Neo4j 圖譜中各節點之間的連結:

Press enter or click to view image in full size

按 Enter 鍵或點擊以檢視完整尺寸的圖片

3. Instantiated Knowledge Graph Examples / 3. 已實例化的知識圖譜範例

The INSTANCES section creates real-world data points and connects them, illustrating how the model works and what the resulting graph structure would look like.

INSTANCES(實例)區段建立了真實世界的資料點並將它們相互連結,藉此說明該模型如何運作,以及最終的圖譜結構會是什麼樣貌。

Press enter or click to view image in full size

按 Enter 鍵或點擊以檢視完整尺寸的圖片

4. Example Cypher Queries for Neo4j / 4. Neo4j 的 Cypher 查詢範例

Once this ontology is migrated to Neo4j, you can write powerful Cypher queries to perform sophisticated threat intelligence analysis.

一旦此本體論遷移至 Neo4j,你便可以撰寫功能強大的 Cypher 查詢來執行精密的威脅情報 (Threat Intelligence) 分析。

Query 1: Find all assets targeted by attacks originating from a specific Threat Actor.

查詢 1:找出所有遭到源自特定威脅行為者之攻擊所鎖定的資產。

MATCH (actor:ThreatActor {label: "APT28 Group"})  
-[:ORIGINATES_FROM]-> (a:Attack)  
-[:TARGETS]-> (asset:Asset)  
RETURN asset.label, a.label

Query 2: Find the Security Controls that mitigate vulnerabilities exploited by a specific piece of Malware.

查詢 2:找出能緩解某特定惡意軟體所利用之漏洞的安全控制措施 (Security Control)。

MATCH (m:Malware {label: "WannaCry Ransomware"})  
-[:COMMUNICATES_WITH]-> (c2:CommandAndControlServer)  
MATCH (m)-[:HAS_INDICATOR]->(ioc:IOC)  
RETURN m.label, c2.ipAddress, ioc.label, ioc.fileHash

Now lets export the ontology to GraphRag based on Qdrant and run some queries / 現在,讓我們將本體論匯出至基於 Qdrant 的 GraphRAG 並執行一些查詢

Queries are avaiable here https://github.com/vishalmysore/graphrag/blob/main/graphrag/queries/cybersecurity-queries.md

查詢內容可在此取得 https://github.com/vishalmysore/graphrag/blob/main/graphrag/queries/cybersecurity-queries.md

Press enter or click to view image in full size

按 Enter 鍵或點擊以檢視完整尺寸的圖片

The concept of transforming an Ontology into a Knowledge Graph (KG) stored in a system like Neo4j is fundamentally about bridging two different, yet complementary, paradigms for knowledge representation: Semantic Web Models and Labeled Property Graphs (LPGs).

本體論轉換為儲存於 Neo4j 這類系統中的知識圖譜 (Knowledge Graph,KG),這個概念從根本上講就是要橋接兩種不同卻又互補的知識表示 (Knowledge Representation) 範式:語意網模型 (Semantic Web Model)帶標籤屬性圖 (Labeled Property Graph,LPG)

🧠 1. The Two Paradigms / 🧠 1. 兩種範式

The concept relies on understanding the distinct strengths of the semantic model (Protégé/OWL) and the property graph model (Neo4j/Cypher).

這個概念有賴於理解語意模型(Protégé/OWL)與屬性圖模型(Neo4j/Cypher)各自不同的優勢。

A. Ontology (OWL/RDF) — The Formal Blueprint / A. 本體論(OWL/RDF)——正式的藍圖

An Ontology is a formal, explicit specification of a conceptualization. It acts as the schema or blueprint for an entire knowledge domain.

本體論是對某種概念化 (conceptualization) 所做的正式、明確的規格描述。它扮演著整個知識領域的綱要 (schema)藍圖角色。

  • Focus: Formal Semantics, Reasoning, and Consistency. What must be true based on logical rules (axioms and constraints).

  • 焦點: 形式語意、推理與一致性。也就是根據邏輯規則(公理 (axiom) 與約束)而必然為真的事物。

  • Structure: Uses the Resource Description Framework (RDF), which is built on Triples: (Subject, Predicate, Object).

  • 結構:使用資源描述框架 (Resource Description Framework,RDF),其建構基礎是三元組 (Triple)(主語, 謂語, 賓語)

  • Classes (e.g., Threat, Malware) define the types of entities.

  • 類別 (Class)(例如 ThreatMalware)定義了實體的類型。

  • Object Properties (e.g., exploits, targets) define relationships between entities.

  • 物件屬性 (Object Property)(例如 exploitstargets)定義了實體之間的關係。

  • Datatype Properties (e.g., cveID, severity) define attributes of entities.

  • 資料型別屬性 (Datatype Property)(例如 cveIDseverity)定義了實體的屬性。

  • Strength: Enables logical reasoning (e.g., inferring that a Ransomware instance is also a Threat instance) and ensures data consistency using reasoners like HermiT.

  • 優勢:能夠進行邏輯推理(例如推論出某個 Ransomware 實例同時也是 Threat 實例),並透過 HermiT 之類的推理器 (reasoner) 確保資料一致性。

B. Knowledge Graph (LPG) — The Operational Data Structure / B. 知識圖譜(LPG)——可操作的資料結構

A Knowledge Graph in a system like Neo4j uses the Labeled Property Graph (LPG) model. It focuses on storing and querying large volumes of interconnected, real-world data efficiently.

在 Neo4j 這類系統中的知識圖譜使用帶標籤屬性圖 (Labeled Property Graph,LPG) 模型。它著重於高效率地儲存與查詢大量相互連結的真實世界資料。

  • Focus: Efficient Traversal, Pattern Matching, and Scalability. What exists and how is it connected in the data.

  • 焦點: 高效率的遍歷 (traversal)、模式比對 (pattern matching) 與可擴展性。也就是資料中存在什麼以及它們如何相互連結

  • Structure: Comprised of four key components:

  • 結構:由四個關鍵元件組成:

  • Nodes: Represent entities (e.g., specific server, specific malware instance).

  • 節點 (Node):代表實體(例如特定的伺服器、特定的惡意軟體實例)。

  • Labels: Categorize nodes (e.g., :Server, :Malware). A node can have multiple labels.

  • 標籤 (Label):對節點進行分類(例如 :Server:Malware)。一個節點可以擁有多個標籤。

  • Relationships (Edges): Connect nodes and are always directional (e.g., [:TARGETS]).

  • 關係(邊,Edge):連接節點,且永遠具有方向性(例如 [:TARGETS])。

  • Properties: Key-value pairs on both nodes and relationships.

  • 屬性 (Property):附加於節點與關係之上的鍵值對 (key-value pair)。

  • Strength: Enables fast, iterative pathfinding (Cypher’s MATCH and RETURN) and the application of graph algorithms (e.g., PageRank).

  • 優勢:能夠進行快速、反覆迭代的路徑搜尋 (pathfinding)(Cypher 的 MATCHRETURN),並應用圖演算法 (graph algorithm)(例如 PageRank)。

🗺️ 2. The Core Conversion Mapping / 🗺️ 2. 核心轉換對映

The conceptual conversion process translates the formal semantic components of the ontology into the structural components of the Labeled Property Graph.

這個概念性的轉換過程,是將本體論的形式語意元件轉譯為帶標籤屬性圖的結構元件。

Press enter or click to view image in full size

按 Enter 鍵或點擊以檢視完整尺寸的圖片

Example Conversion Trace / 轉換追蹤範例

Our ontology defined the following relationship structure:

我們的本體論定義了以下的關係結構:

Threat - Exploits-->  Vlunerablity

In the Neo4j Knowledge Graph, this is materialized by:

在 Neo4j 知識圖譜中,這會具體化為:

  1. A Threat Node (e.g., labeled :Ransomware)

  2. 一個威脅節點(例如標籤為 :Ransomware

  3. An outbound Relationship (Type [:EXPLOITS])

  4. 一個對外的關係(類型為 [:EXPLOITS]

  5. A Vulnerability Node (e.g., labeled :ZeroDayVulnerability)

  6. 一個漏洞節點(例如標籤為 :ZeroDayVulnerability

A Cypher query to find this pattern would look like:

用來尋找此模式的 Cypher 查詢會像這樣:

MATCH (t:Threat)-[:EXPLOITS]->(v:Vulnerability)  
RETURN t.label, v.cveID

🔄 3. The Power of Synergy / 🔄 3. 綜效的力量

The core conceptual benefit of converting an ontology to an LPG Knowledge Graph is the ability to combine the best of both worlds:

將本體論轉換為 LPG 知識圖譜的核心概念性效益,在於能夠兼取兩種世界的精華:

  • Logical Rigor + Computational Speed: We use the ontology to define the meaning and rules of our domain (e.g., a Vulnerability must have a CVE ID), ensuring high data quality. We use the Knowledge Graph to store and query billions of actual instances at high speed.

  • 邏輯嚴謹性 + 運算速度:我們使用本體論來定義領域的意義規則(例如,一個漏洞必須擁有一個 CVE ID),以確保高品質的資料。我們則使用知識圖譜來高速儲存查詢數十億個實際的實例。

  • Schema Flexibility: The LPG model is schema-optional (or schema-flexible), allowing you to quickly ingest new, messy data, while the ontology acts as the canonical, semantic layer on top, validating and organizing the data.

  • 綱要彈性:LPG 模型是綱要選用 (schema-optional)(或稱綱要彈性 (schema-flexible))的,讓你能夠快速攝入嶄新且雜亂的資料,而本體論則作為其上方權威的語意層 (semantic layer),負責驗證並組織資料。

  • Advanced Inference: The initial ontology can be used by an OWL reasoner to infer new facts (e.g., inferring a vulnerability is “High Risk” based on its CVSS score). These inferred facts can then be written directly back into the Neo4j graph as new nodes or relationships, making the graph smarter and ready for querying.

  • 進階推論:最初的本體論可由 OWL 推理器用來推論出新的事實(例如,根據漏洞的 CVSS 分數推論出它屬於「高風險」)。這些推論出的事實隨後便可直接寫回 Neo4j 圖譜,成為新的節點或關係,使圖譜變得更聰明且隨時可供查詢。

This synergy enables powerful applications like Graph-based Retrieval-Augmented Generation (GraphRAG), where the graph provides grounded, explicit knowledge to large language models, mitigating hallucination and providing context-aware answers.

這種綜效促成了諸如基於圖譜的檢索增強生成 (Graph-based Retrieval-Augmented Generation,GraphRAG) 這類強大的應用,在其中,圖譜為大型語言模型提供有所依據、明確的知識,從而減輕幻覺 (hallucination) 現象並提供具備情境感知 (context-aware) 的答案。