Obsidian tools - a Python package for analysing an Obsidian.md vault
obsidiantools is a Python package for getting structured metadata about your Obsidian.md notes and analysing your vault. Complement your Obsidian workflows by getting metrics and detail about all your notes in one place through the widely-used Python data stack.
It's incredibly easy to explore structured data on your vault through this fluent interface. This is all the code you need to generate a vault
object that stores all the data:
import obsidiantools.api as otools
vault = otools.Vault(<VAULT_DIRECTORY>).connect().gather()
These are the basics of the method calls:
connect()
: connect your notes together in a graph structure and get metadata on links (e.g. wikilinks, backlinks, etc.) There ais the option to support the inclusion of 'attachment' files in the graph.gather()
: gather the plaintext content from your notes in one place. This includes the 'source text' that represent how your notes are written. There are arguments to support what text you want to remove, e.g. remove code.See some of the key features below - all accessible from the vault
object either through a method or an attribute.
The package is built to support the 'shortest path when possible' option for links. This should cover the vast majority of vaults that people create. See the wiki for more info on what sort of wikilink syntax is not well-supported and how the graph may be slightly different to what you see in the Obsidian app.
This is how obsidiantools
can complement your workflows for note-taking:
networkx
graph of your vault (vault.graph
)
vault
, the analysis can also be filtered on specific subdirectories.vault.get_note_metadata()
(notes / md files), vault.get_media_file_metadata()
(media files that can be embedded in notes) and vault.get_canvas_file_metadata()
(canvas files).md_file_index
, media_file_index
and canvas_file_index
(canvas files).vault
attributes like nonexistent_notes
, nonexistent_media_files
and nonexistent_canvas_files
.vault
attributes like isolated_notes
, isolated_media_files
and isolated_canvas_files
.vault.backlinks_index
for all backlinks in the vaultvault.get_backlinks(<NOTE>)
for the backlinks of an individual notevault.get_front_matter(<NOTE>)
or vault.front_matter_index
vault.get_tags(<NOTE>)
or vault.tags_index
. Nested tags are supported.vault.get_math(<NOTE>)
or vault.math_index
gather()
is called:
vault.get_source_text(<NOTE>)
). This tries to represent how a note's text appears in Obsidian's 'source mode'.vault.get_readable_text(<NOTE>)
). This tries to reduce note text to minimal markdown formatting, e.g. preserving paragraphs, headers and punctuation. Only slight processing is needed for various forms of NLP analysis.vault.canvas_content_index
vault.canvas_graph_detail_index
dictCheck out the functionality in the demo repo. Launch the '15 minutes' demo in a virtual machine via Binder:
There are other API features that try to mirror the Obsidian.md app, for your convenience when working with Python, but they are no substitute for the interactivity of the app!
The text from vault notes goes through this process: markdown → split out front matter from text → HTML → ASCII plaintext.
pip install obsidiantools
Requires Python 3.9 or higher.
markdown
pymdown-extensions
html2text
pandas
numpy
networkx
python-frontmatter
beautifulsoup4
lxml
bleach
All of these libraries are needed so that the package can separate note text from front matter in a generalised approach.
A small 'dummy vault' vault of lipsum notes is in tests/vault-stub
(generated with help of the lorem-markdownum tool). Sense-checking on the API functionality was also done on a personal vault of over 800 notes.
I am not sure how the parsing will work outside of Latin languages - if you have ideas on how that can be supported feel free to suggest a feature or pull request.
Modified BSD (3-clause)