Hacking handles into Logseq PDF citations
§ Background
Logseq has relatively best-in-class PDF annotation support. My biggest problem was that all PDF citations I don't know that Logseq has a name for the annotation link. Here, I call it a "citation". only show the page number of the reference. If you refer to multiple documents, you need to click through to tell which one refers to which document!
§ Before
§ After
§ Asking the Community
I posted in the Logseq Discord:
Hiya folks! I’m looking to extend the (colored dot) P annotation handle thing. I really want to pull in a PDF name—maybe defined as a property on the annotation page? i.e. instead of * P240 “hello world” it may say something like * Logseq for dummies P240 “hello world”. Is this possible? Anyone have any tips or pointers? I am a pretty competent programmer but haven’t done Logseq plugin-y work before.
Someone named Charlie jumped in, and said:
Cool! That’s a great idea! The current plugin SDK doesn’t provide an API to intercept the rendering of PDF annotation blocks. Even if it is possible to achieve through hacking, it may be complicated and have performance issues. We will consider implementing this feature natively or providing a specific plugin API as soon as possible.
Phrased like a seasoned open source developer!
Unfortunately, I have to quickly pick the underlying tool for an annotation project. There are no clear winners, but Logseq is at the top. This is one of the main blockers—if I can hack this feature in, even in a hard-to-maintain and performance-impacting way, it may be enough to make me feel comfortable using Logseq for the project. I hope by the time I’m done with the project, Logseq won’t have this problem anymore.
§ Hacking
I haven’t worked with Logseq’s internals before, and I don’t think I’ve worked with Clojure. I’ve done some work in Lisp-y environments, though, so I opened up the source. Within a few minutes, I was able to find where the citations get added (in block.cljs
). I could tell that the section didn’t directly have the document or document name, but it did have the document ID. Time for hacking!
I set up the development environment.
- I installed Clojure using Homebrew, checked out Logseq, ran
yarn
, and thenyarn watch
. - Installing the yarn dependencies kept failing. An error message implied the spaces in the directory path were causing a problem. I moved the project to a different directory without spaces and rerunning the command worked.
- I was able to browse to
localhost:3001
, but PDF stuff didn’t seem to load right. I had thought the PDF stuff worked on the web interface, actually, but I’m not sure if it does. Before digging into that, I decided to try the Electron app. - I built the dev electron app with
yarn watch
and waiting for it to get to a steady state, and then opening a new terminal and runningyarn dev-electron-app
. I kept both running. The PDF citation seemed to work like it did in my regular version of Logseq. - I could make changes in the source directories, and within a few seconds, the changes occurred in the Electron window. If I ran
(println "hello world")
in the program, the output would show up in the browser JavaScript console. - I opened the
logseq
directory with PyCharm and installed the Clojure plugin.
This wasn’t done with any real knowledge of Clojure or Logseq internals, unfortunately, so please don’t take this as a great example of how to do Clojure work! I saw reference to a REPL, but the Electron change loop was pretty fast, and I wasn’t sure I had the Clojure knowledge to translate things from the REPL into a plugin or patch to the source.
While looking for examples of getting properties through a database lookup, I found some code in Logseq that was using debug/pprint. The advantage of using pprint
over println
appeared to be that pprint
would indent the output to show the structure of the data better.
I incrementally found the reference to the PDF document, got a property of that page through a database lookup, and then rendered it in the citation.
(debug/pprint "adamwolf: t" t)
(debug/pprint "adamwolf: (:block/page t)" (:block/page t))
(debug/pprint "adamwolf xyzzy: (db-utils/pull (:db/id (:block/page t)))"
(db-utils/pull (:db/id
(:block/page t))) )
(debug/pprint "adamwolf (db-utils/pull (:block/page t))"
(:hl-handle
(:block/properties
db-utils/pull
(:db/id
(:block/page t)))))
I added some comments to explain my intent. You can see the change in context.
;; Get the ID of the page referred to by t's :block/page.
;;
;; Use that ID to pull the page from the database, and only take the
;; :block/properties map.
;;
;; Look up the :hl-handle in the :block/properties, and bind it
;; to awolf-hl-handle.
;;
;; If any of those things are nil or don't exist, awolf-hl-handle is nil.
;;
;; Then, if awolf-hl-handle is not nil, render a span.hl-handle with
;; the handle and a space in it.
(let [awolf-hl-handle (:hl-handle
(:block/properties
(db-utils/pull [:block/properties]
(:db/id (:block/page t)))))]
(when awolf-hl-handle
[:span.hl-handle
[:strong.forbid-edit (str awolf-hl-handle " ")]]
)
)
Now, if I add a property of hl-handle:: foo
to a document page, all the citations for that PDF are formatted like foo P103
! Great!
§ Can I use this?
This is definitely a good proof of concept. Is it something I would want to use locally for a bit?
-
Speed and Scalability
I’ve added a database lookup for every block title render of a citation, I think, and I don’t think there were any database lookups before. I don’t know Logseq enough to know if this is something to worry about. Is there a cache I should be explicitly using? Should I pull in a copy of the handle into the highlight’s properties, and then updating all the highlights if the handle changes? Is the lookup actually fine?
¯\_(ツ)_/¯
Rendering speed seems fine on my largest pages on my desktop—which, for a change intended for myself, seems like a reasonable test. -
Database and Disk Format Agnosticism
I want to avoid affecting how things are stored in the database or on disk. Locking myself into a custom Logseq sounds like a bad idea!
I think the change only affects presentation.
-
Maintenance Effort
As Logseq changes, is this patch something that I could easily bring forward without a lot of work?
The change is short and doesn’t touch a lot of parts of Logseq. Maybe!
§ Wrapping up
To close the loop, I posted an update in the Discord. I wanted to make sure that if someone came across my request in the future, they wouldn’t find a “Oh yeah, I got it working!” message with no details. I also wanted to make sure I didn’t come across expecting my quick-and-dirty change to be adopted across the whole project.
I posted the following in the Discord thread.
OK, so for posterity’s sake, lemme post what I did. I added the following to
block.cljs
.;; Get the ID of the page referred to by t's :block/page. ;; Use that ID to pull the page from the database, and only take the :block/properties map. ;; Then, look up the :hl-handle in the :block/properties, and bind it to awolf-hl-handle.. ;; If any of those things are nil or don't exist, awolf-hl-handle is nil. ;; Then, if awolf-hl-handle is not nil, render a span.hl-handle with the handle and a space in it. (let [awolf-hl-handle (:hl-handle (:block/properties (db-utils/pull [:block/properties] (:db/id (:block/page t)))))] (when awolf-hl-handle [:span.hl-handle [:strong.forbid-edit (str awolf-hl-handle " ")]] ) )
This adds a db lookup for every single block render of an annotation, maybe for every block. The code is probably an abomination. I haven’t ever used Clojure before, and I don’t really know how Logseq works in any way. This is a little personal proof-of-concept patch.
Anyway, after that, you can add an
hl-handle:: foo
property to the annotation page, and then those annotated things will show up with the handle prefix before the page number.
I hope developers add a PDF citation plugin hook. I’d like to see this feature without maintaining a fork (albeit a tiny one). I like the document handle, but there’s room for improvement. For instance, supporting custom citation formats! I’d like to see a chapter number, actually, even if only in exports. Zotero, for instance, supports thousands of citation formats. There's even a Citation Style Language supported by multiple tools! Giving an export to someone without the PDF becomes more useful when a citation reads “Gideon the Ninth, ch. 32, p. 409” rather than “P409”.
If this was helpful or enjoyable, please share it! To get new posts, subscribe to the newsletter or the RSS/Atom feed. If you have comments, questions, or feedback, please email me.