Turn any RSS feed into a newsletter or notification bot
feedletter is a service that
- watches RSS (or Atom!) feeds with great care
- works great with feeds generated by static-site generators!
- distinguishes between new items and older stuff or stuff already seen that flakily reappears
- awaits "finalization" of items, meaning their stabilization (and nondeletion) over specified time intervals
- lets you define a wide variety of subscriptions to those feeds
- Over different media
- Post to Mastodon
- SMS (coming soon!)
- etc
- In different arrangements
- each item as newsletter
- daily or weekly digests
- compilations of every
n
posts - etc
- Over different media
- which are formatted via rich, customizable
untemplates
- which are managed via a web API for easy subscription, confirmation, and unsubscription by users
The application requres a Java 17+ JVM and a Postgres database.
Typical installations will proxy the web API behind e.g. nginx
, and run the daemon as a systemd
service.
A (very) detailed tutorial on setting up, configuring, and customizing a feedletter instance is available here.
- A
feed
is added - One or more
subscribable
s is defined against the feed - One or more
destination
s subscribe to the feed. item
s are observed in the feed, and are added in theUnassigned
state- Each
item
is assigned, in a single transaction, to all the collections (assignable
s) to which they will ever belong.
(Steps 4 and 5 can repeat arbitrarily as new items come in.)
- Separately, collections (
assignable
s) are periodically marked "complete" and, in the same transaction forwarded to subscribers. - Complete
assignable
s are deleted, along with theirassignment
s item
s that are...- Already assigned
- No longer belong to not-yet-completed
assignables
can drop their cached contents, and then move into theCleared
state.
I want to sketch the not-so-obvious db schema I've adopted for this project while I still understand it.
First there are feeds:
CREATE TABLE feed(
id INTEGER,
url VARCHAR(1024),
min_delay_minutes INTEGER NOT NULL,
await_stabilization_minutes INTEGER NOT NULL,
max_delay_minutes INTEGER NOT NULL,
assign_every_minutes INTEGER NOT NULL,
added TIMESTAMP NOT NULL,
last_assigned TIMESTAMP NOT NULL, -- we'll start at added
PRIMARY KEY(id)
)
Feeds must be defined before subscriptions can be created against them. They are defined by a URL, and they define what it means for a feed to "finalize", in the sense of being ready for notification.
Feeds are permanent and basically unchanging until (when someday I implement this) they are manually removed.
(last_assigned
changes, but so far it's
just informational, has no role in the application.)
Next there are items:
CREATE TABLE item(
feed_id INTEGER,
guid VARCHAR(1024),
single_item_rss TEXT,
content_hash INTEGER, -- ItemContent.contentHash
link VARCHAR(1024),
first_seen TIMESTAMP NOT NULL,
last_checked TIMESTAMP NOT NULL,
stable_since TIMESTAMP NOT NULL,
assignability ItemAssignability NOT NULL,
PRIMARY KEY(feed_id, guid),
FOREIGN KEY(feed_id) REFERENCES feed(id)
)""".stripMargin
feed_id
andguid
identify an item.single_item_rss
caches the RSS item. We want to cache this, in case by the time we get around to notifying, the item is no longer available in the feed.content_hash
is a hash based on the prior five fields. We use it to identify whether an item has changed.link
may eventually be used as a neurotic double-check so we never notify the same human-perceived item twicefirst_seen
,last_checked
, andstable_since
are pretty self-explanatory timestamps, We use these to calculate whether an item has stabilized and so can be "assigned". (See below.)assignability
: items can be in one of four statesUnassigned
— The item has not yet been assigned to the collections (including single member collections) to which it will eventually belong, but is eligible for assignment.Assigned
— The item hash been assigned to all the collections (including single member collections) to which it will eventually belong. The application may not be done assigning to those collections, and the items may not yet be distributed to subscribers.Cleared
— This is the terminal state for an item. The item has been assigned to all collections, and have already been distributed to subscribers. The cache fields (title
,author
,article
,publication_date
, andlink
) should all be cleared in this state.Cleared
items are not deleted, but retained indefinitely, so that we don't renotify if the item (an item with the sameguid
) reappears in the feed.Excluded
— Items which are marked to always be ignored. Items are markeExcluded
only upon initial insert. Items can be manually updated fromExcluded
toUnassigned
(timestamps should be reset to the tie of the update), to causeExcluded
posts to be published.
Next there is subscribable
, which represents the definition of a subscription by which parties will be
notified of items or collections of items.
CREATE TABLE subscribable(
subscribable_name VARCHAR(64),
feed_id INTEGER NOT NULL,
subscription_manager_json JSONB NOT NULL,
last_completed_wti VARCHAR(1024),
PRIMARY KEY (subscribable_name),
FOREIGN KEY (feed_id) REFERENCES feed(id)
)
A subscribable maps a name to a feed and a SubscriptionManager
. For our purposes here,
the main role of a SubscriptionManager
(a serialization of a Scala ADT) is to
- Generate for items a
within_type_id
, which is really just a collection identifier. All items in a collection of items that will be distributed will share the samewithin_type_id
. - Determine whether a collection (identified by its
within_type_id
) is "complete" — that is, no further items need by assigned the samewithin_type_id
. - When a collection has been notified, it is deleted from the database. However, some
SubscriptionManagers
need to maintain a sequence ofwithin_type_id
identifiers. So for each subscribable, thelast_completed_wti
is retained.
SubscriptionManager
determines how collections are compiled, to what kind of destination (e-mail,
Mastodon, mobile message, whatever) notifications will be sent, and how they will be formatted.
Names are scoped on a per-feed-URL basis. Users subscribe to a (feed_url, subscribable_name)
pair.
Next there is assignable
, which represents a collection. They essentially map
subscribables
(subscription definitions) to within_type_id
s (the collections
generated by the subscription definition and notified to subscribers).
CREATE TABLE assignable(
subscribable_name VARCHAR(64),
within_type_id VARCHAR(1024),
opened TIMESTAMP NOT NULL,
PRIMARY KEY(subscribable_name, within_type_id),
FOREIGN KEY(subscribable_name) REFERENCES subscribable(subscribable_name)
)
opened
is the timestamp of the first assignment to the collection.
Once an assignable has been notified ("completed"), it is simply deleted from the database.
For each subscribable, the within_type_id
of only the most recently completed
assignable is retained (see subscribable
table above).
Next there is assignment
, which represents an item in an assignable
, i.e. a collection.
It's pretty self-explanatory I think.
CREATE TABLE assignment(
subscribable_name VARCHAR(64),
within_type_id VARCHAR(1024),
guid VARCHAR(1024),
PRIMARY KEY( subscribable_name, within_type_id, guid ),
FOREIGN KEY( subscribable_name, within_type_id ) REFERENCES assignable( subscribable_name, within_type_id )
)
Next there is subscription
, which just maps a destination to a subscribable
.
the destination is JSON blob that can refer to a variety of things: e-mail addresses, SMS numbers, mastodon instances, etc.
Each SubscriptionManager
works with a destination subtype.
CREATE TABLE subscription(
subscription_id BIGINT,
destination_json JSONB NOT NULL,
destination_unique VARCHAR(1024) NOT NULL,
subscribable_name VARCHAR(64) NOT NULL,
confirmed BOOLEAN NOT NULL,
added TIMESTAMP NOT NULL,
PRIMARY KEY( subscription_id ),
FOREIGN KEY( subscribable_name ) REFERENCES subscribable( subscribable_name )
)
Since destinations can have ornamentation (an e-mail address, for example, might have a personal part (e.g. Buffy in "Buffy [email protected]"), it's not sufficient to prevent multiple subscriptions to insist that the JSON entities be unique. So destinations declare a unique core, whose uniqueness within a subscription the database enforces:
CREATE UNIQUE INDEX destination_unique_subscribable_name ON subscription(destination_unique, subscribable_name)
That's it for the base schema! There are also tables that convert destinations specific to subscription types into their various queues for notification. I'm omitting those for now.
There are two layers of templating in feedletter:
Many notifications are rendered via untemplates. However, what untemplates render goes to all subscribers of a subscribable. We generate one "form letter" for all recipients, and store it only once.
But since we may want to customize our notifications in a per-recipient basis, the output of the untemplates
can take the form of a trivial template
with case-insensitive, percentage-delimited %Fields%
that get filled in separately for each recipient.
We try to refer to the former, initial, shared templates as untemplates (because that's the technology that underlies them), and the last-minute substitution templates that are generated by the untemplates as mere templates.