CCBot/2.0 (https://commoncrawl.org/faq/) TLS_AES_256_GCM_SHA384
Ẕ̶͊a̴̦͒͠l̷̠͍̅g̸̢̙͛͗o̴̢̻͝ ̷̺̝̆F̶́ͅḣ̷̢̪̋t̸͔̭̂͆a̸̛͇̯̐g̸̗͆n̴͗̌͜!̶̖͎̐̈

Obscure Webdev

Mon, Nov 04 2024 - 14:41

We have long practiced a form of web development that is so obscure that there's probably barely a couple hundred people following a similar approach.

For that reason, grand and benevolent as we are, we thought it time to finally share some of the esoteric wisdom we gained in our journey on these tubes we call the interbobs.

All while also showing off some of the nice features of this site, both visible and invisble.

Semantically Stupid Shenanigans

We would assume it goes without saying, but because it probably doesn't, we want to spell it out right up front: The decisions we made and the practices we follow in our web development are deeply and inherently political.

If that's a problem for you – well, that's definitely a you problem, what the fuck are you even doing on an explicitly anarchist site anyhow?

Cornerstones

To make sense of the weird-ass smorgasbord of techniques we use when making one of these sites of the wob, we should probably first talk about why we do what we do the way we do it. doo bee doo bee doo…

Reader Privacy

Much of this boils down to the concept of reader privacy, a term which refers to people's right to read things without the government knowing what they read.

The problem this tries to address is that a comprehensive list of your reading materials makes building a detailed profile of your political, sexual and other orientations downright trivial.

The implications of such profiles are dire. In comparison, the sort of data tracking and surveillance methods enabling both the Nazis with their IBM Hollerith machines and later the StaSi in East Germany were laughably primitive.

While reader privacy is usually used in the context of journalistic publications and nonfiction books and only in connection with government access, our definition is quite a bit broader to remain relevant in these so very dystopianinteresting times we are finding ourselves in.

You may or may not know it, but we do in fact live in an age of ubiquitous and routine mass surveillance. For well over a decade now, the systems perpetrating this surveillance are built and run not only by shady government organizations like the NSAs and GCHQs of ECHELON and Snowden-leak fame, but even moreso by at least equally shady corporations aiming to press money out of each and every interaction on the net (and, given enough time and CCTV cams, in our physical spaces, too).

Quick digression; Did you know that the GCHQ already had a working "full take", i.e. reading all of the UKs internet traffic, since 2011? You literally can't put more mass into mass surveillance than that…

So, in our definition of reader privacy, we include pretty much all writings that are in some way published. In our context, that means every single post on all our wobsites.

To us, protecting reader privacy is an ethical obligation which we aim to make all even halfway reasonable efforts – and often more – for.

In practice, measures taken to ensure reader privacy bleed over seamlessly to general privacy and security, so you can have your multi-domain cake and eat it, too.

A Deep Hatred of JavaScript

Okay, so this heading might be exaggerating, but honestly not by much.

The problems with JS are many and not quite easy to summarize, but we'll spotlight some of what we think are the most important issues.

Cultural Problems

One is a general culture in the JS ecosystem we can only describe as neglectful in the best of cases and recklessly malicious in the worst.

The volatility of the software ecosystem around it has been shown time and time again, with shitty one-line "libraries" that take something like a double-digit percentage of sites globally down when they're deleted because everyone and their awful cousin (named Brent) depends on them. This is not an engineering problem, but a cultural one.

As is the confoundingly routine use of using third-party providers (i.e. CDNs) to load JS code from. Deployed JS is already impossible to independently audit for security flaws or straight-up malicious code because the server can deliver different code every time it's requested – including targeting malicious code to selected subgroups of visitors, for example by country or by almost arbitrarily complex qualifiers if connected with tracking services.

Now with CDNs, even site operators can't control what JS actually gets delivered to their visitors. This puts CDN operators in a position from which it is trivial to exfiltrate people's data from the sites that rely on them. And given that this JS is usually just loaded on every single page, this includes data like private messages and even login credentials.

Why virtually nobody seems to flag this as a security threat of existential proportions is entirely beyond us.

Progenital Factors

Another, but somewhat vague, factor is deriving from the origin of most of the JS bits introduced in the last 15 or so years. The problem here being that while still open in name, both the web standards body W3C and browser implementations are firmly in the hands of capitalist shitfuckers in general and, in many cases, Google in specific.

Google in particular is deeply problematic in this context because they make most of their money with what is probably the largest surveillance system that ever existed (so far…) and that system is built on JS.

And if making virtually all your money by systematically violating everyones privacy using web technologies isn't a conflict of interest for one of the central players in shaping the webs' technologies we really don't know what the fuck could ever qualify as one.

Hence, we consider the JS stack itself as pretty much malicous.

The surveillance system we were talking about is, of course, Google Analytics.

For the uninitiated (i.e. those fortunate enough not to have worked in the web development sector), Google Analytics is a mostly "free" tracking service that website operators can use to see fancy graphs and statistics about people visiting their site, including a plethora of demographic markers like age, sex, income bracket and so on.

We say "free" in quotes because, of course, the data goes to Google. So while most website operators don't pay for this service, they make all of their visitors pay with their privacy.

Thing is, Google Analytics is used by an obscene amount of sites out there, somewhere in the high double-digits, percentage wise.

Reliable numbers are hard to come by, but a cursory lookup implies somewhere near the 85% mark for the top million sites.

To put this more bluntly:

Google can track everyones behavior across 80+% of the web, building exactly the sort of profiles the concept of reader privacy sought to make impossible.

Digital Inclusivity

Another factor, albeit sadly not nearly as much as we'd ideally like, is inclusivity.

We at least endeavor to make our sites more inclusive for people with poor or highly filtered connections as well as people depending on screenreaders.

Praxis

By now you might be asking:

"phryk, I have now read about twelve hundred words of your bullshit and you haven't said a single fucking word about actual goddamn web development – can we get to the fucking point already!?"

Well, what a fortuitously timed question, my good chum – because we just got there.

First of all is a simple thing that most site operators already do:

HTTPS only

While many people would neither count their server setup as web development nor HTTPS only as obscure, this very much pertains to the values we laid out earlier, so we think this bears mentioning.

We follow the commonly established pattern of redirecting all plaintext HTTP URLs to corresponding HTTPS URLs. So anyone happening to come in over http:// will have the initial URL they visit readable for all hops between their browser and our server, but any consecutive ones will be private between the two.

Automatically escalating HTTP to HTTPS instead of just completely closing port 80 (i.e. plaintext connections) is a concession we make for usability reasons.

This construct is also why we are irreverently religious about putting https:// before any reference to our sites – it makes sure that this first URL exposure doesn't happen, which leaves the majority of the remaining plaintext requests to be people who only entered the site domain into their browser by hand.

If you're running nginx, there's an easy way to handle the HTTP->HTTPS redirects for all sites running on your server, with just one server block:

server {
    listen 80;

    location / {
        return 301 https://$host$request_uri;
    }
}

Simply paste this into the http block of your nginx.conf and remove the listen 80; directive from all other server blocks, leaving only the listen 443 ssl; directives.

Even most web developers don't think of links as a privacy issue, but following web standards, your browser will actually tell the linked site exactly what URL you came from.

Luckily, even in this dystopian hellscape, there is actually a mechanism in HTML that lets us tell the browser to not fucking do that.

The way to do this is to add rel="noreferrer" to all links pointing to other sites.

This would of course be intensely involved and error-prone if it's not automated. But we can point you to practical examples on how to achieve that.

We have used markdown for all our content for a good long while, but this site (speaking from 2024), marks the first time we're using markdown-it-py to parse it and render it to HTML.

Our initial reason for choosing it was that it implements the commonmark spec, but offers extensibility. When developing previous sites, no such markdown implementations existed for python.

One big boon of markdown-it-py is that instead of string manipulation with a big heap of regexps, it's an actual parser that uses a tokenized representation we can manipulate.

The code to add noreferrer to all links is trivial, but since we found the markdown-it-py docs to be lacking, we'll share it here to make things easier for you:

def renderer_link_open(self, tokens, idx, options, env):

    token = tokens[idx]

    if 'href' in token.attrs and(
        token.attrs['href'].startswith('http://') or\
        token.attrs['href'].startswith('https://')
    ):

            # the assumption here is that internal links begin with /
            # so anything starting with the protocol scheme is treated
            # as external link

            token.attrs['target'] = '_blank'
            token.attrs['rel'] = 'noreferrer'

    attribute_string = ' '.join([f'{name}="{value}"' for (name, value) in token.attrs.items()])
    return f'<a {attribute_string}>'

# somewhere, you'll have something similar to:
# md = markdown_it.MarkdownIt()

md.add_render_rule('link_open', renderer_link_open)

If you have a less cleanly built processor for your authoring language (markdown or whatever it may be) or – Goddess forbid – store your posts directly as HTML, you can still automate this, albeit in a much less elegant fashion – by throwing the entire document into an HTML parser, modifying the DOM and rendering it back to HTML.

An example of that can be found within the article_load function in the code of our XMPP site where we just throw everything into Beautiful Soup.

A concept we have carried over to this site from our previous one is that of scored links. The concept is pretty simple – it's a link with a score attached to it, intended to give an indication of how privacy-friendly the linked URL is. The idea here is to enable informed consent for readers, empowering them to make their own decisions based on information the site collects.

Look at these links:

The icon next to each link is colored based on the score. If you hover your mouse on it, you get some statistical information about the dataset this is based on.

If you'll allow us another side-rant, these examples show really well how most news outlets have actively murdered reader privacy. In our humble opinion, we need a new paradigm in journalism that goes beyond traditional publishers as they have shown themselves more than willing to betray their readers trust and sell them out for profit.

On our previous site, only the main links for curated art used this, but on our new site, we finally integrated it into the markdown parser so scored links are automatically created for every link in every post.

Currently, the score is only numerical and represents how many third-party domains the linked URL loads data from – but in the future, we want to make it more granular and aware of bad actors like the big tech oligopolies, tracking services and dark patterns like forced registrations and paywalls.

As the mechanism is based on scraping the linked URL, it of course also has some limitations. For one, some sites block scrapers, for example:

At the time of writing, WaPo doesn't reply at all, and Axios only with a 403 and a completely empty page (not even doctag/<html>), hence no icon for WaPo and a false-positive of 0 external domains for Axios.

Another limitation is that we're currently only scanning the exact document referred to by the linked URL as HTML. So we're not catching things like fonts loaded from CDNs by the CSS and requests to external URLs done by JS. We also can't properly scan sites that are completely based on JS at all because that would require running and instrumenting an entire browser.

Still, if there's one feature, we wish more sites would include, it's definitely this one.

If you want to implement this feature on your own site, feel free to look at our implementation in class ScoredLink.

Keep Calm and 360 NoJS

As you might have gathered earlier, this site uses absolutely no JavaScript and loads nothing from third-party domains.

This means, that we're dealing only with HTML, CSS and SVG.

One immediate payoff for this is that – assuming clean-ish semantic HTML – our sites are extremely friendly for both screenreaders and text mode browsers.

The only feature that actually gets broken on those are the captchas for the commenting system – and we already have an open issue to alleviate that.

This lack of fancy JS bullshit in conjunction with copious use of basic usability features like the alt attribute also means our site remains friendly for people who filter out traffic-heavy components like images to cope with poor connections.

Our avoidance of CDNs additionally improves our reachability from connections that are heavily filtered, encountering routing fuckups or affected by other outages – as long as our machine is reachable, the entire site remains in a working state.

Another nice touch to make things friendlier for visually impaired people that don't require a screenreader is that we're using percentage-based font-size for virtually all text (minus some decorative elements).

This means we're actively supporting people who configure their browser to use a larger default font size. Complementing that, we're using units derived from rem (i.e. rem, em, ex and the like) which should™ minimize layout breakage for these people.

Yet another thing we look out for is always using :hover and :focus together, i.e.:

a:hover,
a:focus {
    color: rebeccapurple;
}

:hover essentially doesn't exist on mobile, but virtually all elements that :hover is commonly used on also support :focus.

This isn't only better for mobile clients, but also for people navigating websites using the keyboard, as switching to an element with <tab> won't trigger :hover but will trigger :focus.

Now, contemporary "wisdom" has it, that it's basically impossible to create a modern site without JS. We strongly fucking disagree.

Making Things Interactive with CSS

This site, tho not using a single line of JS, has modals, content toggles, galleries, a fully responsive design with burger menus (and probably a bunch of other things we forgot) which web developers usually use JS for.

And yet, CSS was all we needed.

The tiny not-so-secret secret behind most of the interactive components on this site is our favorite CSS pseudo-class, :target.

The basic principle is that :target matches the currently targetted element, i.e. the element the #fragment identifier refers to, usually by its id attribute.

For an example, have a look at this glorious inline image:

Now, when you click on that, a modal opens, and what you'll see in the URL bar of your browser is that the URL gets #modal-image-poobrains-logo appended to it.

If you opened and then closed the modal, you might already have seen the fragment being updated to #__close, which simply is an identifier not used by any element – thus making sure :target matches no element.

The markup behind this (simplified) looks like this:

<a href="#modal-image-foo">
    <img src="/foo.png" />
</a>
<div id="modal-image-foo" class="modal">
    <a href="#__close"></a>
    <div class="modal-content">
        <img src="/foo.png" />
    </div>
</div>

Sidenote: Unicode glyphs are often viable alternatives for icons!

With the most basic CSS being something like:

.modal {
    display: none;
}

.modal:target {
    display: block;
    position: fixed;
    width: 100vw;
    height: 100vh;
}

To extend this to a gallery, all that's needed is linking a few modals together, with previous/next links and a list of linked thumbnails in each modals content.

If you're curious, feel free to peek at this example with your inspector:

While this means that we're repeating quite a bit of UI markup, its generation is completely automated and documents on this site are usually still well below 100kB.

In poobrains, we even used :target in conjunction with SVG for semi-interactive data visualizations – namely plots and maps/chloropleths.

To take an old demo recording from a poobrains devlog, look at what we did with this like a decade back:

Demonstration of an interactive SVG plot with multiple datasets in poobrains.

If you need more inspiration, we recommend youmightnotneedjs.com, tho we do feel compelled to warn that this is hosted on Microsoft GitHub.

There is another nice CSS trick we haven't seen anybody else talk about, and that's extending the :target trick to use :target-within.

Now, we've fruitlessly waited for :target-within to be supported for years and years, but in the meantime, :has() silently saw adoption by all mainstream browsers – and :has(:target) is functionally identical.

We're not practically using it yet, but we are so very excited about this development that it's hard to convey just how boundlessly excited we are.

The essential thing to grok here is that this allows the :target trick to work in an arbitrarily nested fashion. That might not sound like much, so let's look at one of the ways we plan on using this in the future.

As we already said, poobrains had data visualization including maps. It could do things like render an entire world map and have every country be selectable by click to show extra information about it.

But, using only :target one thing we really wanted that wasn't possible was having multiple states of the map in the same SVG – at least not without giving up selectable elements with more info on the map.

Now with :target-within/:has(:target), we can feasibly create an SVG data visualization that has a map as main content with selectable elements on it – BUT also a timeline at the bottom to select different points in time and have different selectable elements on it. That way we could visualize things like troop movements and territory gains/losses for conflicts or geographic movements (and even overlays) for natural disasters.

And quite honestly, we're pretty sure we've only just scratched at the surface of what this can enable – but we will for sure figure out more uses in practice in the future when we finally get to port over some of the dataviz/analysis features from poobrains.

Long Live the Fucking <form>

One of the biggest things webdev, as a culture, insists on JS for is complex, dynamically built forms. This, however, has no basis in reality.

What you need instead are programmatically composable forms, a thing that Django for example has had for ages.

poobrains, too, had this as a central feature. And we ported much of poobrains form system to this site, albeit a bit simplified.

Now, we don't know how Django handles administration, but one of the most basic concepts – both for poobrains and this site – is that of a renderable object, implemented by class Renderable.

Form inherits from Renderable.

As does RenderableModel which implements renderable objects whose data is stored in the database. We inherited from that in turn to create Administerable, which uses AutoForm (which, of course, inherits from Form).

We know this might sound confusing, but bear with us – we swear, we're going somewhere with this.

Now, on every Administerable object, you can call .form(mode='edit') to get an AutoForm object representing a fully working form to edit the corresponding object and update its data in the database.

As the last component in this form system, we have the ProxyFieldset class which can wrap any Form object and embed it in another Form.

This might not sound like much at first, but it enables a very neat thing:

Recursively built forms.

In poobrains, we used this to build an entire multi-pane data analysis and visualization editor that can add, filter and transform datasets, add secondary information as well as set up and parameterize visualizations.

And while this site doesn't yet have anything similarly complex, take a look at the administration form for our propaganda type:

Dynamically built administration form for the propaganda type.
Dynamically built administration form for the propaganda type.

This form is dynamically built from different AutoForm objects corresponding to rows in the database that are linked (even indirectly) through foreign keys, enabling it to transparently handle all involved Propaganda, PropagandaPiece and PropagandaPieceItem objects.

Yet another neat thing this construct enables us to do is slapping extra form elements for tagging, commenting and other things onto administration forms with just one implementation – everything deriving from Taggable or Commentable just automatically gets and handles these fields with zero extra code. In the case of commenting, the same goes for the commenting forms under posts of any type derived from Commentable.

A last thing concerning forms that has nothing to do with the concept of composable form objects, but still bears mentioning, is the existence of the :valid and :invalid pseudo-classes.

We get the impression that many web developers think, these aren't adequately powerful, but in conjunction with the newer <input> types, and the required and pattern attributes, the latter of which allows matching field values with regular expressions, plain HTML offers more than enough in terms of client-side validation. Just don't ever forget to have the actual canonical validation on the server side.

Commenting

As previously implied, we have our own integrated commenting system.

Honestly, no idea why we even have to talk about this, but so many webdevs outsource this to third parties (thus further feeding surveillance capitalism), that, apparently, we must.

It's not overly hard to do, even including captchas. People kept telling us that bots will routinely break our captchas for over a decade now, but this simply never happened.

Threading is one foreign key and a bit of CSS.

Moderation is like two columns in the database and one class inheriting from Form.

And while this site doesn't have e-mail notifications for replies, that's only because we couldn't be arsed to write it yet. This was a feature in poobrains, and we'll port it eventually.

We don't see the need to go into much more detail as we feel a commenting system should be like the first thing you implement when learning to create a dynamic website, our first dynamic site in like '04 implemented most of this.

If you need some inspiration, feel free to read through our commenting module.

2FA, But Cool

We have to say, we really like the concept of multi-factor authentication, but hooo boy, do we hate just about every commonly seen implementation of it. SMS and E-Mail tokens essentially require you to be on your own device, but always force interaction with at least one other program besides the browser, may fail with mail setups that enforce MTAs re-sending mails to cut down on spam and transmission of the extra auth factors might be insecure at some point in the chain.

Usually, the first factor is a password – and this is true for this site, too. We implemented SHA3-512 based salted, peppered and time-obfuscated password storage, which initally started as a refresher project at work because it's been some time since we wrote an authentication implementation and we wanted to harness at least the partial post-quantum security SHA3 offers.

But our bread and butter since poobrains has been TLS client certificate authentication, leveraging public key cryptography.

Even after the deprecation of <keygen>, we still believe using client cert auth is one of the best ways to increase security on the web.

So, currently, our authentication stack first checks whether a valid client certificate was sent. If it was, and the fingerprint of the sent certificate is associated to a user, you can access the login form – which shows just a password field and a button.

And if and only iff a valid client cert, that is explicitly associated to a user was sent and the corresponding password for that exact user is entered, you are logged in.

Adblock Enforcement

If you didn't have an adblocker installed prior to visiting this site and you're reading this article, you probably have one installed now.

If you already had one installed before – good on you!

This site will nag everyone without an ad-blocker with an unclosable nagging overlay telling them to install an adblocker, with some nice recommendations we want to repeat here:

  • AdNauseam: The best one, weaponizes simulated ad clicks against advertisers.
  • uBlock Origin: The basis for AdNauseam, all the same features – minus the weaponization.
  • uBlock Origin Lite: Crippled by Manifest v3. Activate the 'complete' filtering mode to proceed, but think about upgrading to Firefox.

What people without adblocker get to see when visiting this site.
What people without adblocker get to see when visiting this site.

Lastly, while testing this out on different clients, we learned Manifest v3 hasn't been the only adblock-hostile move by Google – as mobile Chrome, at least on Android, doesn't even let you install any extensions anymore!

So, we added detection for browsers that categorically can't add an adblocker. Currently, this is only Chrome on Android, but we'll extend the list if we spot more browsers like this.

Visitors with affected browsers are, ahem, encouraged to switch to something better.

What people with especially shitty browsers get to see when visiting this site.
What people with especially shitty browsers get to see when visiting this site.

The main technique to achieve this is described on Stefan Bohacek's site and Stefan was extremely helpful in getting it to work on our site. :)

Conclusion

Yes, this was a long fucking post, but we actually stayed pretty close to surface level with this one. The actual nitty-gritty of the implementations can often become quite a bit more involved for UX and feature reasons and of course there's a bunch of things that just didn't fit in. Nevertheless we hope this gave you a good overview of what alternative web development practices can look like and why a select group of people choose to put so much effort into them, despite it becoming a dead artform that commonly remains entirely undocumented.

Our hope is that at least some of you will join our group of delectably crazy reprobates and do things on the web a bit differently.

Another web is possible!