Tools that have been useful to me in the past, or might be useful in the future, loosely grouped. Not things I use daily.


🦋 Web scrapers

Curl impersonate

A special build of curl that can impersonate the four major browsers: Chrome, Edge, Safari & Firefox. curl-impersonate is able to perform TLS and HTTP handshakes that are identical to that of a real browser.

Finally got around to trying this (I wanted a copy of a site that’s behind Cloudflare).

It works pretty well, and it can be driven from Scrapy with scrapy-impersonate, but these days, more often than not you want the page after JS has messed with it. So in the end I used scrapy-playwright to do the job. Minimal example:

from pathlib import Path
import urllib.parse
 
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy import Request
 
 
class ExampleSpider(CrawlSpider):
    name = "example"
 
    allowed_domains = ['example.com']
 
    rules = (
        Rule(LinkExtractor(allow=(), unique=True), callback='parse', follow=True),
    )
 
    def start_requests(self):
        urls = [
            "https://example.com/",
        ]
        for url in urls:
            yield Request(url, meta={"playwright": True})
 
    def parse(self, response):
        page = urllib.parse.quote(response.url, safe='')
        filename = f"{page}.html"
        Path(filename).write_bytes(response.body)
        self.log(f"Saved file {filename}")

You also need a couple of tweaks to settings.py:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
 
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

Puppeteer heap snapshot

puppeteer-heap-snapshot is a Node.js module that, given a Puppeteer browser page, can capture and parse a heap snapshot and deserialize objects that contain a set of properties. It comes with a nifty CLI tool too so we can quickly prototype scrapers from our terminal.

Instead of trying to use CSS selectors to get at the data we want, we grab the data straight out of the browser’s working memory.

Archive Team’s grab-site

grab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. Internally, grab-site uses a fork of wpull for crawling.

Note

Still looking for something that can maintain an archive of a Facebook group.


🐌 Convert raster to vector (unsorted)

https://www.autotracer.org/

https://www.vectorizer.io/

https://vectorizer.ai/

https://www.vectorization.org/

https://www.visioncortex.org/vtracer/

https://svgco.de/

https://svgconverter.app/

https://online.rapidresizer.com/tracer.php

https://fconvert.com/autotrace/

https://online-converting.com/autotrace/

https://github.com/fromtheexchange/image2svg-awesome


📈 Convert text to diagram

Mermaid

JavaScript based diagramming and charting tool that renders Markdown-inspired text definitions to create and modify diagrams dynamically.

GraphViz

Not the prettiest output, but the dot language makes generating large diagrams really simple. I use it whenever I need to parse structure (eg library dependencies) into a graphical overview.

PyGraphViz

Consumes dot files and lets you query them. What’s the shortest path between A and B? What relies on C?

Online text to diagram tools

A more complete list.


🔪 Edit PDFs

Stirling PDF

This is absolutely fantastic. A whole bunch of PDF tools, slapped in a Docker container with a simple and effective web interface on top. Less than five minutes to get running. I used it to generate my John Pory PDF.


🦀 Shell scripting

Gum

A collection of useful utilities for enhancing shell scripts.

chezmoi

Dotfile (Unix config file) manager. Maybe it’s better than a github repo, maybe it’s just another yak waiting to be shaved.

direnv

Set environment variables based on path.


🦚 Graphics libraries

Rough.js

Rough.js is a small (<9kB gzipped) graphics library that lets you draw in a sketchy, hand-drawn-like, style. The library defines primitives to draw lines, curves, arcs, polygons, circles, and ellipses. It also supports drawing SVG paths.


Programming

Zed

Code editor. Clean. Fast. Configurable key bindings. I don’t use one tenth of the features of an IDE, so this might be perfect for me.

Difftastic

Absolutely fantastic side-by-side command-line diff. Syntax-based, rather than character-based. My default choice.


🌀 Other

Al Dente

Battery charge limiter for macs.

bucklespring

Makes your MBP sound like a clicky-clacky buckling spring keyboard. I know it sounds like a joke. It probably is a joke. But for some reason it really helps me focus.

Fire Toolbox

Enhance the capabilities of Amazon’s Fire tablets (requires Windows).

audiogest

Transcribe audio, $4 per hour. Note: if you’re in the UK it’ll charge you £4 per hour (currently a 28% surcharge). I have a few hundred hours of audio to transcribe and that leaves a bad taste, so I’m going to try using the tools directly rather than paying the convenience tax.