Garble

Garble web time machine

Garble is an internet wayback machine ready for your local setup! It is written in Haskell and backed by a Postgres database.

Dependencies: persistent, conduit, yesod, warp, http-conduit, tagstream-conduit et al.

Get started

Database setup

Install and start the PostgreSQL server. Create a user “garble” and a database “garble” (the user “garble” should be its owner; it also needs login capability). The schema will be automatically created on first start.

Garble

Download and compile Garble using cabal. Example:

cabal sandbox init
cabal install --dependencies-only
cabal build

Use the admin tool to setup the database schema and your preferences. Example:

cabal run admin -- shell
[... migrations ...]
0/0> set directory "/var/garble"
Okay, set.
0/0> set admin "me@myself.com"
Okay, set.
0/0> set recent for 96 hours
Okay, set.

If you like, you can already add a download job:

0/0> enqueue "https//example.com/"
New: "https//example.com"
Job id: 1

In the default configuration, Garble will recurse three levels on the same host, and one level into outgoing links. TODO: document how to change this.

The daemons

To actually execute the queued download jobs you need to start the garbled daemon:

cabal run garbled

Garbled will now download from the queued URIs and store them to disk. HTML documents will be searched for hyperlinks and included resources (such as style sheets, images, scripts), which will be added to the download queue.

For the web interface we need yet another daemon:

cabal run delivery

The delivery daemon will listen on localhost:3020 and accept the following routes:

/c/${CID}                 -- get the document content with content id ${CID}
/d/${DID}                 -- get the content of the document with document id ${DID}
/h/${HASH}                -- get the document content with store hash ${HASH}
/t/${DATETIME}?uri=${URI} -- get the document content for URI ${URI} most close to ${DATETIME}
/l?uri=${URI}             -- get the last known document content for URI ${URI}

Human users will most commonly use one of the latter two routes. The content id or document id are useful for debugging purposes. The hash route is used for included style sheets and images.

If the delivered content is an HTML document, all contained hyperlinks and resource references are adapted to point to the closest matching known content. If the target is not known to garble yet, an absolute URI to the original location is inserted.

Example: You request “/t/2018-03-01T20:00:00?uri=http//example.com/”. The page originally contains a hyperlink to http://example.net/, which is known to garble. Hence the link is replaced by “/t/2018-03-01T20:00:00?uri=http://example.net/”.

It might also contain a hyperlink to the relative path “/some/strange/things”, which is not tracked by Garble. Hence the link is replaced by “http://example.com/some/strange/things”.

Older versions Editor Timestamp
Garble m@doomanddarkness.eu 2018-04-20 22:07:46 UTC
Garble m@doomanddarkness.eu 2018-04-20 21:57:42 UTC
Garble m@doomanddarkness.eu 2018-04-20 21:49:04 UTC
Garble m@doomanddarkness.eu 2018-04-20 21:47:46 UTC
Garble m@doomanddarkness.eu 2018-04-20 21:47:06 UTC
Garble m@doomanddarkness.eu 2018-04-20 21:46:09 UTC