static site checker

image: escher blummen

Introduction

The static site checker (ssc) is a command line utility to check pure HTML code for various errors, the kind of errors I tend to make when creating static web sites including this one. It reports:

  • broken links
  • dubious syntax
  • malformed microformats (v2 only)
  • questionable values

It is not complete; it complements HTML Tidy. There are many more checks I really ought to add. It does not understand scripts, so will get very confused by, for example, pages containing PHP.

It is still in development, and contains innumerable errors, so expect a lot of changes and even more fixes. Despite these issues, though, I already find it very useful, and, suspecting others will do too, I’m releasing it now.

Note

I wrote ssc partially because I needed something performant to check arts & ego, and partially because I wanted to explore recent changes to C++. In consequence, you will find the source contains gratuitous usage of C++ innovations, preventing usage of older compilers.

The program is based on swlc.

Usage

ssc [switch...] file/dir...

Console Options:
  -h [ --help ]                         output this information and exit.
  -V [ --version ]                      display version & copyright info, then exit.

  -f [ --config ] arg (=.ssc/config)    load configuration from this file.

Standard Options:
  -c [ --general.file ] arg (=ssc.db)   file for persistent data, requires -N (note general.datapath).
  -n [ --general.nochange ]             report what will do, but do not actually do it.
  -p [ --general.datapath ] arg (=.ssc) root directory for all ssc files.
  -v [ --general.output ] arg (=3)      output extra information.
  -e [ --link.external ] arg            check external links (requires curl, sets link.check).
  -3 [ --link.301 ] arg                 report http forwarding errors, e.g. 301 and 308 (sets link.external).
  -l [ --link.check ] arg (=1)          check links.
  -o [ --link.once ]                    report each broken external link once (sets link.external).
  -r [ --link.no-revoke ]               do not check whether https certificates have been revoked (sets link.external).
  --microformat.export arg              export microformat data (only verified data if microformat.verify is set).
  --microformat.verify arg              check microformats (see https://microformats.org/).
  -m [ --schema.microdata ]             check microdata itemtypes.
  -x [ --site.extension ] arg           check files with this extension (default html). May be repeated.
  -i [ --site.index ] arg               index file in directories (default none).
  -s [ --site.domain ] arg              domain name(s) for local site (default none). May be repeated.
  -L [ --site.virtual ] arg             define virtual directory, arg syntax directory=path. May be repeated.

Configuration File

If a configuration file is specified, it should be in INI file format. with these optional sections: general, link, microformat, and site. Each section contains individual assignments using the identifiers (after the dot) noted above. Such a file might contain:

[general]
output=2

[link]
check=1

[site]
extension=shtml
extension=html
index=index.shtml
domain=example.org

Building

The project requires a C++ 17 compiler (I use VC++ 2019, gcc 8, & clang 9), plus recent copies of boost (perhaps 1.72) and curl.

Grab & unpack the source code (zip). Under macos or centos, copy the relevant centos*/macos* model to makefile, change any paths as necessary, then run make. For any other flavour of unix, you may have to copy one of the model files and amend it. A future release of SSC may move to CMake.

Under Windows, if you have Visual Studio 2019, copy the model SLN file, amending any paths as appropriate. For earlier versions of Visual Studio, you will have to create a new project file. Your choice of versions is limited, given SSC requires C++ 2017.

Acknowledgements

The project includes code from Alexander Borisov’s projects modest 0.0.6 & MyHTML 4.0.5.