File | Mode | Size |
---|---|---|
.zs/ | 040000 | |
data/ | 040000 | |
src/ | 040000 | |
tests/ | 040000 | |
third-party/ | 040000 | |
tools/ | 040000 | |
.gitignore | 100644 | 313B |
.gitmodules | 100644 | 233B |
COPYING | 100644 | 34KiB |
Makefile | 100644 | 8,298B |
README.md | 100644 | 12KiB |
zograscope, 2017 - 2022
Clone recursively, there are submodules:
git clone --recursive https://github.com/xaizek/zograscope.git
"A zograscope is an optical device for enhancing the sense of depth perception from a flat picture." (wiki)
zograscope
is built around syntax-aware diff and includes a number of
additional tools.
The nature of syntax-aware diff requires knowledge of structure of the code, which can be used to build other simple tools that can benefit from this information. Competing with real language front-ends in the level of accuracy is not possible, but making some things that are one step further than regular text-processing utilities seems feasible and the result might be even more practical than some of the more elaborate tools which end up requiring complicated setup process.
The project is work in progress, but is useful in its current state.
Code isn't perfect and isn't extensively documented as initial version was more of an experiment, but this situation gets better.
Language | Status |
---|---|
C | C11 and earlier with common extensions, but without K&R syntax |
C++ | C++14 and earlier with common extensions |
GNU Make | Most of the syntax |
Lua | Version 5.3 |
The exact grammar is that of C11 with extensions implemented in popular compilers and additional extensions needed to allow processing of code with macros.
Note the following: * old K&R style of function declarations isn't parsed (there might be a workaround for it, but this syntax is deprecated either way) * preprocessor directives aren't tokenized according to language rules yet, but their contents is diffed * extensive use of macros in unusual places might not be parsed (this probably won't change)
Other than that code in C89, C99, C11 and GNU-versions of C language should be recognized.
C++ support relies on external application called srcml and requires it to be installed in binary form (not necessary during build).
Reported standard version supported by srcml
is C++14, so all previous ones
should work too. Although their parser doesn't handle all language constructs
equally well, it's seems to be good enough, especially for a ready to use parser
that wasn't that hard to integrate.
Note the following: * the tuning of comparison is in progress and will be refined over time
It's hard to measure level of support in case of GNU Make, because there seem to be no reference for the syntax itself apart from documentation, which is not concise.
Yet parser is capable of processing quite complicated examples of Makefiles
(like the one used in this project or generated by automake
) which contain
many features most people don't know exist. It's definitely not 100%, but 90%
or even more of all existing Makefiles out there should be parsed.
Note the following: * the comparison might not produce best results on Makefiles as it needs some tuning, this should happen over time (Makefiles aren't changed that often)
Newly added (March 2021) with very little testing so far. However, the language is small and simple enough to not pose much difficulties.
Note the following: * non-5.3 versions might still work, albeit can produce worse results
More languages should be added in the future, maybe with external parsers that are capable of preserving all information about the source code.
Configuration is done per directory tree ("root") which is the closes parent (or
current directory) that contains .zs/
directory. The .zs/
directory
contains files which define how contents of the root is to be processed.
Settings from multiple nested roots are not combined.
.zs/exclude
fileA .gitignore
-like (or .git/info/exclude
-like) file that lists paths relative
to the root. The purpose is to exclude uninteresting files (automatically
generated, third-party or otherwise). .zs/exclude
is used by tools that
search for files automatically and doesn't prevent the use of the same files
when they are specified explicitly.
The following kinds of entries are recognized:
#
(comments), which are ignored/
match only directories, the /
is stripped and line
processing continues/
are treated as shell-like globs against filename which apply
at any directory level and define paths whose processing should be skipped!
define exception from rules that precede them, you
can't undo exclusion of files in excluded directories, for the purpose of this
discussion the !
is stripped and line processing continues/
always match paths instead of filename and provide a
way to specify files to be ignored only in the root, otherwise they are
processed as specified in the next item/
is allowed, but has no effect other than changing type of a
match) which define paths whose processing should be skippedNo way to escape leading #
and !
or a newline at the moment.
Globs support the following: [{char-class}]
, [!{char-class}]
,
[^{char-class}]
, ?
(doesn't match /
), *
(matches any (including zero)
number of characters except for /
) and \{char}
(matches literal {char}
).
Example:
# .zs/exclude
# automatically generated sources
src/c/c11-lexer.gen.cpp
src/c/c11-parser.gen.cpp
src/make/*.gen.*
# Qt-produced sources
ui_*.h
moc_*.cpp
moc_*.h
# file in root
/config.h
.zs/attributes
fileBorrowing from the git
project here again. This file consists of lines
matching paths to attributes. Lines are trimmed before being processed.
Empty lines and comments work like in .zs/excludes
file, all other lines
follow this pattern:
exclude-expr [attr1=value1 [attr2=value2 [...]]]
Expressions that define exceptions (start with !
) are recognized but ignored
to keep syntax consistent between different files, which basically makes them
another type of comments.
Each line of the file is visited in top down order and attributes from every matching entry are merged with the current state. Hierarchy of configuration values: 1. Default values (lowest priority) 2. Attributes 3. Command-line parameters (highest priority)
Supported attributes:
lang
\
Default: ""\
Those accepted by --lang
command-line option: c, cxx, make, luatab-size
\
Default: 4\
Value should be an integer that's greater than zeroUnknown attributes are ignored.
Example:
# .zs/exclude
*.c tab-size=8
*.h tab-size=8 lang=c
tab-2.[ch] tab-size=2
# any.c has tab-size=8
# tab-2.c has tab-size=2
# tab-2.h has tab-size=2 lang=c
# any.h has tab-size=8 lang=c
# any.cpp has tab-size=4
A terminal-based syntax-aware diff.
Grep-like tool that finds elements of source code structure.
A Qt5 GUI version of syntax-aware diff.
Simple syntax highlighter for xterm-256color palette.
Counter of lines of code.
TUI interface with underdefined scope of functionality.
# if Qt5 is available (use `qmake` if it corresponds to Qt5 on your machine)
echo 'QT5_PROG := qmake-qt5' >> config.mk
# if libgit2 is present
echo 'HAVE_LIBGIT2 := yes' >> config.mk
# if cursesw is present
echo 'HAVE_CURSESW := yes' >> config.mk
make release check
This will build release version and run tests. The executables will be named
release/zs-*
.
There is no data, so just making them available in the $PATH
will work.
However, it's possible to install conventionally (/usr
prefix by default):
make install
DESTDIR
and PREFIX
can be set to modify destination location. On invoking
make uninstall
same values of these values should be specified.
gdiff
tool) qt5gdiff
tool) libgit2tui
tool) curses with support of wide charactersIf you are using Debian or one of its derivatives, you can install the dependencies as follows:
# install make and build tools
sudo apt install -y build-essential
# installing dependencies
sudo apt install -y libboost-filesystem-dev libboost-iostreams-dev
sudo apt install -y libboost-program-options-dev libboost-system-dev
sudo apt install -y libarchive13
sudo apt install -y bison flex
# installing srcml
wget http://131.123.42.38/lmcrs/beta/srcML-Ubuntu18.04.deb
sudo apt install ./srcML-Ubuntu18.04.deb
You can also check out the CI build script in case dependencies change in the future.
At the moment there is only --help
option.
Version 3 of the GNU Affero General Public License.
dtl library is used for finding edit distance.
pmr implementation from C++17 with a small addition is employed for custom allocators.
TinyXML2 is used for parsing XML.
tree-sitter is used for parsing of some languages.
Catch2 is used for tests.
Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction. Beat Fluri, Michael Würsch, Martin Pinzger, and Harald C. Gall. 2007.
Change Detection in Hierarchically Structured Information. Sudarshan Chawathe, Hector Garcia-molina and Jennifer Widom. 1996.
Simple fast algorithms for the editing distance between trees and related problems. Kaizhong Zhang and Dennis Shasha. 1989.