Performance Evaluation¶

query	timing (s)
`Call(Name("len"))`	0.025985
`BinOp(op=Add() \\| Sub())`	0.030508
`Try(handlers=LEN(min=3, max=5))`	0.033486
`BinOp(left=Constant(), right=Constant())`	0.146516
`FunctionDef(f"run_%", returns = not None)`	0.0216
`ClassDef(body=[Assign(), *..., FunctionDef()])`	0.28737

Analysis¶

There are 2 major points that cost nearly %95 of the whole query operation. The first, and the obvious point is the actually running the query in the database. There are a couple points that Reizc an do to optimize this step, including trying to generate the best possible query while being in a linear motion (for supporting constructs like reference variables). The code generator (reiz.reizql.compiler) went through a couple major refactors for performance reasons (e.g #12). Also there is a simple/naive AST optimization pass on the IR (EdgeQL) itself.

The second part is the actually retrieving the code snippets from the disk itself. We already store a lot of metadata (like start/end positions, github project etc.) but the actual ‘source’ is still on the disk. So after retrieving the filenames from the query, we simply go and read those files and get the related segments. This is an area that is open to more optimizations (we could statically determine the byte-range and only fetch it, we could parallelize this for multiple matches [the default resultset come with 10 matches], …), though these won’t have the same effects as in getting a better speed in the DB.

Of course alongside these, there have been tons of ways to optimize postgresql itself for different workloads, though it is outside of the Reiz project.

Setup¶

Machine;


provider	digital ocean
service type	droplet (basic plan)
cpu	(shared) 2vCPU
ram	2GB
disk	regular SSD (not NVME)

IndexDB;


total files	53k
total AST nodes	17 521 894

Benchmark script is present at the source checkout (scripts/benchmark_doc.py).