Generate a database

Anubis halps you to generate a database. Use the CLI to get started

>anubis timemachine --new

This will create in your local folder an inputfile ./anubis_time_machine.yml. Edit this inputfile to match your needs:

# Anubis time machine input file

git_path : /Users/dauptain/GITLAB/avbp
branch : dev
rel_source_path : ./SOURCES
year_start : 2017
year_end : 2022
out_dir : ANUBIS_AVBP
lizard_switch : True

git_path is a local path to your .git repository.
branch is the name of the branch you want to scan. It is often `master``
rel_source_path is a subfolder in your repository. Used to limit cloc and lizard investigation. Usually pointing to the core of the source code (avoiding tests, documentations, templates, etc…)
year_start and year_end defines the time frame. These are integers.
out_dir, where you want to save you Anubis database
lizard_switch, used to deactivate the lizard analysis, which can be quite slow.

Then you can start the database generation with :

> anubis timemachine --file ./anubis_time_machine.yml

the execution will show you this

(...)
ETA : 0:00:01.121897 sec
timewarping (55/72)...  date is now 2021-7-01 12:00
... find last revision
... checkout revision eccfd209bbf9eb90f91645ff8ace621b5441f41e
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard

ETA : 0:01:09.197262 sec
timewarping (56/72)...  date is now 2021-8-01 12:00
... find last revision
... checkout revision 89534c42e102fd5db217cf290b564313187281d4
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard

ETA : 0:18:13.801281 sec
timewarping (57/72)...  date is now 2021-9-01 12:00
... find last revision
... checkout revision 02da5a82c8f765982e05153d477e727552a5ce30
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard
(...)

With this output, you know when the git repos is set (here Sept. 2021), and you get a rough Estimated Time of Achievement or ETA (here 18 minutes).

Database content

The database looks like:

ANUBIS_Nek5000
├── anubis_2017-01
│   ├── blame.json
│   ├── branch_status.json
│   ├── cloc.json
│   ├── commits.json
│   └── lizard.csv
├── anubis_2017-02
│   ├── cloc.json
│   ├── blame.json
│   ├── branch_status.json
│   ├── commits.json
│   └── lizard.csv
├── anubis_2017-03
│   ├── cloc.json
│   ├── blame.json
│   ├── branch_status.json
│   ├── commits.json
│   └── lizard.csv
(...)

Indeed, each month get its own subfolder, with several databases/

Blame

The blame.json gives the following information for each non blank line of code of each file:

the author
the date of last modification
the indentation level (ie the number of blank before the first non-blank character)
the line number

[
    {
        "file": "src/hello_world.py",
        "author": [
            "Aurélien",
            "Aurélien",
            "Aurélien",
            "Aurélien",
        ],
        "date": [
            "2021-05-11",
            "2021-05-11",
            "2021-05-11",
            "2021-05-11",
        ],
        "indentation": [
            0,
            4,
            0,
            4,
        ],
        "line_number": [
            1,
            2,
            3,
            4
        ]
    },
]

Branch status

The branch_status.jsongives the number of commit behind and ahead for each branch compared to the branch specified in the timemachine input .yml file, called the reference branch. It also stores the number of commits on the reference branch.

[
    {
        "branch": "remotes/origin/RELEASE/7.0",
        "behind": 906,
        "ahead": 343,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/RELEASE/7.0.1",
        "behind": 577,
        "ahead": 435,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/RELEASE/7.0.1-SEP16",
        "behind": 577,
        "ahead": 393,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/WIP/EM2C-BiPeriodic_Channel_TBLE",
        "behind": 577,
        "ahead": 340,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/WIP/volumic_temporals",
        "behind": 577,
        "ahead": 321,
        "nb_commits_ref_branch": 5219
    }
]

CLOC, or Count Lines of Code

The cloc.json gives the code sizes:

{
    "header": {
        "cloc_url": "github.com/AlDanial/cloc",
        "cloc_version": "1.92",
        "elapsed_seconds": 2.983234167099,
        "n_files": 1195,
        "n_lines": 519864,
        "files_per_second": 400.571974261766,
        "lines_per_second": 174261.881864116
    },
    "Fortran 90": {
        "nFiles": 1166,
        "blank": 83203,
        "comment": 82278,
        "code": 347867
    },
    "C": {
        "nFiles": 16,
        "blank": 1174,
        "comment": 722,
        "code": 2806
    },
    "Python": {
        "nFiles": 1,
        "blank": 163,
        "comment": 67,
        "code": 496
    },
    "make": {
        "nFiles": 1,
        "blank": 49,
        "comment": 71,
        "code": 266
    },
    "C/C++ Header": {
        "nFiles": 9,
        "blank": 152,
        "comment": 155,
        "code": 167
    },
    "Bourne Shell": {
        "nFiles": 1,
        "blank": 14,
        "comment": 18,
        "code": 162
    },
    "CMake": {
        "nFiles": 1,
        "blank": 7,
        "comment": 7,
        "code": 20
    },
    "SUM": {
        "blank": 84762,
        "comment": 83318,
        "code": 351784,
        "nFiles": 1195
    }
}

Commits

The commits.json gives stats about commits:

[
    {
        "author": "Aurelien PERROT <perrot@cerfacs.fr>",
        "date": "Sat Jan 29 21:59:55 2022 +0100",
        "files": 1,
        "insertions": 1,
        "deletions": 0,
        "revision": "872de25e0372d907bbe366e09a9049c05aa609c2"
    },
    {
        "author": "Victor Xing <xing@cerfacs.fr>",
        "date": "Tue Jan 11 16:37:28 2022 +0100",
        "files": 1,
        "insertions": 3,
        "deletions": 0,
        "revision": "1676e6231f444e6ba924bed689312e019d466f92"
    },
    {
        "author": "Gabriel Staffelbach <gabriel.staffelbach@cerfacs.fr>",
        "date": "Tue Jan 25 14:21:35 2022 +0100",
        "files": 2,
        "insertions": 0,
        "deletions": 9,
        "revision": "0cbe1e4464204f48bbef912428396a00089b6a82"
    }
]

Complexity, with lizard

The code complexity is found with Lizard. This is computed for each function, i.e. :

Function
Subroutine
Method
etc..

The file itself looks like

NLOC,CCN,token,param,size,function@line@file,file,function,call,start,end
54,5,610,2,110,"add_vortex@36-145@./SOURCES/TOOLS/INIT/add_vortex.f90","./SOURCES/TOOLS/INIT/add_vortex.f90","add_vortex","add_vortex( grid , vortex )",36,145
183,33,1629,2,324,"gas_out_main@28-351@./SOURCES/TOOLS/INIT/gas_out_main.f90","./SOURCES/TOOLS/INIT/gas_out_main.f90","gas_out_main","gas_out_main( grid , gas_out )",28,351
(...)

Here some explanations are required:

NLOC is the number of lines of code, per function
CCN is the cyclomatic complexity number, as defined by Mc Cabe.
token is the number of tokens (let say “words”) in the function
param the nb of parameters detected
size the nb of characters
function@line@file the full location of the context
file the file of the context
function the name of the function
call the way the function was called
start line start in the file
end line end in the file