Generate a database

Anubis halps you to generate a database. Use the CLI to get started

>anubis timemachine --new

This will create in your local folder an inputfile ./anubis_time_machine.yml. Edit this inputfile to match your needs:

# Anubis time machine input file

git_path : /Users/dauptain/GITLAB/avbp
branch : dev
rel_source_path : ./SOURCES
year_start : 2017
year_end : 2022
out_dir : ANUBIS_AVBP
lizard_switch : True
  • git_path is a local path to your .git repository.

  • branch is the name of the branch you want to scan. It is often `master``

  • rel_source_path is a subfolder in your repository. Used to limit cloc and lizard investigation. Usually pointing to the core of the source code (avoiding tests, documentations, templates, etc…)

  • year_start and year_end defines the time frame. These are integers.

  • out_dir, where you want to save you Anubis database

  • lizard_switch, used to deactivate the lizard analysis, which can be quite slow.

Then you can start the database generation with :

> anubis timemachine --file ./anubis_time_machine.yml

the execution will show you this

(...)
ETA : 0:00:01.121897 sec
timewarping (55/72)...  date is now 2021-7-01 12:00
... find last revision
... checkout revision eccfd209bbf9eb90f91645ff8ace621b5441f41e
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard

ETA : 0:01:09.197262 sec
timewarping (56/72)...  date is now 2021-8-01 12:00
... find last revision
... checkout revision 89534c42e102fd5db217cf290b564313187281d4
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard

ETA : 0:18:13.801281 sec
timewarping (57/72)...  date is now 2021-9-01 12:00
... find last revision
... checkout revision 02da5a82c8f765982e05153d477e727552a5ce30
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard
(...)

With this output, you know when the git repos is set (here Sept. 2021), and you get a rough Estimated Time of Achievement or ETA (here 18 minutes).

Database content

The database looks like:

ANUBIS_Nek5000
├── anubis_2017-01
│   ├── blame.json
│   ├── branch_status.json
│   ├── cloc.json
│   ├── commits.json
│   └── lizard.csv
├── anubis_2017-02
│   ├── cloc.json
│   ├── blame.json
│   ├── branch_status.json
│   ├── commits.json
│   └── lizard.csv
├── anubis_2017-03
│   ├── cloc.json
│   ├── blame.json
│   ├── branch_status.json
│   ├── commits.json
│   └── lizard.csv
(...)

Indeed, each month get its own subfolder, with several databases/

Blame

The blame.json gives the following information for each non blank line of code of each file:

  • the author

  • the date of last modification

  • the indentation level (ie the number of blank before the first non-blank character)

  • the line number

[
    {
        "file": "src/hello_world.py",
        "author": [
            "Aurélien",
            "Aurélien",
            "Aurélien",
            "Aurélien",
        ],
        "date": [
            "2021-05-11",
            "2021-05-11",
            "2021-05-11",
            "2021-05-11",
        ],
        "indentation": [
            0,
            4,
            0,
            4,
        ],
        "line_number": [
            1,
            2,
            3,
            4
        ]
    },
]

Branch status

The branch_status.jsongives the number of commit behind and ahead for each branch compared to the branch specified in the timemachine input .yml file, called the reference branch. It also stores the number of commits on the reference branch.

[
    {
        "branch": "remotes/origin/RELEASE/7.0",
        "behind": 906,
        "ahead": 343,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/RELEASE/7.0.1",
        "behind": 577,
        "ahead": 435,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/RELEASE/7.0.1-SEP16",
        "behind": 577,
        "ahead": 393,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/WIP/EM2C-BiPeriodic_Channel_TBLE",
        "behind": 577,
        "ahead": 340,
        "nb_commits_ref_branch": 5219
    },
    {
        "branch": "remotes/origin/WIP/volumic_temporals",
        "behind": 577,
        "ahead": 321,
        "nb_commits_ref_branch": 5219
    }
]

CLOC, or Count Lines of Code

The cloc.json gives the code sizes:

{
    "header": {
        "cloc_url": "github.com/AlDanial/cloc",
        "cloc_version": "1.92",
        "elapsed_seconds": 2.983234167099,
        "n_files": 1195,
        "n_lines": 519864,
        "files_per_second": 400.571974261766,
        "lines_per_second": 174261.881864116
    },
    "Fortran 90": {
        "nFiles": 1166,
        "blank": 83203,
        "comment": 82278,
        "code": 347867
    },
    "C": {
        "nFiles": 16,
        "blank": 1174,
        "comment": 722,
        "code": 2806
    },
    "Python": {
        "nFiles": 1,
        "blank": 163,
        "comment": 67,
        "code": 496
    },
    "make": {
        "nFiles": 1,
        "blank": 49,
        "comment": 71,
        "code": 266
    },
    "C/C++ Header": {
        "nFiles": 9,
        "blank": 152,
        "comment": 155,
        "code": 167
    },
    "Bourne Shell": {
        "nFiles": 1,
        "blank": 14,
        "comment": 18,
        "code": 162
    },
    "CMake": {
        "nFiles": 1,
        "blank": 7,
        "comment": 7,
        "code": 20
    },
    "SUM": {
        "blank": 84762,
        "comment": 83318,
        "code": 351784,
        "nFiles": 1195
    }
}

Commits

The commits.json gives stats about commits:

[
    {
        "author": "Aurelien PERROT <perrot@cerfacs.fr>",
        "date": "Sat Jan 29 21:59:55 2022 +0100",
        "files": 1,
        "insertions": 1,
        "deletions": 0,
        "revision": "872de25e0372d907bbe366e09a9049c05aa609c2"
    },
    {
        "author": "Victor Xing <xing@cerfacs.fr>",
        "date": "Tue Jan 11 16:37:28 2022 +0100",
        "files": 1,
        "insertions": 3,
        "deletions": 0,
        "revision": "1676e6231f444e6ba924bed689312e019d466f92"
    },
    {
        "author": "Gabriel Staffelbach <gabriel.staffelbach@cerfacs.fr>",
        "date": "Tue Jan 25 14:21:35 2022 +0100",
        "files": 2,
        "insertions": 0,
        "deletions": 9,
        "revision": "0cbe1e4464204f48bbef912428396a00089b6a82"
    }
]

Complexity, with lizard

The code complexity is found with Lizard. This is computed for each function, i.e. :

  • Function

  • Subroutine

  • Method

  • etc..

The file itself looks like

NLOC,CCN,token,param,size,function@line@file,file,function,call,start,end
54,5,610,2,110,"add_vortex@36-145@./SOURCES/TOOLS/INIT/add_vortex.f90","./SOURCES/TOOLS/INIT/add_vortex.f90","add_vortex","add_vortex( grid , vortex )",36,145
183,33,1629,2,324,"gas_out_main@28-351@./SOURCES/TOOLS/INIT/gas_out_main.f90","./SOURCES/TOOLS/INIT/gas_out_main.f90","gas_out_main","gas_out_main( grid , gas_out )",28,351
(...)

Here some explanations are required:

  • NLOC is the number of lines of code, per function

  • CCN is the cyclomatic complexity number, as defined by Mc Cabe.

  • token is the number of tokens (let say “words”) in the function

  • param the nb of parameters detected

  • size the nb of characters

  • function@line@file the full location of the context

  • file the file of the context

  • function the name of the function

  • call the way the function was called

  • start line start in the file

  • end line end in the file