Generate a database
Anubis halps you to generate a database. Use the CLI to get started
>anubis timemachine --new
This will create in your local folder an inputfile ./anubis_time_machine.yml
.
Edit this inputfile to match your needs:
# Anubis time machine input file
git_path : /Users/dauptain/GITLAB/avbp
branch : dev
rel_source_path : ./SOURCES
year_start : 2017
year_end : 2022
out_dir : ANUBIS_AVBP
lizard_switch : True
git_path
is a local path to your.git
repository.branch
is the name of the branch you want to scan. It is often `master``rel_source_path
is a subfolder in your repository. Used to limit cloc and lizard investigation. Usually pointing to the core of the source code (avoiding tests, documentations, templates, etc…)year_start
andyear_end
defines the time frame. These are integers.out_dir
, where you want to save you Anubis databaselizard_switch
, used to deactivate the lizard analysis, which can be quite slow.
Then you can start the database generation with :
> anubis timemachine --file ./anubis_time_machine.yml
the execution will show you this
(...)
ETA : 0:00:01.121897 sec
timewarping (55/72)... date is now 2021-7-01 12:00
... find last revision
... checkout revision eccfd209bbf9eb90f91645ff8ace621b5441f41e
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard
ETA : 0:01:09.197262 sec
timewarping (56/72)... date is now 2021-8-01 12:00
... find last revision
... checkout revision 89534c42e102fd5db217cf290b564313187281d4
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard
ETA : 0:18:13.801281 sec
timewarping (57/72)... date is now 2021-9-01 12:00
... find last revision
... checkout revision 02da5a82c8f765982e05153d477e727552a5ce30
... running cloc
... gather commit stats
... get git blame info
... branch status
... running lizard
(...)
With this output, you know when the git repos is set (here Sept. 2021), and you get a rough Estimated Time of Achievement or ETA (here 18 minutes).
Database content
The database looks like:
ANUBIS_Nek5000
├── anubis_2017-01
│ ├── blame.json
│ ├── branch_status.json
│ ├── cloc.json
│ ├── commits.json
│ └── lizard.csv
├── anubis_2017-02
│ ├── cloc.json
│ ├── blame.json
│ ├── branch_status.json
│ ├── commits.json
│ └── lizard.csv
├── anubis_2017-03
│ ├── cloc.json
│ ├── blame.json
│ ├── branch_status.json
│ ├── commits.json
│ └── lizard.csv
(...)
Indeed, each month get its own subfolder, with several databases/
Blame
The blame.json
gives the following information for each non blank line of code of each file:
the author
the date of last modification
the indentation level (ie the number of blank before the first non-blank character)
the line number
[
{
"file": "src/hello_world.py",
"author": [
"Aurélien",
"Aurélien",
"Aurélien",
"Aurélien",
],
"date": [
"2021-05-11",
"2021-05-11",
"2021-05-11",
"2021-05-11",
],
"indentation": [
0,
4,
0,
4,
],
"line_number": [
1,
2,
3,
4
]
},
]
Branch status
The branch_status.json
gives the number of commit behind and ahead for each branch compared to the branch specified in the timemachine input .yml
file, called the reference branch.
It also stores the number of commits on the reference branch.
[
{
"branch": "remotes/origin/RELEASE/7.0",
"behind": 906,
"ahead": 343,
"nb_commits_ref_branch": 5219
},
{
"branch": "remotes/origin/RELEASE/7.0.1",
"behind": 577,
"ahead": 435,
"nb_commits_ref_branch": 5219
},
{
"branch": "remotes/origin/RELEASE/7.0.1-SEP16",
"behind": 577,
"ahead": 393,
"nb_commits_ref_branch": 5219
},
{
"branch": "remotes/origin/WIP/EM2C-BiPeriodic_Channel_TBLE",
"behind": 577,
"ahead": 340,
"nb_commits_ref_branch": 5219
},
{
"branch": "remotes/origin/WIP/volumic_temporals",
"behind": 577,
"ahead": 321,
"nb_commits_ref_branch": 5219
}
]
CLOC, or Count Lines of Code
The cloc.json
gives the code sizes:
{
"header": {
"cloc_url": "github.com/AlDanial/cloc",
"cloc_version": "1.92",
"elapsed_seconds": 2.983234167099,
"n_files": 1195,
"n_lines": 519864,
"files_per_second": 400.571974261766,
"lines_per_second": 174261.881864116
},
"Fortran 90": {
"nFiles": 1166,
"blank": 83203,
"comment": 82278,
"code": 347867
},
"C": {
"nFiles": 16,
"blank": 1174,
"comment": 722,
"code": 2806
},
"Python": {
"nFiles": 1,
"blank": 163,
"comment": 67,
"code": 496
},
"make": {
"nFiles": 1,
"blank": 49,
"comment": 71,
"code": 266
},
"C/C++ Header": {
"nFiles": 9,
"blank": 152,
"comment": 155,
"code": 167
},
"Bourne Shell": {
"nFiles": 1,
"blank": 14,
"comment": 18,
"code": 162
},
"CMake": {
"nFiles": 1,
"blank": 7,
"comment": 7,
"code": 20
},
"SUM": {
"blank": 84762,
"comment": 83318,
"code": 351784,
"nFiles": 1195
}
}
Commits
The commits.json
gives stats about commits:
[
{
"author": "Aurelien PERROT <perrot@cerfacs.fr>",
"date": "Sat Jan 29 21:59:55 2022 +0100",
"files": 1,
"insertions": 1,
"deletions": 0,
"revision": "872de25e0372d907bbe366e09a9049c05aa609c2"
},
{
"author": "Victor Xing <xing@cerfacs.fr>",
"date": "Tue Jan 11 16:37:28 2022 +0100",
"files": 1,
"insertions": 3,
"deletions": 0,
"revision": "1676e6231f444e6ba924bed689312e019d466f92"
},
{
"author": "Gabriel Staffelbach <gabriel.staffelbach@cerfacs.fr>",
"date": "Tue Jan 25 14:21:35 2022 +0100",
"files": 2,
"insertions": 0,
"deletions": 9,
"revision": "0cbe1e4464204f48bbef912428396a00089b6a82"
}
]
Complexity, with lizard
The code complexity is found with Lizard. This is computed for each function, i.e. :
Function
Subroutine
Method
etc..
The file itself looks like
NLOC,CCN,token,param,size,function@line@file,file,function,call,start,end
54,5,610,2,110,"add_vortex@36-145@./SOURCES/TOOLS/INIT/add_vortex.f90","./SOURCES/TOOLS/INIT/add_vortex.f90","add_vortex","add_vortex( grid , vortex )",36,145
183,33,1629,2,324,"gas_out_main@28-351@./SOURCES/TOOLS/INIT/gas_out_main.f90","./SOURCES/TOOLS/INIT/gas_out_main.f90","gas_out_main","gas_out_main( grid , gas_out )",28,351
(...)
Here some explanations are required:
NLOC
is the number of lines of code, per functionCCN
is the cyclomatic complexity number, as defined by Mc Cabe.token
is the number of tokens (let say “words”) in the functionparam
the nb of parameters detectedsize
the nb of charactersfunction@line@file
the full location of the contextfile
the file of the contextfunction
the name of the functioncall
the way the function was calledstart
line start in the fileend
line end in the file