“Storms don’t come to teach us painful lessons, rather they were meant to wash us clean.” - Socrates
As the first step in its analysis, Sokrates cleans the code to simplify analyses and to improve their reliability. Cleaning includes removal of comments, empty lines, and long string constants.
Cleaning for Lines of Code Calculations
The central unit of measurement in Sokrates analyses is a line of code. When counting lines fo code, however, Sokrates removes comments and empty lines. Sokrates expresses the size of files and other objects, such as components and concerns in lines of code that do not include blank lines and comments.
For example, the following fragment of code has 35 lines of uncleaned code:
After cleaning the code to remove comment and empty lines, only 17 lines ode code are left, and these lines are counted for size calculations:
Cleaning for Duplication Calculations
Before duplication measurements, Sokrates cleanes the code to remove empty lines, comments, and frequently duplicated constructs such as import statements.
Here is an example of code cleaning for duplication calculations:
Before the cleaning, the code has 25 lines:
After removal of empty lines and comments, 16 lines are left:
Lastly, Sokrates removes statements that are frequenlty automatically inserted and highly duplicated, such as import statements. Sokrates also removes leading and trailing or repeated whitespaces in each line, to be able to identify pieces of code that only differ by their whitespace distribution. This process leads to the following 9 lines that are used to detect duplication:
Preview the Cleaning in Sokrates Explorer
Sokrates values transparency, so to better understand Sokrates cleaning process, you can use Sokrates Explorer file preview panel to see how the content of each file looks after cleaning:
Figure 1: You can use Sokrates Explorer file preview panel to see how the content of each file looks after cleaning.