Source Code Analysis: Code Sniffer
Choice problem
In the previous article we examined the only way of code analysis – PHP tokinizer. But digging deeper you’ll find many of options. Well-known expert of the subject Sebasitan Bergmann points out following
- Dynamic code analysis -* Xdebug (can be coupled with PHPUnit) -* php-code-coverage
- Static code analysis -* Token-Level Analysis by ext/tokenizer and PHP_TokenStream -* Syntax-Level Analysis by PHP_Reflection_AST and ext/parse_tree -* Bytecode-Level Analysis by ext/bytekit As far as you see, with such a set of tools you can do whatever you want. But here we are going to discuss the only application, which is known as code sniffing.
Code Sniffer
Ok, you have some team, working on the projects and, obviously, you have some code convention to adhere. The lead, making code review, checks the compliance with standards by sight. Common, tell me you are not fond of it. There is an amazing tool but again by Sebastian Bergmann so-called PHP CodeSniffer. Ideally you just select from provides standards and particular sniffs what you need, point them into new created standard folder (/Standards/YourCompany/YourCompanyCodingStandard.php) ad now you can check the source code automatically. When using the command line it can look like that:
phpcs --standard=./Standards/YouCompany ./pathToSourceCodeFiles > report.txt
You’ll find in the report.txt what is wrong with you code. Going further, you can bind the instruction to the application builder script. So when you are deploying project test version / production version from developers SVN, all the committed code is checked by the code sniffer. The report will likely appear in CI-tool if you use any. For example, phpUnderControl has precise section CodeSniffer where the report can be bound to.
But trying PHP CodeSniffer you find the need to modify some of provided sniffs or write new ones. Here is the tutorial from PHP CodeSniffer, but it’s much useful just to study how the provided sniffs are written.
Well, let’s see on an example, what I mean. First, you are creating your standard folder and put there configuration file. That is about all the provided sniffs you want to apply:
<?php
class PHP_CodeSniffer_Standards_YourCompany_YourCompanyCodingStandard extends PHP_CodeSniffer_Standards_CodingStandard
{
public function getIncludedSniffs()
{
return array(
// Checks that the opening brace of a function is on the line after the
// function declaration.
// function fooFunction($arg1, $arg2 = '') [\n] {
'Generic/Sniffs/Functions/OpeningFunctionBraceBsdAllmanSniff.php',
// Makes sure that shorthand PHP open tags are not used.
// Always use <%php %> to delimit PHP code, not the <% %> shorthand
'Generic/Sniffs/PHP/DisallowShortOpenTagSniff.php',
// Throws errors if tabs are used for indentation.
'Generic/Sniffs/WhiteSpace/DisallowTabIndentSniff.php',
// JS control for not use of firebug.console
'MySource/Sniffs/Debug/FirebugConsoleSniff.php',
// Checks the declaration of the class is correct.
'PEAR/Sniffs/Classes/ClassDeclarationSniff.php',
// A Sniff to ensure that parameters defined for a function that have a default
// value come at the end of the function signature.
'PEAR/Sniffs/Functions/ValidDefaultValueSniff.php',
// Checks that the closing braces of scopes are aligned correctly.
'PEAR/Sniffs/WhiteSpace/ScopeClosingBraceSniff.php',
// Tests for functions outside of classes.
'Squiz/Sniffs/Functions/GlobalFunctionSniff.php',
// Runs the Zend Code Analyzer (from Zend Studio) on the file.
'Zend/Sniffs/Debug/CodeAnalyzerSniff.php',
// Checks that the file does not end with a closing tag.
'Zend/Sniffs/Files/ClosingTagSniff.php',
// Checks the naming of variables and member variables.
'Zend/Sniffs/NamingConventions/ValidVariableNameSniff.php',
);
}
}
Then you create Sniffs folder and fill it with your own sniffs or modificated sniffs of the package. For example, you follow Zend Framework coding standard and you need a sniff to check that the only class declared per file and class name corresponds the naming convention (the path to the file is reflected in its name). So we find a sniff of this sort among the provided ones (ClassDeclarationSniff.php or Squiz or of PEAR standard) . We study how it’s made and write our own sniff:
class YourCompany_Sniffs_Classes_ClassDeclarationSniff implements PHP_CodeSniffer_Sniff
{
/**
*
* @var array
*/
protected $_filesWithClasses = array();
/**
* Returns an array of tokens this test wants to listen for.
*
* @return array
*/
public function register()
{
return array(
T_CLASS,
T_INTERFACE,
);
}//end register()
/**
* Processes this test, when one of its tokens is encountered.
*
* @param PHP_CodeSniffer_File $phpcsFile The file being scanned.
* @param int $stackPtr The position of the current token in the
* stack passed in $tokens.
*
* @return void
*/
public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr)
{
$tokens = $phpcsFile->getTokens();
$classToken = $phpcsFile->findNext(T_STRING, $stackPtr);
$className = $tokens[$classToken]['content'];
if (in_array($fileName, $this->_filesWithClasses)) {
$phpcsFile->addError('There must by the only class per file', $classToken);
return;
}
$fileName = str_replace(LIB_PATH, "", $phpcsFile->getFilename());
array_push($this->_filesWithClasses, $fileName);
// Check for Lib files if they corresponds to ZF Name Convention
if (false !== strstr($phpcsFile->getFilename(), LIB_PATH)) {
if ($fileName != "/" . str_replace("_", '/', $className) . ".php") {
$phpcsFile->addError($className . ' does not corresponds to Class Name Convention. Check file name: '
. $fileName, $classToken);
return;
}
}
}//end process()
}//end class
You see here the getTokens method of the PHP_CodeSniffer_File object returns array of the following structure:
...
[107] => Array
(
[type] => T_OPEN_PARENTHESIS
[code] => 1004
[content] => (
[line] => 33
[parenthesis_opener] => 107
[parenthesis_owner] => 104
[parenthesis_closer] => 108
[column] => 38
[level] => 1
[conditions] => Array
(
[16] => 352
)
)
...
Comparing to PHP tokenizer it’s much more detailed
The index for encountered token (the list of what the tokens we are looking for is specified in the array of sniff class registered() method) is passed in as $stackPtr argument. So you can navigate on the FILE iterator like $phpcsFile->findNext(..) or $phpcsFile->findPrevious(..). Matching found token info with conditional values we can generate error $phpcsFile->addError or warning $phpcsFile->addWarning when it’s required.
What the custom sniffs I would need?
From my experience you are going to need to sniff whitespaces in control structures. So the expected code is like that:
if (..) {.. } // valid
if(..) { ..} // invalid
Another sniff you may need is a modification of MultiLineConditionSniff.php to check multiline condition against EOL between closing parenthesis and opening brace:
// VALID
if (($veryLongCondition1 >= $veryLongCondition2)
($veryLongCondition2 >= $veryLongCondition3)
($veryLongCondition1 >= $veryLongCondition2))
{
// ...
}
// INVALID
if (($veryLongCondition1 >= $veryLongCondition2)
($veryLongCondition2 >= $veryLongCondition3)
($veryLongCondition1 >= $veryLongCondition2)) {
}
It’s a controversial point, because by Zend Framework standard conditional structures have EOLs before the opening parenthesis. Though in the case of multiline condition it will be indistinct where is the begin and the end for the structure.
I’m sure you have some custom template for PHPDOC headers, so take a look on FileCommentSniff.php to customize it.
You will find in all the implementations of LineLengthSniff.php the limit is set to 80/85 columns. For God sake, it’s not MS-DOS epoch anymore, let’s extend it till 100 at least.
I have a function syntax sniffs to propose. Function argument list can be multiline and in the case of, our standard for instance, comma can be on a new line. So space is still prohibited before the comma, but EOL is permitted:
function($arg1, $arg1, $arg1, $arg1, $arg1, $arg1, $arg1, $arg1
, $arg2);
Oh, this one you will like. In original implementation of ValidFunctionNameSniff.php by PEAR protected scope is considered as isPublic and demanded not to be underscored. So that is something to fix.
I guess you would want to check whitespaces for string concatenation
$string = "part1" . "part2"; // valid
$string = "part1"."part2"; // invalid
Well, that’s all from my side, and that is only an example. The set of sniffs will depend of what the requirements you have according to your own coding standards.