3
votes

I'm trying to get PHP_CodeSniffer to check for camelCase in class names, however it seems to me camelCase checking is impossible (without a dictionary, including techy words).

I've raked the internet but so far the only options I've seen would be if the string had some common delimiter to explode from - i.e. underscore, space between words etc.

And even this is not useful as checking could only be accurate if the name accurately/always contained a delimiter between each word.
And the point of "checking" would be to determine if the name is not formatted correctly, and this could include not delimiting correctly.

Also, resources on PHP_CodeSniffer are either rare, or so basic and techy only the writer/developer would understand it.

Current Standard Sniff Checks

I've found this code in some of the current Sniffs (i.e. Squiz and PEAR standards):

if (PHP_CodeSniffer::isCamelCaps($functionName, false, true, false) === false) 

However, I've looked at the PHP_CodeSniffer core code and this function only does the following:

// Check the first character first.
// Check that the name only contains legal characters.
// Check that there are not two capital letters next to each other.
// The character is a number, so it cant be a capital.

These basic checks are better than nothing, although arguably useless for their supposed intended purpose, as they do not really check for camelCase at all.

The Question

How can a Sniff (or i.e. PHP script) know which "words" to check in a given string to identify if the string is 100% camelCase?


EDIT

Examples

Correct camelCase: class calculateAdminLoginCount

// Not camelCase
class calculateadminlogincount

// Partially camelCase
class calculateadminLogincount

How can the isCamelCaps() function (or any PHP script for that matter) catch the above two examples?

How can the function or a PHP script identify "separate words" from a string, when it has no concept of "words" without feeding it that info (i.e. from a dictionary)?

Even if a script where to explode, what would it explode based on?

Take class calculateadminLogincount
How can any PHP script identify that calculate admin Login count are different words in that string to then be able to check if: 1st letter 1st word is lowercase, then all subsequent words 1st letters are uppercase?

isCamelCaps() function

public static function isCamelCaps(
    $string,
    $classFormat=false,
    $public=true,
    $strict=true
) {

        // Check the first character first.
        if ($classFormat === false) {
            $legalFirstChar = '';
            if ($public === false) {
                $legalFirstChar = '[_]';
            }

            if ($strict === false) {
                // Can either start with a lowercase letter, 
                // or multiple uppercase
                // in a row, representing an acronym.
                $legalFirstChar .= '([A-Z]{2,}|[a-z])';
            } else {
                $legalFirstChar .= '[a-z]';
            }
        } else {
            $legalFirstChar = '[A-Z]';
        }

        if (preg_match("/^$legalFirstChar/", $string) === 0) {
            return false;
        }

        // Check that the name only contains legal characters.
        $legalChars = 'a-zA-Z0-9';
        if (preg_match("|[^$legalChars]|", substr($string, 1)) > 0) {
            return false;
        }

        if ($strict === true) {
            // Check that there are not two capital letters 
            // next to each other.
            $length          = strlen($string);
            $lastCharWasCaps = $classFormat;

            for ($i = 1; $i < $length; $i++) {
                $ascii = ord($string{$i});
                if ($ascii >= 48 && $ascii <= 57) {
                    // The character is a number, so it cant be a capital.
                    $isCaps = false;
                } else {
                    if (strtoupper($string{$i}) === $string{$i}) {
                        $isCaps = true;
                    } else {
                        $isCaps = false;
                    }
                }

                if ($isCaps === true && $lastCharWasCaps === true) {
                    return false;
                }

                $lastCharWasCaps = $isCaps;
            }
        }//end if

        return true;

    }//end isCamelCaps()

EDIT 2

A little info for those wondering if this is worthwhile or not, or if I'm just "messing around" and "having fun":

It is imperative for class names to be correctly named throughout, as the file/folder structure and names and class names have to match in order for the autoloader to work solidly.

While I have checks in the Core code itself for checking and handling such issues if a script, class, etc cannot be loaded (of course), there's nothing wrong with additional scripting (PHP_CodeSniffer) to run through all files and tell me where a potential issue may lie.
Even if just for a second check, especially as it also ensures code base is tidy, correctly structured, and has continuity throughout.

2
Can you give some examples of strings that are causing problems, where something is identified as camelCase but isn't, or vice versa?Andy Lester
I don't understand how the isCamelCaps tests are inadequate. What example fails those tests?sjagr
Hmm, that's super tricky and probably impossible. The best I can think of is a "tolerance" limit to throw warnings if there is a possible fail (but not definite.) For example, if there are 20 characters and only one capital letter, that's a "possible failure."sjagr
@sjagr char count might be a useful idea. Won't be accurate, as can have long words, short words, etc, but coupled with the other checks it might be something I can play with. Whatever checks I end up with in total, they'll only be a warning/notice which can be ignored, or sometimes may end up being useful.James
Exactly what I mean! Combine it with the dictionary thing too maybe. Generally if you're doing programming right, you're not going to have lengthy words in a camelCase variable or class name.sjagr

2 Answers

0
votes

You can analyze the function names for correct capitalization by breaking apart the word where the case transitions. For each part of the original function name, look up that sub-word in a dictionary or a dictionary + jargon file ('calc', 'url', 'admin', etc (perhaps check jargon first)). If any sub-word fails then the proper capitalization is not in place.

You can use Solr or ElasticSearch to break your words apart for you with the WordDelimiterFilter in Lucene. This will create sub-words when the case changes:

"PowerShot" -> "Power" "Shot" "LoginURL" => "Login" "URL"

You can either insert the words directly into these NoSQL databases and do your analysis later, or you can (at least in ES) simply use the word delimiter token filter to break apart your query without actually saving the results.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Example:

calcAdminLogin => calc Admin Login

calcadminlogin => calcadminlogin

If you have a supplemental dictionary that contains words like 'calc' and 'admin' then the first function name would decompose into 3 words that will be present in a dictionary, therefore the camel case is correct.

In the second example 'calcadminlogin' will fail to be found in the dictionary, therefore the camel casing is incorrect.

0
votes

I've made a few scripts to try to "loosely" identify if a class name is camelCase.

Some of the scripts I wrote for my scenario won't help others, eg they're too specific to my own naming conventions (I've not included them here).
So my actual collection of scripts makes it all worthwhile, but hopefully the ones more generic below will help someone else.

For example I prefix class names with a lowercase word, so I check if the word after that prefix is uppercase.
For those (most people) who do not prefix class names with a specific word, then it's easy enough to check first char of string is lowercase.

Criticisms very welcome.


Only allow mixed case alpha

This makes sure the class name only contains upper or lower case alpha letters (A-z), which is needed for the camelCase checks (if you remove this script, you'll need to change the other scripts to accommodate for the potential of non-alpha chars).

/** Check string is only alpha (A-z) */
if (ctype_alpha($name) === false) {
  $error = '%s name must only contain alpha chars (A-z)';
  $phpcsFile->addError($error, $stackPtr, 'AlphaChars', $errorData);
  return;
}

No two uppercase chars together

Some standards allow for acronyms etc, however my standards do not allow this as it's not strict camelCase and breaks the flow of reading.

e.g. userSitePHPLogin is invalid, and userSitePhpLogin is valid.

(There's probs a more elegant way of doing this, but it works fine and given it's for PHP_CodeSniffer, I don't need micro optimisation)

/** Check for uppercase chars together */
$nameUppercaseExplode = preg_split('/(?=[A-Z])/', $name);
$totalIllegalUpperChars = 0;

foreach ($nameUppercaseExplode as $namePiece) {
  if (strlen($namePiece) == 1) {
    $totalIllegalUpperChars++;
  }
}

if ($totalIllegalUpperChars >0) {
  $warning = 'Class name seems invalid; 
  Total '.$totalIllegalUpperChars.' uppercase chars not part of camelCase';
  $phpcsFile->addWarning($warning, $stackPtr, 'UppercaseTogether', $errorData);
}

e.g. class name DUserPHPUserclassLogin returns:

Class name seems invalid; Total 4 uppercase chars not part of camelCase

It's not perfect, as it's 1 out on that check.
But it will only return a warning if there is at least 1 occurrence of uppercase together.

e.g. class name classDUserPhpUserLogin returns:

Class name seems invalid; Total 1 uppercase chars not part of camelCase

So this at least prompts the dev to check the name and fix it as appropriate.


Check if total uppercase chars less than total words

Thanks to sjagr for the idea.

"Total words" is of course a "guessed" figure, based on 5 char average for each word - because it seems the official average is around 4.7 chars per average word.

/** Loose check if total (guessed) words not match total uppercase chars */
$totalWordsGuess = ceil(strlen($name) / 5);
$totalUpperChars = strlen(preg_replace('![^A-Z]+!', '', $name));

// Pointless if only 1 word (camelCase not exist)
if ($totalWordsGuess >1) {

  // Remove the first word which should be lowercase
  // (first word should be checked in separate check above this one)
  $totalWordsGuess--;

  if ($totalUpperChars < $totalWordsGuess) {
    $warning = 'Expected '.$totalWordsGuess.' camelCase words in class name; 
    Found '.$totalUpperChars;
    $phpcsFile->addWarning($warning, $stackPtr, 'BadCamelCase', $errorData);
  }

}

I've tested it and works quite well (it is only a warning for potential issues).

For example, using class name UserLoginToomanywordsWithoutcamelCase, PHP_CodeSniffer returns:

Expected 7 camelCase words in class name; Found 5

If too many false positives are returned (different devs use different words etc), then tweak the current "5" up or down a notch.

Edit: Updated this above script:

  • Added condition so script only runs if more than 1 word, as 1 word cannot be camelCase.
  • Added code to deduct 1 from the total guessed words (var --), to account for the first word being lowercase and so no uppercase count will exist for it.

You should have a separate check above this one to check the first word, which returns if first word is not lowercase.