At Houzz, we’re constantly identifying new ways to ensure the best possible experience for our community. One way to do that is by testing our systems for hidden bugs. Recently, we developed our own tool called TAP (or Type Analyzer for PHP), to add a static analysis to our dynamic language, so that we can catch issues before they are released to the public.
Static vs. dynamic language
There are two main categories of programming languages: statically typed languages like C++ and Java, and dynamically typed languages like Python, PHP and Javascript. Dynamic languages are beneficial in that they have fast development speed, do not require compilation and offer flexible design patterns. However, this flexibility leads to vulnerability. In a large codebase, it’s very easy to make mistakes like passing a map to a function which is expecting a list, or trying to access a member function $a->foo() while $a is actually null. These types of bugs typically hide in a particular code path which can’t be detected until it breaks on production, causing a negative experience for users.
Since a number of components of the Houzz website are written in PHP (a dynamic language), we sought a tool that could identify these types of bugs before new versions of code are released. The only available tools we found either didn’t have the sophistication to catch these bugs or required us to completely re-write our codebase in a new syntax and re-install local and production environments. So, we developed TAP.
The solution
TAP is a C++ tool that and scans PHP codebase, including PHPDoc comments, to identify irregularities. It is less demanding than other available tools, so you don’t have to reinstall your local and production environments to account for new syntax and it can continue to run checks without impacting daily user activities. TAP can search the entire Houzz PHP codebase and provide a report in eight minutes.
How it works
Static analyzing takes advantage of both the flexibility of dynamic languages and rigorousness of static languages. By checking the code statically during the development stage, TAP will deduce the types of local variables, test the compatibility of function arguments and check if a variable is nullable or not to identify existing bugs.
TAP uses PHPDoc for type declaration. PHPDoc is multi-line comments between /** and */, containing annotations that start with @. PHPDoc is also used by IDEs like PHPStorm for a similar purpose.
The most useful annotations are @param (for function param type), @return (for function return type) and @var (for object property type). Here is an example:
This is the resulting example report:
TAP supports four more precise types than PHP itself. To learn more about the types of errors TAP can detect, review the test PHP files at this link.
TAP Usage
TAP supports three running modes, including:
Single: This mode is typically used for demonstrating TAP’s basic functionality, and used for TAP’s self-inspection test. Use -f to specify a single PHP file you want to check. Please note that if this file uses any classes/functions/consts defined otherwhere, TAP won’t know and will report errors like DEFINITION_NOT_FOUND.
Batch: This mode will do a full scan on the whole repository. If the -s argument is specified, TAP will take it as the source root, and check the .tap config file there. If -s isn’t specified, TAP will try to find the first .tap file at or above the current directory, and check all PHP files under the location where .tap resides. Here is an example of .tap config file:
You can specify which directories to skip, and which directories to be scan-only. “ScanOnly” means that TAP will only check the functions’ signature but not the implementation, which is much faster than a full scan. It can be applied to third-party libraries and auto-generated classes.
You can use -r to specify the human-readable error report file, and -d to specify the sqlite db file which is supposed to be read by a web UI tool.
Daemon: Daemon is an experimental mode. It will run interactively during PHP development. After it is started, it will do a quick scan for the whole repository, only recording the types of class properties and function signatures, and skipping the function implementation. It will continue watching for any file changes and updating the recorded signatures in real time.
- If, for example, you think your change is ready, and want TAP do a full scan before you commit, you can explicitly tell TAP to do it on the files/directories you touched.
- This mode will ultimately become much faster than Batch mode, and more suitable for the development process.
- Note: At the time this blog post was published, Daemon mode was under development. Errors may be reported when using this mode.
Open Sourcing TAP
Once we completed TAP and tested it in our environment, we decided to open source the application. Other companies can use TAP on top of their PHP to check their own code for bugs before releasing updates to their customers. Follow this link to access the open source code.
Getting TAP
Developers can use the pre-built binary directly, by downloading the Mac OSX version here. To run the tool, execute chmod +x tap_server.
To build TAP, use cmake. Developers may need to install the dependencies beforehand (use brew, apt-get, yum or whatsoever), including:
- Boost
- Folly
- Glog
- Gflags
- Sqlite3
- Fswatch
Then create a directory for build:
Assuming the code is at /houzz/tap, run cmake to generate the makefile, then build:
Maximizing TAP
To make full use of TAP’s capabilities, it’s important to annotate classes as much as possible. TAP takes these annotations as a source of truth, deducing all local variables inside functions, and reporting type incompatibilities. The more annotations you provide for your methods and properties, the better TAP will be!
We’re always on the hunt for engineers to help us make the Houzz experience even better. Check out opportunities on our team at houzz.com/jobs.