GitHub unveiled several days ago the addition of a machine learning system experimentsl to code scanning service to identify common types of vulnerabilities In the code. With this, GitHub's CodeQL-based code analysis technology has been revamped and now uses machine learning (ML) to find potential security vulnerabilities in code.
And it is that GitHub acquired the technology for CodeQL as part of the Semmie acquisition. CodeQL is used by security research teams to perform semantic analysis of code, and GitHub made it open source.
With these models, CodeQL can identify more untrusted user data streams and therefore more potential security vulnerabilities.
It is observed that the use of a machine learning system has made it possible to significantly expand the range of identified problems, in whose analysis the system is now not limited to verifying typical patterns and is not tied to known frameworks.
Of the problems identified by the new system, errors leading to cross-site scripting (XSS), distortion of file paths (for example, through the indication "/.."), substitution of SQL and NoSQL queries are mentioned. .
GitHub's new tool fue released as a free public beta For all users, the feature uses machine learning and deep learning to scan code bases and identify common security vulnerabilities before a product is shipped.
With the rapid evolution of the open source ecosystem, there is an ever-increasing long tail of libraries that are used less frequently. We use examples from manually created CodeQL queries to train deep learning models to recognize open source libraries as well as internally developed closed source libraries.
The tool is designed to look for the four most common vulnerabilities that affect projects written in these two languages: cross-site scripting (XSS), route injection, NoSQL injection and SQL injection.
The code scanning service allows you to detect vulnerabilities at an early stage of development by scanning each git push operation for potential issues.
The result is attached directly to the pull request. Previously, the check was done using the CodeQL engine, which analyzes patterns with typical examples of vulnerable code (CodeQL allows you to generate a template of vulnerable code to detect the presence of a similar vulnerability in the code of other projects).
The new machine learning engine can identify previously unknown vulnerabilities because it is not tied to the iteration of code patterns that describe specific vulnerabilities. The price of such an opportunity is an increase in the number of false positives compared to CodeQL-based checks.
Finally for those interested in knowing more about it, you can check the details In the following link.