GitHub launched a machine learning system to find vulnerabilities in code

github logo

GitHub unveiled several days ago the addition of a machine learning system experimentsl to code scanning service to identify common types of vulnerabilities In the code. With this, GitHub's CodeQL-based code analysis technology has been revamped and now uses machine learning (ML) to find potential security vulnerabilities in code.

And it is that GitHub acquired the technology for CodeQL as part of the Semmie acquisition. CodeQL is used by security research teams to perform semantic analysis of code, and GitHub made it open source.

With these models, CodeQL can identify more untrusted user data streams and therefore more potential security vulnerabilities.

It is observed that the use of a machine learning system has made it possible to significantly expand the range of identified problems, in whose analysis the system is now not limited to verifying typical patterns and is not tied to known frameworks.

Of the problems identified by the new system, errors leading to cross-site scripting (XSS), distortion of file paths (for example, through the indication "/.."), substitution of SQL and NoSQL queries are mentioned. .

Code scanning can now find more potential security vulnerabilities by leveraging a new deep learning model. This experimental feature is available in public beta for JavaScript and TypeScript repositories on

GitHub's new tool fue released as a free public beta For all users, the feature uses machine learning and deep learning to scan code bases and identify common security vulnerabilities before a product is shipped.

The experimental feature is currently available to all platform users, including GitHub Enterprise users as a GitHub Advanced Security Feature, and can be used for projects written in JavaScript or TypeScript.

With the rapid evolution of the open source ecosystem, there is an ever-increasing long tail of libraries that are used less frequently. We use examples from manually created CodeQL queries to train deep learning models to recognize open source libraries as well as internally developed closed source libraries.

The tool is designed to look for the four most common vulnerabilities that affect projects written in these two languages: cross-site scripting (XSS), route injection, NoSQL injection and SQL injection.

The code scanning service allows you to detect vulnerabilities at an early stage of development by scanning each git push operation for potential issues.

The result is attached directly to the pull request. Previously, the check was done using the CodeQL engine, which analyzes patterns with typical examples of vulnerable code (CodeQL allows you to generate a template of vulnerable code to detect the presence of a similar vulnerability in the code of other projects).

With new analysis capabilities, Code Scanning can generate even more alerts for four common vulnerability patterns: Cross-Site Scripting (XSS), Path Injection, NoSQL Injection, and SQL Injection. Together, these four vulnerability types represent many of the recent vulnerabilities (CVEs) in the JavaScript/TypeScript ecosystem, and improving the ability of code scanning to detect such vulnerabilities early in the development process is key to helping developers write more secure code.

The new machine learning engine can identify previously unknown vulnerabilities because it is not tied to the iteration of code patterns that describe specific vulnerabilities. The price of such an opportunity is an increase in the number of false positives compared to CodeQL-based checks.

Finally for those interested in knowing more about it, you can check the details In the following link.

Also it is important to mention that in the testing stage, the new functionality is currently only available for repositories with JavaScript and TypeScript code.

The content of the article adheres to our principles of editorial ethics. To report an error click here!.

Be the first to comment

Leave a Comment

Your email address will not be published.



  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.