Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: feature
---
* Data flow barriers and barrier guards can now be added using data extensions. For more information see `Customizing library models for C and C++ <https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-cpp/>`__.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: feature
---
* Data flow barriers and barrier guards can now be added using data extensions. For more information see `Customizing library models for C# <https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-csharp/>`__.
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ The CodeQL library for CPP analysis exposes the following extensible predicates:
- ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models <threat-models-cpp>`."
- ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable.
- ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements.
- ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model barriers, which are elements that stop the flow of taint.
- ``barrierGuardModel(namespace, type, boolean subtypes, name, signature, ext, input, acceptingvalue, kind, provenance)``. This is used to model barrier guards, which are elements that can stop the flow of taint depending on a conditional check.

The extensible predicates are populated using the models defined in data extension files.

Expand Down Expand Up @@ -176,6 +178,88 @@ The remaining values are used to define the input and output specifications, the
- The ninth value ``"taint"`` is the kind of the flow. ``taint`` means that taint is propagated through the call.
- The tenth value ``"manual"`` is the provenance of the summary, which is used to identify the origin of the summary model.

Example: Taint barrier using the ``mysql_real_escape_string`` function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This example shows how the CPP query pack models the ``mysql_real_escape_string`` function as a barrier for SQL injection.
This function escapes special characters in a string for use in an SQL statement, which prevents SQL injection attacks.

.. code-block:: cpp

char *query = "SELECT * FROM users WHERE name = '%s'";
char *name = get_untrusted_input();
char *escaped_name = new char[2 * strlen(name) + 1];
mysql_real_escape_string(mysql, escaped_name, name, strlen(name)); // The escaped_name is safe for SQL injection.
sprintf(query_buffer, query, escaped_name);

We need to add a tuple to the ``barrierModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file.

.. code-block:: yaml

extensions:
- addsTo:
pack: codeql/cpp-all
extensible: barrierModel
data:
- ["", "", False, "mysql_real_escape_string", "", "", "Argument[*1]", "sql-injection", "manual"]

Since we are adding a barrier, we need to add a tuple to the ``barrierModel`` extensible predicate.
The first five values identify the callable (in this case a free function) to be modeled as a barrier.

- The first value ``""`` is the namespace name.
- The second value ``""`` is the name of the type (class) that contains the method. Because we're modelling a free function, the type is left blank.
- The third value ``False`` is a flag that indicates whether or not the barrier also applies to all overrides of the method. For a free function, this should be ``False``.
- The fourth value ``"mysql_real_escape_string"`` is the function name.
- The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name.

The sixth value should be left empty and is out of scope for this documentation.
The remaining values are used to define the output specification, the ``kind``, and the ``provenance`` (origin) of the barrier.

- The seventh value ``"Argument[*1]"`` is the output specification, which means in this case that the barrier is the first indirection (or pointed-to value, ``*``) of the second argument (``Argument[1]``) passed to the function.
- The eighth value ``"sql-injection"`` is the kind of the barrier. The barrier kind is used to define the queries where the barrier is in scope.
- The ninth value ``"manual"`` is the provenance of the barrier, which is used to identify the origin of the barrier model.

Example: Add a barrier guard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This example shows how to model a barrier guard that stops the flow of taint when a conditional check is performed on data.
A barrier guard model is used when a function returns a boolean that indicates whether the data is safe to use.
Consider a function called ``is_safe`` which returns ``true`` when the data is considered safe.

.. code-block:: cpp

if (is_safe(user_input)) { // The check guards the use, so the input is safe.
mysql_query(user_input); // This is safe.
}

We need to add a tuple to the ``barrierGuardModel``\(namespace, type, subtypes, name, signature, ext, input, acceptingvalue, kind, provenance) extensible predicate by updating a data extension file.

.. code-block:: yaml

extensions:
- addsTo:
pack: codeql/cpp-all
extensible: barrierGuardModel
data:
- ["", "", False, "is_safe", "", "", "Argument[*0]", "true", "sql-injection", "manual"]

Since we are adding a barrier guard, we need to add a tuple to the ``barrierGuardModel`` extensible predicate.
The first five values identify the callable (in this case a free function) to be modeled as a barrier guard.

- The first value ``""`` is the namespace name.
- The second value ``""`` is the name of the type (class) that contains the method. Because we're modelling a free function, the type is left blank.
- The third value ``False`` is a flag that indicates whether or not the barrier guard also applies to all overrides of the method. For a free function, this should be ``False``.
- The fourth value ``"is_safe"`` is the function name.
- The fifth value is the function input type signature, which can be used to narrow down between functions that have the same name.

The sixth value should be left empty and is out of scope for this documentation.
The remaining values are used to define the input specification, the ``accepting value``, the ``kind``, and the ``provenance`` (origin) of the barrier guard.

- The seventh value ``Argument[*0]`` is the input specification (the value being validated). In this case, the first indirection (or pointed-to value, ``*``) of the first argument (``Argument[0]``) passed to the function.
- The eighth value ``true`` is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply.
- The ninth value ``sql-injection`` is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope.
- The tenth value ``manual`` is the provenance of the barrier guard, which is used to identify the origin of the barrier guard.

.. _threat-models-cpp:

Threat models
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ The CodeQL library for C# analysis exposes the following extensible predicates:
- ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models <threat-models-csharp>`."
- ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable.
- ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements.
- ``barrierModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model barriers, which are elements that stop the flow of taint.
- ``barrierGuardModel(namespace, type, boolean subtypes, name, signature, ext, input, acceptingvalue, kind, provenance)``. This is used to model barrier guards, which are elements that can stop the flow of taint depending on a conditional check.
- ``neutralModel(namespace, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) can be used to override generated summary models (those with a provenance such as ``df-generated``), so that the summary model will be ignored. Other than that, neutral models have no effect.

The extensible predicates are populated using the models defined in data extension files.
Expand Down Expand Up @@ -307,6 +309,90 @@ For the remaining values for both rows:

That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided to ``Select``. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned from ``Select``.

Example: Add a barrier for the ``RawUrl`` property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This example shows how we can model a property as a barrier for a specific kind of query.
A barrier model is used to define that the flow of taint stops at the modeled element for the specified kind of query.
Here we model the getter of the ``RawUrl`` property of the ``HttpRequest`` class as a barrier for URL redirection queries.
The ``RawUrl`` property returns the raw URL of the current request, which is considered safe for URL redirects because it is the URL of the current request and cannot be manipulated by an attacker.

.. code-block:: csharp

public static void TaintBarrier(HttpRequest request) {
string url = request.RawUrl; // The return value of this property is considered safe for URL redirects.
Response.Redirect(url); // This is not a URL redirection vulnerability.
}

We need to add a tuple to the ``barrierModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file.

.. code-block:: yaml

extensions:
- addsTo:
pack: codeql/csharp-all
extensible: barrierModel
data:
- ["System.Web", "HttpRequest", False, "get_RawUrl", "()", "", "ReturnValue", "url-redirection", "manual"]

Since we are adding a barrier, we need to add a tuple to the ``barrierModel`` extensible predicate.
The first five values identify the callable (in this case the getter of a property) to be modeled as a barrier.

- The first value ``System.Web`` is the namespace name.
- The second value ``HttpRequest`` is the class (type) name.
- The third value ``False`` is a flag that indicates whether or not the barrier also applies to all overrides of the method.
- The fourth value ``get_RawUrl`` is the method name. Getter and setter methods are named ``get_<name>`` and ``set_<name>`` respectively.
- The fifth value ``()`` is the method input type signature.

The sixth value should be left empty and is out of scope for this documentation.
The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the barrier.

- The seventh value ``ReturnValue`` is the access path to the return value of the property getter, which means that the return value is considered safe.
- The eighth value ``url-redirection`` is the kind of the barrier. The barrier kind is used to define the queries where the barrier is in scope. In this case - the URL redirection queries.
- The ninth value ``manual`` is the provenance of the barrier, which is used to identify the origin of the barrier.

Example: Add a barrier guard for the ``IsAbsoluteUri`` property
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This example shows how we can model a property as a barrier guard for a specific kind of query.
A barrier guard model is used to stop the flow of taint when a conditional check is performed on data.
Here we model the getter of the ``IsAbsoluteUri`` property of the ``Uri`` class as a barrier guard for URL redirection queries.
When the ``IsAbsoluteUri`` property returns ``false``, the URL is relative and therefore safe for URL redirects because it cannot redirect to an external site controlled by an attacker.

.. code-block:: csharp

public static void TaintBarrierGuard(Uri uri) {
if (!uri.IsAbsoluteUri) { // The check guards the redirect, so the URL is safe.
Response.Redirect(uri.ToString()); // This is not a URL redirection vulnerability.
}
}

We need to add a tuple to the ``barrierGuardModel``\(namespace, type, subtypes, name, signature, ext, input, acceptingvalue, kind, provenance) extensible predicate by updating a data extension file.

.. code-block:: yaml

extensions:
- addsTo:
pack: codeql/csharp-all
extensible: barrierGuardModel
data:
- ["System", "Uri", False, "get_IsAbsoluteUri", "()", "", "Argument[this]", "false", "url-redirection", "manual"]

Since we are adding a barrier guard, we need to add a tuple to the ``barrierGuardModel`` extensible predicate.
The first five values identify the callable (in this case the getter of a property) to be modeled as a barrier guard.

- The first value ``System`` is the namespace name.
- The second value ``Uri`` is the class (type) name.
- The third value ``False`` is a flag that indicates whether or not the barrier guard also applies to all overrides of the method.
- The fourth value ``get_IsAbsoluteUri`` is the method name. Getter and setter methods are named ``get_<name>`` and ``set_<name>`` respectively.
- The fifth value ``()`` is the method input type signature.

The sixth value should be left empty and is out of scope for this documentation.
The remaining values are used to define the ``access path``, the ``accepting value``, the ``kind``, and the ``provenance`` (origin) of the barrier guard.

- The seventh value ``Argument[this]`` is the access path to the input whose flow is blocked. In this case, the qualifier of the property access (``uri`` in the example).
- The eighth value ``false`` is the accepting value of the barrier guard. This is the value that the conditional check must return for the barrier to apply. In this case, when ``IsAbsoluteUri`` is ``false``, the URL is relative and considered safe.
- The ninth value ``url-redirection`` is the kind of the barrier guard. The barrier guard kind is used to define the queries where the barrier guard is in scope. In this case - the URL redirection queries.
- The tenth value ``manual`` is the provenance of the barrier guard, which is used to identify the origin of the barrier guard.

Example: Add a ``neutral`` method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This example shows how we can model a method as being neutral with respect to flow. We will also cover how to model a property by modeling the getter of the ``Now`` property of the ``DateTime`` class as neutral.
Expand Down
Loading
Loading