1

Generating fake security data with Python and faker-security

 2 years ago
source link: https://snyk.io/blog/generating-fake-security-data-python-faker-security/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Generating fake security data with Python and faker-security

michael-aquilina-150x150.jpg
Michael AquilinaApril 25, 2022

Snyk recently open sourced our faker-security Python package to help anyone working with security data. In this blog post, we’ll briefly go over what this Python package is and how to use it. But first, we’ll get some context for how the factory_boy Python package can be used in combination with faker-security to improve your test-writing experience during development.

Note: Some knowledge of Python is helpful for getting the most out of this post.

Testing with Faker and FactoryBoy

Before diving into faker-security, it’s helpful to start with what FactoryBoy and Faker are and how we use them within Snyk.

Snyk believes strongly in the ability of automated tests to make our code maintainable. Tests allow us to iterate and develop features quickly, and confidently make changes to our code without fearing we may inadvertently break existing features in the process. 

Our commitment to testing drives us to find new ways to simplify the testing experience for the test writers and readers within our teams. Faker and factory_boy are two of our favorite packages for testing Python projects. Together, they generate fake instances of models we use in testing.

Faker is a Python package that allows you to generate fake data for many different kinds of fields, like usernames, dates, and URLs. FactoryBoy is another Python package that helps integrate Faker’s data generation into your code by defining factory classes.

What we love about FactoryBoy, in particular, is that it allows a test author to focus on pinning the data they care about within their tests, while leaving Faker to generate all the other data that the test does not care about. This greatly improves test readability by reducing the required lines of code and removing noise from fields you do not need to worry about.

To see the difference in action, compare a test that’s written with factory_boy to one that isn’t in the following examples.

Without factories:

from django.contrib.auth.models import User

def test_correct_email_address():
    user = User(
        first_name="Sherlock",
        last_name="Holmes", 
        username="sherlock.holmes",
        email="[email protected]",
        is_admin=False,
    )

    assert has_valid_email(user) is True

With factories:

from tests.factories import UserFactory

def test_correct_email_address():
    user = UserFactory(email="[email protected]")
    assert has_valid_email(user) is True

The test importing UserFactory is exactly equivalent to the one which does not. However, it is shorter, easier to read, and clearly displays the fields that matter to the test. In comparison, the non-factory test is longer and makes it difficult to understand which fields actually matter for the purposes of the test. This is a fairly simple example, but the difference becomes even more pronounced as test complexity increases.

The UserFactory class can be defined in tests/factories.py once and re-used in all of your tests:

import factory
from django.contrib.auth.models import User
from factory.django import DjangoModelFactory


class UserFactory(DjangoModelFactory):
    class Meta:
        model = User

    username = factory.Faker("slug")
    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")
    email = factory.Faker("email")
    is_admin = False

When dealing with security data, we often need to generate data for security fields like CVSSv3 vectors and CVE identifiers. Fakerdoes not have a direct way of providing this data by default, but it does allow you to add your own providers, which is exactly where faker-security comes into play.

What is faker-security?

faker-security is a Python package that acts as a Faker provider, allowing you to randomly generate security-related data for your projects. Currently, faker-security supports data generation for:

  • CVSSv3 vectors
  • CVSSv2 vectors
  • semver versions
  • NPM semver version ranges

In the future, we hope to cover more generation methods and types of version ranges — like the Maven semver.

Building on our previous examples, if we want to create a VulnerabilityFactory of some kind to generate fake data, we would define it as follows:

import factory
from factory.django import DjangoModelFactory
from faker_security.providers import SecurityProvider
from myproject.models import Vulnerability

factory.Faker.add_provider(SecurityProvider)


Class VulnerabilityFactory(DjangoModelFatory):
    class Meta:
        model = Vulnerability

    cvss_v3_vector = factory.Faker("cvssv3")
    cve_id = factory.Faker("cve")
    cwe_id = factory.Faker("cwe")

How to use faker-security

faker-security can be installed via pip:

pip install faker-security

If you want to use it within your project, add it to your dependency file of choice. This is typically your project’s requirements.txt file. If you are using a higher-level package manager like poetry or pipenv, follow their instructions for adding new packages.

Once installed, you just need to configure Faker or factory_boy to make use of faker-security.

If you are running tests with pytest, we recommend setting up faker-security for factory_boy in your conftest.py file as follows:

import factory
from faker_security.providers import SecurityProvider


def pytest_configure():
    factory.Faker.add_provider(SecurityProvider)

Moving forward with faker-security 

Using factory_boy and Faker is a great way to simplify your tests, and with faker-security, you now have a quick and easy way to generate fake security data for all your projects!

We hope you find this package as useful as we do and would love to have you contribute! Please star our GitHub repo and send pull requests and contributions. Happy testing!

Secure your Python apps with Snyk

Find and fix vulns in your Python code, dependencies, containers, and configs.

Log4Shell resource center

We’ve created an extensive library of Log4Shell resources to help you understand, find and fix this Log4j vulnerability.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK