2

Django 3.2 - News on compressed fixtures and fixtures compression

 3 years ago
source link: https://www.paulox.net/2021/04/06/django-32-news-on-compressed-fixtures-and-fixtures-compression/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

© 2019 Paolo Melchiorre "Photo of a renaissance tower helical staircase in Palazzo Ducale - Urbino, Marche, Italy" instagram.com/pauloxnet

Management Commands

As reported in the documentation, the changes are related to the scope of the management commands.

  • loaddata now supports fixtures stored in XZ archives (.xz) and LZMA archives (.lzma).

  • dumpdata now can compress data in the bz2, gz, lzma, or xz formats.

loaddata

The loaddata command searches for and loads the contents of the named fixture into the database.

Compressed fixtures

In the Django 3.2 version was addes support for xz archives (.xz) and lzma archives (.lzma).

Fixtures may be compressed in zip, gz, bz2, lzma, or xz format.

For example $ django-admin loaddata mydata.json would look for any of mydata.json, mydata.json.zip, mydata.json.gz, mydata.json.bz2, mydata.json.lzma, or mydata.json.xz.

The first file contained within a compressed archive is used.

dumpdata

The dumpdata outputs all data in the database associated with some or installed applications. The output of dumpdata can be used as input for loaddata.

Fixtures compression

In the Django 3.2 version was addes support to dump data directly to a compressed file.

The output file can be compressed with one of the bz2, gz, lzma, or xz formats by ending the filename with the corresponding extension.

For example, to output the data as a compressed JSON file $ django-admin dumpdata -o mydata.json.gz

Benchmarks

After the development of the new fixtures compression function I carried out benchmarks for all supported file formats starting from different databases, from small projects to larger ones.

The benchmark were performed on my pc and are only examples of the relationship between time, file size, memory and cpu occupation that is needed to export data directly into different type of compressed files.

System info

import os
import platform

print(f"Architecture:\t{platform.architecture()[0]}")
print(f"Machine type:\t{platform.machine()}")
print(f"System glibc:\t{platform.libc_ver()[1]}")
print(f"System memory:\t{os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')}")
print(f"System release:\t{platform.release()}")
print(f"System type:\t{platform.system()}")
print(f"Python impl.:\t{platform.python_implementation()}")
print(f"Python version:\t{platform.python_version()}")
Architecture:   64bit
Machine type:   x86_64
System glibc:   2.32
System memory:  33402449920
System release: 5.8.0-48-generic
System type:    Linux
Python impl.:   CPython
Python version: 3.8.6

Benchmark 01

type time memory cpu size txt 0.75 70300 99 826 gz 0.66 70920 99 312 bz2 0.69 70372 99 351 xz 0.67 86832 99 336

Benchmark 01

Benchmark 02

type time memory cpu size txt 0.67 70260 99 1202 gz 0.66 70868 99 501 bz2 0.66 70560 99 538 xz 0.68 86860 99 532

Benchmark 02

Benchmark 03

type time memory cpu size txt 1.03 72856 98 872126 gz 1.08 72904 99 30446 bz2 1.21 79024 99 20664 xz 1.14 96608 99 23252

Benchmark 03

Benchmark 04

type time memory cpu size txt 1.53 71304 98 2138437 gz 1.60 72004 98 257593 bz2 1.71 77732 98 198347 xz 2.42 107252 99 164072

Benchmark 04

Benchmark 05

type time memory cpu size txt 2.10 74240 98 5252952 gz 2.25 74236 98 405580 bz2 2.69 80592 98 334556 xz 3.22 137004 99 238432

Benchmark 05

Benchmark 06

type time memory cpu size txt 55.31 87012 73 12092981 gz 71.97 87200 71 845193 bz2 53.74 93372 74 688968 xz 73.19 180936 73 768812

Benchmark 06

Benchmark 07

type time memory cpu size txt 118.74 86344 74 36035128 gz 183.11 86572 71 3936656 bz2 158.76 93272 73 2719186 xz 220.65 181636 73 2586748

Benchmark 07

Benchmark 08

type time memory cpu size txt 532.92 89192 79 394846146 gz 711.72 89944 77 94789125 bz2 673.47 96284 78 73823620 xz 1217.50 184724 79 64908128

Benchmark 08

Conclusion

From the benchmarks carried out with various starting data in exporting data directly to compressed files, it is clear that:

  • the xz format almost always produces the smallest files in the face of greater memory and cpu occupation
  • the gz and bz2 formats almost always have execution times comparable to saving on simple and uncompressed text files in the face of a strong reduction in the space occupied
  • the space gain in the generated compressed files compared to the uncompressed text file ranges from 55% to 98%
  • export execution times for compressed files are in the worst case (xz) about double the export in an uncompressed file and in the best case (gz) a tenth faster

The export of fixtures directly to compressed files therefore allows a strong reduction of the space occupied in the face of a small increase in the time and resources required for creation.

In addition there is the possibility for the user to choose the best file type for their use case, opting for maximum compression (xz) or for greater portability (gz).

External links

PR #12871 Added tests for loaddata with gzip/bzip2 compressed fixtures. Ticket #31552 Loading lzma compressed fixtures. PR #12879 Fixed #31552 — Added support for LZMA and XZ fixtures to loaddata. Ticket #32291 Add support for fixtures compression in dumpdata. PR #13797 Fixed #32291 — Added fixtures compression support to dumpdata.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK