Reproduce from scratch ====================== To recreate the ground-truth in our format you have to convert the annotations using the scirpt ``generate_ground_truth.py``. **N.B. You should have ``wget`` installed in your system, otherwise SMD dataset can’t be downloaded.** You can run the script with ``python 3``. You can also skip the already existing datasets by using the ``--blacklist`` and ``--whitelist`` argument. If you do this, their ground truth will not be added to the final archive, thus, remember to backup the previous one and to merge the archives. Generate misaligned data ------------------------ If you want, you can generate misaligned data using the ``--train`` and ``--misalign`` options of ``generate_ground_truth.py``. It will run ``alignment_stats.py``, which collects data about the datasets with real non-aligned scores and saves stats in ``_alignment_stats.pkl`` file in the ASMD module directory. Then, it runs ``generate_ground_truth.py`` using the collected statistics: it will generate misaligned data by using the same deviation distribution of the available non-aligned data. Note that misaligned data should be annotated as ``2`` in the ``ground_truth`` value of the dataset groups description (see :doc:`./index` ), otherwise no misaligned value will be added to the ``misaligned`` field. Moreover, the dataset group data should have `precise_alignment` or `broad_alignment` filled by the annotation conversion step, otherwise errors can raise during the misalignment procedure. For more info, see ``python -m asmd.generate_ground_truth -h``. A usual pipeline is: #. Generate music score data and other ground-truth except artificial one: ``python -m asmd.generate_ground_truth --normal`` #. Train a statistical model (can skip this): ``python -m asmd.generate_ground_truth --train`` #. Generate misalignment using the trained model (trains it if not available): ``python -m asmd.generate_ground_truth --misalign``