186 views
# Foss4g versioning workshop content [TOC] ## General presentation ### Goal of this workshop In this workshop we will explore some PostGIS based solution for versioning, discover their main characteristics, play a bit with them and explore what their DB schema can help us achieve. ### Work environment We will work on the following environment : - Pre-installed Virtualbox virtual machine - Linux Lubuntu 20.04 system The virtual machine is available here:  - https://share.oslandia.net/public/9a2ceb - sha1sum : a483d999a6a777324273534e55e7a17271d88554 On the system you will find the following tools pre-installed : - PostgreSQL 12 - PostGIS 3 - QGIS 3.20 As for the training platform, we use the platform setup by FOSS4G organizers for video-conferencing, screen sharing and chat. Do not hesitate to :  - Ask any question on the chat - Ask to go slower - Ask to explain an item again - Ask for a break Should you wish to have the keyboard in your own locale, you can run the following in a terminal (here to french, adapt to your locale name ): ```bash setxkbmap fr ``` ### Credentials - Ubuntu : oslandia / oslandia - PostgreSQL : postgres / oslandia (change this with `sudo -u postgres psql -c "alter user postgres password 'oslandia'"`) Obviously, do not use this virtual machine for production... ### Components - PostGIS : the spatial database, a PostgreSQL extension - QGIS : The famous desktop GIS # Workshop steps Part 1 - data into PostGIS - Download cadastral parcel data - Create a PostGIS database - Integrate the data - make a fresh copy of the batiments table (for each following part) Part 2 - Use the db manager to implement a simple versioning system - Generate versioning with the DB manager - Observe the new db structure - Make some changes on the dataset - Query the history - Use case: Revert one particular change - Use case: revert vandalism - Limitations and improvements Part 3 - Use pg_history_viewer - Get data - Installation - Configuration - Check the installation - Basic usage - Cancel a change - Filter changes - Limitations Part 4 - Use QGIS-versioning - Get data - Install plugin - Versionize - Checkout - Modify and push! - View history - Under the hood - Other features ## Part 1 - Data to PostGIS ### download data Download the data : - [Cadastral parcel data for Paris (75)](https://cadastre.data.gouv.fr/data/etalab-cadastre/2020-07-01/shp/departements/75/) ``` mkdir data cd data wget https://cadastre.data.gouv.fr/data/etalab-cadastre/2020-07-01/shp/departements/75/cadastre-75-batiments-shp.zip unzip cadastre-75-batiments-shp.zip ``` ### Create a PostGIS database Create a database and activate PostGIS extension. Create the database : ```shell=sh sudo -u postgres createdb cadastre psql -h localhost -U postgres -d cadastre ``` Create the extension : ```sql CREATE EXTENSION postgis; ``` ### Load the data Load the data into the database :  ```shell=sh # Check what is inside the data file ogrinfo batiments.shp -so -al # Load it into the database SHAPE_ENCODING="LATIN1" ogr2ogr PG:"host=localhost user=postgres dbname=cadastre password=oslandia" -nlt PROMOTE_TO_MULTI -lco FID=fid -f "PostgreSQL" ./batiments.shp "batiments" ``` ### Check loaded data - Open QGIS (from *Education* menu) - Add a PostGIS data source /!\ use "basic" authent, not configuration and check the "Store" checkboxes (some tools don't yet support configuration) - Open the buildings table - Zoom to the data - Check the attribute table ### Make a fresh copy ```sql create table batiments2(like batiments including all); ``` :::info Currently the db manager doesn't support initialisation on a non-empty dataset, we'll copy the data later. ::: ## Part 2 - Use the db manager to implement a simple versioning system ### Generate versioning with the DB manager - Open the db manager : Database > DB Manager - select the connexion - select the table - Table > Change logging :::info :bulb: you should version this script with git. ::: ### Observe the new db structure - PK has changed - time_start and time_end columns - 2 triggers - one rule - one view (with rules) The view is to be used each time you want a current view of the dataset. This view is editable, so you would normally use it instead of the table ### Make some changes on the dataset - Insert the data from the `batiment` table ```sql insert into batiments2 select * from batiments; ``` - add the view `batiment2_current` to your qgis project - edit a geometry and some properties with qgis, also add some features etc... ### Query the history - We might want to see what happened to a particular building. To get all the versions of one building (and display them in qgis) ```sql select * from batiments2 where fid=<one value>; ``` - We might want to inspect what has been modified in a window of time. Select all the building that has been modified today ```sql select count(*) from batiments2 where time_end > now()::date; ``` - See a snapshot of the data at a particular time For this you can use the function `batiments2_at_time`. Ex: ```sql select * from batiments2_at_time(<timestamp>); ``` But this function will prevent the use of index on the geometry! It might be better to query the table `batiments2` directly. ```sql SELECT * FROM "public"."batiments2" WHERE time_start <= (now()::timestamp - interval '1 day') and ( time_end is null or "time_end" > (now()::timestamp - interval '1 day') ) ``` - see a summary of changes per "commit" ```sql select time_end, user_role, count(*) from batiments2 group by time_end, user_role; ``` - see an activity summary (per 15 minute for instance) ```sql select count(*) filter (where fid is not null), time from generate_series((select min(time_end) from batiments2), now(), interval '15 minute') time left join batiments2 b on b.time_end >= time and b.time_end < time + interval '15 minute' group by time order by time asc; ``` or even better ```sql with counts as ( select count(*) filter (where fid is not null), time from generate_series((select min(time_end) from batiments2), now(), interval '15 minute') time left join batiments2 b on b.time_end >= time and b.time_end < time + interval '15 minute' group by time ) select time, repeat('■', cast( 20 * count / cast(max(count) over () as float) as integer) ) from counts; ``` ### Use case: Revert one particular change - Restore an old version of one building ```sql with previous_version as ( select * from batiments2 where fid = 18350 and time_end is not null order by time_end desc limit 1 ) update batiments2 b set wkb_geometry=old.wkb_geometry from previous_version old where b.fid=old.fid; ``` ### Use case: revert vandalism Let's first vandalise our dataset, for instance by altering 25% of geometry ```sql update batiments2_current set wkb_geometry=st_translate(wkb_geometry, random() * 100, random() * 100) where fid % 4 = 0; ``` Let's revert all the building that have been vandalized with this query: ```sql with to_be_reverted as ( select fid, wkb_geometry from batiments2 where time_end=(select max(time_end) from batiments2) ) update batiments2 b set wkb_geometry=t.wkb_geometry from to_be_reverted t where b.fid=t.fid; ``` ### Limitations and improvements The sql the DB Manager proposes can be improved in several ways: - The old PK behaviour is not completely copied: we can add a unique constraint on fid. - It does not work on existing dataset, cf https://github.com/qgis/QGIS/pull/44856 - unique constraint on old PK - remove the RULE on delete and use a trigger - remove the RULE on the view (by default this view is editable in supported version of postgresql) - range data types could be used for dates - currently, the `user_role` column is not reliably populated ## Part 3 - Use pg_history_viewer ### Get data Don't forget to make a fresh copy of your data! ```sql create table batiments3(like batiments including all); insert into batiments3 select * from batiments; ``` ### Installation This installation comes into 2 part: the qgis plugin, an a sql api. Steps: - Download and install "History viewer for a PostgreSQL base with audit triggers" from qgis plugin manager - Download the sql part ```bash wget https://gitlab.com/Oslandia/audit_trigger/-/raw/master/audit.sql ``` and execute it, from the client of your choice. With psql: ```bash psql -h localhost -d cadastre -U postgres -f ./audit.sql ``` ### Configuration Then activate it for a table: ```sql select audit.audit_table('public.batiments3'); ``` The table is now plugged into the audit system. Now configure the plugin in QGis: - Click on the plugin icon - configure the connection by clicking on the `...` - select `audit.loggued_actions` as Audit table - select `audit.replay_event` as replay function You can access this menu again in Plugins > Postgresql History Viewer > configuration ### Check the installation - display the currently known historic by clicking on the plugin icon (should be empty for now) - inspect the table batiments3. For instance from psql: ```sql \d batiments3 ``` What do you observe? :::info :bulb: <details><summary>answer</summary> The structure hasn't change a bit! But triggers on event has been added. </details> ::: - inspect the logging table ```sql \d audit.loggued_actions \d audit.loggued_relations select * from audit. ``` ### Basic usage - add the `batiments3` layer and make some changes to it (both to properties and geometries. Also remove and add features) - use the history view of the plugin ### Cancel a change - cancel the most recent change to the batiments3 table ```sql with event as ( select max(event_id) as event_id from audit.logged_actions where table_name = 'batiments3' and schema_name = 'public' ) select audit.rollback_event(event_id) from event ``` - Delete features with id 28104 and 28073. - make a bunch of other changes on other features - oh no! you've previously deleted the Invalides. Please cancel the most recent operations on these 2 features, without touching the other most recent events ```sql with events as ( select max(event_id) as event_id, row_data->'fid' from audit.logged_actions where row_data->'fid' in ('28104', '28073') group by row_data->'fid'; ) select audit.rollback_event(event_id) from events; ``` ### Filter changes - Use the filter capabilities of the plugin, display changes affecting one part of the Invalides only. - Use the filter to display changes on all buildings composing the Invalides (use "Data contains"). - Display top-10 most changed buildings in qgis. In each feature properties, put a list of names it ever had. Style the feature to color according to number of changes. <details> <summary>solution</summary> Add a sql layer (with db manager for instance, or create a view) using this query: ```sql with most_changed as ( select (row_data->'fid')::int as fid, array_agg(distinct row_data->'nom') as names, count(*) as change_count from audit.logged_actions group by row_data->'fid' limit 5 ) select m.fid, m.change_count, m.names, b.wkb_geometry from most_changed m left join batiments3 b on m.fid=b.fid order by change_count desc ; ``` And use the "styling panel" with the "graduated" style. Try editing more features to see the effect! </details> ### Limitations - no support for branches ## Part 4 - Use QGIS-versioning [QGIS-versioning](https://gitlab.com/Oslandia/qgis/qgis-versioning) is a Subversion-like solution, completly integrated in QGIS. Principle is that you have a versioned centralized data in your Postgres database (the trunk), and several working copies which lives in other Postgres databases or SQLite files. Then the working copies could be updated from trunk, or you could push your modifications from the working copies to the trunk. ### Get data Don't forget to make a fresh copy of your data! ```sql create table batiments5(like batiments including all); insert into batiments5 select * from batiments; ``` ### Install plugin Download and install it ```shell cd ~/.local/share/QGIS/QGIS3/profiles/default/python/plugins/ wget https://gitlab.com/Oslandia/qgis/qgis-versioning/-/archive/master/qgis-versioning-master.zip unzip qgis-versioning-master.zip ``` Then enable it from the plugin manager ![](https://pad.oslandia.net/uploads/upload_803fec3f36adfce0bfb51bc0c7a1fb37.png) > :warning: Be carefull to take the one with the icon without evaluations!! ### Versionize You have to create a group before to gather the layer you want to versioned ![](https://pad.oslandia.net/uploads/upload_73834a1a78c402e6a404f48dc5b9cf9b.png) If you select the group in the browser, must see this information in the toolbar ![](https://pad.oslandia.net/uploads/upload_927d533d3333cddbc3370f0125d35498.png) The group is still *unversioned*, select the ![](https://pad.oslandia.net/uploads/upload_87ed5ed0cb72d233678bb9e2f069dc77.png) button to versionize it. ### Checkout Now, we gonna checkout a working copy, on which we will made our modification before sending back in the master trunk. There is 3 possible working copy: - ![](https://pad.oslandia.net/uploads/upload_2ff0ec02550d4dabf83efee01cab1558.png) In a SQLite local file - ![](https://pad.oslandia.net/uploads/upload_e862bf5c0646d36c39c187941bfee57b.png) In a PostGIS local database - ![](https://pad.oslandia.net/uploads/upload_f0b9ff3f6a220a4be22e8b2d9ea7652e.png) In the same PostGIS database than the master one In this workshop, we will show the SQLite checkout. So select the group from the browser and click the ![](https://pad.oslandia.net/uploads/upload_2ff0ec02550d4dabf83efee01cab1558.png) button and follow instructions. A new SQLite file has been created and is displayed in the browser named as *working copy* ![](https://pad.oslandia.net/uploads/upload_3463bf9182d7fb1a2aa1e63715c3c4e9.png) ### Modify and push! Now modify the batiments FROM THE **WORKING COPY** group > :warning: You need to give unique fid value to these features (value must be greater or equal than 120000) ![](https://pad.oslandia.net/uploads/upload_4825223528b3a38b9c31ab9124dd062c.png) *Here for instance, there is 1 feature modified, 1 deleted and 2 added* Save your modifications like you will do with any layer and commit your modifications by selecting the ![](https://pad.oslandia.net/uploads/upload_843a179b952afa8f07ce2bcd4f94cf54.png) button If you refresh your master data, you see that there are up-to-date with your local modification now! ![](https://pad.oslandia.net/uploads/upload_7a8412dca05114e56728a13180831f2b.png) ### View history You can see revisions history by selecting the ![](https://pad.oslandia.net/uploads/upload_c186dbb84b5af57b6d627018f23b26f3.png) button ![](https://pad.oslandia.net/uploads/upload_4270ff0a2de2800cd3ba25113b687d66.png) Check the revisions checkbox and the *compare selection revisions* one and select *OK*. ![](https://pad.oslandia.net/uploads/upload_513a22169509d4d08e149652ae80e493.png) > **__NOTE__** As you can see, there is an issue, the deleted object is missing ### Under the hood Take a look of what happen in database ```sql cadastre=# \d revisions Table « public.revisions » Colonne | Type | Collationnement | NULL-able | Par défaut ------------+-----------------------------+-----------------+-----------+---------------------------------------- rev | integer | | not null | nextval('revisions_rev_seq'::regclass) commit_msg | character varying | | | branch | character varying | | | 'trunk'::character varying date | timestamp without time zone | | | CURRENT_TIMESTAMP author | character varying | | | ``` New table which contains the different revisions, meaning group of modifications (commit) that happened through time. ```sql cadastre=# \d batiments Table « public.batiments » Colonne | Type | Collationnement | NULL-able | Par défaut -----------------+-----------------------------+-----------------+-----------+-------------------------------------------------- fid | integer | | not null | nextval('batiments_fid_seq'::regclass) commune | character varying(5) | | | nom | character varying(80) | | | type | character varying(2) | | | created | date | | | updated | date | | | wkb_geometry | geometry(MultiPolygon,2154) | | | versioning_id | integer | | not null | nextval('batiments_versioning_id_seq'::regclass) trunk_rev_begin | integer | | | trunk_rev_end | integer | | | trunk_parent | integer | | | trunk_child | integer | | | ``` The folowing new column have been added to your *batiments* table - versioning_id : unique id in table for each version of row - trunk_rev_begin / trunk_rev_end : define the revisions interval for which this feature exists - trunk_parent : versioning_id of the feature existing just before this one - trunk_child : versioning_id of the feature existing just after this one There is a view to get the HEAD (current view) of your data: *public_trunk_rev_head.batiments* ##### Put it more simply ![](https://pad.oslandia.net/uploads/upload_20766cda1b1bdbcf6b67ef20e5c7c0cc.png) ### Other features - Can deal with branch - Can update from the trunk (~git rebase) - Can deal with foreign key and primary key - Suppress the original contraints and manage them with trigger - Can checkout only a subset of data (extent) More infos [here](https://qgis-versioning.readthedocs.io/en/latest/functionality.html) ## Part 5 - Q&A, followup, Other use cases ### Other tools Other tools we might explore: - pgversion - offline editing core plugins and qfield uses versioning techniques under the hood - [Kart](https://kartproject.org/) is a version management system for geopackages ### other use case: data validation workflow Sometimes, "versioning" word is used in a broader meaning or we might think we need versioning for a use case where a simpler solution exists. It can be tempting to implement a versioning or logging solution as a mean to inspect and ensure data quality, and be able to revert some changes. However, there are many ways to enforce data integrity and quality in PostgreSQL. Before attempting to use versioning, it's a good idea to check if triggers or constraints are not up to the task. It's always a good idea to use them anyway! - set up user and role ```sql create user a_user password 'a_user'; create user admin password 'admin'; ``` - As admin, create a GIS table, ensuring data are valid ```sql set role admin; create table my_gis_data ( id int generated always as identity, geom geometry(Polygon, 3857) not null check (st_isvalid(geom)), description text check (description is not null and description !~ '^\s*$'), created timestamptz default now() not null, updated timestamptz, author text default current_user not null, validated boolean default false ); ``` :::info :bulb: Check constraints go a long way ensuring data quality! For more complex rules, a `before` trigger can be used. ::: - ensure only admin can set `author`, `updated` and `validated` directly ```sql grant insert(geom, description, created), update(geom, description, created), select on table my_gis_data to a_user; ``` - write a `UPDATE` trigger to set updated to `current_timestamp` automatically. NOTE: admin should still be able to set it to whatever value. - with qgis (or psql), check that you can't insert a validated to `true` as a user. - insert some data to be validated by admin - with admin, flip the `validated` boolean to some of the data - with admin, create a table for production - then transfer newly validated data in production, leaving those unvalidated behind.