Skip to content

Create indexes in django models and migration files#7568

Open
acwhite211 wants to merge 17 commits intomainfrom
issue-7482
Open

Create indexes in django models and migration files#7568
acwhite211 wants to merge 17 commits intomainfrom
issue-7482

Conversation

@acwhite211
Copy link
Copy Markdown
Member

@acwhite211 acwhite211 commented Nov 25, 2025

Fixes #7482

Create indexes mentioned in the issue into the models and migration files. Some of the indexes mentioned were already present in the existing model, the rest have been added.

The main downside I could see is with adding the tree field indexes is that writes might be too slow. With an index like taxon.name, write operations like INSERT, UPDATE, and DELETE will take longer, but with the upside of read operations being faster. I know that some of our tree operations make bulk edits to the tree record fields, like the 'Move' action in the tree viewer, so we'll want to be careful in our performance evaluation testing. We'll want to test this on large databases with a big taxon tree to make sure the read and write performance is acceptable.

I ran into a problem with the tree viewer timing out after running the index migrations. Solved the issue by rewriting the get_tree_rows() function to avoid the expensive grouped self-join on tree tables. Instead of joining child and synonym rows and collapsing them with GROUP BY, it now computes child counts and synonym lists with correlated subqueries, which preserves the same response shape while producing a faster query for taxon tree requests.

Indexed fields

Table Indexed field(s)
agentidentifier identifier, identifiertype
agentspecialty ordernumber, specialtyname
agentvariant name
attachmentmetadata name
author ordernumber
collectionobject name, projectnumber
collectionobjectgroup guid, name
collectionobjectgrouptype name
collectionobjectproperty guid
collectionobjecttype name
collectionreltype name
exchangein exchangeinnumber
exsiccataitem number
geography commonname, guid, highestchildnodenumber, nodenumber
geographytreedef name
geographytreedefitem name
geologictimeperiod highestchildnodenumber, nodenumber
geologictimeperiodtreedef name
geologictimeperiodtreedefitem name
institutionnetwork altname
latlonpolygon name
lithostrat highestchildnodenumber, nodenumber
lithostrattreedef name
lithostrattreedefitem name
locality guid
materialsample guid
morphbankview viewname
otheridentifier identifier
picklist fieldname, filterfieldname, tablename
preparationproperty guid
preptype name
referencework librarynumber
spauditlogfield fieldname
specifyuser name
spexportschema schemaname
spexportschemaitem fieldname
spexportschemaitemmapping exportedfieldname
spexportschemamapping mappingname
spfieldvaluedefault fieldname, tablename
splocalecontainer picklistname
splocalecontaineritem picklistname, weblinkname
sppermission name
spprincipal name
spquery contextname
spqueryfield fieldname, formatname
spviewsetobj filename
storage highestchildnodenumber, nodenumber
storagetreedef name
storagetreedefitem name
taxon cultivarname, groupnumber, highestchildnodenumber, nodenumber
taxontreedef name
taxontreedefitem name
tectonicunit fullname, guid, highestchildnodenumber, name, nodenumber
tectonicunittreedef name
tectonicunittreedefitem name
voucherrelationship vouchernumber

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR

Testing instructions

  • See that the new migration step for adding the indexes completes successfully.
  • Use the QB on fields that have been indexed to see that they run correctly and in a timely manner.
  • Use the taxon tree viewer on a database with a large taxon tree. Try all of the tree operations to see that they run in a timely manner.

@github-project-automation github-project-automation Bot moved this to 📋Back Log in General Tester Board Nov 25, 2025
@acwhite211 acwhite211 added this to the 7.13.0 milestone Jan 7, 2026
@CarolineDenis CarolineDenis modified the milestones: 7.13.0, 7.12.1 Feb 19, 2026
@acwhite211 acwhite211 modified the milestones: 7.12.1, 7.13.0 Feb 19, 2026
@CarolineDenis CarolineDenis modified the milestones: 7.13.0, 7.12.1 Feb 19, 2026
@acwhite211 acwhite211 changed the title Create indexes in models and migration files Create indexes in django models and migration files Mar 13, 2026
@acwhite211 acwhite211 marked this pull request as ready for review March 13, 2026 19:29
@acwhite211 acwhite211 requested review from a team and grantfitzsimmons March 13, 2026 19:30
@acwhite211
Copy link
Copy Markdown
Member Author

Let me know if anyone thinks of any additional fields that they think would benefit from indexing?

Copy link
Copy Markdown
Contributor

@alesan99 alesan99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that the new migration step for adding the indexes completes successfully.
  • Use the QB on fields that have been indexed to see that they run correctly and in a timely manner.
  • Use the taxon tree viewer on a database with a large taxon tree. Try all of the tree operations to see that they run in a timely manner.

Migration runs correctly on KUBirds and read operations are snappy 👍
Tried deleting, moves, merges, searching, and importing big trees.
I didn't notice any speed drops either when running this locally.

@alesan99 alesan99 requested a review from a team March 16, 2026 16:50
@acwhite211 acwhite211 requested a review from melton-jason March 18, 2026 18:50
Copy link
Copy Markdown
Collaborator

@bhumikaguptaa bhumikaguptaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that the new migration step for adding the indexes completes successfully.
  • Use the QB on fields that have been indexed to see that they run correctly and in a timely manner.
  • Use the taxon tree viewer on a database with a large taxon tree. Try all of the tree operations to see that they run in a timely manner.

When I tried to run the MaterialSample table I got the following error:

Specify 7 Crash Report - 2026-03-18T18_28_32.334Z.txt

Link to DB: https://ojsmnh20251211-issue-7482.test.specifysystems.org/specify/query

@CarolineDenis CarolineDenis removed the request for review from melton-jason March 19, 2026 15:00
@acwhite211
Copy link
Copy Markdown
Member Author

  • See that the new migration step for adding the indexes completes successfully.

    • Use the QB on fields that have been indexed to see that they run correctly and in a timely manner.

    • Use the taxon tree viewer on a database with a large taxon tree. Try all of the tree operations to see that they run in a timely manner.

When I tried to run the MaterialSample table I got the following error:

Specify 7 Crash Report - 2026-03-18T18_28_32.334Z.txt

Link to DB: https://ojsmnh20251211-issue-7482.test.specifysystems.org/specify/query

This looks to be caused by duplicate splocalecontainer records in that database. I added a commit to handle that case 👍

Copy link
Copy Markdown
Collaborator

@bhumikaguptaa bhumikaguptaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that the new migration step for adding the indexes completes successfully.
  • Use the QB on fields that have been indexed to see that they run correctly and in a timely manner.
  • Use the taxon tree viewer on a database with a large taxon tree. Try all of the tree operations to see that they run in a timely manner.

--

It works as expected. I was able to query on indexed fields without any errors, including materialsample.

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Apr 8, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@acwhite211
Copy link
Copy Markdown
Member Author

Working on doing some more thorough testing to make sure that no action or process in Specify is significantly slowed down from re-indexing.

Copy link
Copy Markdown
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that the new migration step for adding the indexes completes successfully.
  • Use the QB on fields that have been indexed to see that they run correctly and in a timely manner.
  • Use the taxon tree viewer on a database with a large taxon tree. Try all of the tree operations to see that they run in a timely manner.

I did the same as @alesan99– made some really big tree moves and imports. I don't feel the speed difference, but waiting for your testing to be done @acwhite211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📋Back Log

Development

Successfully merging this pull request may close these issues.

Add missing field indexes

5 participants