It should not be this hard to run SQL queries on some data
I’m doing some simple analysis of some data at work in a spreadsheet with about 8,000 rows, essentially to decide which rows I care about and which I can ignore. My first attempt at this involved many iterations of “build a filter filter in LibreOffice Calc to match certain rows I know I can ignore, delete those rows, resetting the filter, repeat.”
After several hours of this I got myself down to about 4,000 rows remaining before realizing that some rows I’d decided to delete quite a ways earlier in the process, I really should have kept. Oh, well, time to start over.
This time I decided to be smart about it: instead of doing this analysis in Calc and deleting rows, let’s load the data into something I can run SQL queries on and write a query with lots of where clauses to exclude the rows I don’t want. This way, if I realize part-way in that I made a mistake, I can simply modify the query instead of having to start over.
My first thought for how to accomplish this was that we’re a Microsoft 365 shop, so I can just load the data into Microsoft Access. Apparently not: it’s not installed on my work laptop, not available in the Jamf self-service software install app, and not available to me for download (as far as I can tell) on the Microsoft 365 website, so either the M365 account my company gave me does not have a license for Access, or Microsoft has cunningly hidden the installer somewhere. I have now run into my first 🚧. Time to change course!
At this point I remembered that LibreOffice Database exists, i.e., LibreOffice has a free tool kind of like Microsoft Access. So I’m going to try to use that.
If I just wanted to run read-only SQL queries against the data in the spreadsheet, my problem would be solved here. It’s trivial to connect a LibreOffice Database to an existing ODS file and run SQL queries against it. But I also want to be able to _edit_ the underlying data, and ODS-backed LibreOffice databases are read-only. I’m therefore hoping there’s an easy way to copy a table from one LibreOffice database into another, so I can create a read-write database backed by HSQLDB, which is supported by LibreOffice Database, and do my querying and editing in that.
I do a bit of web searching, from which I conclude that the recommended way to do this is to create a LibreOffice Database connected to the spreadsheet, create a second Database connected to the HSQLDB backend, copy the table in the first database, and paste it into the second.
I create the Database connected to the spreadsheet with no trouble, but when I try to create and open the HSQL Database, I get a pop-up telling me I have to go install a JDK. 🚧#2
I go download and install OpenJDK, which requires some digging around to figure out the correct way to do that on macOS. The first time I do it it doesn’t work, it turns out because I should have installed the “aarch64” version on my silicon Mac rather than the “x64” version. D’oh! 🚧#3. I delete the x64 version I installed and replace it with the aarch64 version, and now I’m able to open the HSQLDB Database.
Now I attempt the table copy/paste maneuver, and I get errors at the point where LibreOffice is attempting to copy the data into the new table. I waste quite some time digging into what is causing these errors, only to discover that for some inexplicable reason LibreOffice is ignoring the data in the first column of source data and shifting all the data one column to the left, with the result that the data is being imported into the wrong columns, which means the data types of the columns and the data being imported into them don’t match, which is what’s causing the errors. 🚧#4
I don’t know how to work around this, but it seems like a pretty serious bug, so I should try to construct a simple test case and report it to the LibreOffice maintainers, right? Only when I attempt to do that, I run into a _different_ bug, which I can’t figure out how to work around. I report that one instead. 🚧#5. Since I can’t get past that bug, I can’t create a recipe for reproducing the other one, so I can’t report it. 🙁
My next thought is that maybe this problem is only in the HSQLDB backend, so I can install MariaDB on my Mac and use that as the backend instead of HSQLDB. I’ve never actually installed MariaDB on a Mac before, but it should be straightforward, right? Alas, the MariaDB install page only says how to install it on Linux and Windows, not Mac. 🚧#6. I’m sure if I had kept digging I could have found a way to install it on my Mac (heck, it’s probably available in Homebrew, which I forgot to check), but I just shrugged and figured what the heck, I’ll use PostgreSQL instead.
I found, downloaded, and ran the macOS PostgreSQL installer. During the install it said that database initialization failed but didn’t explain why. 🚧#7
I haven’t actually installed PostgreSQL in a long time, so it took me some digging to refresh my memory about how to initialize the database from the command line, but I figured it out. Then more digging to figure out how to create users and set their passwords, then some more digging to figure out how to give them permissions on the database. Permissions are quite arcane on PostgreSQL, so it took me quite a while to figure it out, and then I subsequently realized in hindsight that all that effort was pointless because `initdb` enables trust authentication for anybody connecting through localhost. 🚧#8
While I was in the middle of figuring out how to assign permissions I saw someone on Stack Overflow recommend using pgAdmin for that, so I figured what the heck, I’ll give that a try. My friends, I have a powerful MacBook Pro, and pgAdmin took several minutes to launch. Furthermore, after it launched, the user interface took anywhere from 3 to 10 seconds to respond to every single mouse click. I don’t know what tech pgAdmin is built out of under the hood, but wow, it is absurdly, unusably slow. Also, it turns out you can only use pgAdmin to assign permissions for existing objects; you can’t use it to assign default permissions, apparently. So all that effort was wasted. 🚧#9
Now that I’ve got the database working, I create a LibreOffice Database connected to it, and then I again attempt to do the table copy/paste from the ODS-connected Database. The entire process of doing that works as expected and seemingly runs to completion, but… the table is not created. LibreOffice is acting like nothing’s wrong but silently failing. I report that bug too. 🚧#10
“Maybe it will work if I create the table in the target database by hand, and then attempt to do the paste in ‘append the data’ mode, I say to myself. So I try that with a simple test case before I waste time recreating the entire schema of my actual data in the target database. It fails, because the primary key column created by LibreOffice isn’t auto-populating so the INSERT statement generated by LibreOffice fails because it’s not providing a value for that column. I reported that as well. 🚧#11
You get how absurd all this is, right?
Next thought: create the table with hand-written SQL with a proper PostgreSQL serial primary key, _then_ append the data from the source table. I try that out with my tiny test table, and lo and behold, it works. So I go ahead and do the same for the actual table schema and data, and, miraculously, that works as well, and I finally have all the data in a LibreOffice Database that I can query and edit. It only took me the majority of a work day. 🤦
But wait, there’s more. I continue to discover issues as I use LibreOffice Database to query and edit the data:
* I’m building complex queries. To make the queries readable as I work my way through the data, I’m inserting line breaks, SQL comments, etc. I don’t discover until the first time I close and later reopened one of these queries that when you close a query it strips out all the line breaks and SQL comments. Awesome! Reported as a bug. 😡 🚧#12
* When you edit a row in the query results and then rerun the query before clicking on a different row or clicking the data save button, your changes are silently discarded. This bug cost me about 45 minutes because when I realized it was happening I had to go back and re-review over a thousand rows of data to find the ones that hadn’t been saved properly. Reported as a bug. 🚧#13
It’s unbelievable how hard this was to do and how many barriers to success I had to get through.
### Share this:
* Email a link to a friend (Opens in new window) Email
* Share on LinkedIn (Opens in new window) LinkedIn
* Share on Reddit (Opens in new window) Reddit
* Share on Mastodon (Opens in new window) Mastodon
*
### _Related_