ā ļø This post links to an external website. ā ļø
I recently went on a parsing journey, and it led to yet another wonderful Dashbit library: NimbleParsec. It all started when my colleague Daniel Andrews began building MCP clients with Elixir. This approach opens up enormous potential, and I decided to write a tool that allows for querying and retrieving data using the Delta Sharing protocol. This was new ground for me ā Iām used to SQL and Excel, but delta sharing utilizes the column based Apache Parquet data format to store data.
As part of this implementation, I needed to parse SQL-like predicate strings into structured Elixir data that I could use to filter Parquet files. For example, I needed to transform:
"project_id = 123"ā{:eq, "project_id", 123}"status != 'closed'"ā{:neq, "status", "closed"}"created_at >= '2024-01-01'"ā{:gte, "created_at", "2024-01-01"}These tuples could then be passed to the Explorer library to actually filter the data in the parquet files (for example,
DataFrame.filter(df, col("project_id") == 123)). So the question became: how best to parse these predicate strings?
continue reading on revelry.co
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.