Coginiti has implemented Retrieval Augmented Generation (RAG) for Coginiti Team and Enterprise customers to improve the quality of AI Assistant interactions. Retrieval Augmented Generation is a way to enhance the quality of responses from a large language model by supplying it with relevant domain knowledge that was not part of its original training data. What follows are some of the technical details about what Retrieval Augmented Generation is, how it works, and how we went about implementing it in Coginiti.
Coginiti’s AI Assistant
Coginiti’s AI Assistant today is a direct integration to the model service for a number of large language models with some proprietary prompting. Coginiti ships with support for the public APIs from OpenAI and Anthropic along with cloud model services such as AWS Bedrock and Azure. Coginiti’s strategy is to support customer model choices the same way we support their choices around data platforms.
When a user submits a question to Coginiti’s AI Assistant, we enrich the context of the session with some default prompts to enhance the context of the user enquiry. These prompts include the name of the connected data platform, such as AWS Redshift or Databricks, which helps the connected language model generate the appropriate platform syntax when generating SQL. Coginiti also provides a map of the connected schema, including table names, column names, and primary key relationships. These prompts are included to increase the accuracy of the model response.
We test these prompts and various database schema across each of our supported model services for consistent results, though language models are nondeterministic so performance can vary. Coginiti also enables users to create static custom prompts, so they can tailor the interaction to the model for their own needs. (Yes, you can make it talk like a pirate)
Using Coginiti’s AI Assistant gives consistently good results with text-to-sql generation, error explanations, performance recommendations, and general data guidance. However, we wanted the AI Assistant implementation to be able to handle more complex questions about the business. To be able to do that, the language model needs to have more exposure to information about the business. One way to approach this would be to fine-tune a model based on information from the business, but this is time consuming, expensive, and somewhat brittle. (If you want help though, reach out) Retrieval Augmented Generation presents a less expensive and more flexible way to get context relevant information to the model during the user interaction.
What is Retrieval Augmented Generation
Retrieval Augmented Generation could have also been called Search Augmented Generation because it involves searching over a collection of data for relevant results, then injecting those results into the context of the user’s chat session. The Coginiti product was well positioned for this kind of implementation because it ships with an embedded repository of relevant data in the form of our Analytics Catalog. The Analytics Catalog stores all your data and analytics assets from critical transformation code, cleansing routines, and query logic, along with relevant metadata in the form of documentation, comments, and tags. (Hint: if you are not enhancing your catalog assets with metadata today, you will want to start so you can fully leverage the RAG capabilities).
Coginiti could have easily enabled keyword search across catalog assets since it already ships with a search engine, but keyword search is limiting. Traditional keyword search though is like using a flashlight in the dark to find something specific—you might find things that are labeled correctly but miss out on related items. In contrast, semantic search uses a broader beam to illuminate not just the exact words but also associated concepts and meanings. More powerful would be to use semantic search as a way to improve search accuracy for users by capturing the meaning of search queries. For example, an analyst searching for “holiday sales trends” would receive a mix of all queries mentioning “holiday,” “sales,” or “trends,” which could include irrelevant information. With semantic search, the system understands the context of “holiday sales trends” and recognizes the analyst’s likely intent to analyze customer purchasing behavior during holiday seasons.
Semantic search algorithms create a vector space comprised of embeddings of all entries of a context corpus, in our case, the Analytics Catalog. That’s a complicated way of saying that embeddings are a way of translating words or phrases into a language that computers can understand better. Just like each word in a dictionary has a definition, in the world of AI, each word or phrase gets a unique numeric ‘code’ that captures its meaning based on how it’s used in the real world. When a user submits a question, the question is embedded using the same vector space. The most similar queries will be closer together in the vector space. Thus, the closest embedding (or ‘k’ closest embeddings) are returned to the user, providing them the best result for their question. (For a deep read on embeddings see: Vicki Boykis on What are Embeddings)
Coginiti AI Assistant + RAG
Coginiti’s architecture presents a number of design constraints when it comes to implementing new services. Coginiti is deployed software, so any of its services need to be small enough to run containerized on a single server or scale out to serve thousands of users. Many of Coginiti’s customers deploy into highly secure environments limiting our ability to call outside services. We performed our initial testing and proof-of-concept using OpenAI’s embedding API, but as a practical matter, this means that Coginiti cannot make use of such services because of their public nature. We needed to find an embeddings model that could run in a container and be relatively fast using CPUs rather than GPUs for compute. We ended up testing six model families:
Model name | Token number limit | Model size (GB) | Embedding dimensions (output vector size) | Time to embed 1024 tokens (seconds) | Time to process Analytics Catalog (seconds) | Performance on MTEB leaderboard (Retrieval) |
all-MiniLM-L6-v2 | 256 | 0.09 | 384 | 0.16 | ~6.5 | 51 |
UAE-Large-V1 | 512 | 1.34 (0.33 GB quantized) | 1024 | 1.08 | 161 | 3 |
bge-large-en-v1.5 | 512 | 1.34 | 1024 | 1.09 | ~200 | 4 |
bge-base-en-v1.5 | 512 | 0.44 | 768 | 0.34 | 49.9 | 6 |
bge-small-en-v1.5 | 512 | 0.11 | 384 | 0.24 | 34 | 10 |
gte-large | 512 | 0.67 | 1024 | 1.08 | 155 | 7 |
gte-base | 512 | 0.22 | 768 | 0.24 | 54 | 13 |
gte-small | 512 | 0.07 | 384 | 0.20 | 37 | 20 |
gte-tiny | 512 | 0.05 | 384 | 0.12 | 14.7 | 41 |
udever-bloom-7b1 | 2048 | 28.8 | 4096 | N/A (couldn’t run it on CPU) | N/A | 21 |
voyage-lite-01-instruct | 4096 | N/A (cloud service) | 1024 | N/A | N/A | 1 |
To create our embeddings Coginiti selected the BGE-M3 embedding model series as a state-of-the-art multi-lingual and cross-lingual model. The multi-lingual and cross-lingual model support is important given we out embeddings consist of a specialized SQL, but also metadata that might be in a variety of written languages. The performance of the model due to its small size, acceptable token limit, and speed was also a critical factor. This embedding model runs as a standalone service within the Coginiti stack, so no data is sent outside the application stack, and it gives us a consistent embeddings tool across all of our clients.
The embeddings for each catalog asset also need to be stored in a way that Coginiti can easily perform a vector similarity search. There are a number of open source databases purpose built as vector databases, Pinecone and Weaviate being two of the leading candidates. Using a standalone vector store would have meant adding yet another service to our stack, which would add additional management complexity and grow the resource demands to run the product. Fortunately, Coginiti makes use of Postgres as a backend storage layer which has an available extension, pg_vector, to enable storing embeddings and performing vector similarity search directly within Postgres. Our testing showed that pg_vector was more than adequate for our vector storage and search needs. As of this writing pg_vector is available with the managed Postgres service from every major cloud provider and it’s easily installable for self managed users.
Conclusion
We are excited to release our implementation of retrieval augmented generation for Coginiti Team and Enterprise customers! Being able to combine domain specific customer data with the existing power of generative large language models is a powerful combination. Looking forward to learning with our customers how this improves their workflows and accuracy working with Coginiti’s AI Assistant. If you would like to give it a trial or see a demo, reach out to schedule a call.
We first released the Coginiti AI Assistant nine months ago with support for OpenAI’s GPT models. That initial release enabled data analysts and engineers using the Coginiti platform to ask any data related question of the large language model, such as how to generate a query to fetch data or how to create a table. In some regards, this implementation merely moved the chat interface into the application so that user didn’t have to switch context to interact with the model. We improved the experience by preloading the user’s context with information about the connected data platform, so the language model would generate the correct platform syntax. We also inject the user’s database schema, table, view, and column names along with any associated relationships so that the language model can generate the correct semantics.
The AI Assistant implementation was completely optional for users, but for those that adopted it the initial response was generally positive. The large language models proved very capable of translating natural language questions into functional code. This is especially aided when organizations use good semantic naming practices in their date platform. eg dim_customer over dim_cust. These large language models weren’t just good at generating code, they were good at answering all kinds of data related questions from error explanation to helping analyst reason through a problem.
Six months ago, it was clear that large language models are going to be mostly a commodity service. Widely used but not meaningfully differentiated from one another in their capabilities. We felt organizations should have the optionality to pick the large language models that best fit the needs for their organization. To enable this, we added support for Anthropic’s Claude models in Coginiti Pro, along with model services such as AWS Bedrock and Azure OpenAI. We want to support a wide array of models the same way we support a wide array of databases.
We also expanded the AI Assistant integration beyond just the dialectic chat interface. Users can select code from the editor and ask for it to be optimized or explained. This is especially useful when a user is working with code that was originally written by someone else. We also integrated the AI Assistant into Coginiti’s error handling. Database error codes, especially for older database systems, can be cryptic and difficult to understand. We send the user’s code and the resulting errors to the assistant for analysis. The AI Assistant integrated into Coginiti’s visual explain plan, enabling users to get deeper understanding of the database operations and optimization options. These integration improvements all smoothed the interaction with the AI Assistant.
Coginiti customers that have enabled support for large language models report 15-20% performance improvements for their team. Far from replacing analysts or data engineers, large language models are helping them be better and more productive in their daily work. The AI Assistant is just that, an assistant that is there when the user needs it and disappears when they don’t. You might wonder then, what’s next.
For the last couple of months we’ve been working on an implementation of Retrieval Augmented Generation (RAG) for Coginiti Team and Enterprise customers. RAG combines the retrieval capabilities of a search engine with the generative power of a large language model. This augmentation increases the accuracy and relevance of AI Assistant interactions by injecting more relevant domain information into the language model’s context. Where previously the large language model just had access to a user’s schema information, it will now have relevant code samples to work with as well. RAG is enabled in Coginiti by the fact that the product ships with a repository for domain specific data in the form of our Analytics Catalog. Each catalog asset consists of code samples along with comments, tags, and documentation among its metadata. (Hint: if you are not enhancing your catalog assets with metadata today, you will want to start doing so you can fully leverage the RAG capabilities).
The integration of RAG into Coginiti’s AI Assistant represents a significant leap forward, expanding the large language model’s response generation capabilities through sourcing accurate and relevant information. The potential to transform data into actionable insights has never been more accessible. You will be able to start utilizing these new capabilities in the next release of Coginiti Team and Enterprise (24.03). Learn more about how you can experience Coginiti AI Assistant today by scheduling a demo.
Rubber duck debugging is a method of debugging code by articulating the problem to someone else, even when that someone can’t give you feedback like a rubber duck. The very process of describing the problem in detail often illuminates the path to a solution. It’s a great practice when you’re not ready to bother a co-worker or you’re working remotely. Coginiti’s AI Assistant is the ultimate rubber duck debugging tool though because it talks back! Below are some sample prompts to help get you started.
- Describe the SQL Problem Clearly
“I have a SQL query that’s supposed to [describe expected outcome], but instead it [describe the incorrect behavior or error]. Here’s the query: [insert SQL query]. Can you help me understand why it’s not working as expected?” - Share Specific Error Messages
“I’m getting a [specific error message] when I run the following SQL query: [insert SQL query]. What does this error mean, and how can I fix it?” - Explain the Desired Outcome
“I need to write a SQL query to [describe the task, such as ‘retrieve all records from Table A that…’]. I’ve tried [insert SQL query], but it’s not giving me the results I expect. What am I doing wrong?” - Detail the Database Structure “My database has these tables: [list of tables, with key columns and relationships]. I’m trying to [describe the task or problem]. Here’s my current query: [insert SQL query]. How can I adjust this to work correctly?”
- Ask for Optimization Suggestions
“This SQL query works, but it’s very slow: [insert SQL query]. How can I optimize it for better performance?” - Request Explanation of SQL Concepts or Functions
“Can you explain how [specific SQL function or concept, like JOINs, window functions, etc.] works? I’m trying to use it in this context: [brief description of context or problem].” - Inquire About Best Practices
“Is there a best practice for [specific task, like indexing, writing complex queries, etc.] in SQL? Here’s what I’m currently doing: [insert SQL query or description]. How can I improve?” - Share Attempts and Ask for Feedback
“I’ve attempted to solve this problem with the following query: [insert SQL query]. It’s not working as expected. Can you review it and suggest improvements?” - Query Logical or Syntax Errors
“I think there’s a logical/syntax error in my SQL query: [insert SQL query]. I’ve looked over it several times but can’t spot the issue. Can you help identify the error?” - Discuss Performance and Scalability
“I’m concerned about the scalability of my SQL query as the database grows: [insert SQL query]. Do you have any suggestions to ensure it remains efficient?”
When framing your prompt, the more detail you can provide, the better. When enabled, the AI Assistant has the context of your SQL dialect and current schema, but you can enhance this with examples of the data you’re working with or expect to receive. Such comprehensive prompts enable the AI Assistant to provide more accurate and helpful responses, facilitating an effective debugging session.
Coginiti continues to lead the way in enhancing data engineering and analysis workflows with its latest release. Our focus on collaboration, performance optimization, and seamless integration offers significant advancements for Coginiti users. Let’s dive into what’s new:
Introducing Projects for Enhanced Collaboration
The introduction of Projects marks a significant leap forward in collaborative data engineering and analytics. Projects allow you to group related analytic assets, streamline data workflows, and foster a shared development environment. Within Coginiti Team & Enterprise, projects can be shared among team members, enabling collective work on a unified platform. By assigning access levels such as Viewer and Editor, you gain precise control over how each team member interacts with the project. Benefits of project sharing include:
- Simplified Asset Distribution: Projects simplify the process of distributing analytics assets across the catalog, making it easier for teams to manage and reuse code efficiently.
- Enhanced Communication: By streamlining communication among team members, Projects accelerate code management and foster a dynamic, efficient collaboration environment.
Trino Visual Explain Plan
We are excited to announce the launch of the Visual Explain Plan for Trino, a powerful feature designed to demystify complex query performance and optimization strategies. This tool allows users to visualize and understand the intricacies of SQL query execution plans in Trino. With AI-generated explanations for individual nodes, users can now gain insights into query performance issues and optimization opportunities, enhancing the efficiency of data processing in Trino’s distributed SQL query environment.
Explore Trino Integration: Learn more about how Coginiti integrates with Trino and enhances query optimization.
Launching the Catalog API
To further streamline the integration of Coginiti with the enterprise data stack, we’re introducing the Catalog API. This new API provides endpoints for browsing a list of stored analytic assets or retrieving the code for any asset, including SQL and CoginitiScript. It’s designed to facilitate seamless integration with popular third-party tools, databases, and platforms, making it easier than ever to manage and access your analytics assets.
Key Features of the Catalog API:
- Browse and Retrieve Assets: Easily locate and access your stored analytics assets, streamlining the integration with your data stack.
- Execution API Integration: Use the Catalog API in conjunction with the Execution API to execute queries directly from your catalog, offering a more efficient way to process and analyze data.
These updates reinforce Coginiti’s commitment to empowering data teams with advanced tools and features that enhance productivity, foster collaboration, and deliver reliable data insights. Try these new capabilities for yourself and see how they can transform your data engineering and analytics workflows.
In the realm of data management, the emergence of SQL (Structured Query Language) marks a pivotal moment in the history of database systems. Conceived by Donald D. Chamberlin and Raymond F. Boyce at IBM in the early 1970s, SQL was initially developed as a part of a project aimed at exploiting the potential of Edgar F. Codd’s relational database model. This model proposed a way to structure data in tables with rows and columns, a concept that revolutionized how information was stored, retrieved, and manipulated in computer systems.
The standardization of SQL began with SQL-86, under the auspices of the American National Standards Institute (ANSI). This first standard laid the groundwork for SQL as a universal language for managing and querying data in relational database management systems (RDBMS). Over the years, SQL standards have evolved significantly, adapting to the changing needs of data storage, retrieval, and processing. These standards not only reflect the advancements in database technology but also shape the way data is managed across diverse applications and platforms.
Our journey through the evolution of SQL, from SQL-86 to the latest SQL:2023, is not just a chronicle of technical enhancements. It’s a testament to the enduring vision of its creators and the continuous efforts to make data more accessible and actionable. As we delve into each version of the SQL standard, we highlight the introduction and refinement of features that have become integral to modern data management practices. Whether you’re a database professional, a developer, or an enthusiast, understanding the history and progression of SQL standards offers valuable insights into the capabilities and versatility of this foundational technology.
SQL-86 – The Beginnings
The inaugural chapter in the history of standardized SQL begins with SQL-86, also formally recognized as SQL-87. This version marks the first endeavor to standardize the SQL language for relational database management systems, a step that was crucial for ensuring consistency and interoperability across different database systems.
SQL-86 was characterized by its simplicity and focus on the core aspects of SQL. The standard defined the fundamental framework of SQL, which included basic Data Definition Language (DDL) and Data Manipulation Language (DML) operations. DDL operations in SQL-86 encompassed the creation (CREATE TABLE
) and deletion (DROP TABLE
) of tables, forming the backbone of database structure management. Meanwhile, DML operations covered the essential functionalities for interacting with stored data: querying data (SELECT
), inserting new rows (INSERT
), updating existing data (UPDATE
), and deleting data (DELETE
).
In addition to these operations, SQL-86 introduced basic query functionalities. It supported simple WHERE
clauses for data filtering, basic arithmetic and comparison operators, and some aggregate functions like COUNT
, SUM
, AVG
, MIN
, and MAX
. These features provided the fundamental tools for data retrieval and basic analysis.
The data types available in SQL-86 were rudimentary, focusing on primary types like INTEGER, SMALLINT, and VARCHAR. This limited range reflected the primary needs of database systems at the time, prioritizing straightforward data storage and retrieval.
However, the simplicity of SQL-86 also meant that it had notable limitations. It lacked advanced features that would later become standard in SQL, such as JOIN operations, foreign key constraints, views, and complex transaction control. These limitations were a reflection of the early stage of relational database technology and the evolving understanding of what was needed in a database query language.
SQL-86 laid the groundwork for the future of SQL. It established a common language for databases at a time when the concept of a standardized query language was still novel. While it offered only a basic toolkit, it was the first crucial step towards developing SQL into the rich and versatile language it is today, capable of handling complex and varied data management tasks across countless applications and systems.
(Read the SQL-86 Standard in full)
SQL-89 – Early Enhancements
Only a few years after the establishment of the first SQL standard, the SQL-89 update was introduced. While SQL-89 was a relatively minor revision, it addressed critical aspects of database management that were not covered in the initial standard. These enhancements were pivotal in evolving SQL from a basic data manipulation language to a more robust tool for database administration and security.
One of the key additions in SQL-89 was the introduction of integrity constraints. These constraints are essential for maintaining the accuracy and reliability of data within a database. SQL-89 focused particularly on primary key constraints, which ensure that each row in a table is uniquely identified by its values in specified columns. This addition marked a significant step towards more sophisticated data integrity and relational data modeling.
Another major enhancement in SQL-89 was the inclusion of basic security features, specifically the GRANT
and REVOKE
statements. These commands provided database administrators with the ability to control access to data at a more granular level. The GRANT
statement allowed permissions to be assigned to users or roles, enabling them to perform specific actions like selecting, inserting, updating, or deleting data on given tables. Conversely, the REVOKE
statement was used to remove these permissions. This control over data access was crucial for managing data security and ensuring that only authorized users could perform certain operations on the database.
Although SQL-89 did not introduce a wide array of features, the additions it made were significant in strengthening the foundational aspects of SQL. The introduction of integrity constraints and basic security features laid the groundwork for more advanced functionalities that would be added in subsequent versions of the SQL standard. By addressing these fundamental needs, SQL-89 played a key role in the progression of SQL as a reliable and secure language for database management.
SQL-92 (SQL2) – The Major Leap
The release of SQL-92, also known as SQL2, marked a monumental moment in the evolution of SQL. This version represented not just an incremental update, but a major overhaul that significantly expanded the capabilities of the SQL language. SQL-92 set a new benchmark for database systems and laid the foundation for the modern SQL we use today.
One of the most significant enhancements in SQL-92 was the introduction of JOIN operations. This feature brought a fundamental change in how queries could be constructed, allowing for more complex data retrieval across multiple tables. JOIN operations, including INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN, enabled users to combine rows from two or more tables based on a related column between them. This capability was crucial for relational database operations, facilitating more intricate and meaningful data relationships and analysis.
Subqueries were another vital addition in SQL-92. These are queries nested within other queries, providing a powerful tool for creating more dynamic and flexible SQL statements. Subqueries enhanced the language’s capability to handle complex data retrieval scenarios, making SQL more versatile and effective in addressing diverse data manipulation needs.
SQL-92 also introduced a range of new data types to accommodate a broader spectrum of data representation. These included DATE, TIME, TIMESTAMP, INTERVAL, and various numeric and string types. The expansion of data types was a direct response to the growing complexity of data being stored and processed in databases, reflecting the evolving requirements of database applications.
Additionally, SQL-92 made significant strides in transaction control and data integrity. It introduced concepts like transaction isolation levels, constraints (such as CHECK, UNIQUE, FOREIGN KEY), and declarative referential integrity. These features provided more robust mechanisms for maintaining data consistency, integrity, and managing concurrent database operations.
The advancements in SQL-92 had a profound impact on the world of database systems. By significantly broadening the scope and functionality of SQL, it enabled more sophisticated data modeling, querying, and analysis. This standard was instrumental in solidifying SQL’s position as the predominant language for relational databases and set the stage for continued innovation in database technology. SQL-92’s comprehensive feature set transformed it into a more powerful tool, capable of handling the complex demands of modern data environments.
SQL:1999 (SQL3) – Expanding Beyond Traditional Models
SQL:1999, also known as SQL3, marked a pivotal advancement in the SQL standard, introducing a range of sophisticated features that significantly expanded the capabilities of SQL beyond the traditional relational database model.
- Introduction to Object-Relational Features:SQL:1999 brought in object-relational capabilities, blending object-oriented programming concepts with the established relational model. This integration allowed for more complex and natural data structures, including table inheritance and user-defined types (UDTs). These UDTs empowered users to create custom data types, tailoring database schema to more closely align with the complexities of real-world data.
- Triggers, Recursive Queries, and Common Table Expressions (CTEs):The introduction of triggers was a major enhancement, enabling automated procedural actions in response to specific changes in the database, such as insertions, updates, or deletions.Recursive queries, a significant addition to SQL’s querying capabilities, allowed for the processing of hierarchical or nested data structures.Common Table Expressions (CTEs), including the
WITH
clause, were introduced, offering a more readable and flexible way to write complex queries. CTEs made it possible to define temporary result sets that could be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. - Advanced Analytical Functions – OLAP Extensions:SQL:1999 introduced several extensions to the
GROUP BY
clause, notablyROLLUP
,CUBE
, andGROUPING SETS
. These extensions were integral to OLAP (Online Analytical Processing) functions, enhancing SQL’s data summarization and analysis capabilities.ROLLUP
facilitated hierarchical data aggregation, allowing for cumulative totals at multiple levels.CUBE
provided a means to generate the aggregate data for all possible combinations of a group of attributes, useful in multi-dimensional data analysis.GROUPING SETS
offered a more flexible way to specify multiple grouping sets within a single query, enhancing the ability to produce comprehensive summary reports.
The advancements introduced in SQL:1999 significantly broadened the scope of SQL. By incorporating features that aligned SQL with contemporary programming paradigms and enhancing its analytical capabilities, this version of the standard was crucial in adapting SQL to the increasingly complex demands of modern data environments. SQL:1999’s rich feature set established a solid foundation for further developments in the SQL language, paving the way for future innovations.
SQL:2003 – XML Integration
SQL:2003 marked another significant milestone in the evolution of SQL, introducing features that further bridged the gap between traditional relational databases and the emerging needs of modern data processing. This standard is particularly notable for its integration of XML, reflecting the growing importance of this format in data interchange and web services.
- Introduction of SQL/XML Features:The most prominent addition in SQL:2003 was the integration of XML. The standard introduced the
XML
data type, enabling the storage and manipulation of XML documents within SQL databases. This feature was a direct response to the widespread adoption of XML in various applications and the need for relational databases to handle XML data efficiently.SQL:2003 included several functions and operators for working with XML data, such as XML parsing and serialization. These tools allowed for the conversion between XML and relational data, making it easier to integrate data from different sources. - Sequence Generators and Identity Columns:SQL:2003 introduced sequence generators, which provided a standardized way to generate unique, sequential values, often used for primary key generation. This feature was crucial for ensuring data integrity and consistency, especially in environments with high volumes of data insertion.Enhancements to identity columns were also made, offering more control and flexibility in their usage. Identity columns automatically generate unique values when new rows are added, streamlining data entry processes.
- Extension of Window Functions:Another key feature in SQL:2003 was the expansion of window functions. These functions allow for the performance of calculations across sets of rows related to the current row, which is particularly useful for complex analytical tasks.SQL:2003 introduced new window functions and extended the capabilities of existing ones, allowing for more sophisticated data analysis and reporting. This enhancement was significant for businesses and analysts who rely on SQL for data-driven decision-making.
The introduction of SQL/XML and other advanced features in SQL:2003 was a significant step in aligning SQL with the evolving trends in data management. By accommodating the growing use of XML and enhancing analytical capabilities, SQL:2003 not only expanded the versatility of SQL but also ensured its continued relevance in a rapidly changing technological landscape. The standard represented a key development in the ongoing journey of SQL, reinforcing its position as a comprehensive tool for modern database systems.
SQL:2006 and SQL:2008 – Refinements and JSON Introduction
The progression of SQL continued with the updates in SQL:2006 and SQL:2008. While these revisions were not as extensive as their predecessors, they introduced specific refinements and features that addressed emerging trends and needs in database technology.
- Enhancements in SQL:2006:SQL:2006 continued the integration of XML initiated in SQL:2003, further solidifying the relationship between SQL and XML. It focused on enhancing the functionality and ease of use of XML within SQL databases, a reflection of the ongoing importance of XML in data exchange and web services.This version of the standard introduced additional capabilities for querying and manipulating XML data. The enhancements were aimed at improving the efficiency and flexibility of handling XML, ensuring that SQL databases could seamlessly work with XML as part of their core functionality.
- Key Features in SQL:2008:SQL:2008, while considered a minor revision, introduced notable features that added to SQL’s robustness and ease of use.The introduction of
INSTEAD OF
triggers in SQL:2008 was a significant enhancement. These triggers allowed for the specification of custom actions to be executed in place of standard operations (like insert, update, or delete) on views, providing more control over data manipulation and enhancing the language’s flexibility.Another important addition was theTRUNCATE TABLE
statement. This feature provided a more efficient way to delete all rows from a table, particularly useful for large tables where deleting rows individually would be time-consuming and resource-intensive.
Both SQL:2006 and SQL:2008, through their focused updates, demonstrated the ongoing commitment to refining and expanding SQL’s capabilities. These revisions ensured that SQL remained adaptable and relevant, capable of meeting the evolving requirements of database management systems. The inclusion of enhanced XML features in SQL:2006 and the introduction of practical functionalities like INSTEAD OF
triggers and TRUNCATE TABLE
in SQL:2008 were crucial steps in the continuous development of SQL, maintaining its status as a powerful and versatile tool in data management.
SQL:2011 and SQL:2016 – Modern Features and Enhancements
The evolution of SQL continued with substantial updates in SQL:2011 and SQL:2016, focusing on modern data management needs and integrating emerging data formats.
- SQL:2011 – Temporal Data and Enhanced Indexing:SQL:2011 introduced support for temporal data, allowing for the management and querying of data across different time dimensions. This feature was vital for applications requiring historical data analysis, auditing, and time-based reporting.Enhanced indexing in SQL:2011 improved the performance and efficiency of queries, particularly critical in large and complex databases. These new indexing techniques optimized data retrieval and storage, addressing scalability and performance challenges.
- SQL:2016 – Introduction of JSON and Advanced Analytics:A hallmark of SQL:2016 was the introduction of JSON data handling. This feature catered to the widespread use of JSON as a data interchange format, especially in web applications. SQL:2016 provided a set of functions and operators for efficient storage, manipulation, and querying of JSON data. Key aspects of the JSON syntax included:
JSON_VALUE
andJSON_QUERY
for extracting data from JSON strings.JSON_OBJECT
andJSON_ARRAY
for creating JSON structures.- The
IS JSON
predicate to verify if a string contains valid JSON. - A standardized path language for JSON to specify locations within JSON data.
- Advanced analytics in SQL:2016 were bolstered by new and enhanced analytical functions, empowering more sophisticated data analysis and business intelligence capabilities.
SQL:2011 and SQL:2016 collectively enriched SQL’s feature set, ensuring its adaptability and relevance in the evolving landscape of data management. The introduction of temporal data support, advanced indexing techniques, and JSON data handling, along with improvements in analytical functions, underscored SQL’s role as a comprehensive tool for modern database systems. These versions marked significant strides in SQL’s development, catering to the complex demands of contemporary data environments.
SQL:2023 – The Latest Frontier
SQL:2023 emerges as the latest and most contemporary iteration of the SQL standard, encapsulating a series of enhancements and features that align with the latest trends and demands in data management. This version of SQL reflects a significant leap forward, introducing key updates that cater to both the evolving landscape of database technologies and the diverse needs of users.
- Introduction of Property Graph Queries: One of the standout features in SQL:2023 is the addition of Property Graph Queries (SQL/PGQ). This new functionality significantly reduces the gap between relational database management systems (RDBMS) and graph database systems. SQL/PGQ allows users to query data in relational tables as if it were stored in a graph database. This is particularly beneficial for applications dealing with complex relationships and interconnected data sets, such as social networks, recommendation systems, and network analysis.
- Enhanced JSON Support: Building on the JSON functionalities introduced in SQL:2016, SQL:2023 further enhances JSON data handling. It introduces a native
JSON
data type, simplifying the storage and manipulation of JSON within SQL databases. The enhanced JSON support includes operations likeJSON_SERIALIZE
,JSON_SCALAR
, andIS JSON
, along with additional JSON-related syntax, providing more comprehensive tools for working with JSON data. This enhancement is a response to the continued prevalence of JSON as a data interchange format, especially in web-based and networked applications. - Simplified Data Handling and Other Improvements:
SQL:2023 simplifies several aspects of data handling, making the language more accessible and easier to use. This includes allowing for more intuitive and simplified syntax in certain operations, aiming to enhance the efficiency of writing and maintaining SQL code.Other improvements include new functionalities and enhancements to existing features, further bolstering SQL’s capabilities in various domains of data processing and analysis.
SQL:2023 represents a significant milestone in the ongoing development of SQL. By introducing Property Graph Queries and enhancing JSON support, this version of SQL not only embraces contemporary data modeling concepts but also broadens its applicability in various complex data scenarios. The simplifications and improvements in data handling reflect a commitment to evolving SQL in line with user needs and technological advancements. As the latest standard, SQL:2023 demonstrates SQL’s enduring adaptability and its crucial role in the ever-changing world of data management and analysis.
The Ever-Evolving Landscape of SQL
As we survey the journey of SQL from its inception in SQL-86 to the latest advancements in SQL:2023, it’s evident that SQL has undergone a remarkable evolution. Each iteration of the SQL standard has not only responded to the emerging challenges of data management but has also proactively shaped the way we interact with and think about data. From its early days of basic data manipulation and querying to the sophisticated handling of JSON and graph data, SQL has consistently adapted to meet the needs of an ever-changing technological landscape.
The progression from simple table operations to complex object-relational features, advanced analytical capabilities, and now, to the integration of graph-based queries and enhanced JSON support, underscores SQL’s resilience and versatility. This journey reflects a continuous endeavor to provide more powerful, efficient, and intuitive tools for data professionals across various industries.
Looking to the future, SQL standards are likely to continue this trajectory of innovation and adaptation. Potential areas of development could include the integration of vector indexes, which would enhance SQL’s capabilities in handling large-scale, high-dimensional data, a common requirement in fields like machine learning and data science. Additionally, there might be a greater emphasis on support for measures, further refining SQL’s analytical and business intelligence functionalities.
As data continues to grow in volume, variety, and complexity, the role of SQL in managing and deriving value from this data becomes increasingly crucial. Future standards will likely focus on making SQL even more capable, efficient, and aligned with the latest developments in database technology and data analysis methodologies.
The story of SQL is one of continual growth and adaptation, driven by the relentless pace of technological innovation and the ever-expanding frontiers of data management. As we look ahead, we can anticipate that SQL will persist in evolving, remaining a fundamental and indispensable tool in the world of data management.
Experience the Power of SQL with Coginiti
As we reflect on the dynamic and ever-evolving world of SQL, it’s clear that having the right tools to navigate and harness its capabilities is crucial for anyone working with data. Whether you are a database professional, a developer, or a data analyst, the power of SQL in data management and analytics cannot be overstated.
We invite you to experience SQL’s robustness and versatility with Coginiti, our best-in-class data analytics tool. Coginiti treats SQL as a first-class language, providing an intuitive and powerful platform that empowers you to unlock the full potential of your data.
With Coginiti, you can:
- Seamlessly integrate with various SQL databases.
- Perform complex data analysis with an intuitive interface.
- Leverage advanced features that align with the latest SQL standards.
Don’t just read about the evolution of SQL – be a part of it. Download Coginiti today and step into a world where data analytics is made efficient, insightful, and accessible. Embrace the power of SQL and transform the way you interact with data.
Download Coginiti and elevate your data analytics experience.
We’re thrilled to announce Coginiti 23.12, bringing even more advancements to your data engineering and analysis workflows. This release is packed with AI enhancements, CoginitiScript updates, visual explain plan support for Snowflake, and more. Here’s what’s new:
AI Assistant: Set Custom Instructions for Tailored Responses
In Coginiti 23.12, the AI Assistant gets even smarter. You can now set custom instructions for the AI to follow, ensuring responses align perfectly with your specific needs. For example, you can instruct the AI Assistant to provide two code examples with each response, greatly enhancing the utility and precision of its assistance in your SQL query development and optimization efforts.
New AI Models: Anthropic Claude 2.1 and OpenAI ChatGPT 4 Turbo
Our commitment to providing support for cutting-edge AI models continues with new integration for Anthropic Claude 2.1 and OpenAI ChatGPT 4 Turbo. Claude 2.1 significantly expands the context window to 200,000 tokens (about 150,000 words), allowing for deeper and more comprehensive AI interactions. This means more complex, multi-faceted queries can be handled with ease, bringing unprecedented depth and breadth to your data analysis.
CoginitiScript Enhancements: Time Package and Length Function
CoginitiScript now includes a new Time package and a Length function. These additions provide more flexibility and efficiency in handling time-based data and measuring string lengths within your scripts. Whether it’s scheduling tasks, analyzing time-sensitive data, or manipulating string data, these updates make CoginitiScript an even more powerful tool in your analytics arsenal.
Here’s an example use case from our time package documentation which shows a CoginitiScript publication setting a dynamic table name based on the current date. This can be a useful approach to versioning your data.
#+import "std/time"
#+src sql Foo()
#+meta {
:publication {
:type "table",
:name "published_table_" + time.Format(time.Now(), time.IsoBasicDate)
}
}
#+begin
SELECT 1 AS id, 'John' AS first_name, 'Doe' AS last_name
#+end
Visual Explain Plan for Snowflake
Understanding the performance of SQL queries in Snowflake just got easier. Our Visual Explain Plan interface is designed to demystify query execution. You can now visualize and optimize your Snowflake SQL queries. This tool is invaluable for identifying performance bottlenecks and improving the efficiency of your data processing.
Execution API: Arrow Stream Result Set Output
The Execution API now supports Arrow Stream result set output, offering a more efficient way to handle large datasets. This enhancement means faster and more streamlined data processing, enabling you to handle extensive data sets with increased performance and reduced latency.
Object Store File Browser: Enhanced Navigation
Navigating your object store files becomes more effortless with our updated file browser. A new context menu option allows you to quickly copy absolute and relative paths to objects, streamlining the process of referencing and accessing data. This update is particularly useful for those who regularly work with large data sets or need to share data paths with team members.
Dive into the latest version of Coginiti and discover how these new features can revolutionize your data engineering and analytics workflow. Not yet a Coginiti user? Download Coginiti 23.12 today and experience the future of data analysis.
We’re excited to announce the release of Coginiti 23.11 with significant enhancements designed to enrich your data engineering and analysis experience. This update includes new AI integration functionality, the introduction of Query Tags in CoginitiScript, Explain Plan support for Trino, and more. Here’s a rundown of what’s new:
AI Assistant Upgrades for Smoother Interactions
This release features several improvements to the AI Assistant designed to boost your productivity even more, especially when dealing with complex SQL queries or troubleshooting.
- You can now edit your previous prompts, allowing for more dynamic AI Assistant conversations. This capability makes the AI Assistant more flexible to accommodate iterative prompts and context.
- You can now regenerate AI responses, empowering users to quickly explore alternate insights to enrich the query optimization process.
- We’ve enhanced AI response formatting for better readability and clarity. This is particularly helpful when requesting an organized response with many levels of detail, as is the case when using AI to explain a complex query.
Visual Explain Plan: AI-Generated Node Explanations
Understanding the intricacies of SQL execution plans is now more straightforward with our updated Visual Explain Plan. Users can access AI-generated explanations for individual nodes, demystifying complex query performance and optimization strategies. This makes it easier than ever to optimize your SQL queries and improve database performance.
Introducing Query Tags in CoginitiScript
CoginitiScript now supports Query Tags to add descriptive labels to your SQL queries, enhancing your ability to organize, track, and manage SQL queries. Query Tags allow you to annotate your SQL queries with key-value pairs, adding a layer of descriptive data. With Query Tags, you bring more structure and clarity to your data workflows. Such tags are useful for a variety of business functions, such as meeting audit and compliance requirements or supporting cost allocation reporting.
Here’s a practical example showcasing Query Tags within a CoginitiScript block:
#+src sql Main()
#+meta {
:query_tags {
:project "Segmentation Analysis",
:quarter_id 4,
:is_final true
}
}
#+begin
[SQL query here]
#+end
CoginitiScript Query Tags can be used across database platforms. Whether you’re querying data in Redshift, Snowflake, or any other database, CoginitiScript automatically adjusts to store query tags metadata using the best available method for each platform. This means that your Query Tags are always integrated in the most efficient way possible, tailored to the unique strengths and features of whichever database you’re using. It’s like having a smart assistant who knows the best way to label and track your work, no matter which tool you’re using.
Refer to our Query Tags documentation for further details.
Explain Plan Support for Trino
With the introduction of Explain Plan support for Trino, users can delve deeper into their query performance, gaining valuable insights to fine-tune their SQL queries for enhanced efficiency and faster data processing in Trino’s distributed SQL query environment.
Drag & Drop Files into Object Store File Explorer
Managing files in an Object Store has never been easier. The latest update adds drag and drop functionality in the Object Store File Explorer, simplifying the process of uploading files and making data management more intuitive.
Coginiti Premium is Now Coginiti Team
We’re evolving our branding to better reflect our commitment to teamwork and collaboration. Coginiti Premium is now Coginiti Team, a change that signifies our focus on providing tools and features that enhance teamwork and facilitate shared data workflows.
The newly branded Coginiti Team emphasizes collaboration and efficiency with features like in-context comment threads on shared code, fostering teamwork through interactive code reviews and discussions. It enhances workflow management with email notifications for scheduled code blocks, keeping teams synchronized on project updates. Additionally, the execution API enables seamless integration with third-party tools for collaborative data exploration, while automated data quality tests ensure consistent accuracy and reliability of data, crucial for collaborative data projects.
—
We’re excited to see how these new capabilities enhance your data engineering and analysis experience. Not yet a Coginiti user? Download the latest version and transform the way you interact with data today!
In the dynamic world of Artificial Intelligence, this weekend marked a significant moment with notable leadership changes at OpenAI. These developments have sparked discussions and speculations about the future trajectory of the company and its widely-used services. As we closely monitor these events at Coginiti, our commitment remains steadfast: ensuring our customers have access to a broad spectrum of AI options.
A few months back, a purported leak from a Google researcher shed light on an essential truth in the AI industry: the inherent capabilities of Large Language Models (LLMs) like Anthropic Claude, OpenAI’s GPT-4, Google’s PaLM, and Meta’s Llama, are not unique competitive advantages. These models, while advanced, exhibit similar performance levels in most business applications. This revelation is crucial for businesses and AI practitioners alike, suggesting that the true value of these technologies lies not solely in their standalone abilities.
The key differentiator, as we’ve come to understand, is in the implementation strategies of these models. How they are integrated into products, the sophistication of their prompting mechanisms, and the seamless incorporation of proprietary data are where the real competitive edge lies. It’s this nuanced application of AI that transforms a generic tool into a powerful business asset.
Looking ahead, it’s evident that the future of AI is not about relying on a single, all-encompassing model. Rather, it’s about embracing model flexibility and prioritizing customer choice. This approach not only caters to the diverse needs of businesses but also fosters an environment of innovation and continuous improvement in AI technologies.
At Coginiti, we remain committed to this vision. As the AI landscape evolves, so will our strategies and offerings, always with the goal of providing our clients with the best tools and insights to succeed in an increasingly AI-driven world.
We’re excited to announce the release of Coginiti 23.10, offering smarter and more efficient ways to handle your data. This release introduces the Execution API for more versatile data workflows, broader Enterprise AI Assistant integration options, and innovations in CoginitiScript and Explain Plan Visualization.
Coginiti Execution API: Your Gateway to Efficient Data Workflows
Our new Execution API provides an efficient way to access data sets defined in your shared analytics catalog. Just connect and retrieve — it’s that simple. This capability not only lets you explore and analyze your data with popular business tools like Power BI, Excel, and Jupyter Notebooks, but also opens the door to automated workflows. You can now use popular orchestration tools such as Dagster or Kestra to trigger CoginitiScript publications or run data quality tests, ensuring both data consistency and reliability.
Explore additional integration opportunities.
New Enterprise AI Assistant Integration Options
The Coginiti AI Assistant empowers data professionals of all skill levels to write, understand, and optimize SQL. With the launch of our new enterprise AI Assistant integration options, available for Coginiti Team and Enterprise users, you no longer have to worry about the external handling of your invaluable internal data.
We’re excited to offer straightforward integration with industry-leading, enterprise-secure AI models from Azure OpenAI and Amazon Bedrock. These platforms are renowned not just for their cutting-edge AI models but also for their steadfast commitment to data security.
Coginiti administrators can easily set up and configure the appropriate enterprise AI integration within Global Preferences, which is then automatically rolled out to all users.
Incremental Publication with CoginitiScript
Our new Incremental Publication feature provides a more efficient approach to materializing your data with CoginitiScript. After setting the :incremental type property to ‘append’, you can employ built-in logic blocks to filter exactly which new rows should be added, making your data publishing more intelligent and less resource intensive.
See our support documentation for more details and sample code using incremental publications with CoginitiScript.
Explain Plan Node Explanation Sidebar
Need some help understanding nodes within the new Explain Plan Visualization? Now when you click on a node in the explain plan visualization, a sidebar will open up offering insights into the node’s purpose. This is perfect when you need a quick refresher (or introduction) into the finer details of what each node represents.
Explain Plan Visualization is currently available for Redshift, with additional platform support coming soon.
Your input drives our innovation, and we’re eager to see how these enhancements will help you drive better data insights. If you haven’t tried Coginiti yet, now’s the time to start your journey!