Google Open Source Blog: database

Showing posts with label database. Show all posts

El Carro extends the flexibility and choices for Oracle databases on Kubernetes

Thursday, September 2, 2021

When we released El Carro, our goal was to provide the best experience possible to run Oracle databases on Kubernetes with the help of our operator. Today, we want to take a closer look at how that works. The diagram below shows the high-level architecture of a database that is managed by El Carro. At the core is the actual database instance with its background processes which run in a single container that contains the Oracle installation. So how does this container image get created and what goes into it? The image itself is essentially a snapshot of a filesystem that contains an operating system, packages and other software, and custom scripts. Specifically for El Carro, an image is made up of a base OS, required packages, and an Oracle database installation. The image must be stored on a container registry that is accessible by the Kubernetes cluster, and El Carro will expect oracle binaries to be installed in certain paths—or create symbolic links to those locations.

Architecture Diagram showing the operator controlling the db container.

Initially, El Carro worked with 12c for Enterprise Edition and 18c for Express Edition. And while 12c is still popular with many users, the extended support ended this summer. So the first news is that we added support for 19c, Oracle’s long term release. The choice should be easy for any new database deployments, but the options don’t end there.

We know that DBAs have different preferences in how and where software gets installed and we believe that making different options available will ultimately empower users. With the exception of Express Edition, redistributing is not a right granted by Oracle licenses, preventing the community from providing a public container registry with usable images. Rather than that, each user will have to build their own image based on binaries they download from Oracle themselves, using their own license agreement with Oracle. All of the other containers used by the El Carro operator use open source software and are made available on our public registry, so that you do not have to build and host them yourself.

Option 1 - Use El Carro to build your own image with GCP

If you are using GCP, then we have an easy way for you to create custom images, you just upload Oracle binaries and patches to your own GCS bucket and start a Cloud Build job that will create the container image for you and upload it to your own, private container registry. A single build script and serverless cloud services take care of the whole process, so that you don’t have to worry about building locally and moving more images across the internet. In addition to creating seeded images (see below), this method also allows you to build containers with the Oracle Patches such as Release Update Revisions (RURs).

Diagram of container image build pipeline where a cloud build job reads installation files from GCS and writes finished images to GCR.

Option 2 - Use El Carro to build you own image locally

You can also use the same Dockerfile and build process from Option 1—but without Google Cloud. Download Oracle installers and patches locally or to a VM used for the builds—then start a script that invokes Docker and builds the image on that machine. Lastly, tag and push the container image to a container registry of your choice. You will have to do a few more steps yourself if you don’t use Cloud Build, but you get the same image and customization options as with Option 1.

Option 3 - Use Oracle build scripts to build your own image

Oracle also maintains an open source repository of scripts to build container images with their database. Maybe you are already using those images either with docker or Kubernetes, or you prefer to use Oracle’s own build method over ours. We recently added functionality to El Carro to make sure that the resulting images work just as well as the ones that El Carro can build for you.

Option 4 - Use Oracle’s Container Registry directly

There is a way to avoid building your own images: The Oracle Container Registry contains pre-built images that can be used with our Kubernetes operator directly and without modification. But since Oracle’s registry can only be accessed by customers, it is protected with a password. After accepting Oracle’s license conditions, one can either copy images to their own registry, or configure OCR as a private repository in Kubernetes.

The Power of Seeding

Aside from the installation, it is the creation of a database that takes the longest time in the initial provisioning process and it is often a frustrating wait time before you can log in and use your database for the first time after creation. To reduce this wait time, the first two options allow you to build a pre-seeded database image that already contains a snapshot of a created and configured database. That way this initialization step is moved to the container build process and minimizes the startup time of new database instances.

Aside from the wait time, relying on a seeded image (i.e. including an empty database in the image can provide consistency in config options if the same image is to be used in multiple deployments).

	Option 1 - El Carro on GCP	Option 2 - El Carro local build	Option 3 - Oracle local build	Option 4 - Oracle Container Registry
Versions	12c, 18c, 19c	12c, 18c, 19c	12c, 18c, 19c	19c
Editions	XE, EE	XE, EE	EE	EE
Patches Updates	yes	yes	no	no
Seeded Images	yes	yes	no	no
Automatic build pipeline	yes	no	no	n/a

Conclusion

We believe in an open cloud approach and empowering users with choice and flexibility. In the context of running Oracle databases on Kubernetes that means that you get to choose your database container images. El Carro provides build scripts that allow you to not only customize containers but also to increase security and robustness with the ability to bake patches and updates into the container image. Seeding container images with a database further reduces the deployment time by avoiding this step on first startup - which is especially useful in environments that create many databases - such as automatic test pipelines.

But other users may feel more comfortable in receiving support when they use Oracle’s pre-built images from their registry.

The choice is yours. Just know that El Carro is here to help you modernize your Oracle database workloads with Kubernetes. And if you have any other feature requests or choices that matter to you—let us know by filing an issue on Github.

By Bjoern Rost, Product Manager and Ash Gbadamassi, Software Engineer – Cloud Databases

MySQL to Cloud Spanner via HarbourBridge

Tuesday, September 22, 2020

Today we’re announcing that HarbourBridge—an open source toolkit that automates much of the manual work of evaluating and assessing Cloud Spanner—supports migrations from MySQL, in addition to existing support for PostgreSQL. This provides a zero-configuration path for MySQL users to try out Cloud Spanner. HarbourBridge bootstraps early stages of migration, and helps get you to the meaty issues as quickly as possible.

Core capabilities

At its core, HarbourBridge provides an automated workflow for loading the contents of an existing MySQL or PostgreSQL database into Spanner. It requires zero configuration—no manifests or data maps to write. Instead, it imports the source database, builds a Spanner schema, creates a new Spanner database populated with data from the source database, and generates a detailed assessment report. HarbourBridge can either import dump files (from mysqldump or pg_dump) or directly connect to the source database. It is intended for loading databases up to a few tens of GB for evaluation purposes, not full-scale migrations.

Bootstrap early-stage migration

HarbourBridge bootstraps early-stage migration to Spanner by using an existing MySQL or PostgreSQL source database to quickly get you running on Spanner. It generates an assessment report with an overall migration-fitness score for Spanner, a table-by-table analysis of type mappings and a list of features used in the source database that aren't supported by Spanner.

View HarbourBridge as a way to get up and running quickly, so you can focus on critical things like tuning performance and getting the most out of Spanner. You will need to tweak and enhance what HarbourBridge produces—more on that later.

Getting started

HarbourBridge can be used with the Cloud Spanner Emulator, or directly with a Cloud Spanner instance. The Emulator is a local, in-memory emulation of Spanner that implements the same APIs as Cloud Spanner’s production service, and allows you to try out Spanner’s functionality without creating a GCP Project. The HarbourBridge README contains a step-by-step quick-start guide for using the tool with a Cloud Spanner instance.

Together, HarbourBridge and the Cloud Spanner Emulator provide a lightweight, open source toolchain to experiment with Cloud Spanner. Moreover, when you want to proceed to performance testing and tuning, switching to a production Cloud Spanner instance is a simple configuration change.

To get started on using HarbourBridge with the Emulator, follow the Emulator instructions. In particular, start the Emulator using Docker and configure the SPANNER_EMULATOR_HOST environment variable (this tells the Cloud Spanner Client libraries to use the Emulator).

Next, install Go and configure the GOPATH environment variable if they are not already part of your environment. Now you can download and install HarbourBridge using

GO111MODULE=on \
go get github.com/cloudspannerecosystem/harbourbridge

It should be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a MySQL database, run mysqldump and pipe its output to HarbourBridge

mysqldump <opts> db | $GOPATH/bin/harbourbridge -driver=mysqldump

where <opts> are the standard options you pass to mysqldump or mysql to specify host, port, etc., and db is the name of the database to dump.
Similarly, to use HarbourBridge on a PostgreSQL database, run

pg_dump <opts> db | $GOPATH/bin/harbourbridge -driver=pg_dump

See the Troubleshooting guide if you run into any issues. In addition to creating a new Spanner database with data from the source database, HarbourBridge also generates a schema file, the assessment report, and a bad data file (if any data is dropped). See Files generated by HarbourBridge.

Sample dump files

If you don’t have ready access to a MySQL or PostgreSQL database, the HarbourBridge github repository has some samples. The files cart.mysqldump and cart.pg_dump contain mysqldump and pg_dump output for a very basic shopping cart application (just two tables, one for products and one for user carts). The files singers.mysqldump and singers.pg_dump contain mysqldump and pg_dump output for a version of the Cloud Spanner singers example. To use HarbourBridge on cart.mysqldump, download the file locally and run

$GOPATH/bin/harbourbridge -driver=mysqldump < cart.mysqldump

Next steps

The schema created by HarbourBridge provides a starting point for evaluation of Spanner. While it preserves much of the core structure of your MySQL or PostgreSQL schema, data types will be mapped based on the types supported by Spanner, and unsupported features will be dropped e.g. functions, sequences, procedures, triggers and views. See the assessment report as well as HarbourBridge’s Schema conversion documentation for details.

To test Spanner’s performance, you will need to switch from the Emulator to a Cloud Spanner instance. The HarbourBridge quick-start guide provides details of how to set up a Cloud Spanner instance. To have HarbourBridge use your Cloud Spanner instance instead of the Emulator, simply unset the SPANNER_EMULATOR_HOST environment variable (see the Emulator documentation for context).

To optimize your Spanner performance, carefully review choices of primary keys and indexes—see Keys and indexes. Note that HarbourBridge preserves primary keys from the source database but drops all other indexes. This means that the out-of-the-box performance you get from the schema created by HarbourBridge can be significantly impacted. If this is the case, add appropriate Secondary indexes. In addition, consider using Interleaved tables to optimize table layout and improve the performance of joins.

Recap

HarbourBridge is an open source toolkit for evaluating and assessing Cloud Spanner using an existing MySQL or PostgreSQL database. It automates many of the manual steps so that you can quickly get to important design, evaluation and performance issues, such as. refining choice of primary keys, tuning of indexes, and other optimizations.

We encourage you to try out HarbourBridge, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. We have big plans for HarbourBridge, including the addition of user-guided schema conversion (to customize type mappings and provide a guided exploration of indexing, primary key choices, and use of interleaved tables), as well as support for more databases. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner

JanusGraph connects the past and future of Titan

Thursday, January 12, 2017

We are thrilled to collaborate with a group of individuals and companies, including Expero, GRAKN.AI, Hortonworks and IBM, in launching a new project — JanusGraph — under The Linux Foundation to advance the state-of-the-art in distributed graph computation.

JanusGraph is a fork of the popular open source project Titan, originally released in 2012 by Aurelius, and subsequently acquired by DataStax. Titan has been widely adopted for large-scale distributed graph computation and many users have contributed to its ongoing development, which has slowed down as of late: there have been no Titan releases since the 1.0 release in September 2015, and the repository has seen no updates since June 2016.

This new project will reinvigorate development of the distributed graph system to add new functionality, improve performance and scalability, and maintain a variety of storage backends.

The name "Janus" comes from the name of a Roman god who looks simultaneously into the past to the Titans (divine beings from Greek mythology) as well as into the future.

All are welcome to participate in the JanusGraph project, whether by contributing features or bug fixes, filing feature requests and bugs, improving the documentation or helping shape the product roadmap through feature requests and use cases.

Get involved by taking a look at our website and browse the code on GitHub.

We look forward to hearing from you!

By Misha Brukman, Google Cloud Platform

Lovefield: a powerful Javascript SQL-like database query engine for the web

Monday, November 17, 2014

Today we are announcing the release of a powerful library to be added to the arsenal of every web developer's toolbox. Since WebSQL standardization efforts ceased in 2010, there has been no cross-browser relational database solution for web clients. Existing persistence solutions such as IndexedDB and LocalStorage fall under the category of object-oriented storage and therefore lack traditional relational database features.

Lovefield is finally closing that gap by providing a feature rich database query engine built using IndexedDB as a backend. It provides an intuitive SQL-like declarative syntax such that developers can pick it up with minimal effort. Its declarative form provides immunity to SQL injection attacks, since there is no query parsing involved. The feature list includes:

select, insert, update, delete queries.
atomicity with intuitive transaction semantics (unlike IndexedDB’s surprising auto-commit behavior).
integrity constraint checks (primary key, unique, nullable/not-nullable).
aggregators (count, min, max, sum, avg, stddev, distinct)
"group by" for select queries.
multi-table join
easier schema upgrade mechanism than IndexedDB.
cross browser support (Chrome, Firefox, IE10).

On the performance front, Lovefield includes a query optimizer which will evaluate different execution plans and finally pick the most promising. We are confident that current performance will satisfy the majority of use cases (less than 50k rows) and we plan to further improve the performance for larger datasets in the near future.

Lovefield’s vision is captured in this specification document and we are working to provide some more exciting features such as foreign keys, cascaded delete/update, self-table join, observers/data-binding, in the near future.

Lovefield is already successfully powering a few Google services, including Google Play Movies Chrome app. With this open source release we are hoping to enable the development of data-rich applications and to attract interest and feedback from developers which will allow us to better understand how to move forward.

By Demetrios Papadopoulos, Chrome team

Cayley: graphs in Go

Wednesday, June 25, 2014

Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google. It’s been astounding to watch the growth of the Knowledge Graph and how it has improved Google search to delight users every day.

When I moved to New York last year, I saw just how far the concepts of Freebase and its data had spread through Google’s worldwide offices. I began to wonder how the concepts would advance if developers everywhere could work with similar tools. However, there wasn’t a graph available that was fast, free, and easy to get started working with.

With the Freebase data already public and universally accessible, it was time to make it useful, and that meant writing some code as a side project.

So today we are excited to release Cayley, an open source graph database.

Cayley is a spiritual successor to graphd; it shares a similar query strategy for speed. While not an exact replica of it’s predecessor, it brings it’s own features to the table:
• RESTful API
• Multiple (modular) backend stores, such as LevelDB and MongoDB
• Multiple (modular) query languages
• Easy to get started
• Simple to build on top of as a library
and of course
• Open Source

Cayley is written in Go, which was a natural choice. As a backend service that depends upon speed and concurrent access, Go seemed like a good fit. Go did not disappoint; with a fantastic standard library and easy access to open source libraries from the community, the necessary building blocks were already there. Combined with Go’s effective concurrency patterns compared to C, creating a performance-competitive successor to graphd became a reality.

To get a sense of Cayley, check out the I/O Bytes video we created where we “Build A Small Knowledge Graph”. The video includes a quick introduction to graph stores as well as an example of processing Freebase and Schema.org linked data.

You can also check out the demo dataset in a live instance running on Google App Engine. It’s running with the sample dataset in the repository — 30,000 movies and their actors, roles, and directors using Freebase film schema. For a more-than-trivial query, try running the following code, both as a query and as a visualization; what you’ll see is the neighborhood of the given actor and how the actors who co-star with that actor interact with each other:

costar =

g.M().In("/film/performance/actor").In("/film/film/starring")

function getCostars(x) {

return g.V(x).As("source").In("name")

.Follow(costar).FollowR(costar)

.Out("name").As("target")

}

function getActorNeighborhood(primary_actor) {

actors = getCostars(primary_actor).TagArray()

seen = {}

for (a in actors) {

g.Emit(actors[a])

seen[actors[a].target] = true

}

seen[primary_actor] = false

actor_list = []

for (actor in seen) {

if (seen[actor]) {

actor_list.push(actor)

}

getCostars(actor_list).Intersect(g.V(actor_list)).ForEach(function(d)
{

if (d.source < d.target) {

g.Emit(d)

}

})

}

getActorNeighborhood("Humphrey Bogart")

To get involved, check out the project on GitHub and join the mailing list. But most importantly, have fun building your own graphs!

By Barak Michener, Software Engineer, Knowledge NYC

Welcoming MariaDB 10.0.5

Thursday, November 7, 2013

MariaDB is a community-developed fork of MySQL, a relational database management system for developers looking for a robust, scalable, and reliable SQL server. Its current version is based on MySQL 5.5 and has the capability to provide powerful multi-source replication for data warehouses, to support subqueries that maximize performance, and to make replication more reliable with global transaction IDs.

Today, the MariaDB team is releasing MariaDB 10.0.5, which includes parallel slave replication threads, a feature sponsored by Google. Parallel replication has the ability to remove bottlenecks in replicated configurations, which is crucial as storage speeds increase to keep systems moving quickly.

Internally at Google, we’ve already deployed MariaDB 10.0 to our non-production MySQL instances to help report bugs and work with the MariaDB team to test their fixes. This release takes the MariaDB 10.0 branch from alpha to beta status, where the team will shift focus from stabilization to bug fixes.

Google’s move and support of MariaDB doesn’t affect our Google Cloud Platform’s Cloud SQL offering for developers.

Congratulations and thank you to everyone who has worked hard to get here!

By Ian Gulliver, Site Reliability Manager

LevelDB: A Fast Persistent Key-Value Store

Wednesday, July 27, 2011

LevelDB is a fast key-value storage engine written at Google that provides an ordered mapping from string keys to string values. We are pleased to announce that we are open sourcing LevelDB under a BSD-style license.

LevelDB is a C++ library that can be used in many contexts. For example, LevelDB may be used by a web browser to store a cache of recently accessed web pages, or by an operating system to store the list of installed packages and package dependencies, or by an application to store user preference settings. We designed LevelDB to also be useful as a building block for higher-level storage systems. Upcoming versions of the Chrome browser include an implementation of the IndexedDB HTML5 API that is built on top of LevelDB. Google's Bigtable manages millions of tablets where the contents of a particular tablet are represented by a precursor to LevelDB. The Riak distributed database has added support for using LevelDB for its per-node storage.

We structured LevelDB to have very few dependencies and it can be easily ported to new systems; it has already been ported to a variety of Unix based systems, Mac OS X, Windows, and Android.

LevelDB has good performance across a wide variety of workloads; we have put together a benchmark comparing its performance to SQLite and Kyoto Cabinet. The Riak team has compared LevelDB’s performance to InnoDB. A significant difference from similar systems like SQLite and Kyoto Cabinet is that LevelDB is optimized for batch updates that modify many keys scattered across a large key space. This is an important requirement for efficiently updating an inverted index that does not fit in memory.

LevelDB is available on Google Code, we hope you’ll find it useful for your projects.

By Jeff Dean and Sanjay Ghemawat; Google Fellows

Announcing Google Refine 2.0, a power tool for data wranglers

Wednesday, November 10, 2010

Our acquisition of Metaweb back in July also brought along Freebase Gridworks, an open source software project for cleaning and enhancing entire data sets. Today we’re announcing that the project has been renamed to Google Refine and version 2.0 is now available.

Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.

Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:

http://www.youtube.com/watch?v=yNccGtn3Wb0 (7 min)

http://www.youtube.com/watch?v=45EnWK-fE9k (9 min)

http://www.youtube.com/watch?v=m5ER2qRH1OQ (6 min)

The project is open source and its code and downloads are available here. Changes from version 1.1 to 2.0 are listed here.

By David Huynh, Google Search Infrastructure

Acre, an open source platform for building Freebase apps

Wednesday, August 25, 2010

Freebase is an open, Creative Commons licensed repository of structured data that contains information about 12 million real-world entities including people, places, films, books, events, businesses, and almost any other thing you can imagine. Our graph database has about 400 million facts and connections between entities, and all of it is accessible via our REST API. Freebase was acquired by Google last month, and one thing we knew would happen was that Freebase would become “even more open.”

We first launched Acre, the hosted, server-side JavaScript platform behind Freebase Apps, just over a year ago. Since then it's become more and more important to us and to the Freebase community. Not only are all kinds of individual developers and businesses using Acre to build apps and integrate Freebase data into their own platforms, but we've also recently announced our intention to develop the Freebase.com site on the platform, too.

Until now, Acre development has always been tied to Freebase.com, meaning that you need to develop your Acre apps on our server, using our app editor. But we know that most software developers prefer to use their own native development environments -- their favourite text editor, version control system, and so on -- so lately we've been working on ways to make Acre work with source code that's not stored in Freebase.

Last week we announced that we're releasing the Acre platform as open source software. This means that you can run Acre on your own machine, pulling templates and other files from your local disk and using your own development environment. While Acre still has close ties to Freebase (such as API hooks for easily making Freebase queries), this also means that you'll be able to develop standalone, non-Freebase apps using the platform if you want. And, by running Acre on your own platform, you can avoid the resource limitations that are necessary in a shared environment.

If you're interested in server-side JavaScript platforms, you may also be interested in some of the technical details of Acre.

Acre is based on Rhino, Mozilla's implementation of Javascript in Java. (In fact, "Acre" stands for "A Crash of Rhinos Evaluating.") Acre, by default, uses the Jetty servlet engine as its HTTP server, but can be run in any servlet container.
Acre includes a module system that supports high-latency source retrieval using extensive caching. Although Acre was originally designed to fetch data only from Freebase itself, it can also fetch data from disk and will support a wider range of require() options such as WebDAV.
Acre is capable of running on Google AppEngine, with support for the Keystore and for synchronous and asynchronous HTTP requests. Soon, Freebase's own Acre installation will run on AppEngine.

Please download Acre and try it out, and let us know what you think! You might also like to look at some of our other open source releases, like freebase-python (a Python library for working with the Freebase API) or freebase-suggest (a jQuery plugin that makes it easy to have your users select Freebase topics based on any criteria). For more information about Freebase and our open source efforts, see the Freebase wiki or post to the freebase-discuss mailing list.

By Kirrily Robert, Freebase Team

Google Releases More Patches for MySQL

Monday, September 8, 2008

By Mark Callaghan, Software Engineering Team

Did you know that Google uses MySQL as part of its Ads system? As you can imagine, we demand a lot from this Open Source code base and so we have spent a fair amount of time enhancing it to work better in our massively scaled environment. In the past, we have published several patches and today we have a few more to offer. We expect several of these features to be merged into a future official MySQL release, and one of them, semi-synchronous replication, is already available as a MySQL feature preview.

All of the features in the patch are described on our project wiki. The features include:

enhancements and bug fixes for features from the previous patch

changes to make InnoDB run faster on multi-core servers

changes to display mutex contention statistics

changes to monitor and rate-limit activity by database account and client IP

We are publishing several patches:

a patch for MySQL 5.0.37 with all of our changes

a patch for MySQL 5.1.26 with the changes for mutex contention statistics

a patch for MySQL 5.0.67 to make InnoDB run faster on multi-core servers

We hope these features we've Open Sourced will be useful to other developers. Check out the code and let us know what you think. We'd love to hear from you and answer any questions you might have in our Google MySQL Tools Discussion Group.