程式扎記: DB

顯示具有 DB 標籤的文章。顯示所有文章

2018年11月28日星期三

[ 常見問題 ] How to compare dates in datetime fields in Postgresql?

Source From Here
Question
How to make below query right while the column update_date is of type as "date/time"?

view plaincopy to clipboardprint?
select * from table where update_date >= '2013-05-03' AND update_date <= '2013-05-03'  

How-To
One example as below:

view plaincopy to clipboardprint?
SELECT *  
FROM table  
WHERE update_date >= '2013-05-03'::date  
AND update_date < ('2013-05-03'::date + '1 day'::interval);  

Supplement
* PostgreSQL Doc - Date/Time Functions and Operators

[ 文章收集 ] PostgreSQL 筆記 — 索引

Source From Here
PostgreSQL 筆記 — 索引
在資料庫當中，為了改進查詢效率，在資料量大的時候，可以透過建立索引讓資料庫加速查詢效率。本篇文章將會探討 PostgreSQL 的 index。雖然在大部分的情況，直接使用 CREATE INDEX 的語法已經能夠應付常見的開發場景，但是在 PostgreSQL 當中，其實還提供了許多不同的 index types 使用。

索引的基本原理
資料庫最重要的就是查詢資料，當資料存儲在硬碟時就要考慮如何處理 I/O 的效能問題。外接硬碟因為其本身硬體的限制，會比直接在記憶體讀取耗費更多的時間。在資料庫當中，如果沒有對資料表建立索引，將會使用 sequential scans 來查詢資料。在資料數量小的時候並沒有太明顯的效能差異，甚至還比建立索引還快，不過當資料量越來越大，sequential scans 很有可能造成讀取過慢的問題。

索引如何解決 sequential scans 過慢的問題
透過額外在硬碟建立一張索引表，我們可以透過索引表當作目錄，資料庫在查詢時，就會用給定的索引表尋找相對應的索引。再透過由索引指向實體資料，連結到硬碟當中的實體位址。

B Tree
B Tree 是一個專門為硬碟優化的平衡樹，跟一般的二元樹不同，B Tree 的樹可以有很多分支，有效減少樹的深度。B tree 的原理需要比較長的篇幅介紹，所以這邊省略介紹。

PostgreSQL 中的索引

建立索引
在 PostgreSQL 中可以透過 SQL 語法來建立索引:

view plaincopy to clipboardprint?
CREATE INDEX index_name ON table_name (column_name) USING method;  

在 PostgreSQL 中，如果沒有特別指定演算法，預設使用 B-Tree 來建立索引。也可以針對某個 column 做索引。也可以一次對多個 column 建立索引:

# CREATE INDEX index_name ON table_name (col1, col2);
"Index Scan using index_subscribers_on_email on subscribers (cost=0.28..8.29 rows=1 width=76)"

用 EXPLAIN 來觀察 query 的效能。發現透過 B-Tree 的方式成功減少了查詢時間。
注意：

* 使用 CREATE INDEX 建立索引時會鎖住整張表，在資料量大時可能需要耗費不少時間。
* 建立索引時我們可以加入 CREATE INDEX CONCURRENTLY 來確保資料表不會被 lock 住。不過為了確保索引表的完整性，CONCURRENTLY 需要花更多的時間來建立索引。
When this option is used, PostgreSQL will build the index without taking any locks that prevent concurrent inserts, updates, or deletes on the table; whereas a standard index build locks out writes (but not reads) on the table until it’s done. There are several caveats to be aware of when using this option — see Building Indexes Concurrently.

source

恰如其分的 index
為了解釋資料庫使用 index 的時機，我們先對某個只有兩筆資料的 table 下 index:

# CREATE INDEX index_users_on_email ON users (email);
# EXPLAIN SELECT email FROM users WHERE email = '[email protected]';
"Seq Scan on users (cost=0.00..1.02 rows=1 width=32)"
" Filter: ((email)::text = '[email protected]'::text)"

對於資料量小的資料，儘管建立了索引，資料庫仍然會使用 sequential scan 查詢。這是因為 Random I/O 的花費會比循序查詢的方式還高，因此如果資料量小時，就算建立 index 也不會增加查詢效能，反而浪費了硬碟空間。

index size
對一個具有 1300 多筆的資料建立 index:

# CREATE INDEX index_subscribers_on_email on subscribers (email);
# SELECT pg_relation_size('index_subscibers_on_email')/1024 || 'K' AS size; //80K

透過 pg_relation_size 拿取相關 relation 的大小。刪除 index 可使用 SQL 如 DROP INDEX IF EXISTS index_name. 為了確保索引的完整性，在建立索引時，PostgreSQL 會將整張表做 exclusive lock，建立完索引後才 release。因此對頻繁查改的資料表來說很有可能造成服務中斷，對於筆數相當多的資料來說，建立索引有可能花上幾分鐘或是幾十分鐘。

維護 index
首先先在資料庫刪除幾筆資料，對於資料庫來說，預設刪除的方式並不是真正從硬碟當中釋放空間，而是標上類似「已刪除」的標記。對於索引來說，刪除資料並不會減少索引大小:

# DELETE FROM subscribers WHERE id >1350;
# SELECT pg_relation_size('index_subscibers_on_email')/1024 || 'K' AS size; // 80K

這時可以透過 REINDEX 的方式重新建立索引。REINDEX INDEX index_name:

# REINDEX INDEX index_subscibers_on_email;
# SELECT pg_relation_size('index_subscibers_on_email')/1024 || 'K' AS size; // 72K

重建索引會把無用的索引刪除，索引表的大小減少了。從上面簡單的測試可以得知，對於增刪頻繁的資料表來說，定期重建索引可以減少索引表的大小，如果需要刪除大量的資料，也盡量重建索引來維護索引表的大小，否則可能造成硬碟空間的浪費。不過 REINDEX 也會造成鎖表，如果不允許服務維修或暫停的話，可以直接用 CREATE INDEX CONCURRENTLY 的方式重新建立一個全新的索引，再把舊的索引刪除，並且重新命名索引。這會比 REINDEX 耗費更久時間，而且也比較麻煩一些，不過可以確保資料表不會被鎖住。

對特定資料做索引
有些時候我們只會頻繁查詢某些特定條件的資料，這時候就不一定要對所有資料下索引。透過 CREATE INDEX index_name ON table_name WHERE ... 的方式，我們可以針對需要的資料做索引。

索引排序
在建立索引時如果沒有特別宣告排序，預設會使用 ASC 來建立索引。不過在某些使用情境下，我們可能更常用 DESC 的方式來排序，例如排名、文章發佈日期等等，這時使用 DESC 建立索引更有效率。

view plaincopy to clipboardprint?
CREATE INDEX index_articles_on_published_date ON articles (published_date DESC);  

重建索引
B-Tree索引只有在全部清空索引 page 才會被重複利用，所以在資料庫刪除某個值並不會減少索引表的大小。原本的資料為 1400 筆，刪除 50 筆，索引大小沒有變化。

# DELETE FROM subscribers WHERE id > 1350;
# SELECT pg_relation_size('index_subscibers_on_email')/1024 || 'K' AS size; // 80K

重新建立索引

# REINDEX INDEX index_subscibers_on_email;
# SELECT pg_relation_size('index_subscibers_on_email')/1024 || 'K' AS size;
72K

Postgre 當中的 index type
上面介紹完了 postgre index 與使用方式，接下來介紹 Postgre 當中常見的 index type。如果在建立 index 時沒有特別指定使用的方法，PostgreSQL 會使用 B Tree 來建立索引。

Btree
B Tree 支援下列幾種查詢操作：

<
<=
=
>=
>

GIN(Generalized Inverted Index)

GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items.

GIN 被用在全文搜尋或是或是像 Array 這種可能會被。如同字面上的意思，倒排索引，就是利用 Value 當作索引值，再去找相對應出現的索引。例如：

view plaincopy to clipboardprint?
this is cat  
this is an apple.  
cat meows.  
// build inverted index  
"this": {(0, 0), (1,0)}  
"is": {(0,1), (1,1)}  
"an": {(1,2)}  
"apple: {(1,3)}  
"cat": {(0,2), (2,0)}  
"meows": {(2,1)}  

透過下面的索引表，如果我們想要搜尋 cat 這個關鍵字，就可以從 key 為 cat 的 value 找到對應的索引。發現 cat 出現在第一個句子的第二個字以及第三個句子的第一個字。

適用場景：

* 全文搜尋
* 陣列

在 PostgreSQL 中使用 GIN：

view plaincopy to clipboardprint?
CREATE INDEX index_on_document on sentences using gin(document);  

GiST(Generalized Search Tree)
GiST 其實更算是一種通用的 interface，我們可以使用 GiST 來定義自己的 index 實現方式（B-Tree, R-Tree 等）。一般來說比較少使用到。但像是空間結構的資料，用 B-Tree 可能較難加速像是包含、相鄰、相交的搜索，因此像是 PostGIS 就是使用 GiST 來建立對空間結構查詢效能較好的索引。

適用場景：

* 自己實現 index 接口
* 空間結構
* 使用 B-Tree 可能無法符合使用場景的資料結構

BRIN(Block Range Indexes)
跟 B-Tree 對每一行做索引不同，BRIN 會對一段 range 做索引，雖然性能方面上仍然是 B-TREE 勝，但是 size 比 B-Tree 小很多。

如果資料常常是以範圍查詢，例如 log 檔，某個範圍的交易訂單、出帳資料等等，這些資料量通常相當驚人，用 B-Tree 建立索引可能會耗費不小的硬碟空間。不過這些資料通常比較少使用精確搜尋，而是用像範圍搜尋的方式來批次處理這些資料，這時候使用 BRIN 可以得到還不錯的效能，同時節省了相當大的空間。

適用場景：

* Log 檔分析
* 交易訂單分析
* 資料量大，常用範圍搜尋

Bloom
Bloom Filter 是為了解決 Hash table 的空間以及查詢時間，算法本身的原理可以相當快速地判斷某個元素是否有出現在集合，但會有 false positive 的情形出現，因此在 PostgreSQL 當中需要做二次檢查。

view plaincopy to clipboardprint?
CREATE INDEX bloom_index_event ON events USING bloom (startTime, endTime, name, gifts)  
WITH (length=80, startTime=2, endTime=2, name=4, gifts=2);  

length 為每個 signature 的長度，預設為 80，最長為 4096。後面的參數為對應的 column name 會被 mapped 到幾個 bits，最少為 2，最多為 4096。

適用場景：

對多個 column 的精確搜尋 SELECT * FROM events WHERE startTime=20171111 and endTime=20171231 and name=christmas and gifts="gift_special"，透過 bloom filter 可以快速濾掉不在集合裡的結果。

總結
一般來說，B-Tree 幾乎能夠處理大部分的使用情境。但 PostgreSQL 有豐富的 index 類型可供使用，讓資料能夠更容易地被查詢、索引，也給開發者更高的彈性。這是我相當喜歡 PostgreSQL 的一大原因之一。這次介紹的 index types，並沒有很深入探討背後的實作原理，但有興趣的讀者可以自行尋找相關的資源閱讀。簡單整理重點如下:

1. 在 PostgreSQL 中有數種下 index 的演算法可供使用，每個演算法都有適合的使用情景。

2. 使用 CREATE INDEX 時會鎖住整張表，在為大量資料建立索引時很可能會造成影響。可以用 CREATE INDEX CONCURRENTLY 解決，但需要花更多時間建立索引。

3. 若資料頻繁地更新、刪除，會導致冗餘的 index。盡可能地固定使用 REINDEX 重新建立索引減少硬碟空間的使用。同時注意 REINDEX 會造成鎖表。可以再次建立 index，再把原本的 index 名稱刪除來避免鎖表

4. CREATE UNIQUE INDEX 可透過 unique 來確保值的唯一性

5. INDEX 可以透過 WHERE 條件來對特定的 column 下索引，針對特定使用情景。

6. INDEX 可以透過排序（預設為 ASC），來符合特定的使用情景（文章發佈日期等等）

7. index 並不是銀彈，可以發現在小資料當中就算下 index 仍然會使用 seq scan，這是因為 Random I/O 的代價遠高於 seq can。因此對於固定且數量不多的資料（ex: 縣市、郵遞區號），建立索引並不一定能增加查詢時間，反而佔據了不必要的硬碟空間。

Supplement
* Tutorialspoint - PostgreSQL - INDEXES

2017年11月30日星期四

[ 文章收集 ] How To Install and Use PostgreSQL on CentOS 7

Source From Here
Introduction
Relational database management systems are a key component of many web sites and applications. They provide a structured way to store, organize, and access information. PostgreSQL, or Postgres, is a relational database management system that provides an implementation of the SQL querying language. It is a popular choice for many small and large projects and has the advantage of being standards-compliant and having many advanced features like reliable transactions and concurrency without read locks.

In this guide, we will demonstrate how to install Postgres on CentOS 7 and go over some basic ways to use it.

Installation
CentOS's default repositories contain Postgres packages, so we can install them without a hassle using the yum package system. Below will install the postgresql-server package and the "contrib" package, that adds some additional utilities and functionality:

# sudo yum install postgresql-server postgresql-contrib

Now that our software is installed, we have to perform a few steps before we can use it. First is to create a new PostgreSQL database cluster:

# sudo postgresql-setup initdb

By default, PostgreSQL does not allow password authentication. We will change that by editing its host-based authentication (HBA) configuration. Open the HBA configuration with your favorite text editor. We will use vi:

# sudo vi /var/lib/pgsql/data/pg_hba.conf

Find the lines that looks like this, near the bottom of the file:
- pg_hba.conf excerpt (original)

view plaincopy to clipboardprint?
...  
host    all             all             127.0.0.1/32            ident  
host    all             all             ::1/128                 ident  

Then replace "ident" with "md5", so they look like this:
- pg_hba.conf excerpt (updated)

view plaincopy to clipboardprint?
...  
host    all             all             127.0.0.1/32            md5  
host    all             all             ::1/128                 md5  

Save and exit. PostgreSQL is now configured to allow password authentication. Now start and enable PostgreSQL:

# sudo systemctl start postgresql
# sudo systemctl enable postgresql

PostgreSQL is now ready to be used. We can go over how it works and how it may be different from similar database management systems you may have used.

Using PostgreSQL Roles and Databases
By default, Postgres uses a concept called "roles" to aid in authentication and authorization. These are, in some ways, similar to regular Unix-style accounts, but Postgres does not distinguish between users and groups and instead prefers the more flexible term "role". Upon installation Postgres is set up to use "ident" authentication, meaning that it associates Postgres roles with a matching Unix/Linux system account. If a Postgres role exists, it can be signed in by logging into the associated Linux system account.

The installation procedure created a user account called postgres that is associated with the default Postgres role. In order to use Postgres, we'll need to log into that account. You can do that by typing:

# sudo -i -u postgres

You will be asked for your normal user password and then will be given a shell prompt for the postgres user. You can get a Postgres prompt immediately by typing:

# psql

You will be auto-logged in and will be able to interact with the database management system right away. However, we're going to explain a little bit about how to use other roles and databases so that you have some flexibility as to which user and database you wish to work with. Exit out of the PostgreSQL prompt by typing:

postgres=# \q

You should now be back in the postgres user command prompt.

Create a New Role
From the postgres Linux account, you have the ability to log into the database system. However, we're also going to demonstrate how to create additional roles. The postgres Linux account, being associated with the Postgres administrative role, has access to some utilities to create users and databases.

We can create a new role by typing:

-bash-4.2$ createuser --interactive
Enter name of role to add: john
Shall the new role be a superuser? (y/n) y

This basically is an interactive shell script that calls the correct Postgres commands to create a user to your specifications. It will only ask you two questions: the name of the role and whether it should be a superuser. You can get more control by passing some additional flags. Check out the options by looking at the man page:

-bash-4.2$ man createuser

Create a New Database
The way that Postgres is set up by default (authenticating roles that are requested by matching system accounts) also comes with the assumption that a matching database will exist for the role to connect to. So if I have a user called test1, that role will attempt to connect to a database called test1 by default.

You can create the appropriate database by simply calling this command as the postgres user:

-bash-4.2$ createdb test1

Connect to Postgres with the New User
Let's assume that you have a Linux system account called test1 (you can create one by typing: sudo adduser test1), and that you have created a Postgres role and database also called test1. You can change to the Linux system account by typing:

# sudo -i -u test1

You can then connect to the test1 database as the test1 Postgres role by typing:

# psql

This will log in automatically assuming that all of the components have been configured.

If you want your user to connect to a different database, you can do so by specifying the database like this (make sure you use \q to be back to the command prompt):

$ psql -d postgres
psql (9.2.23)
Type "help" for help.

postgres=#

You can get information about the Postgres user you're logged in as and the database you're currently connected to by typing:

postgres=# \conninfo
You are connected to database "postgres" as user "test1" via socket in "/var/run/postgresql" at port "5432".

This can help remind you of your current settings if you are connecting to non-default databases or with non-default users.

Create and Delete Tables
Now that you know how to connect to the PostgreSQL database system, we will start to go over how to complete some basic tasks. First, let's create a table to store some data. Let's create a table that describes playground equipment. The basic syntax for this command is something like this:

view plaincopy to clipboardprint?
CREATE TABLE table_name (  
    column_name1 col_type (field_length) column_constraints,  
    column_name2 col_type (field_length),  
    column_name3 col_type (field_length)  
);  

As you can see, we give the table a name, and then define the columns that we want, as well as the column type and the max length of the field data. We can also optionally add table constraints for each column. You can learn more about how to create and manage tables in Postgres here.

For our purposes, we're going to create a simple table like this:

postgres=# CREATE TABLE playground (
postgres(# equip_id serial PRIMARY KEY,
postgres(# type varchar (50) NOT NULL,
postgres(# color varchar (25) NOT NULL,
postgres(# location varchar(25) check (location in ('north', 'south', 'west', 'east', 'northeast', 'southeast', 'southwest', 'northwest')),
postgres(# install_date date
postgres(# );

We have made a playground table that inventories the equipment that we have. This starts with an equipment ID, which is of the serial type. This data type is an auto-incrementing integer. We have given this column the constraint of primary key which means that the values must be unique and not null; For two of our columns, we have not given a field length. This is because some column types don't require a set length because the length is implied by the type.

We then give columns for the equipment type and color, each of which cannot be empty. We then create a location column and create a constraint that requires the value to be one of eight possible values. The last column is a date column that records the date that we installed the equipment. We can see our new table by typing this:

postgres=# \d

view plain copy to clipboard print ?

                  List of relations

Schema |          Name           |   Type   | Owner

--------+-------------------------+----------+-------

public | playground              | table    | john

public | playground_equip_id_seq | sequence | john

(2 rows)

As you can see, we have our playground table, but we also have something called playground_equip_id_seq that is of the type sequence. This is a representation of the "serial" type we gave our equip_id column. This keeps track of the next number in the sequence. If you want to see just the table, you can type:

postgres=# \dt

Add, Query, and Delete Data in a Table
Now that we have a table created, we can insert some data into it. Let's add a slide and a swing. We do this by calling the table we're wanting to add to, naming the columns and then providing data for each column. Our slide and swing could be added like this:

postgres=# INSERT INTO playground (type, color, location, install_date) VALUES ('slide', 'blue', 'south', '2014-04-28');
INSERT 0 1
postgres=# INSERT INTO playground (type, color, location, install_date) VALUES ('swing', 'yellow', 'northwest', '2010-08-16');
INSERT 0 1

You should notice a few things. First, keep in mind that the column names should not be quoted, but the column values that you're entering do need quotes. Another thing to keep in mind is that we do not enter a value for the equip_idcolumn. This is because this is auto-generated whenever a new row in the table is created.

We can then get back the information we've added by typing:

# SELECT * FROM playground;

view plain copy to clipboard print ?

equip_id | type  | color  | location  | install_date

----------+-------+--------+-----------+--------------

        1 | slide | blue   | south     | 2014-04-28

        2 | swing | yellow | northwest | 2010-08-16

(2 rows)

Here, you can see that our equip_id has been filled in successfully and that all of our other data has been organized correctly. If our slide breaks and we remove it from the playground, we can also remove the row from our table by typing:

postgres=# DELETE FROM playground WHERE type = 'slide';
DELETE 1

If we query our table again, we will see our slide is no longer a part of the table:

postgres=# SELECT * FROM playground;

view plain copy to clipboard print ?

equip_id | type  | color  | location  | install_date

---------+-------+--------+-----------+--------------

       2 | swing | yellow | northwest | 2010-08-16

1 row)

How To Add and Delete Columns from a Table
If we want to modify a table after it has been created to add an additional column, we can do that easily. We can add a column to show the last maintenance visit for each piece of equipment by typing:

postgres=# ALTER TABLE playground ADD last_maint date;
ALTER TABLE

If you view your table information again, you will see the new column has been added (but no data has been entered):

postgres=# SELECT * FROM playground;

view plain copy to clipboard print ?

equip_id | type  | color  | location  | install_date | last_maint

---------+-------+--------+-----------+--------------+------------

       2 | swing | yellow | northwest | 2010-08-16   |

1 row)

We can delete a column just as easily. If we find that our work crew uses a separate tool to keep track of maintenance history, we can get rid of the column here by typing:

postgres=# ALTER TABLE playground DROP last_maint;
ALTER TABLE

How To Update Data in a Table
We know how to add records to a table and how to delete them, but we haven't covered how to modify existing entries yet.

You can update the values of an existing entry by querying for the record you want and setting the column to the value you wish to use. We can query for the "swing" record (this will match every swing in our table) and change its color to "red". This could be useful if we gave it a paint job:

postgres=# UPDATE playground SET color = 'red' WHERE type = 'swing';
UPDATE 1

We can verify that the operation was successful by querying our data again:

postgres=# SELECT * FROM playground;

view plain copy to clipboard print ?

equip_id | type  | color | location  | install_date

---------+-------+-------+-----------+--------------

       2 | swing | red   | northwest | 2010-08-16

1 row)

As you can see, our swing is now registered as being red.

Conclusion
You are now set up with PostgreSQL on your CentOS 7 server. However, there is still much more to learn with Postgres. Although many of them were written with Ubuntu in mind, these tutorials should be helpful in learning more about PostgreSQL:
* A comparison of relational database management systems
* Learn how to create and manage tables with Postgres
* Get better at managing roles and permissions
* Craft queries with Postgres with Select
* Install phpPgAdmin to administer databases from a web interface on Ubuntu
* Learn how to secure PostgreSQL on Ubuntu
* Set up master-slave replication with Postgres on Ubuntu
* Learn how to backup a Postgres database on Ubuntu

訂閱：意見 (Atom)

程式扎記

標籤

2018年11月28日星期三

[ 常見問題 ] How to compare dates in datetime fields in Postgresql?

[ 文章收集 ] PostgreSQL 筆記 — 索引

2017年11月30日星期四

[ 文章收集 ] How To Install and Use PostgreSQL on CentOS 7

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2018年11月28日 星期三

[ 常見問題 ] How to compare dates in datetime fields in Postgresql?

[ 文章收集 ] PostgreSQL 筆記 — 索引

2017年11月30日 星期四

[ 文章收集 ] How To Install and Use PostgreSQL on CentOS 7

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2018年11月28日星期三

2017年11月30日星期四