Skip to content

added: sync standby name, additional named replication slot#410

Open
ghost wants to merge 1 commit into
masterfrom
unknown repository
Open

added: sync standby name, additional named replication slot#410
ghost wants to merge 1 commit into
masterfrom
unknown repository

Conversation

@ghost

@ghost ghost commented Jan 15, 2018

Copy link
Copy Markdown

This pull request relates to the stream #330
The ideas of the 2 new parameters:

  1. Implemented "AdditionalSyncStandbyNames" parameter;
    • configure additional value for the "synchronous_standby_names" parameter
  2. Implemented "AdditionalReplicationSlotName";
    • provide additional replication slot with specified configurable name

The additional synchronous name will be used in 2 nodes synchronous stolon cluster. If the master or standby fails then postgres will start using additional synchronous name for the synchronous replication, and therefore the stolon cluster (now it consists of the one survived single node) will continue operate.
As the value for "AdditionalSyncStandbyNames" parameter we want to use PgBarman's application name and since it requires exact replication slot we have implemented the "AdditionalReplicationSlotName" parameter
From the consistency point of view the schema seems to be ok because in a moment of a failure the PgBarman will be engaged as the synchronous destination and the data will be saved by the additional synchronous standby (e.g. PgBarman).

The pull request is the continue of the discussion: #330

@Vadim0908

Vadim0908 commented Jan 15, 2018

Copy link
Copy Markdown

This patch allows us to use additional replication slots for backup (RPO=0). For example, we can implement two options for working with a Barman.

Classic Stolon Architecture and Barman Backup

stolon barman sr 3 nodes

Barman offers 2 options for backup and archiving. Traditionally, Barman has supported standard WAL file shipping through PostgreSQL’s archive_command (usually via rsync/SSH). With this method, WAL files are archived only when PostgreSQL switches to a new WAL file. To keep it simple, this normally happens every 16MB worth of data changes.
Barman 1.6.0 introduces streaming of WAL files for PostgreSQL servers 9.2 or higher, as an additional method for transactional log archiving, through pg_receivexlog. WAL streaming is able to reduce the risk of data loss, bringing RPO down to near zero values.
Barman 2.0 introduces support for replication slots with PostgreSQL servers 9.4 or above, therefore allowing WAL streaming-only configurations. Moreover, you can now add Barman as a synchronous WAL receiver in your PostgreSQL 9.5 (or higher) cluster, and achieve zero data loss (RPO=0).
Thus, the traditional archiving of WAL logs can lead to data loss in the event of an accident. Therefore, we believe that we need to use both approaches for archiving WA-logs.

Synchronous WAL streaming

This feature is available only from PostgreSQL 9.5 and above. Barman can also reduce the Recovery Point Objective to zero, by collecting the transaction WAL files like a synchronous standby server would.

Stolon Architecture with 2 Nodes and Barman as 3rd Nodes

stolon barman sr 2 nodes

@sgotti sgotti left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada @Vadim0908 thanks for you PR and detailed explanation. Comments inline. There's also the need for some integration tests.

But I don't consider this as a fix for #330:
This PR will just add additional sync standbys but, if the internal good standbys are less then minSynchronousStandbys, than a fakesyncstandby will be added anyway. And I won't change this behavior because if you define minSynchronousStandbys stolon must ensure that there're at least minSynchronousStandbys synchronous stanbys and these stanbys must be controlled by stolon (not external).

The fix for #330 will be the ability to not make stolon add fakesyncstanbys when there less than minSynchronousStandbys good keepers. I haven't tests this and it should be better discussed but probably just defining minSynchronousStandbys to 0 will make this happen without any code change (or if not a PR, different than this, should be submitted to discuss this).

Comment thread cmd/keeper/keeper.go
log.Errorw("failed to create replication slot", "slotName", backupSlotName, zap.Error(err))
return err
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this drop and recreate the replication slot at every loop?

Comment thread cmd/keeper/keeper.go Outdated
// warning reloading the parameters) with postgresql < 9.6
if len(synchronousStandbys) > 1 {
parameters["synchronous_standby_names"] = fmt.Sprintf("%d (%s)", len(synchronousStandbys), strings.Join(synchronousStandbys, ","))
if syncStandbyNumber > 1 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here there's some logic that needs to be detailed in a comment. If I got it right you want to define the additionalsyncstandbynames only if there's at least one defined synchronous standby?

I won't do all of this here. This is something that should be handled by the sentinel. So, inside the sentinel I'll just add the additional synchronous standby in db.Spec.SynchronousStandby.

Comment thread doc/cluster_spec.md Outdated
| synchronousReplication | use synchronous replication between the master and its standbys | no | bool | false |
| minSynchronousStandbys | minimum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| maxSynchronousStandbys | maximum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| additionalSyncStandbyNames| additional names for possible standbys not included into stolon cluster, but used as replication destinations | mo | string | null |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this an []string or you'll have to explicitly define that they should be comma separated and parse/validate them.

Comment thread doc/cluster_spec.md Outdated
| maxSynchronousStandbys | maximum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| additionalSyncStandbyNames| additional names for possible standbys not included into stolon cluster, but used as replication destinations | mo | string | null |
| additionalWalSenders | number of additional wal_senders in addition to the ones internally defined by stolon, useful to provide enough wal senders for external standbys (changing this value requires an instance restart) | no | uint16 | 5 |
| additionalReplicationSlotName | name of the additional replication slot which is always created by the master postgres instance | no | string | null |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be this a []string as above so users can define multiple additional replication slots?

Comment thread pkg/cluster/cluster.go Outdated
}
if s.AdditionalReplicationSlotName == nil {
s.AdditionalReplicationSlotName = StringP(DefaultReplicationSlotName)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making them a []string the values should be validated to check that they're valid sync standby names and replication slot names.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdditionalReplicationSlotName - is the name of a single replication slot, not an array

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was asking to make it an array so multiple additional replication slots can be defined like done for additionalSyncStandbyNames.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simon, in our task we need one definite replication slot. But if you insist we can extend the basic idea. Do you?

@sgotti sgotti Jan 19, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada yes, since we are providing a new option to define an additional replication slot lets make it useful for everyone giving the ability to define N additional replication slots if needed.

Comment thread cmd/sentinel/sentinel.go Outdated
db.Spec.FollowConfig.StandbySettings = clusterSpec.StandbySettings
}
db.Spec.AdditionalWalSenders = *clusterSpec.AdditionalWalSenders
db.Spec.AdditionalSyncStandbyNames = *clusterSpec.AdditionalSyncStandbyNames

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this and just add the additional synchronous standbys in db.Spec.SynchronousStandby.

Comment thread pkg/cluster/cluster.go Outdated
// addition to the ones internally defined by stolon
AdditionalWalSenders uint16 `json:"additionalWalSenders"`

AdditionalSyncStandbyNames string `json:"additionalSyncStandbyNames"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an []string.

Comment thread pkg/cluster/cluster.go Outdated
// addition to the ones internally defined by stolon
AdditionalWalSenders *uint16 `json:"additionalWalSenders"`

AdditionalSyncStandbyNames *string `json:"additionalSyncStandbyNames"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an []string.

Comment thread pkg/cluster/cluster.go Outdated
AdditionalWalSenders *uint16 `json:"additionalWalSenders"`

AdditionalSyncStandbyNames *string `json:"additionalSyncStandbyNames"`
AdditionalReplicationSlotNames *string `json:"additionalReplicationSlotNames"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an []string.

Comment thread cmd/keeper/keeper.go Outdated
synchronousStandbys = append(synchronousStandbys, syncStandByName)
}
}

@sgotti sgotti Jan 24, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly can't understand the logic here. As I tried to explain in my previous comment: https://github.com/sorintlab/stolon/pull/410/files#r161692823 just drop this and let the sentinel add the AdditionalSyncStandbyNames to the db.Spec.SynchronousStandbys.

Comment thread cmd/sentinel/sentinel.go Outdated
db.Spec.FollowConfig.StandbySettings = clusterSpec.StandbySettings
}
db.Spec.AdditionalWalSenders = *clusterSpec.AdditionalWalSenders
db.Spec.AdditionalSyncStandbyNames = clusterSpec.AdditionalSyncStandbyNames

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't define db.Spec.AdditionalSyncStandbyNames but just add the additional synchronous standbys to db.Spec.SynchronousStandby.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgotti We want these additional standbys not to be taken into account here keeper.go:303

		parameters["synchronous_standby_names"] = fmt.Sprintf("%d (%s)", syncStandbyNumber, strings.Join(synchronousStandbys, ","))
	} else {
		parameters["synchronous_standby_names"] = strings.Join(synchronousStandbys, ",")
	}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada I don't understand why. As the name states they are AdditionalSyncStandbyNames so they should be added to the list of "synchronous_standby_names" no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada If you want them only as a "fallback" standby when one or more keeper standby isn't working then that's another story and it should be handled differently since this will break the current concistency logic that uses a fake standby as this change will skip it.

As I explained previously these are 2 distinct changes. One is to set additional sync standbys (not replacement standbys) that will be counted in the number of requires sync standbys. Another is to disable sync repl when you don't have enough standbys.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgotti if I add the additional standby names into db.Spec.SynchronousStandby then the provided piece of will be generated like (for instance) = "2 (stolon_fdg4gs, add_name)", and this is what I want to avoid. We want to implement additional synchronous standbys for needs of backups as it was mentioned on the schema (before). It is supposed that PgBarman will serve the required needs. Thus we add PgBarman into 2 nodes stolon cluster with enabled sync replication. In a moment the master fails the standby keeper will be promoted into new Master with sync replication still enabled. The single stolon node will still operate and the PgBarman will be used as sync standby destination.

Later when a new 2nd node of stolon is restored/created the cluster will be switched to use it. But PgBarman will serve the needs of sync backup.

@sgotti sgotti Jan 24, 2018

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered here: #410 (comment)

@sgotti

sgotti commented Jan 24, 2018

Copy link
Copy Markdown
Member

@saranhada Looks like you have two requirements:

a. You don't want that a primary blocks when there are less than MinSynchronousStandbys working standbys AND you don't want to just fallback to async repl but instead you want to send wal records to an external standby that will become Synchronous ONLY when the above condition happen.

Note that this isn't #330 (deactivate synchronous replication for cluster consisting of 2 nodes in case the slave fails).

and

b. You want an external synchronous standby for backups with rpo = 0 but only when there are less than MinSynchronousStandbys working standbys.

But I'm quite sure they can be summed in only one requirement:

  • Add an option to define N Fallback synchronous standbys that will be added only when there are less than MinSynchronousStandbys working standbys.

The main issue with your implementation is that you can't control all of this using the FIRST method of the sync_standby_names parameters because it's acting outside stolon control (switching sync standbys without the sentinel knowing this). Let me explain:

This PR will define for the primary instance something like this: 1 (stolon_fdg4gs, add_name).

This means that if the standby keeper stolon_fdg4gs fails, the master instead of blocking will continue committing transactions since it can send data to pgBarman (or any other external pg instance defined by add_name).

If in a small time window a standby dies then the primary dies and the failed standby comes back before the sentinel notices this, it may happen that the standby is elected as master also if it's not in sync with the dead primary causing the loss of all the changes contained in the missed wal records.

That's the reason why we are forced to have FIRST value equal to the total number sync standbys. Will open a PR to document it better.

That's the reason why I was asking to just add the AdditionalSyncStandbys to db.Spec.SynchronousStandbys. But obviously this will block your primary if just one of the external sync standbys are down and you don't want this.

Proposed implementation

So the starting point is that it must be the sentinel to control all of this. And let's change the option name to some more clear like ExternalFallbackSyncStandbys.

A possible solution is to handle all of this in the sentinel (like we already do adding a fakestandby when there aren't enough keeper to force the primary to block): if the sentinel detects that the good standbys are less than MinSynchronousStandbys then add one or more ExternalFallbackSyncStandbys to reach MinSynchronousStandbys.

So, in a two node cluster, you'll have MinSynchronousStandbys = 1, if the standbys dies the sentinel will notice this, remove it from the db.Spec.SynchronousStandbys and add the first entry in ExternalFallbackSyncStandbys. In this way the primary will unblock. The above problem won't happen because the external standby will be added by the sentinel only if it detects that the internal standbys is dead. If the dead standby comes back it'll be readded to the list of sync stanbys and the external stanby will be removed.

Of course the switch isn't instantaneus as if it's managed by the postgres instance but will permit the cluster to unblock after some seconds/minutes.

@ghost

ghost commented Jan 26, 2018

Copy link
Copy Markdown
Author

@sgotti I'm agree the name ExternalFallbackSyncStandbys sounds good. It really reflects the idea of the backup synchronous standbys.

Also I have reviewed the stolon source code again, of course I'm not an deep expert in it.

You proposed to migrate the logic into sentinel. I think I see a piece of code can be responsible for the creation of the required configuration. It seems to start here sentinel.go:952 =>

if !masterOK {
log.Infow("trying to find a new master to replace failed master")
.....
newMasterDB.Spec.SynchronousStandbys = []string{fakeStandbyName}

I have the following proposal:

  • If the ExternalFallbackSyncStandbys parameter is not empty then just take its values and put into the "newMasterDB.Spec.SynchronousStandbys" list.

How do you think?

@sgotti

sgotti commented Jan 26, 2018

Copy link
Copy Markdown
Member

@saranhada Without testing a start to see if it works or there're some corner cases could be:

  • Before

    // If there're not enough real synchronous standbys add a fake synchronous standby because we have to be strict and make the master block transactions until MaxSynchronousStandbys real standbys are available
    add a check if len(synchronousStandbys) < int(*clusterSpec.MinSynchronousStandbys) { that will add N ExternalFallbackSyncStandbys to to reach MinSynchronousStandbys.

  • Validate in clusterspec.Validate() that the provided ExternalFallbackSyncStandbys are not stolon names (use IsStolonName in

    func IsStolonName(name string) bool {
    ) to not clash with internal stolon names.

  • I'll add some tests to see that all of this works to both sentinel/sentinel_test.go and integration tests in tests/integration

@ghost

ghost commented Jan 27, 2018

Copy link
Copy Markdown
Author

@sgotti
Your suggestion:
add a check if len(synchronousStandbys) < int(*clusterSpec.MinSynchronousStandbys) { that will add N ExternalFallbackSyncStandbys to to reach MinSynchronousStandbys.
will not work.

Please see the code in keeper.go:275:

// Setup synchronous replication
if db.Spec.SynchronousReplication && len(db.Spec.SynchronousStandbys) > 0 {
synchronousStandbys := []string{}
for _, synchronousStandby := range db.Spec.SynchronousStandbys {
synchronousStandbys = append(synchronousStandbys, common.StolonName(synchronousStandby))
}

Every sync name which we add in sentinel will be prefixed by "stolon_" in keeper.

I think we should implement the mentioned by you logic (add required the insufficient number of sync standby names) in keeper and take into consideration the MinSynchronousStandbys & MaxSynchronousStandbys

@sgotti

sgotti commented Jan 29, 2018

Copy link
Copy Markdown
Member

Every sync name which we add in sentinel will be prefixed by "stolon_" in keeper.

So lets remove this from the keeper and make the sentinel to generate a StolonName. The keeper will put the provided synchronous standby names as provided in the db spec without any change.

Also remove all the prefix checks and removal in the keeper:

if !common.IsStolonName(n) {

to report them as is (so also the external stanbys are reported).

Perhaps there's something else to change in the sentinel. Let's see after these changes.

I think we should implement the mentioned by you logic (add required the insufficient number of sync standby names) in keeper and take into consideration the MinSynchronousStandbys & MaxSynchronousStandbys

The keeper just tries to converge to the clusterdata db spec and for this reason must not implement any cluster logic. The clusterdata db spec is calculated by the sentinel so all of this logic pertains to the sentinel.

@sgotti

sgotti commented Jan 31, 2018

Copy link
Copy Markdown
Member

@saranhada I rebased you branch on master and changed some parts to make it work. You can find it here (just one commit): https://github.com/sgotti/stolon/tree/saranhada-ADD_PARAMS_SYNC

  • In the end, instead of keeping both internal and external standbys in db.Spec.SynchronousStanbys, I preferred to add a db.Spec.ExternalSynchronousStandbys since these are two different things. Se below.
  • Added some comments to clarify that db.Spec.SynchronousStanbys is a list of internal db UIDs (not generic standby names) while db.Spec.ExternalSynchronousStandbys is a list of external standby names.
  • For the above reason the fakestandbyname will now be added to db.Spec.ExternalSynchronousStandbys

Please take a look.

You can just take my commit and put it in your branch and force push to update this PR (no need to open a new PR).

@marct83

marct83 commented Jan 31, 2018

Copy link
Copy Markdown

This is completely unrelated to this issue....but does Barman work with Stolon and is configuration any different than a regular Barman Postgres setup?

@sgotti

sgotti commented Jan 31, 2018

Copy link
Copy Markdown
Member

This is completely unrelated to this issue....but does Barman work with Stolon and is configuration any different than a regular Barman Postgres setup?

@marct83 can you please ask this question on the gitter channel so more people can help you and we don't pollute this PR? Thanks.

@marct83

marct83 commented Jan 31, 2018

Copy link
Copy Markdown

No problem. Delete my comments. Thanks

@sgotti

sgotti commented Feb 5, 2018

Copy link
Copy Markdown
Member

@saranhada I saw you imported my commit but also added another commit (2f25a26) that is adding unneeded things (ExternalFallbackSyncStandbys is not used and need).

You said on gitter that you completed manual tests but before this PR can be merged we need some automatic tests. We need some unit tests in cmd/sentinel/sentinel.go and an integration tests in tests/integration/ha_test.go. For this last one a possible implementation idea is to use the Standby cluster feature to set up a stanby cluster as a synchronous standby (it'll be the same as you barman instance but we can reuse the code used in other tests) and do all the thing you probably did manually.

If you have some problems adding them I can do it for you in the next days when I have some time.

@ghost

ghost commented Feb 7, 2018

Copy link
Copy Markdown
Author

@sgotti sorry for the late response.
Can you please confirm now that the created code is OK? ...and I should created auto-tests.
If not then I'd prefer to accomplish with the code/logic and after that to create tests. I'm absolutely agree that I have to create the tests as well.

Comment thread pkg/cluster/cluster.go Outdated
}

if len(s.AdditionalReplicationSlotNames) > 0 {
rp := regexp.MustCompile("[_0-9a-z]+")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pkg/cluster/cluster.go Outdated
symbolsNameFilter := regexp.MustCompile("[_0-9]")
if len(s.ExternalFallbackSyncStandbys) > 0 {
correctSymbolsNameFilter := regexp.MustCompile("[_0-9A-Za-z]")
for _, standbyName := range s.ExternalFallbackSyncStandbys {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in its own function and add some unit tests to it?

@sgotti

sgotti commented Feb 8, 2018

Copy link
Copy Markdown
Member

@saranhada The logic related to the FallbackExternalSyncStandbys looks good but I'm quite sure there'll be bugs that we could catch only if we write tests to cover the changes. As I explained before I don't understand why you added commit 2f25a26

Additionally:

  • The part related to creating additional replication slot is a different feature than the FallbackExternalSyncStanbys and should be split in its own PR. I haven't reviewed it because it's difficutl to see if it does what it's meant to do without some tests. Can you open a new PR for it so I can review it separately from this?

  • Added some comment for the new fields validation.

@ghost

ghost commented Feb 8, 2018

Copy link
Copy Markdown
Author

@sgotti do you suggest to split the PR into the following 2?

  1. Additional replication slots
  2. Fallback sync standbys

Regarding option 2 I understand the integration tests are required. But how about option 1? DO you require it? I think unit tests will be enough.

@sgotti

sgotti commented Feb 8, 2018

Copy link
Copy Markdown
Member

@sgotti do you suggest to split the PR into the following 2?

  1. Additional replication slots
  2. Fallback sync standbys

yes.

Regarding option 2 I understand the integration tests are required. But how about option 1? DO you require it? I think unit tests will be enough.

Now the current updateReplSlots is covered by the current integration tests. In its current shape we cannot cover it with just an unit test since to test it we need a running keeper and postgres instance. To make an unit test you should split it in two part, one that does the get and the drop/create on the postgres instance and a function than given the current replication slots and the desired slots will return a list of slots to drop and a list of slots to create (so an unit test can be created on that function).

But I'd say to just start splitting it in its own PR, then let's see which kind of tests are needed. I could write something myself to let you see how this test could be done.

@ghost ghost closed this Feb 8, 2018
@ghost ghost deleted the ADD_PARAMS_SYNC branch February 8, 2018 20:52
@ghost ghost mentioned this pull request Feb 8, 2018
@ghost

ghost commented Feb 10, 2018

Copy link
Copy Markdown
Author

Reopened

@ghost ghost reopened this Feb 10, 2018
@ghost ghost closed this Feb 10, 2018
@ghost ghost reopened this Feb 10, 2018
@ghost ghost closed this Feb 28, 2018
@ghost ghost reopened this Feb 28, 2018
Comment thread cmd/keeper/cmd/keeper.go Outdated
for _, synchronousStandby := range db.Spec.SynchronousStandbys {
synchronousStandbys = append(synchronousStandbys, common.StolonName(synchronousStandby))
}
numberSyncStandbys := len(synchronousStandbys)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not do it then then the following code will result in forming of incorrect "synchronous_standby_names", it will will be like "2 (standby1, standby2)":
for _, synchronousStandby := range db.Spec.ExternalSynchronousStandbys {
synchronousStandbys = append(synchronousStandbys, synchronousStandby)
}
By saving the size of synchronousStandbys we are avoiding the for compilation of the incorrect value

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 (standby1, standby2) is correct. As the comment below in the code says all the provided synchronous standbys must be synchronous. This is what I explained previously here: #410 (comment)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgotti I'm just wanted to notify you that we are currently considering your point of view. Will return to you in a couple of days....

Comment thread cmd/keeper/cmd/keeper.go Outdated
// master. And choosing the non synchronous one will cause the loss of
// the transactions contained in the wal records not transmitted.
if len(synchronousStandbys) > 1 {
if numberSyncStandbys > 1 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the previous comment

Comment thread cmd/sentinel/cmd/sentinel.go Outdated
}
func (p dbSlice) Swap(i, j int) {
p[i], p[j] = p[j], p[i]
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't reformat this.

log.Debugf("synchronousStandbys not changed", "masterDB", masterDB.UID, "prevSynchronousStandbys", prevSynchronousStandbys, "synchronousStandbys", synchronousStandbys)
}

// If there're not enough real synchronous standbys add a fake synchronous standby because we have to be strict and make the master block transactions until MinSynchronousStandbys real standbys are available

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment shouldn't be removed.

Comment thread doc/cluster_spec.md Outdated
| maxSynchronousStandbys | maximum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| additionalWalSenders | number of additional wal_senders in addition to the ones internally defined by stolon, useful to provide enough wal senders for external standbys (changing this value requires an instance restart) | no | uint16 | 5 |
| additionalMasterReplicationSlots | a list of additional replication slots to be created on the master postgres instance. Replication slots not defined here will be dropped from the master instance (i.e. manually created replication slots will be removed). | no | []string | null |
| externalFallbackSyncStandbys | a list of additional names for possible standbys not included into stolon cluster, but used as replication destinations. | no | []string | null |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description doesn't look very clear. I'll prefer something more detailed and also document this in https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md

Comment thread pkg/cluster/cluster.go Outdated
}

func validateExternalSyncStandby(externalSyncStandbyName string) error {
symbolsNameFilter := regexp.MustCompile("[_0-9]")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an unit test to this function.

Comment thread pkg/cluster/cluster.go
if filteredSymbolsOnlyName := symbolsNameFilter.ReplaceAllString(externalSyncStandbyName, ""); len(filteredSymbolsOnlyName) == 0 {
return fmt.Errorf("external fallback synchronous standby name \"%s\" has an incorrect format, every name should start with a symbol", externalSyncStandbyName)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above code look too much complicated. The regexp should already catch trailing spaces and \n.

@ghost

ghost commented Mar 29, 2018

Copy link
Copy Markdown
Author

@sgotti I fixed your last comment. I'm agree with your logic. But there is one question left. Lets consider a stolon cluster:

  1. 2 keeper nodes sync (master - slave)
  2. 1 barman node connected as "externalFallbackSyncStandbys"
  3. minSynchronousStandbys = 1

In this cluster I kill "standby keeper", but I see that master keeper doesn't rebuild synchronous_standby_names attribute. At leaves it as untouched.
I need the "synchronous_standby_names" to be set into "barman".

May I ask you to point a piece code which is responsible for the specified scenario?

@sgotti

sgotti commented Mar 30, 2018

Copy link
Copy Markdown
Member

May I ask you to point a piece code which is responsible for the specified scenario?

Perhaps I lost something but wasn't the purpose of this patch to do this? And this should be done here: https://github.com/sorintlab/stolon/pull/410/files#diff-9961d4546ef09e0152c558e820699523R1324

When the sentinel declares the standby as not good it'll catch this code and add an external synchronous standby.

@ghost

ghost commented Jul 5, 2018

Copy link
Copy Markdown
Author

Hi @sgotti,

can we continue the conversation on this PR?

@sgotti

sgotti commented Jul 5, 2018

Copy link
Copy Markdown
Member

@saranhada yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants