| 07/14/11
|
#43686
|
ConcurrentModificationException observed while iterating over System Properties
|
6.5
|
closed
|
ConcurrentModificationException with modification of System Properties during initialization of GemFire Cache.
|
Adding or modifying the System Properties during initialization of GemFire Cache sometimes used to result in ConcurrentModificationException. This issue is now fixed.
|
|
| 07/06/11
|
#43659
|
concurrent ops on PR hang while looking for node managing the bucket
|
6.5
|
closed
|
Hang after rebalancing in rare cases
|
In extremely rare cases, after a rebalance members can get stuck in the PartitionedRegion.putInBucket method.
|
|
| 07/05/11
|
#43652
|
jgroups messages pile up in receivers when senders fail to satisfy retransmission requests
|
|
closed
|
OutOfMemoryException with large number of messages in the NakReceiverWindow of the JGroups NAKACK protocol
|
A problem in JGroups message retransmission may cause members of the distributed cache to accumulate messages and eventually run out of memory.
This affects both multicast and non-multicast configurations of the product.
The problem is caused by faulty size calculations in JGroups during the bundling of messages for retransmission.
|
decrease the udp-fragment-size setting in gemfire.properties to 40000 bytes.
|
| 06/29/11
|
#43630
|
BucketMovedException should always be thrown before lastResult is called
|
6.5
|
closed
|
BucketMovedException is thrown after all results are collected/ after last result is sent.
|
During function execution on partitioned region, we check for the local buckets availability. While doing function execution, if the local buckets are not available on datastore then BucketMovedException is thrown. It makes sense if we do this check before starting function execution. But doesn't make sense if function execution is complete and last result is already sent. It will be better if we add this check before sending last result.
|
|
| 06/28/11
|
#43626
|
all threads stuck with ops, online backup and rebalancing (no HA)
|
|
closed
|
Hang while performing online backup and rebalance simultaneously
|
In rare cases, performing an online backup at the same time as a rebalance can result in a hang
|
Don't invoke the backup command while rebalancing the system.
|
| 06/28/11
|
#43625
|
Improve warning message for "redundancy not satisfied"
|
|
closed
|
Warning message about 'partition regions redundancy is not satisfied' needs improvement
|
while trying to create a redundant bucket,if there are not sufficient datastores available, a warning message is generated. This warning message needs to be changed so that it will be more readable.
|
|
| 06/24/11
|
#43620
|
when DAE happened during online compaction, files (init file, lock file) used in diskstore are not closed
|
|
closed
|
exception in online compaction, some files are left open
|
when doing online compaction, if there's any exception happened, it will go to DSI.handleDiskAccessException(), then in DSI.close, we expected to close everything in the order of oplog then filelock then diskinit file. But since CacelException? might happen in DSI.stopAsyncFlusher, the close() did not close everything.
This will cause issue in windows.
|
|
| 06/24/11
|
#43619
|
offline compaction left krf open for empty child oplog
|
|
closed
|
offline compaction left krf open for empty child oplog
|
Offline compaction will open a krf for child. If the child contained data, it will close the krf. However if the child is empty, it forgot to close krf.
|
|
| 06/23/11
|
#43613
|
Chance of missing events when members crash while rebalancing a persistent PR.
|
|
closed
|
Chance of missing events when members crash while rebalancing a persistent PR.
|
In rare cases, if a number of members equal to the redundancy-level crash while rebalancing a persistent PR, it is possible some updates will be missing when the persistent PR is recovered.
|
|
| 06/22/11
|
#43601
|
Client doesn't get Cq close event when other client destroy concerned Region
|
|
closed
|
CQ Destroy Event to clients after Region Destroy
|
This has been fixed in GemFire 6.6. Now client gets appropriate CQ Info to close/unregister it in client, if CQ was registered on a PrtitionedRegion on cacheserver.
|
|
| 06/20/11
|
#43597
|
NPE in index maintenance when the indexed expression have same values for a regionEntry
|
|
closed
|
NullPointerException encountered in specific cases during index maintenance
|
NullPointerException encountered in specific cases, during index maintenance while doing an update or destroy operation on the entries of the region.
|
|
| 06/16/11
|
#43583
|
PR initialization hangs waiting for response of StateFlushOperation (which targets a departed member)
|
5.1
|
closed
|
hang in StateFlushOperation.flush waiting for replies while targeting a process that is no longer there
|
It is possible for a member of the peer to peer cache to hang attempting to create a Region. The hang will show the process stuck in StateFlushOperation.flush() waiting for a message from a member that recently left the distributed system. The 15 second warning will show that it is targeting that member and that it is waiting for replies from other members.
|
|
| 06/16/11
|
#43578
|
PDX test generates DiskAccessException while re-initializing in end-task
|
|
closed
|
IOexception will cause creating empty krf
|
If out of disk space or any other IOexception happened when creating krf, it will create an empty krf file. The fix is to remove these krf files if IOexception happened.
|
|
| 06/15/11
|
#43575
|
Internal GemFire Error in ProcessorKeeper21 when we increment id and it wraps around.
|
6.5
|
closed
|
InternalGemFireError after large number of distribution messages
|
An InternalGemFireError that may occur when more than 2,147,483,647 messages are sent to peer members is fixed in this release.
|
|
| 06/14/11
|
#43570
|
Persistent parent PR can be rebalanced before child PR is recovered from disk
|
|
closed
|
Rebalancing parent persistent colocated PR before recreating child leads to unrecoverable region.
|
If two persistent partitioned regions are colocated, it is possible to recreate only the parent partitioned region and rebalance it. At that point, the child regions persistent data is no longer colocated with the parent data, making the child region unrecoverable.
|
Make sure to recreate all persistent colocated PRs before rebalancing the system.
|
| 06/10/11
|
#43553
|
Fire and forget function execution from client need to wait for reply(an exception) so that connection will be properly released
|
6.5
|
closed
|
Client with only one server connection should experience AllConncetionInUseException rather than connection timeout while doing any operation after fire and forget function execution.
|
When client is configured with only one server connection, then the operation executed on the same connection on which long running fire and forget function execution is already executed, in this case client should get AllConncetionInUseException instead on connection time out.
|
|
| 06/08/11
|
#43544
|
Unexpected PartitionOfflineException when dataStore recycled (with fixedPartitioning)
|
6.5
|
closed
|
PartitionOfflineException after fewer than redundancy-level members crash
|
In rare cases, a member crashing during initial bucket creation can result in a receiving a PartitionOfflineException when doing operations on a partitioned region, even though fewer members crashed than the redundancy-level of the partitioned region.
|
|
| 06/03/11
|
#43519
|
Error using Functional Index on attribute of type Short
|
6.5
|
closed
|
index on field of SHORT type.
|
ClassCastException is thrown when a delete operation is performed on the region, that has index created on a field thats of type SHORT.
|
|
| 06/02/11
|
#43513
|
A function execution (with no result expected) for lifetime of the server blocks a connection on a Cacheserver while letting client to use same connection for some other task
|
6.5
|
closed
|
Client with only one server connection experience connection timeout while doing operation after fire and forget function execution
|
When client is configured with only one server connection, then the operation executed on the same connection on which long running fire and forget function execution is already executed, in this case client will get a connection timeout exception since the same connection on server side is busy doing the long running function execution.
|
set minimum number of connection to more than 1.
|
| 06/01/11
|
#43495
|
The gateway socket-read-timeout setting is silently ignored, and 0 is used instead
|
|
closed
|
The gateway socket-read-timeout setting is silently ignored, and 0 is used instead
|
The gateway socket-read-timeout setting should not be used because the system cannot honor the setting. The time taken to process a batch of messages may vary and timing out can cause spurious failures. This setting is not honored by the system.
|
|
| 05/31/11
|
#43489
|
RegionDestroyedException on PR bucket during commit while rebalancing is moving buckets
|
|
closed
|
RegionDestroyedException on commit
|
A possible RegionDestroyedException on transaction commit when rebalancing is in progress has been fixed in this release.
|
|
| 05/26/11
|
#43483
|
TX destroy (in server hosting PR) not distributed to 1 of 3 edge clients
|
6.5
|
closed
|
transactional invalidates/destroys not delivered to all clients
|
transactional invalidates/destroys on partitioned regions may not be delivered to all clients with registered interest when rebalancing is in progress. This has been fixed in this release.
|
|
| 05/23/11
|
#43452
|
hang during disconnect while shutting down vms
|
|
closed
|
hang during disconnect while shutting down vms with DistributionManager.waitForThreadsToStop() blocked
|
The distributed system may hang during disconnect. Thread dumps will show the disconnecting thread blocked in DistributionManager.waitForThreadsToStop() and will show a MemberInvoker thread. This was caused by a faulty time-interval calculation and failure to interrupt the MemberInvoker thread.
|
|
| 05/16/11
|
#43414
|
Disk conversion tool does not allow conversion from 5.8 when no WAN queue is present
|
|
closed
|
Conversion from gemfire 5.8 to gemfire 6.5 persistence files fails
|
Using the disk conversion utility to convert gemfire 5.8 disk files to gemfire 6.5 format fails with the error:
"Gemfire 5.8 is not supported by this tool."
|
Get the latest version of the conversion tool. Note that 5.8 persistent WAN queues are still not supported.
|
| 05/11/11
|
#43371
|
Transaction RemoteGetMessage can go to a region that has not finished GII
|
|
closed
|
Incorrect results for transactional operations during PR initialization
|
When transactional operations are performed on a PR which is being initialized/rebalanced, incorrect results may be returned if data is still being fetched from another member. This has been fixed in this release.
|
|
| 05/11/11
|
#43363
|
Bad Message : Current Connection count of 2 is greater than the 800 Max
|
6.5
|
closed
|
Confusing message Connection count 2 is greater than 800
|
This has been fixed in GemFire 6.5. maintbranch r32163. Confusing message has been replaced by meaningful message.
|
|
| 05/09/11
|
#43343
|
Cache Port Configuration Conflict leads to hang of gemfire shut-down-all
|
6.5
|
closed
|
Customer wants to remove a problem node from the DS when the hang of the shut-down-all
|
It's caused by misconfiged node, which hang shutdownall. The fix is to let disconnect always be called when exception happened.
|
|
| 05/06/11
|
#43324
|
entry available through keySetOnServer() prior to completion of create via putIfAbsent()
|
6.5
|
closed
|
replay of operations after failover returns incorrect result
|
If there is a server failure during execution of an operation clients using that server will retry the operation on another server. If the operation had already been applied and distributed to other servers before the server failure this can result in an incorrect result from the operation. This affects all operations that return a result that is based on server state, such as create(), putIfAbsent(), remove(K,V) and replace(K,V,V).
Another side effect of this behavior is that other clients may see the change before it completes on the client that initiated the operation.
|
|
| 05/03/11
|
#43279
|
order by support with PR queries.
|
|
closed
|
ORDER BY support in OQL query.
|
ORDER BY support is added to queries executed on Partition Region. And also in case where LIMIT and ORDER BY used, the query engine behavior is changed to apply order by first and then LIMIT. And also if indexes are present the ORDER BY clause is made to use natural ordering of the index.
|
|
| 05/02/11
|
#43264
|
dead lock found by ConcurrentRegionOperationJUnitTest
|
|
closed
|
deadlock in oplog force rolling
|
When doing forceRoll in 2 threads, both could end up at switchOplog() at the same time.
The fix is to move the closing oplog out of synchronization section.
|
|
| 04/29/11
|
#43255
|
PR recovery issues with overflow
|
6.5
|
closed
|
NPE in PR recovery when there's eviction definition
|
This is the same bug as #42614. Merge the fix into 6.5 maint.
|
|
| 04/28/11
|
#43247
|
durable clients miss destroy/invalidate events on reconnect
|
6.5
|
closed
|
Change in the order of method calls for reconnecting Durable Clients
|
The order of method calls on durable client reconnection needs to be changed. Before 6.6 version, interest registration method call was recommended after Cache readyForEvents method.
Program your durable client's reconnection to:
1. Connect, initialize the client cache, regions, any cache listeners, and create and execute any durable continuous queries.
2. Run all interest registration calls.
3. Call ClientCache.readyForEvents so the server will replay stored events. If the ready message is sent earlier, the client may lose events.
ClientCache clientCache = ClientCacheFactory.create();
// Here, create regions, listeners, and CQs that are not defined in the
cache.xml . . .
// Here, run all register interest calls before doing anything else
clientCache.readyForEvents();
Modify your durable client code accordingly.This will help in preventing occasional miss of destroy/invalidate events on reconnect.
|
|
| 04/25/11
|
#43223
|
Index usage causes values to be returned rather than keys
|
6.5
|
closed
|
index usage causes values to be returned rather than keys
|
When a query to fetch keys on overflow region executed. If the query used an index to process, then instead of returning key, it used return associated value.
Ex.: select key.ID from /region.keys key where key.ID = 1 This one was returning the ID from value instead from the key.
|
|
| 04/22/11
|
#43212
|
Index Creation on Overflow Regions is too limited
|
6.5
|
closed
|
Index creation with Method Invocation in Index Expression.
|
This has been fixed in GemFire 6.6. Now Index expression can have method call on an identifier which corresponds to a data object in the region. Like following can be used,
<region name="test">
<region-attributes disk-store-name="sample" data-policy="persistent-replicate" id="sample" enable-gateway="false" statistics-enabled="true">
<eviction-attributes>
<lru-entry-count maximum="100" action="overflow-to-disk"/>
</eviction-attributes>
</region-attributes>
<index name="sample">
<functional from-clause="/test.keys k"
expression="k.getValue()"/>
</index>
</region>
|
|
| 04/21/11
|
#43200
|
InternalGemFireException: java.io.NotSerializableException: java.util.HashMap$KeySet thrown during gii exchange of FilterInfo
|
6.5
|
closed
|
InternalGemFireException: java.io.NotSerializableException: java.util.HashMap$KeySet thrown during message dispatcher initialization
|
When a server cache is initializing the message queue and message dispatcher for a client it may throw an InternalGemFireException. This is caused by an attempt to serialize a KeySet view of interest-registration information. These view objects are not serializable.
|
|
| 04/21/11
|
#43196
|
gemfire encrypt-password -help produces NPE
|
6.5
|
closed
|
gemfire encrypt-password -help produces NPE
|
Now this script does not produce a NullPointerException no matter what the arguments are. If the arguments are wrong then it displays the proper usage pattern.
|
|
| 04/20/11
|
#43189
|
Nullpointer in query at nomura
|
|
closed
|
NPE while evaluating dependencies in OQL
|
The query engine could throw NPE while evaluating the dependency.
|
|
| 04/19/11
|
#43182
|
compaction (both online and offline) will cause krf gone
|
|
closed
|
compaction (both online and offline) will cause krf gone
|
This is a missing feature. Add creating krf when close cache.
|
|
| 04/18/11
|
#43176
|
Events from transactions involving empty regions not sent to clients
|
6.5
|
closed
|
Transactional events not delivered to clients for empty regions
|
Transactions on server did not transmit all event updates to clients with register interest if last region in transaction was empty. This has been fixed in this release.
|
|
| 04/18/11
|
#43174
|
ConcurrentModificationException during putAll
|
|
closed
|
ConcurrentModificationException updating PR on cache server
|
In rare cases, updating a partitioned region on a member that is also a cache server can result in a ConcurrentModificationException.
|
|
| 04/14/11
|
#43156
|
Transactional Function Execution fails with TransactionDataNotColocated when servers recycled
|
6.5
|
closed
|
Wrong exception is thrown when servers are recycled while doing transactional function execution
|
When servers are recycled while doing transactional function execution;is there is mismatch between the member from the transaction state and member calculated using the key, TransactionDataNotColocatedException is thrown. Instead of TransactionDataNotColocatedException, TransactionDataRebalancedException should be thrown.
|
|
| 04/14/11
|
#43153
|
Admin API can fail with certain containers
|
6.5
|
closed
|
Admin API or JMX Agent may fail if any GemFire member cannot locate its gemfire.jar
|
GemFire attempts to find its gemfire.jar for response to monitoring attempts by the Admin API and JMX Agent. The following three locations are searched:
1) getProtectionDomain().getCodeSource().getLocation()
2) Searches "java.class.path" for gemfire.jar
3) Searches "sun.boot.class.path" for gemfire.jar
If a JVM or container environment does not return a URL in #1 that can be used to open a stream, then the Admin API or JMX Agent may hang or fail if the gemfire.jar cannot be found on either the "java.class.path" or "sun.boot.class.path".
|
Place the gemfire.jar on "java.class.path" or "sun.boot.class.path".
|
| 04/14/11
|
#43151
|
NullPointerException in DataSerializer.readRegion while disconnecting from DS
|
|
closed
|
NullPointerException from DataSerializer.readRegion
|
It is possible for DataSerializer.readRegion to throw a NullPointerException. This only happens when it is called while the Cache is being closed.
|
|
| 04/12/11
|
#43126
|
FunctionService.onRegion(region).execute("functionId") throws Exception if the Function is not with default attributes on server
|
6.5
|
closed
|
FunctionService.onRegion(region).execute("functionId") throws FunctionException if the Function is not with default attributes on server
|
If client wants to execute function on server by providing functionId then the function has to be registered with default attributes on the server. The other client API's which need function attributes as parameters require user to provide attributes parameter on client side each time they want to execute function on server even if the function is actually registered on server.
|
|
| 04/11/11
|
#43108
|
CacheClosedException (without Caused by: ForcedDisconnectException) caught during network partition
|
6.5
|
closed
|
a process that is forced from membership does not always see the reason in the "cause" of CacheClosedExceptions
|
A number of parts of the product were found to be throwing CacheClosedException without setting the cause of the closure.
|
|
| 04/11/11
|
#43106
|
Security log will not roll on log-file-size-limit
|
|
closed
|
SecurityLogWriter is not rolled by log-file-size-limit
|
log-file-size-limit should trigger both ManagerLogWriter and SecurityManagerLogWriter. It's re-arched and fixed.
|
|
| 04/08/11
|
#43101
|
NPE thrown by PartitionManager.prCheck() while creating primary bucket
|
6.5
|
closed
|
NPE thrown by PartitionManager.prCheck() while creating primary bucket
|
We now check if the partitioned regions is created or not, rather than throwing NullPointerException
|
|
| 04/07/11
|
#43094
|
NPE in Oplog.recoverCrf
|
|
closed
|
NPE in Oplog.recoverCrf
|
Root cause is: thread-1 is recovering a disk store while thread-2 is using this disk store to create a disk region.
We added dsi into GemFireCache first, then do recovery. We should finish recovery, then add it into gfc.
|
|
| 04/05/11
|
#43082
|
6.5 edge clients cannot communicate with 6.6 bridgeServers: "IOException: Unknown String header 0" thrown from InternalDataSerializer()
|
6.5
|
closed
|
6.5 edge clients cannot communicate with 6.6 bridgeServers due to change in ServerLocation class
|
Some new attributes were added to ServerLocation which older clients could not read. Because of this clients cannot connect to the locator as well as servers.
|
|
| 04/05/11
|
#43081
|
Region#get() does not discover JTA transaction
|
|
closed
|
No CommitConflicts in read only JTA transactions
|
JTA TransactionManager now detects conflicts when an entry is read in one transaction (but not modified) while being modified in another transaction concurrently.
|
|
| 04/01/11
|
#43064
|
CS_TX: unexpected load in bridgeServer when edge client iterates over Region.values() collection
|
6.5
|
closed
|
values() iterator invoking cache loader
|
Iterating over values in a region no longer causes the loader to be invoked for an invalidated key.
|
|
| 03/31/11
|
#43063
|
persistent txs do not persist entry destroys
|
6.5
|
closed
|
Transactional entry destroys are not correctly persisted and may cause the disk to be corrupted
|
Transactional entry destroys are not correctly persisted and may cause the disk to be corrupted. This will only happen if -Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true is used.
The transactional destroys will not be written to disk and in some causes causes the disk store to be in a state that causes it to always throw an IllegalArgumentException during recovery.
|
|
| 03/31/11
|
#43060
|
Creating buckets in child colocated region throws exception if accessor does not have the child region
|
6.0
|
closed
|
Accessors must have all colocated regions
|
If a colocated region is created on all of the data stores that host the parent partitioned region, but the an accessor only has the parent region, bucket creation will fail in the data stores.
|
Create the child PR in the accessors as well.
|
| 03/29/11
|
#43038
|
Client requires dataSerializers even if not required to work with the cache data
|
6.5
|
closed
|
Data serializer and instantiator classes are no more eagerly loaded by GemFire client.
|
Prior to this release, when a client connected to a server, the server would send all custom data serializers and instantiators that might be required for object deserialization and the client would load all of the classes. If any classes were missing from the client CLASSPATH, the connection attempt would throw a
NoSubscriptionServersAvailableException.
Now, classes are not eagerly loaded. GemFire loads each class as needed, the first time the GemFire needs to deserialize the object. If GemFire does not find the class it needs to load, the deserialization attempt throws a ClassNotFoundException.
|
|
| 03/28/11
|
#43027
|
losingSide VM does not process afterRegionDestroyed (FORCED_DISCONNECT) in timely fashion after forceful disconnect
|
6.5
|
closed
|
member is slow to process a forced-disconnect
|
When network-partition-detection is enabled a member will shut itself down if it is unable to periodically contact a locator process. When this type of shutdown occurs it is possible for the shutdown sequence to stall for a period of time. This was caused by a stuck timer task.
|
|
| 03/28/11
|
#43024
|
ConcurrentModificationException thrown while iterating over ServerBucketProfiles in PartitionedRegion.virtualPut
|
6.5
|
closed
|
In pr single hop mode, ConcurrentModificationException thrown while iterating over ServerBucketProfiles
|
In pr single hop mode, ConcurrentModificationException thrown while iterating over ServerBucketProfiles
|
|
| 03/21/11
|
#42979
|
Shared PR meta data can prevent start-up of nodes
|
6.5
|
closed
|
Changing PR attributes and restarting some members without restarting DS results in IllegalStateException
|
Stopping all of the members that host PR one, but leaving some members running that host PR one, changing the attributes of PR one, and restarting can result in an error about incompatible region attributes.
|
Stop all members when changing PR attributes.
|
| 03/18/11
|
#42966
|
Wan events not received when some events are for region that does not exist
|
|
closed
|
The WAN link appears broken or hung if a gateway-enabled region is not defined on both WAN sites
|
If a gateway-enabled region is not defined on all WAN sites, it appears as if the WAN link is broken or hung.
|
To work-around this issue in releases before 6.6, each gateway-enabled region must be defined on all WAN sites.
|
| 03/17/11
|
#42947
|
NullPointerException while sending Exception during fucntion execution in P2P case
|
6.5
|
closed
|
In P2P, If an exception is sent from function execution using ResultSender#sendException, then NullPointerException is observed
|
In case pf peer-to peer function execution, if the exception is sent through function execution using ResultSender#sendException, then a NullPointerException is observed. Ideally exception to be send must be added to ResultCollector.
|
|
| 03/16/11
|
#42944
|
containsKey on a PR should not throw an exception when the bucket does not exist
|
6.5
|
closed
|
containsKey/containsValueForKey on PR throws exception
|
containsKey/containsValueForKey for a PR no longer throws an exception when the bucket for the key does not exist.
|
|
| 03/15/11
|
#42937
|
Functions sending multiple lastresults cause hang in Execution.execute()
|
|
closed
|
Functions sending multiple lastresults cause hang in Execution.execute()
|
Though the contract is to send lastResult only once, if a Function sends result by calling ResultSedner.lastResult() multiple times, it causes hang in FunctionStreamingResultCollector
|
|
| 03/14/11
|
#42927
|
JTA: CommitConflictExceptions are wrapped with extraneous text
|
6.5
|
closed
|
Extraneous text in JTA error messages
|
A number of exceptions thrown by the GemFire JTA manager have extraneous text in the form "TransactionManagerImpl::operation::". This text has been removed.
|
|
| 03/10/11
|
#42921
|
pr-single-hop-enabled="true" and server-group do not work together
|
6.5
|
closed
|
Client ignores server-groups when pr-single-hop is enabled.
|
Client using pr-single-hop acquires the PartitionedRegion meta-data and creates the connections directly to the nodes with the buckets to provide single-hop access even if the node is not in the server-group the client is connected to.
|
|
| 03/10/11
|
#42920
|
isOriginRemote flag on cacheWriter event in datastore has incorrect value
|
6.5
|
closed
|
Incorrect value of isOriginRemote flag in PR cache writer
|
When a client is connected to a datastore for a partitioned region, the events on the cacheWriter now have the isOriginRemote true.
|
|
| 03/09/11
|
#42911
|
OSProcess.bgexec in pure java mode depends on Bash shell
|
6.0
|
closed
|
OSProcess.bgexec in pure java mode depends on Bash shell
|
We now have added a system property -Dgemfire.commandShell to pass the shell name. OSProcess.exec uses this shell name rather than assuming shell name as bash shell
|
|
| 03/09/11
|
#42903
|
gemfire encrypt-password produces non-readable output
|
|
closed
|
gemfire encrypt-password utility produces non-readable output
|
gemfire encrypt-password utility now supports base-64 encoding. So the output is in readable form.
|
|
| 03/08/11
|
#42898
|
Enhancing LIKE predicate to support regEx
|
|
closed
|
Enhancements to OQL LIKE predicate.
|
The like predicate is enhanced to support special chars (% and _) in any place of the matching string. Earlier only % is supported at the end of the string.
This is supported using java Regex.
The index usage with LIKE is disabled.
|
|
| 03/08/11
|
#42897
|
Client Index not updated by initial register interest in GFE6.5.1.4
|
6.5
|
closed
|
Client Index update after register-interest
|
This has been fixed in GemFire 6.6. Now in client cache if index is created declaratively using cache.xml and then registerInterest is called from client cache, the Index will be updated accordingly.
|
|
| 03/07/11
|
#42890
|
Java level deadlock reported by PRFunctionStreamingResultCollector related to com.gemstone.gemfire.distributed.internal.membership.InternalDistributedMember during memberDeparture
|
6.5
|
closed
|
FunctionExecution on PR could deadlock in HA scenario
|
When a node goes down, PRFunctionStreamingResultCollector.getResult() could deadlock handling memberDeparted from two different code paths. One as a membership listener and other from preWait() while waiting for result.
|
|
| 03/06/11
|
#42882
|
Wan queue events not received after disk file conversion of wan queue
|
|
closed
|
wan config issue
|
Before running disk convert tool, the region list in pre6.5 and 6.5 cache.xml should match. We should not remove any regions in new xml files (but we can add new regions in new xml file). Some gateway events might be directed to these removed regions and cause exception.
|
|
| 03/06/11
|
#42881
|
Unknown header byte while converting old version wan queue to 6.6
|
|
closed
|
disk file convertion failed on gemfire5.8
|
5.8 is a special version. It's GatewayEventImpl? does not contain _createTime, while 5.7, 6.0, 6.5 have it.
If detected old gemfire's version is 5.8 and has wan configration, will exit.
|
|
| 03/04/11
|
#42877
|
Loading of cache-xml-file as a resource fails if OS path.separator is not '/'
|
3.0
|
closed
|
Windows fails to load cache-xml-file as resource if contained in non-default Java package
|
Specifying cache-xml-file will fail on Windows (or any OS with a path.separator different than '/') if the the value points to a classpath resource rather than an actual file.
<p>
java.lang.ClassLoader#getResource(String) specifies that '/' must be used as the path separator regardless of the OS path.separator System property.
<p>
Internally, GemFire stores the value of cache-xml-file in an instance of java.io.File which changes '/' characters to be the OS path.separator character. Then when the cache is created, an exception similar to the following is thrown:
<p>
com.gemstone.gemfire.cache.CacheXmlException: Declarative Cache XML file/resource "com\example\cache.xml" does not exist.
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.getCacheXmlURL(GemFireCacheImpl.java:583)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:622)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:533)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:403)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:178)
at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:223)
|
Try to use cache-xml-file only to specify an actual OS file rather than a resource.
<p>
Another alternative is for the application to load the desired cache.xml as a resource and then feed that into com.gemstone.gemfire.cache.Cache#loadCacheXml(InputStream). Example:
<p>
Cache cache = new CacheFactory().create();<br>
URL url = getClass().getClassLoader().getResource("com/example/cache.xml");<br>
cache.loadCacheXml(url.openStream());<br>
|
| 03/03/11
|
#42874
|
Uncaught exception in thread Thread[Idle OplogCompactor,5,Oplog Compactor Thread Group]: NPE in BucketRegion.updateCounter()
|
6.5
|
closed
|
NPE in BucketRegion.updateCounter()
|
Race condition when BucketRegion() is calling super() where LRU has been triggered to use BucketRegion object.
|
|
| 03/03/11
|
#42872
|
InternalGemFireError thrown from ProxyBucketRegion.setHosting() when asserting this.realBucket != null
|
6.5
|
closed
|
InternalGemFireError after concurrently destroying and recreating a PR in multiple members
|
Calling Region.destroyRegion in one member and at the same time calling Region.localDestroyRegion in another member, and then recreating the region, can in rare cases result in an InternalGemFireError.
|
|
| 03/02/11
|
#42865
|
jgroups messages pile up in the NAKACK sent-messages collection when security is enabled
|
6.0
|
closed
|
memory leak in JGroups
|
A flaw in one of the JGroups messaging protocols causes it to retain messages that should be garbage-collected. This shows up in the "distribution stats" jgNAKACKSentMessages statistic as a steady increase in the number of messages sent by that protocol that are being retained for retransmission.
|
|
| 03/01/11
|
#42857
|
LicenseException: Could not find a license occurs in container environments
|
1.0
|
closed
|
License validation fails if gemfire.jar cannot be found
|
License validation fails if the gemfire.jar cannot be located. GemFire licensing attempts to find the gemfire.jar for license validation in the following three locations:
1) getProtectionDomain().getCodeSource().getLocation()
2) Searches "java.class.path" for gemfire.jar
3) Searches "sun.boot.class.path" for gemfire.jar
If a JVM or container environment does not return a URL in #1 that can be used to open a stream, then a LicenseException will be thrown if the gemfire.jar cannot be found on either the "java.class.path" or "sun.boot.class.path".
|
Place the gemfire.jar on "java.class.path" or "sun.boot.class.path".
|
| 02/28/11
|
#42850
|
Reading stats from JMX not working as needed in customer environment
|
6.5
|
closed
|
ObjectName for StatisticResource MBean includes Statistic Type for improved readability
|
ObjectName for StatisticResource MBean now includes Statistic Type for improved readability through Tools like JConsole.
|
|
| 02/25/11
|
#42830
|
delta object disappears from region after transactional update
|
|
closed
|
Incorrect value for transactional get with Delta
|
Transaction on an empty member (peer) where delta is used may not return correct value for transactional get operations. This has been fixed in this release.
|
|
| 02/23/11
|
#42815
|
ClassCastException executing a function on a PR in a loner member
|
|
closed
|
Client experience ClassCastException while executing function on Loner Member
|
when the function is executed on a loner server (by setting the mcast-port to 0 and the locators to the empty string), client gets ClassCastException.
|
Most of the time servers are not configured with LonerDistributedSystem. starting locator will be sufficient.
|
| 02/22/11
|
#42807
|
Index Creation fails with overflow region error on CACHING_PROXY client
|
|
closed
|
Index creation in client using CACHING_PROXY
|
When index on overflow region is created on the client (using CACHING_PROXY), exception is thrown.
|
|
| 02/22/11
|
#42804
|
CQ query fails using: select * from /exampleRegion where get('key1') = '2'
|
6.5
|
closed
|
Query Execution without Alias.
|
This has been fixed in GemFire 6.6. Now Alias is not required to call a function on region values. Like following can be successfully used,
select * from /exampleRegion where get('key1') = '2'
|
|
| 02/10/11
|
#42751
|
Disk Stores errors when using Japanese Locale
|
6.5
|
closed
|
Non-english locales can cause disk store errors
|
Non-english locales can cause disk store recovery to fail.
|
Specify -Duser.language=en on the JVM command line.
|
| 02/09/11
|
#42747
|
UserTransaction#getStatus() returns wrong status
|
6.5
|
closed
|
UserTransaction#getStatus() throws Exception
|
UserTransaction#getStatus() now returns Status.STATUS_NO_TRANSACTION when there is no active transaction.
|
|
| 02/08/11
|
#42738
|
AssertionError: value in RegionEntry should not be INVALID
|
|
closed
|
Query using Async index maintenance throws assertion error with INVALID value.
|
With async index maintenance, while concurrent modification is happening with the entry getting evaluated by the query engine, used to throw assertion error with INVALID value found.
|
|
| 02/07/11
|
#42735
|
edge create not replicated to all servers during server HA; not recovered from remaining servers during subsequent gii at startup
|
6.5
|
closed
|
events not properly distributed after failed concurrent-map operations from client cache
|
When a replace(K,V,V) or putIfAbsent(K,V) operation failed it would sometimes leave a phantom entry in the cache that interfered with subsequent operations on the same key. This would sometimes cause operations to fail to propagate from one server to another.
|
|
| 02/04/11
|
#42725
|
Disk file converter fails with ClassCastException
|
|
closed
|
Disk file converter fails with ClassCastException
|
Did not consider that xml is new version which contain disk-store-name intead of overflow-directory. Changed xsl to handle this case.
|
|
| 02/04/11
|
#42724
|
disk file converter fails with non-existing disk dir
|
|
closed
|
disk file converter fails with non-existing disk dir
|
Due to gateway-hub defined 2 gate-way queue, which case is not handled before.
|
|
| 02/02/11
|
#42713
|
The client metadata is incorrect on data recovered from disk
|
6.5
|
closed
|
In pr single hop mode, client metadata can be incorrect on data recovered from disk
|
clients may get a different metadata depending on which server it connects to. This issue reproduces only in the case where server is started after region creation.
|
|
| 02/02/11
|
#42708
|
remove(K,V) returns true even though key (K) does not exist
|
6.5
|
closed
|
remove(K,V) returns true even though key (K) does not exist
|
When a region was being concurrently destroyed a remove(K,V) operation might return true instead of throwing a RegionDestroyedException.
|
|
| 01/27/11
|
#42666
|
client getAll shouldn't add entries to local cache if entry doesn't exist on server
|
5.7
|
closed
|
client getAll adds entries to local cache even if it doens't exist on server
|
client getAll adds entries to local cache even if it doens't exist on server. If we later create an entry for those key then client will throw EntryExistException.
|
|
| 01/24/11
|
#42641
|
PR does not support containsValue(Object)
|
6.5
|
closed
|
containValue(Object) for ParitionedRegion throws UnsupportedException
|
PartitionedRegion now supports containsValue(Object).
|
|
| 01/24/11
|
#42638
|
replace(K,V,V) returns true and results in afterCreate event when oldValue does not match
|
6.5
|
closed
|
replace(K,V,V) returns true and results in afterCreate event when oldValue does not match
|
The ConcurrentMap replace(K,V,V) method sometimes applies the operation in a client cache when it shouldn't. A race condition with concurrent operations in the same VM can cause the operation to change into a create() and be applied both in the client and the server.
|
|
| 01/21/11
|
#42633
|
online backup produces no backup files after converting disk files and creating regions with older version xml
|
|
closed
|
Persistent regions with disk-write-attributes not backed up.
|
Persistent files created by regions that use the deprecated disk-write-attributes rather than the disk-store attribute are not backed up.
|
|
| 01/17/11
|
#42614
|
Missing buckets in persistence tests
|
|
closed
|
NPE in PR recovery when defined eviction
|
There're 2 places to save the eviction controller's stats object. One is from RegionAttributes. Another is from DiskStore.
When we create the PR, it will set the former, not the latter. When we recover from DiskStore, it will set the latter not the former.
when creating BR, we are checking the former while creating PR itself will check the latter.
In recovery case, we will only set the latter, not the former even our region attribute did defined eviction because we thought we've got this stats from DSI, however when doing BR creation, we only check the former.
we should keep use the stats for EvctionController in PR's Region Attributes.
|
|
| 01/13/11
|
#42608
|
Issuing multiple "agent start" commands not handled properly 7296
|
6.5
|
closed
|
Unexpected behavior for multiple invocations of the 'agent start' command.
|
When 'agent start' command was executed multiple times from the same working directory, the JMX Admin Agent launcher couldn't read the status correctly and used to start a new process to fail later. This issue is now fixed also with the correct status reporting using 'agent status'.
JMX Admin Agent launcher creates a status file (default name: .agent.ser). The binary format for this file is changed. It's advised to delete/move the older status file from the JMX Admin Agent's working directory.
|
|
| 01/12/11
|
#42602
|
'GemFireConfigException: Unable to contact a Locator service' after getting a ForcedDisconnectException in locator logs.
|
6.5
|
closed
|
Over-aggressive suspect processing forces member from the system
|
It is possible for a member to be forcibly removed from the system if it does not accept tcp/ip connections in a reasonable amount of time and is not able to respond to wellness queries quickly enough. This was caused by an error in a timing calculation in tcp/ip connection formation. The faulty calculation takes place after the ack-wait-timeout period elapses (15 seconds) and it causes excessive suspect processing to be initiated on the unresponsive member.
|
|
| 01/07/11
|
#42585
|
NullPointerException found in JGroupMembershipManager.suspectMember
|
6.5
|
closed
|
NullPointerException from JGroupMembershipManager.suspectMember() during initialization
|
When network-partition-detection is enabled it is possible during initialization of the distributed system that a NullPointerException will be thrown, resulting in a SystemConnectException and failure to connect to the distributed system. This was caused by a race condition in the membership view installation system.
|
|
| 01/05/11
|
#42573
|
No buffer exception while executing a function
|
|
closed
|
No buffer exception while executing a function on server in selctor mode
|
When BridgeServer's maxThreads are more than 0(i.e. Selector mode is enabled) and on client side pr single hop is disabled function execution can throw IOException saying there is no buffer available
|
|
| 01/04/11
|
#42567
|
EXCEPTION_ACCESS_VIOLATION for java.net.Inet6AddressImpl.lookupAllHostAddr with hitachi JVM 1.5.0_11-b03-CDK0850
|
6.5
|
closed
|
|
|
|
| 01/04/11
|
#42566
|
Can't bind RMI Registry to start on a specific IP address on a multi NIC host.
|
6.5
|
closed
|
Can't bind RMI Registry to start on a specific IP address on a multi NIC host.
|
On a host with multiple network interfaces, RMI Registry started by the JMX Agent gets bound to all the network interfaces and currently the user can not bind RMI Registry to a specific network interface/IP.
|
Use different rmi-ports for different agents.
|
| 01/03/11
|
#42563
|
security logging will cause problems with rolling logs
|
|
closed
|
security logging will cause problems with rolling logs
|
Re-arch ManageLogWriter? and GemfireStatSampler. Use instance attributes instead of static attributes. Move archive related attributes and method from ManagerLogWriter into GemfireStatSampler.
|
|
| 01/03/11
|
#42562
|
locator rolling logs never cleaned up
|
|
closed
|
Locators configured for log rolling never clean up child logs
|
If a locator is configured to roll its logs then it will never remove any of its child logs.
|
|
| 12/30/10
|
#42555
|
'InternalGemFireException: unexpected exception on member null' caused by 'java.io.IOException: Could not create directory' while creating online backup.
|
6.5
|
closed
|
Could not create directory error during backup.
|
Due to a jdk issue, in rare having multiple members backing up to a directory that does not exist yet can result in "java.io.IOException: Could not create directory".
|
Create the backup directory before invoking backup, or reattampt the backup after receiving a failure.
|
| 12/23/10
|
#42548
|
Timeout on client shutdown while waiting for responses from departed members
|
6.5
|
closed
|
cache hangs when it should have automatically closed
|
If a member becomes unresponsive for a long period of time, perhaps several minutes, it may be forcibly removed from the distributed system but never realize that this has happened and end up hanging waiting for responses to messages from members that are either gone or are ignoring it.
This was a flaw in the membership system that allowed a shunned member to temporarily rejoin the distributed system. When a 6 minute timeout period elapsed the member was again purged from the system but was never told that this was happening and ended up hanging.
|
|
| 12/15/10
|
#42529
|
prSingleHop client meta data not always cleaned up
|
6.5
|
closed
|
prSingleHop client meta data not is not cleaned up when a server goes down
|
prSingleHop relies on a byte received as a reply of cache operation to determine whether hop has taken place and fetches metadata. Though this ensures that if a server goes down then it will be removed, however we can also use Ping operation to determine and remove when a server goes down.
|
|
| 12/15/10
|
#42527
|
OOME when statistics are not enabled with ConcurrentLinkedQueue
|
6.0
|
closed
|
Disabling statistic sampling can cause a memory leak
|
If statistic-sampling-enabled is set to false or the sampling interval is set to a very large value then GemFire can leak memory when writing statistics. The leaked memory is released every time a sample is taken. The more threads you have writing statistics the worse the leak.
|
Do one of the following to work around this memory leak:
1. Set statistic-sampling-enabled=true and the statistic-sample-rate to a reasonable value in gemfire.properties. The leaked memory is released every time a sample is taken.
2. Set -Dgemfire.STRIPED_STATS_DISABLED=true. This will cause more thread contention when writing statistics but avoids the memory leak in the striped statistics implementation.
|
| 12/14/10
|
#42523
|
Disk file converter hangs if old version xml given to tool is really a current version xml file
|
|
closed
|
Disk file converter hangs if old version xml given to tool is really a current version xml file
|
Check the gemfire version for loadsnapshot step. If used 6.5, exit.
|
|
| 12/03/10
|
#42495
|
Disk directory created with online backup does not pass the disk validator
|
|
closed
|
Unable to restore from backup files in rare cases
|
In rare cases, backing up a system using the gemfire backup tool and recovering from a the backup can result in the following error: "java.lang.IllegalStateException: The following required files could not be found"
|
Validate backup files using the gemfire validate-disk-store. If you receive this exception, perform another backup.
|
| 12/02/10
|
#42490
|
commit that is not permitted in function causes txState to be corrupted
|
|
closed
|
corrupted transaction state in transactional function
|
A commit call in a transactional function (when the transaction was started on a remote node) no longer leaves the transaction in a corrupted state.
|
|
| 12/02/10
|
#42484
|
if JTA afterCompletion is called from a different thread the GemFire transaction remains visible to the original thread
|
|
closed
|
jta afterCompletion call
|
if JTA afterCompletion is called from a different thread the GemFire transaction remains visible to the original thread
|
|
| 11/19/10
|
#42470
|
multiple instances of FunctionServiceStats on client with multiple threads and on servers
|
|
closed
|
Multiple instances of FunctionServiceStats and FunctionStats for each function has been observed in client stats.
|
Ideally there should be only one instance of FunctionServiceStats one cache and only one instance FunctionStats per function. But due to race condition is product multiple instances of FucntionServiceStats and FucntionStats has been observed in client stats.
|
|
| 11/16/10
|
#42463
|
prSingleHop from client fails if hashCode is negative
|
|
closed
|
Client partition region single hop may need to do multiple hops
|
Client partition region single hop may need to do multiple hops even though it should only need to do a single hop. This happens if the hashCode method on your keys returns a value less than zero.
|
Implement a hashCode that always returns a positive value.
|
| 11/08/10
|
#42451
|
operations on persistent regions may fail with spurious out of disk exceptions
|
6.0
|
closed
|
Unexpected DiskAccessException when disk space runs low
|
You may see a DiskAccessException telling you that all disk dirs are full. This should only happen if rolling is not enabled. When rolling is enabled you should only get warnings about the disk dirs being full. But in some cases it will happen even though rolling is enabled.
|
Set the dir-size to be a value much larger than 10G.
|
| 11/05/10
|
#42449
|
Gemfire considers link local addresses when checking to see in members are on the same host
|
|
closed
|
Presence of link local address impairs redundancy satisfaction
|
When using the gemfire.EnforceUniqueHostAllocation flag, gemfire will not place redundant copies of data on the same host. Gemfire uses the ip addresses of each host to determine what host it is running on.
Certain backup tools create the same link local address on every machine. The presence of this address caused gemfire to consider every member to be on the same host.
|
|
| 11/05/10
|
#42448
|
putIfAbsent followed by invalidate on PR results in invalid entry which can't be GIId
|
|
closed
|
Create with null value on persistent PR results in lost invalid entry
|
Doing a create with a null value (to create an entry in the invalid state) on a persistent partitioned region while a member is offline can result in a situation where the offline member will not mark the entry as invalid when it recovers.
|
|
| 10/25/10
|
#42435
|
Inconsistent state in colocated PR when a non persistent PR is colocated with a persistent PR
|
|
closed
|
If a user colocates a non persistent PR with a persistent PR, there are certain cases where we can end up creating the colocated buckets in only some members
|
If a user colocates a non persistent PR with a persistent PR, there are certain cases where we can end up creating the colocated buckets in only some members. This can lead to hangs if the missing colocated bucket should be primary.
The issue is caused because when we recover the parent bucket, we also try to create the colocated buckets. However, the colocated bucket creation will fail (as designed) if the colocated region is not created on all the nodes that host the parent region. The issue is that the very last member to create the colocated region will actually succeed in creating colocated buckets, but other members that host the parent bucket didn't create those buckets.
|
|
| 10/25/10
|
#42434
|
Parent PR with a colocated child PR may fail to restore redundancy
|
|
closed
|
In case of colocation, bucket redundancy is not satisfied for colocated regions. and later operations on missing buckets can cause potential hang.
|
In case of colocation, first the parent region is created and buckets are created(populate region) on this region, and then child region is created without any bucket creation (not populating region), then though the colocation is complete , child region will not have the buckets corresponding to the parent region buckets.
|
|
| 10/25/10
|
#42433
|
getAll does not update last accessed time
|
5.7
|
closed
|
A getAll done from a client will not update that last access time of the entries it reads on the server
|
A getAll done from a client will not update that last access time of the entries it reads on the server.
|
|
| 10/24/10
|
#42429
|
Hang in waitForPrimaryMember when EndBucket message overlaps with a node going down
|
6.5
|
closed
|
|
|
|
| 10/23/10
|
#42427
|
Unable to connect to DS on laptop not on any network
|
|
closed
|
unable to run GemFire when not connected to a network
|
GemFire fails to connect on startup if the machine's NIC is not connected to a network. The cause of the exception will be similar to this:
com.gemstone.gemfire.IncompatibleSystemException?: Peer localhost:4195/4192 has no network interfaces
This is caused by the operating system turning off external addresses when there is no cable connected to the NIC.
|
|
| 10/22/10
|
#42424
|
In 4 node serial wan topologies, events can bounce between sites
|
|
closed
|
Gateway receivers are now added to the GatewayEvent CallbackArgument as the event is replicated across the wan
|
Gateway receivers are now added to the GatewayEvent CallbackArgument as the event is replicated across the wan. It keeps track of which receivers have applied this event and does not send the event back to the receivers
|
|
| 10/21/10
|
#42419
|
backed up disk stores map contains null key instead of member; cannot restore backup files
|
|
closed
|
Backup from admin API may miss the local members files
|
Invoking AdminDistributedSystem.backupAllMembers from within a member that has a disk store can result in not backing up the member, and listing the member as null in the BackupStatus.
|
Don't invoke backup from within a member that has a disk store.
|
| 10/21/10
|
#42418
|
Online backup run from command line tools sometimes reports disk dirs both offline and backed up
|
|
closed
|
Online backup run from command line tools sometimes reports disk dirs both offline and backed up
|
It's possible that when calling getMissingPersistentMembers, some members are still creating/recovering regions, and at FinishBackupRequest.send, the regions at the members are ready. Logically, since the members in successfulMembers should override the previous missingMembers
|
|
| 10/17/10
|
#42410
|
Expiration of remote entries is not correctly working
|
6.5
|
closed
|
Remote gets do not update last access time in some cases allowing expiration to still occur
|
In some cases a remote get of a region entry will not update the entries last access time. This can cause it to still expire even that it has been recently read.
|
|
| 10/14/10
|
#42408
|
Memory leak in PartitionedRegion entry expiry
|
|
closed
|
EntryExpiryTask gets added to PartitionedRegion entryExpiryTasks causing Memory leak
|
In case of PartitionedRegion expiration of entries are managed at BucketRegion level. The entryExpiryMap is per bucket and entryExpiryTasks are added to it. entryExpiryTasks should not be added to extryExpiryMap of PartitionedRegion since the entry doesn't exist in PartitionedRegion but BucketRegion. This causes EntryNotFoundException when the entryExpiryTask is getting scheduled for expiry.
|
|
| 10/06/10
|
#42394
|
PR distribution advisor issue: wrong stub used for communications after new member uses same direct-channel port as old (crashed) member
|
6.5
|
closed
|
hang trying to communicate with a departed member
|
If the DistributedMember identity of a new member happens to be the same as that of an old member it is possible that partitioned region operations will hang trying to communicate with the old member. This was caused by a flaw that allowed the old communication information to be retained.
|
|
| 10/01/10
|
#42382
|
Feature gap: overflow regions do not allow indexes
|
|
closed
|
Index support on overflow region.
|
Application can create indexes on overflow region, given that the index expressions satisfies the compact-range index requirements.
|
|
| 09/27/10
|
#42369
|
hang in messaging layer if toData throws an exception
|
6.0
|
closed
|
Paritioned region operations may hang if toData throws an exception
|
If your serialization code that implements DataSerializable.toData throws an exception it can cause some partitioned region operations to hang. This is because the product keep retrying the operation thinking it is caused by a transient network error.
|
|
| 09/24/10
|
#42358
|
Static reference to Refresh Timer MBean in MBeanUtil remains even after the agent is stopped
|
6.5
|
closed
|
GemFire Admin Agent should not be restarted in the same JVM process
|
If the agent is restarted in the same JVM process, auto-refresh for SystemMember, CacheVm & StatisticResource MBeans does not happen and hence GemFire statistics & other attributes of these MBeans are not refreshed.
There is no known work-around yet. The Agent JVM process should be stopped and the Admin Agent should be started in a new JVM process.
|
|
| 09/17/10
|
#42346
|
CachePerfStats regions stat count is incremented for internal region.
|
|
closed
|
Regions statistic value may be larger than it should be
|
The CachePerfStats "regions" statistic may have a larger value than is correct. This is because internal regions are used by some product features and the statistics was also being incremented for the internal regions.
|
|
| 09/17/10
|
#42343
|
the instance name given to PartitionedRegionStats is too verbose
|
|
closed
|
The PartitionedRegionStats instance name is too long
|
The PartitionedRegionStats instance name is "Partitioned Region " + fullRegionName + " Statistics". It should just be fullRegionName.
|
|
| 09/15/10
|
#42334
|
NPE in PartitionRegionHelper methods.
|
6.0
|
closed
|
PartitionRegionHelper's getLocalData and getLocalPrimaryData throws NullPointerException
|
A null check is missing from region instance passed to PartitionRegionHelper#getLocalData and PartitionRegionHelper#getLocalPrimaryData. If null is passed instead of actual region object, IllegalArgumentException should be thrown.
|
|
| 09/03/10
|
#42312
|
ClientHealthMonitor.getStatusForAllClients() should ignore connections that are not for clients.
|
6.5
|
closed
|
For a Gateway Hub, GemFire monitoring module includes Gateways from remote site as clients.
|
While retrieving information about clients of a member that hosts a Gateway Hub, GemFire Admin module includes Gateways from remote site also as clients of the member. This issue is limited only to monitoring.
|
|
| 09/02/10
|
#42309
|
Server swallows exception during cq execution
|
6.0
|
closed
|
Failure message with CQ execution is not reported at client.
|
The client does not report the cause of the failure that happens during CQ execution; event though the server logs shows the actual error message.
|
The user has to look into the server log to see the error message.
|
| 08/30/10
|
#42296
|
NPE thrown from LocalRegion.serverPut() on replace(K,V,V), expected CacheClosedException
|
6.0
|
closed
|
NullPointerException thrown by replace(K,V,V) when the cache is closed
|
The replace(K,V,V) operation throws a null pointer exception when attempted on a closed cache instead of throwing a CacheClosedException.
|
|
| 08/19/10
|
#42281
|
Hang in waitForPrimary in inserts on child region when it is created after inserts on parent
|
6.5
|
closed
|
No colocation if parent PR populated brefore creating child PR
|
If parent partitioned region is populated before creating child partitioned regions, then populating child partitioned regions do not colocate buckets as per the parent partitioned region buckets.
|
|
| 08/18/10
|
#42280
|
Edge client stops getting events causing test to fail with data inconsistency
|
|
closed
|
client cache stops getting updates after server failure
|
It is possible that if server redundancy is lost and new servers are starting at the same time a client's sole server is shutting down that the new servers will not recover subscription information for the client and will stop sending it updates.
|
|
| 08/11/10
|
#42265
|
PR get ops should be executed in P2P reader threads to avoid contention on VMThinDiskLRURegionEntry (see AbstractDiskLRURegionEntry.setBits())
|
6.0
|
closed
|
Hang with current PR operations
|
With a partitioned region with persistence or overflow configured, and using conserve-sockets false, several concurrent operations on different members can cause a hang in very rare cases.
|
|
| 08/11/10
|
#42264
|
socket-lease-time has no impact due to typo p2p.idleConnectionTime[out]
|
6.0
|
closed
|
Connections continue to be closed even when socket-lease-time="0"
|
Connections continue to be closed even when socket-lease-time="0"
|
|
| 08/11/10
|
#42261
|
DistributedSystem disconnect hang after NPE reported by VERIFY_SUSPECT.stop()
|
6.0
|
closed
|
Hang during shutdown
|
In rare circumstances we have seen tests hang during shutdown after throwing a NullPointerException in VERIFY_SUSPECT.stop().
|
Killing the process that threw the exception will resolve the hang.
|
| 08/10/10
|
#42253
|
InternalGemFireError from UpdateOperation$UpdateMessage.setNewValueInEvent
|
|
closed
|
InternalGemFireError reported by UpdateOperation$UpdateMessage.setNewValueInEvent
|
*Fixed in 6.6*
This issue arises when mixed object types, i.e. some implementing Delta and other not, are used for updating keys in a region. We have already mentioned in the docs that such updates are not supported and that a ClassCastException is thrown if these are seen.
May be we can further add that such a usage may cause other problems, including possible loss/corruption of data.
As of now, the server which receives such a delta update from client does not throw the exception but when it distributes that delta update to its peers, those peers may throw this exception back to it, which in turn may give it back to the client.
In server-to-client path, client doesn't throw ClassCastException back to the application but simply logs it as a warning.
|
|
| 08/05/10
|
#42244
|
primary HA region queues are not balanced
|
|
closed
|
Primary client subscription queues are not balanced
|
While client subscription queues were balanced fairly across servers, primaries were not properly balanced, causing performance problems. Now, primary queues are much more balanced.
|
|
| 08/05/10
|
#42241
|
Unexpected ServerOperationException caused by CacheClosedException
|
|
closed
|
PutAll partial result behavior
|
Partial result will not return ServerOperationException caused by CacheClosedException to user application. In stead, user application will get CancelException directly.
|
|
| 08/02/10
|
#42221
|
replace(K,V) from client did not put new value in client
|
|
closed
|
replace(K,V) from client does not put V in client
|
When an entry exists on the server with key K and value null, a replace(K,V) from client replaces the value on the server, but does not put new value V in client cache. However, get(K) from client will fetch the value from server.
|
|
| 07/19/10
|
#42153
|
destroying region hung waiting for replies on vm waiting for destroy lock
|
|
closed
|
Hang creating a region with concurrent region destroy
|
If a member is creating a region while other members are doing a distributed destroy of the same region, that member could hang while creating the region in rare cases.
|
|
| 07/15/10
|
#42139
|
Proctor logging should not be Fine Level when hitting low memory thresholds
|
6.5
|
closed
|
Log level need to be upgraded from fine to warning when hitting low mwmory thresholds.
|
When the memory is chronically low, then it should be logged as warning rather than fine level log.
|
|
| 06/30/10
|
#42103
|
Gateway shutdown hang: GemFireCache.close -> GemFireCache.stopServers => GatewayImpl.stop => PoolImpl.acquireConnection
|
6.0
|
closed
|
Hang closing a gateway during network partition
|
If a gateway is closed before the gateway has established a connection to the remote side, closing the gateway may hang if a network partition occurs.
|
|
| 06/28/10
|
#42091
|
LinuxSystemStats.processes statistic is incorrect
|
|
closed
|
The LinuxSystemStats "processes" may be incorrect on RedHat
|
The LinuxSystemStats "processes" may be incorrect on certain versions of RedHat.
|
|
| 06/28/10
|
#42087
|
NullPointerException for DistributionManager.getChannelId
|
6.0
|
closed
|
NullPointerException thrown by DistributedSystem.connect()
|
It is remotely possible that DistributedSystem.connect() will throw a NullPointerException in DistributionManager.getChannelId(). This can happen when enable-network-partition-detection has been enabled and the connection attempt succeeds but is immediately disconnected by a network partition event.
|
|
| 06/22/10
|
#42076
|
member hangs in DistributedSytem.connect() [ClientGmsImpl.findInitialMembers] during network partition
|
6.0
|
closed
|
Hang during DistributedSystem.connect()
|
It is remotely possible for DistributedSystem.connect() to hang in ClientGmsImpl.findInitialMembers(). Thread dumps will show another thread named "UDP ucast receiver" blocked in PingWaiter.getPossibleCoordinator().
This can happen if the system is attempting to connect to a locator that was running on a machine that crashed during the connection attempt.
|
|
| 06/14/10
|
#42058
|
DiskAccessException while creating diskStore caused by java.io.IOException: Input/output error
|
|
closed
|
Input/output error when creating a disk store on NFS mount
|
We have observed that when persisting to an NFS mount on redhat 5 we occasionally see this error when creating the persistent store: java.io.IOException: Input/output error.
|
|
| 05/07/10
|
#41957
|
Interaction between registerInterest and eviction produces incorrect number of entries in region
|
|
closed
|
Incorrect number of entries in client region after registerInterest
|
If a client region is configured with eviction, the eviction stats can be inaccurate after a call to registerInterest with InterestResultPolicy.KEYS_VALUES. This will result in evicting the wrong number of entries.
|
|
| 05/04/10
|
#41941
|
when a cached object changes from serialized to deserialized its size is not updated
|
|
closed
|
ObjectSizer not consulted when deserializing objects
|
When using memory sized based eviction, an object sizer can be provided to ensure that gemfire accurately calculates the size memory usage of each object.
This object sizer is not being consulted in certain cases when gemfire has the serialized form available for the object and then later deserializes it. Instead, gemfire remembers the serialized size.
This can lead to inaccuracy in when gemfire performs memory based eviction.
|
|
| 04/29/10
|
#41921
|
Distributed deadlock when gateway startup is concurrent with ops and conserve-sockets=true
|
|
closed
|
Distributed deadlock when gateway startup is concurrent with ops and conserve-sockets=true
|
In rare circumstances, startup of a gateway could hang when cache operations are concurrently occurring. This can only happen if the gemfire property conserve-sockets=true is set. This race condition has been fixed.
|
|
| 04/28/10
|
#41917
|
transportUdp: peer hangs in Flow control (replenishments) processing
|
6.0
|
closed
|
hang in FC flow control protocol
|
It is possible under heavy load with disable-tcp=true for the system to lose messages. This sometimes manifests as a hang in the com.gemstone.org.jgroups.protocols.FC protocol. The problem is caused by flaws in UDP message dispatching.
|
|
| 04/26/10
|
#41889
|
entry operations hang in waitForReplies from surviving side when network dropped (network partition tests)
|
6.0
|
closed
|
operations hang waiting for replies from crashed machines with enable-network-partition-detection=true and IBM JVM
|
Using the IBM 1.5 JVM we have found that invoking Thread.isAlive() or Thread.isDead() on a thread that is reading on a socket connected to a machine that has crashed can hang. This causes operations to block until the OS keepalive timeout expires. We have removed these checks when running in an IBM JVM when network-partition-detection is enabled.
|
|
| 04/20/10
|
#41865
|
hang in BucketAdvisor.releasePrimaryLock waiting for replies from member that was previously shutdown
|
|
closed
|
Hang closing a partitioned region
|
If a member crashes while another member is closing a partitioned region or closing a cache containing a partitioned region, the member doing the close may hang while performing the close operation in rare cases.
|
|
| 04/20/10
|
#41857
|
JMX Agent can't use the RMI Registry that is already running.
|
6.0
|
closed
|
JMX Admin Agent can now use external RMI Registry when rmi-registry-enabled is set to false.
|
JMX Admin Agent boolean property rmi-registry-enabled indicates whether it should start the RMI Registry or use an external RMI Registry. Default value for this is true and the RMI Registry is started by the JMX Admin Agent. When this property is set to false, the JMX Admin Agent can now use the external RMI Registry.
|
|
| 04/20/10
|
#41855
|
While starting JMX Agent, there should be a way to configure the RMI Connector Server port.
|
6.0
|
closed
|
Well-defined ports should be configurable for the Agent.
|
Additional properties now to define well-known ports are:
(1)rmi-server-port: The port on which the RMI Connector Server should start.
(2)membership-port-range: The allowed range of UDP ports for use in forming an unique membership identifier. This range is given as two numbers separated by a minus sign.
(3)tcp-port: TCP/IP port number to use in the agent's distributed system
These properties are useful for starting the agent behind a firewall.
|
|
| 04/19/10
|
#41850
|
giiWhileMultiplePublishing fails when 3 of 10 (replicated) members do not have the entire keySet
|
6.0
|
closed
|
message loss with disable-tcp=true
|
It is possible for UDP messages to be lost under heavy load. This is caused by faults in the UDP unicast dispatching code.
|
|
| 04/16/10
|
#41829
|
Constraints on valid characters for use in region names should include OQL query string constraints
|
6.0
|
closed
|
Querying on region with special characters
|
Queries referring the region with special character (supported with regionName) are not supported. The support is added in 6.5.
|
|
| 03/31/10
|
#41739
|
shutDownAllMembers() appears to disconnect admin vm
|
|
closed
|
ShutDownAll assumptions
|
ShutDownAll will only shutdown members with cache. Locator, admin members are not shut down.
|
|
| 03/29/10
|
#41726
|
NPE generated in DataSerializer.readClass if getContextClassLoader returns null
|
6.0
|
closed
|
NPE in DataSerializer when using GemFire as an OSGi bundle
|
When GemFire is used as an OSGi bundle, a NPE is thrown (visible only in fine logs)
|
|
| 03/22/10
|
#41705
|
Assertion thrown from RegionAdvisor.getBucket() during bucket recovery
|
6.0
|
closed
|
Assertion error during bucket recovery
|
After a new member joins, an assertion error could be thrown when we try to restore the redundant copy for a colocated partitioned region.
|
|
| 03/17/10
|
#41686
|
async disk region leaks memory
|
5.7
|
closed
|
Memory leak with with async persistence or overflow
|
With a region configured with asynchronous persistence or overflow, the disk region may create and retain many byte buffers while getting an initial image from another peer. After this point the byte buffers are not released, resulting in excessive memory usage.
|
|
| 03/12/10
|
#41671
|
ConcurrentModificationException thrown while iterating over DistributedRegion.getHeapThresholdReachedMembers HashMap
|
6.0
|
closed
|
ConcurrentModificationException in DistributedRegion.getHeapThresholdReachedMembers()
|
A ConcurrentModificationException may be thrown when a remote member exceeds Critical memory threshold.
|
|
| 03/10/10
|
#41663
|
tests with concurrent region (region create, region destroy) operations fail with OOME
|
6.0
|
closed
|
EventTracker memory leak
|
A small flaw in Region destruction causes the cache to retain references to EventTracker objects that should otherwise be discarded.
EventTracker objects record information about which events have been applied to a cache Region. It could impact an application that has a high thread count across the distributed system and which performs a lot of Region destruction operations.
|
|
| 03/06/10
|
#41628
|
Need to remove the use of BlowFishJ from GemFire
|
|
closed
|
GemFire uses BlowFishJ
|
GemFire no longer uses BlowFishJ, it has been replaced by JDK supported BlowFish algorithm.
|
|
| 02/22/10
|
#41568
|
test hangs while creating cache with ipv6
|
|
closed
|
Hang creating a connection while admin console is running
|
If there is an admin console running that is receiving alerts from the gemfire members, and a newly created VM can't connect to the admin console within p2p.handshakeTimeoutMS (60 seconds), the member in trouble could hang during the DistributedSystem.connect call.
|
|
| 02/11/10
|
#41553
|
Support to include keys as part of the CQ result set.
|
|
closed
|
CQ Results to include keys.
|
When CQ is executed with "executeWithInitialResults" option, the resultset returned does not contain the keys as part of the result set, because of this it is harder to correlate between result set and the CQ events generated in later stages, the CQ Event includes the key on which the update happened.
|
|
| 02/09/10
|
#41539
|
Shutdown timeout with Distributed system shutdown hook waiting for responses to UpdateAttributes requests (from departed members)
|
6.0
|
closed
|
hang during shutdown waiting for responses from departed members
|
It is possible for the product to hang during shutdown, issuing a warning message that it has not received responses to a message from members that have shut down. This is caused by early termination of notification of membership changes in some parts of the product.
|
|
| 02/09/10
|
#41538
|
unexpected afterRemoteRegionCrash event refers to vm that should be healthy
|
|
closed
|
gemfire deadlocks and is kicked out of distributed system
|
It is possible for gemfire to hang while attempting to send an alert to a member that is no longer there. The code sending the alert holds a lock that prevents the member from being able to respond to failure-detection probes or membership changes.
|
|
| 01/25/10
|
#41509
|
ClassCastException running an OQL query
|
|
closed
|
ClassCastException running an OQL query
|
This is fixed in gemfire57_hotfix and is ported to GemFire 6.5.
|
|
| 01/25/10
|
#41508
|
hang creating region when peer logs that "Peer has disappeared from view"
|
6.0
|
closed
|
hang attempting to connect to departed member
|
If a new member happens to reuse the peer-to-peer port number of a recently departed member it is possible that the product will hang trying to communicate with the departed member after logging "Peer has disappeared from view". This is due to a bookkeeping error in membership management.
|
|
| 01/15/10
|
#41482
|
Managed Resources related to regions are not removed even after the region is destroyed/removed/lost.
|
6.0
|
closed
|
Clean up managed resources in Agent created for regions in the Cache
|
Managed resources are created in Agent for regions in the cache in a member of a distributed system. These are now removed when a region gets destroyed. Also there are four new notifications available for JMX clients through JMX on the MBeans - SystemMember and CacheVm. The notifications are:
(1)gemfire.distributedsystem.cache.created - Creation of a cache on a member
(2)gemfire.distributedsystem.cache.closed - Closure of a cache on a member
(3)gemfire.distributedsystem.cache.region.created - Creation of a region in a cache on a member
(4)gemfire.distributedsystem.cache.region.lost - Removal of a region from a cache on a member
|
|
| 01/13/10
|
#41473
|
Members should send notifications for changes in the set of clients.
|
6.0
|
closed
|
JMX Notifications for GemFire cache client connections are now sent by JMX Admin Agent to JMX Clients
|
GemFire Client membership information for SystemMember & CacheVm MBeans is available through these notifications:
(1) gemfire.distributedsystem.cache.client.joined - When a cache client connects with a cache server
(2) gemfire.distributedsystem.cache.client.left - When a cache client gets disconnected gracefully from a cache server
(3) gemfire.distributedsystem.cache.client.crashed - When a cache client gets crashed and/or abruptly loses connection with a cache server
|
|
| 01/12/10
|
#41468
|
Data consistency between CQ Result Set and the region data.
|
6.0
|
closed
|
Data consistency between CQ Result Set and the region data.
|
When CQ is executed using executeWithInitialResults option, there is a possibility that CQ can miss the events that is applied While resultset is being sent to client. This is fixed in 6.5 by queuing event that occurs during CQ execution on the client and replaying once CQ is completely initialized.
NOTE: There is a possibility that the change may already reflected in the result set, still the CQ listener can see the same change (resulting in duplicate event), the client application need to manage the duplicate event (if it needs to ignore the event or apply the same on the result set).
|
|
| 12/10/09
|
#41402
|
data inconsistency PR datastore with functionExecution HA re-execution
|
6.0
|
closed
|
Executing write operations on a cache inside a function can cause inconsistency when redundant copies is greater than 1
|
This issue occurs when the primary on which the function has partially executed is killed and it has done the following
1> It has distributed the operation to one of the two secondaries.
2> The secondary that received the operation becomes the primary.
3> Re-execution of the function happens on the new primary.
The product executes the function on a thread pool. When the cache operation like say destroy, is done in the function body, it goes through the normal process of generating an event id based on member, thread and sequenceid on that node. When the retry comes in, it happens on a different node, on a different thread and has a different sequence id. So there is no way to detect this as a re-execution of a previous function.
This is actually no different from the case where we put data into a region on a peer which is the primary and kill it. The redundant nodes will be inconsistent and there is nothing that can be done.
Prudent practices:
1> Use a redundancy level of 1 for the partitioned region
2> If 1 is not feasible, use transaction if you need the all or nothing behavior for cache operations running inside a function
|
|
| 11/24/09
|
#41357
|
Shutdown hang with ConcurrentModificationException thrown from LogWriterImpl.cleanUpThreadGroups during InternalDistributedSystem disconnect
|
6.0
|
closed
|
DistributedSystem disconnect throws ConcurrentModificationException
|
During shutdown it is possible for DistributedSystem.disconnect() to throw a ConcurrentModificationException. This can happen if an administrative member is disconnecting at the same time. The exception is thrown from LogWriterImpl.cleanUpThreadGroups().
|
|
| 11/24/09
|
#41355
|
SystemConnectException: Unable to become coordinator of existing group because no view responses were received
|
6.0
|
closed
|
locator startup fails
|
When authorization is used or enable-network-partition-detection is enabled it is possible for locator startup to fail with the message "Unable to become coordinator of existing group because no view responses were received".
|
|
| 11/18/09
|
#41323
|
peer PR member misses destroy (while performing bucket gii) during rebalancing
|
6.0
|
closed
|
Missing CQ event when bucket re-balance in progress
|
This is an missing event issues. This was first seen in eventFilterOpt branch and is fixed in 6.5 release.
|
|
| 11/16/09
|
#41306
|
Unexpected DiskAccessException, Data for diskEntry could not be obtained from Disk.
|
|
closed
|
DiskAccessException when applying ConcurrentMap operations to a region
|
When processing a ConcurrentMap operation GemFire may throw a DiskAccessException. The stack will show that the product is attempting to read information from the disk but that the data could not be located:
com.gemstone.gemfire.cache.DiskAccessException: For Region: /testRegion: Data for DiskEntry having DiskId as Oplog ID = -1; Offset in Oplog = 832485; Value Length = 23; UserBits is = 1 could not be obtained from Disk. A clear operation may have deleted the oplogs
|
|
| 11/11/09
|
#41283
|
Getting a server's PR entry from a client doesn't update its lastAccessedTime
|
|
closed
|
lastAccessedTime on an entry does not reflect when the entry was accessed last from any client in the system
|
This is a trade-off unlikely to be ever changed. In order to scale gets, we allow gets to be satisfied from primary or secondary data stores. The lastAccessedTime is maintained locally on the store. So it is likely that key X has been fetched on a secondary recently but has idle timed out on the primary due to load balancing.
We do ensure that when an entry expires out on a primary, it is removed from the entire system
|
|
| 11/11/09
|
#41280
|
createCQfetchInitialResult fails, Caused by: NPE from CqService.executeCq()
|
6.0
|
closed
|
NPE with CQ Execution
|
Reported when CQ is executed. One cause of this bug was unsynchronized code that establishes the identity of a client based on its first connection's port. Fixed in GemFire 6.5 release.
|
|
| 11/05/09
|
#41269
|
Unexpected replies processed in bridge servers
|
6.0
|
closed
|
warning messages in logs about unexpected replies
|
When using Delta, if one of the members has a region with DataPolicy EMPTY, the following warning message is logged "Received reply from member <memberId> but was not expecting one."
|
|
| 10/28/09
|
#41248
|
locator fails to start with GemFireConfigException
|
|
closed
|
locator fails to start with GemFireConfigException
|
If the system property gemfire.locators is used to configure the locators setting and the property doesn't include the locator being started, startup will fail with a GemFireConfigException
{{{
com.gemstone.gemfire.GemFireConfigException: Unable to contact a Locator service. Operation either timed out or Locator does not exist. Configured list of locators is "[frodo:15964]".
at com.gemstone.org.jgroups.protocols.TCPGOSSIP.sendGetMembersRequest(TCPGOSSIP.java:183)
at com.gemstone.org.jgroups.protocols.PingSender.run(PingSender.java:82)
at java.lang.Thread.run(Thread.java:619)
}}}
As a workaround, make sure that the gemfire.locators property includes the locator being started.
|
|
| 10/13/09
|
#41206
|
ClassNotFoundException when DataSerializer attempts to deserialize an object array that has an array component type
|
|
closed
|
Deserializing a multidimensional array fails
|
If you serialize an array that array fields with DataSerializer, gemfire will throw a ClassNotFoundException when deserializing the array.
|
|
| 10/05/09
|
#41188
|
GII recipient could incorrectly ignore an event because it is marked as a possible duplicate
|
|
closed
|
Crash while creating a replicate region could result in a lost update
|
In rare cases an update may be lost if one cache server is creating a replicated region and another cacher server with the same region crashes while applying them update from a client. After the crash, the cache server that just created the region may miss the update.
|
|
| 09/30/09
|
#41163
|
A large number of the following AdminException are seen in the Agent logfiles ...
|
6.0
|
closed
|
Few occurrences of AdminException about failure to refresh statistics
|
There could be few occurrences of "AdminException: Failed to refresh statistics". These could be ignored if around the same time there is a log statement logged as: "Processing client membership event from <Server_Id> for client with id: <Client_Id> running on host: <Client_Host>". These exceptions could appear in logs until the clean up event received from the server is processed completely.
|
|
| 09/28/09
|
#41160
|
closing cache hangs waiting for replies from vm making no attempt to respond
|
|
closed
|
hang during shutdown with disable-tcp=true
|
It is possible for the product to hang during shutdown when disable-tcp=true. This is caused by faults in the UDP unicast dispatching code.
|
|
| 09/25/09
|
#41155
|
Client id is not random enough (getting duplicates)
|
|
closed
|
Duplicate client cache ID
|
It is possible for two client caches to use the same membership ID, causing servers to become confused and mis-deliver events. The caches must be running on the same machine for this to happen.
|
|
| 09/21/09
|
#41136
|
Eviction is not evicting the least recently used entries for normal regions
|
6.0
|
closed
|
Entry other than the least recently used was evicted
|
Eviction does not always evict the least recently used entry.
|
|
| 09/21/09
|
#41131
|
Hang with mix of gets and puts using same key with Partitioned region
|
|
closed
|
Hang with concurrent operations on the same key with statistics enabled
|
In rare cases, concurrent operations on the same key in partitioned region can result in a hang if statistics are enabled.
|
|
| 09/17/09
|
#41117
|
InternalGemFireError: Assert thrown from partitioned.DestroyMessage during PR invalidate region
|
6.0
|
closed
|
invalidateRegion() is not supported for PartitionedRegions
|
PartitionedRegion now supports invalidateRegion() operation.
|
|
| 09/14/09
|
#41097
|
EnforceUniqueHostStorageAllocation flag prevents moving a bucket between two VMs on the same host
|
6.0
|
closed
|
gemfire.EnforceUniqueHostStorageAllocation setting has an inattended impact on partitioned region rebalancing
|
Setting the gemfire.EnforceUniqueHostStorageAllocation prevents buckets from moving one VM to another on the same host during a rebalance operation.
|
|
| 09/14/09
|
#41096
|
Enabling both eviction and expiration in a partitioned region leaves entries in the cache.
|
6.0
|
closed
|
Partition Region eviction may prevent entries from expiring
|
In prior version of GemFire entries would not get expired on partition region secondaries. This would occur if eviction of an entry in a partition region primary occurred before expiration, and the eviction action was "LOCAL_DESTROY".
|
|
| 09/14/09
|
#41093
|
Transactional entry-create in region destroyed within same transaction is unexpectedly processed by CacheListener and TransactionListener.
|
6.0
|
closed
|
Transactional load does not cause conflict
|
A load done to satisfy a get operation does not cause a CommitConflictException even though the same entry is modified by another thread.
|
|
| 09/14/09
|
#41091
|
Missing primary detected after member forcefully disconnected from DS (underlying InternalGemFireError: Trying to clear a bucket region that was not destroyed)
|
6.0
|
closed
|
Redundancy not satisfied after network partition
|
If network partition detection is enabled, in rare cases gemfire can fail to restore redundancy after the partition.
|
|
| 09/11/09
|
#41085
|
The LRU list can get into a state where it won't clean up properly
|
5.7
|
closed
|
Memory leak with LRU eviction
|
In a region with eviction configured, if eviction never actually occurs, but many destroy operations are performed on the region, some metadata for the destroyed entries may be retained, resulting in excess heap usage.
This can also affect gemfire gateways.
|
|
| 09/11/09
|
#41084
|
GatewayEventImpl.isUpdate uses a transient variable in its determination
|
5.7
|
closed
|
memory leak can occur in gateway queues that enable conflation
|
In versions prior to 6.5, a memory leak could occur in VMs containing gateway queues that have conflation enabled and whose events overflow to disk.
This leak was fixed in 6.5.
|
|
| 09/08/09
|
#41076
|
Memory leak of EntryExpiryTasks in BucketRegion.pendingSecondaryExpires
|
|
closed
|
Memory leak in partition region secondaries
|
When an entry in a partition region secondary is destroyed, the expiration task associated with the entry is not released until the secondary switched to being the primary.
|
|
| 09/08/09
|
#41075
|
accessor vms hang in waitForPrimaryMember after a dataStore is forcefully disconnected from the DS
|
6.0
|
closed
|
hang caused by alert listener notification
|
It is possible for GemFire to deadlock trying to notify an admin member of an alert. Thread dumps will show a thread in ManagerLogWriter.notifyAlertListeners() with other threads waiting to lock the membership view.
|
|
| 09/07/09
|
#41060
|
JMX operation SystemMemberCache.getRegionSnapshot fails completely if creating snapshot for even one of the regions fails.
|
6.0
|
closed
|
Occurrence of an exception in admin agent while retrieving region information would prevent the retrieval of region information for other regions on the member.
|
The admin agent logs failures encountered while retrieving information about regions in a cache, and continues with the retrieval of information of the other regions on the member. In versions of GemFire Enterprise prior to 6.5, this failure would prevent the admin agent from retrieving information about all regions present in a cache. This behavior was most commonly seen when invoking the SystemMemberCache.getRegionSnapshot MBean operation.
|
|
| 09/04/09
|
#41058
|
Missing requirement that agent needs instantiators on its classpath
|
5.7
|
closed
|
Admin Agent in GFE 5.7 or older needs instantiators on its classpath
|
This is specific to GemFire versions 5.7 & older.
Consider the case where: (1)The application has custom classes that it uses to store data into the cache. (2)These custom classes implement DataSerializable interface & provide their own instantiator. When the JMXAgent starts it would require the application specific classes on it's classpath. The exception to this is: If the agent is started before the cacheserver(s), in which case a warning is logged on the absence of application specific classes (in the agent classpath) but agent operations continue.
|
|
| 08/30/09
|
#41025
|
hasDelta/toDelta are invoked on the client side even if Delta Propagation property is turned off
|
6.0
|
closed
|
Client sends delta even if delta-propagation=false in the distributed system
|
Client have no knowledge whether delta-propagation is turned on or off on the server, and attempts to send deltas during updates. This does not cause any data errors. The server handles the incoming delta bytes and does not propagate the update as a delta.
|
|
| 08/27/09
|
#41022
|
Cacheserver ignores log-file property from gemfire.properties file
|
6.0
|
closed
|
Cacheserver script ignores log-file property
|
When starting a cache server using the cacheserver script, the log-file property in a gemfire.properties was being ignored. Now, the search order for the log-file property is: 1. command line arg 2. gemfire.properties 3. cacheserver.log default.
|
|
| 08/27/09
|
#41021
|
New EventTrackers are not tracked properly by the ExpiryTask
|
6.0
|
closed
|
A memory leak involving event trackers.
|
The cache uses event trackers to ensure that we can detect duplicates coming in from a single thread (events that may been retransmitted due to primary servers going down). These trackers are supposed to expire after a specified idle timeout period. In 6.0, the expiration task was not removing these event trackers leading to a memory leak. This is an issue for long running systems where publishing threads keep changing over the lifetime of the system. This has been addressed in 6.5
|
|
| 08/26/09
|
#41014
|
JMX Agent error reading mcast-port property
|
6.0
|
closed
|
Leading and trailing whitespace in property values would prevent a cache server or agent process from starting.
|
Preceding or trailing spaces in the values in the gemfire.properties or the agent's properties files could result in exception preventing the process from getting launched. Now all values are trimmed of leading & trailing white spaces.
|
|
| 08/21/09
|
#41001
|
DataSerializer.register throws the wrong exception
|
6.0
|
closed
|
DataSerializer.register throws incorrect exception type
|
If the id specified for a DataSerializer type clashes with that of a type already registered with the data serialization framework, GFE throws an IllegalArgumentException instead of an IllegalStateException as documented. The exception message, though, correctly described the reason for this exception and also names the class that is also registered.
|
|
| 08/19/09
|
#40996
|
PartitionedRegion#getEntry can access an entry before it is created
|
|
closed
|
Early escape of Region.Entry from CacheWriter
|
It was possible for the cache writer to get a reference to a Region.Entry before it was initialized. A call to getEntry now returns null.
|
|
| 08/18/09
|
#40985
|
Possible infinite loop in GrantorRequestProcessor.startElderCall
|
|
closed
|
Hang while closing a global region
|
In rare conditions, closing a global region could result in a hang. This may cause other members to hang trying to lock entries while updating them.
|
|
| 08/17/09
|
#40977
|
Entries are lost in PartitionedRegions by cycling dataStore VMs
|
6.0
|
closed
|
During HA event, destroy operation failed with EntryNotFoundException
|
When a destroy operation is done on a PartitionedRegion and the primary member for that key crashes, an EntryNotFoundException may be thrown.
|
|
| 08/11/09
|
#40955
|
Iterating on PR local data invokes PartitionResolver
|
|
closed
|
Improvements to partition resolver
|
PartitionResolver is now invoked only once per operation. Iterating over local data does not invoke resolver. Iterators from peer accessors does not invoke resolver in the accessor.
|
|
| 08/11/09
|
#40953
|
Region javadoc for putAll states it is unsupported on PR
|
5.7
|
closed
|
Region javadoc for putAll states it is unsupported on PartitionedRegions
|
The javadoc for putAll states:
{{{
throws UnsupportedOperationException If the region is a partitioned region
}}}
This is a mistake, putAll has been supported on all region types since the GemFire 5.7 release.
|
Customers using GemFire 5.7 or later are encouraged to use putAll on partitioned regions.
|
| 08/06/09
|
#40943
|
Reblancing colocated regions moves fewer buckets than expected
|
6.0
|
closed
|
Reblancing colocated regions moves less data than expected
|
Due to a bug in the rebalancing algorithm, gemfire does not move data during a rebalance even though it appears there is space for the data.
This bug only appears when using colocated regions. Gemfire is erroneously comparing the total size of data to be moved for all of the colocated regions with the local-max-memory setting of each individual region. If the total amount of data is greater than the remaining capacity of the region, gemfire will not move the data.
|
Increase the local-max-memory of all of the regions.
|
| 08/04/09
|
#40932
|
GemFire cannot serialize a String who's logical length is < 0xFFFF, but who's utf-8 encoded length is > 0xFFFF
|
|
closed
|
GemFire cannot serialize a String who's logical length is < 0xFFFF, but who's utf-8 encoded length is > 0xFFFF
|
If you have a string with some multibyte characters that is less than 0xFFFF characters long, but will be more than 0xFFFF bytes when serialized using UTF, a UTFDataFormatException is thrown when serializing the string with gemfire.
|
|
| 07/29/09
|
#40916
|
IllegalArgumentException thrown if multiple regions configured using same EvictionAttributes
|
5.0
|
closed
|
IllegalArgumentException thrown if multiple regions configured using same EvictionAttributes
|
If a single instance of EvictionAttributes was shared among multiple region creations, an IllegalArgumentException was thrown. This is now fixed.
|
|
| 07/24/09
|
#40906
|
socket-buffer-size can not exceed 16,777,215
|
5.7
|
closed
|
GemFire API allows socket-buffer-size to be configured to values greater than Java allows.
|
Setting the "socket-buffer-size" to a value greater than 16,777,215 will trigger an exception:
{{{
java.lang.IllegalStateException?: tcp message exceeded max size of 16,777,215
}}}
|
Do not set the "socket-buffer-size" to a value greater than 16,777,215.
|
| 07/17/09
|
#40886
|
MulticastSocket.setInterface call fails on Windows Server 2008
|
|
closed
|
GemFire cannot create a multicast socket on WIndows Server 2008, Windows Vista, or Windows 7
|
Due complications related to JGroups bug JGRP-777 GemFire throws an exception with the root cause stating "An operation was attempted on something that is not a socket" when configured to use Multicast for membership discovery on Windows Server 2008.
{{{
Caused by: java.net.SocketException: An operation was attempted on something that is not a socket
at java.net.PlainDatagramSocketImpl.socketSetOption(Native Method)
at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.ja
va:299)
at java.net.MulticastSocket.setInterface(MulticastSocket.java:420)
at com.gemstone.org.jgroups.protocols.UDP.createSockets(UDP.java:631)
at com.gemstone.org.jgroups.protocols.UDP.start(UDP.java:502)
at com.gemstone.org.jgroups.stack.Protocol.handleSpecialDownEvent(Protoc
ol.java:874)
... 78 more
}}}
|
Use locators instead of multicast for discovery.
|
| 07/16/09
|
#40883
|
fromDelta called twice for same update with CQs and HA
|
|
closed
|
Delta callback method fromDelta() may get invoked twice for same update
|
*Fixed in 6.6*
In delta propagation feature, it is possible that a vm may invoke fromDelta() twice on its value for the same event. The second invocation may result in generating a new value which may differ from the actual value which triggered the event.
This could be avoided by taking this into account while writing the implementation of Delta so that the second invocation of fromDelta() for the same event becomes a no-op.
|
|
| 07/16/09
|
#40882
|
In RemoteGfManagerAgent, exceptions occurred while connecting to the DS and handling joined members should be handled properly.
|
5.8
|
closed
|
Before failing to connect in distributed system due to missing license information on a member, the Agent should try every member of a distributed system
|
If there is more than one member running and the agent fails to retrieve license information for the distributed system from the first member, the agent tries the next member. In addition, failure to retrieve the license information from one of the members is now logged at both the member and the agent.
|
|
| 07/13/09
|
#40872
|
Incorrect mbean descriptor in JMX AdminAgent
|
6.0
|
closed
|
Incorrect descriptor JMX operation SystemMember.manageStat removed
|
Removed non-existing operation descriptor manageStat that was described for SystemMember MBean.
|
|
| 06/26/09
|
#40835
|
Locators fail to start on Windows in Pure Java Mode
|
5.7
|
closed
|
Locators fail to start on Windows in Pure Java Mode
|
A locator cannot be started in pure Java Mode by using the following command-line:
gemfire start-locator -port=8888
The locator.log has the following message:
'" true true "' is not a valid IP address for this machine.
|
Use the following command to workaround the issue by specifying values for the bind address, hostname for clients, and logfile.
gemfire start-locator -address=%bindaddr% -hostname-for-clients=locator_%bindaddr% -Dgemfire.log-file=%logfile%
where bindaddr is a suitable bind address for the machine and logfile is any filename other than "locator.log"
|
| 06/16/09
|
#40808
|
CQ doesn't send update events in case of evication (overflow to disk).
|
|
closed
|
CQ Events with update on evicted value
|
When an update happens on the region entry whose value is written to disk, the cq applies the query condition on only new value, as the old value is not available during that case it just ignores applying the query condition on old value. The issue will be seen only if the event is not cached before. Fixed in 6.5.
|
|
| 06/12/09
|
#40797
|
Events due to eviction on PR are not firing
|
|
closed
|
CacheListener Events due to eviction on PartitionedRegions do not get invoked
|
This bug impacts a region configured to be a PartitionedRegion with a listener and eviction. The expected behavior is that a listener would invoke the void afterDestroy(EntryEvent e) method whenever an entry was evicted from the cache. While eviction does take place, the listener event is not triggered. All other listener events do behave correctly though.
|
Use Distributed Regions with a manual partitioning scheme.
|
| 06/11/09
|
#40790
|
region-time-to-live and region-idle-time have not been implemented for PR
|
|
closed
|
'region-time-to-live' and 'region-idle-time' attributes have no effect on Partitioned Regions
|
Distributed regions support 'region-time-to-live' and 'region-idle-time' expiration attributes for their entries. These expiration attributes are not supported in partitioned regions and are ignored.
|
|
| 06/03/09
|
#40757
|
HAClientQueues (not persistent) do not get deleted when client disconnects
|
5.7
|
closed
|
Client queues on server may cause the server to lock up or run out of memory
|
In cases where a client disconnect soon after connecting to a server, the client's queue did not cleaned up. If this happened frequently, these queues would cause the server to run out of memory, or the queues to fill up with events causing the server to lock up while trying to insert events into the queue. This has been fixed in GemFire 6.1.
|
|
| 06/02/09
|
#40751
|
A RuntimeException from a user's toData method causes a hang
|
6.0
|
closed
|
A RuntimeException from a user's toData method can cause a distributed member to hang
|
If a runtime exception is thrown from the toData method of a user's DataSerializable object while doing a distributed put, GemFire will become hung.
|
Code toData methods defensively to catch RuntimeException and handle it in an alternate way.
|
| 06/02/09
|
#40749
|
Using multiple GII providers with a persistent region can resurrect destroyed entries
|
6.0
|
closed
|
Using multiple GII providers with a persistent region can resurrect destroyed entries
|
When more than one member with the 'provider' attribute set to true is present, a new member coming up does a union GII from all of the providers in addition to what is on disk. The result is that if there are entries on disk which have been destroyed in the providers, the new member will resurrect those destroyed entries.
|
|
| 05/27/09
|
#40735
|
Disk recovery fails if using -Duser.language=ja
|
6.0
|
closed
|
Disk Regions do not function correctly if the locale's language is "ja", such as when -Duser.language=ja
|
Due to an error in how the filename's prefix is handled by the localization code GemFire will fail to find a disk persistence file even if it exists at the path specified by the user's configuration. The code works correctly for all user language's except Japanese ("ja").
|
Setting the java system property user.language to English via the command line will avoid this problem.
java -Duser.language=en ...
|
| 05/27/09
|
#40731
|
gateways are limited to 10G of persistence/overflow
|
6.0
|
closed
|
gateways are limited to 10G of persistence/overflow
|
In 6.0 gateways were changed to no longer roll oplogs. Gateways always have a single directory whose dir-size is the default of 10G. Note that dir-size only applies to oplogs but that is all a gateway has now since it never rolls. Once the oplogs on a gateway reach 10G the next write will fail with an out of disk space error.
|
|
| 05/21/09
|
#40722
|
Partitioned Region expiration does not distribute events
|
5.8
|
closed
|
Destroy and invalidate events not sent to clients or cache listeners in a partition region
|
When a Region with DataPolicy.PARTITION is configured with Eviction enabled, and with EvictionAction set to either DESTROY or INVALIDATE, an AFTER_DESTROY or AFTER_INVALIDATE event is not sent to cache client, or CacheListeners.
|
|
| 05/21/09
|
#40718
|
HeapLRU with ObjectSizer will expose CachedDeserializable instances to user code
|
6.0
|
closed
|
Configuring HeapLRU with an ObjectSizer it will expose CachedDeserializable instances to application code
|
If you configure a HeapLRU and an ObjectSizer for it then GemFire will mistakenly pass instances of our internal CachedDeserializable instances to the customers implementation of ObjectSizer.sizeof(Object)
|
Customers can workaround this bug by adding the following code in any implementation of ObjectSizer.
{{{
import com.gemstone.gemfire.internal.cache.lru.Sizeable;
public class MyObjectSizer implements ObjectSize {
public int sizeof(Object o) {
if (o instanceof Sizeable) {
return ((Sizeable)o).getSizeInBytes();
}
// customer's sizeof code goes here
}
}
}}}
|
| 05/20/09
|
#40714
|
Registering a function on a Java client changes the behavior when executing an instance of the function
|
6.0
|
closed
|
Incorrect function may be executed in Execution.execute(Function f) API
|
In prior versions of GemFire, the Execution.execute(Function f) API resulted in the execution of a function other than the one supplied as a parameter if the ID of this instance matched that of a function already registered on the server. The registered function was executed instead.
|
|
| 05/20/09
|
#40713
|
LIFO Eviction APIs should not be visible to customers
|
5.7
|
closed
|
LIFO Eviction APIs should not be part of the public API
|
The following methods and constants were intentionality exposed as part of the GemFire API. They are not intended for customer use and should be considered strongly deprecated.
{{{
Package: com.gemstone.gemfire.cache
EvictionAttributes#createLIFOEntryAttributes
EvictionAttributes#createLIFOMemoryAttributes
EvictionAlgorithm.LIFO_ENTRY
EvictionAlgorithm.LIFO_MEMORY
EvictionAlgorithm#isLIFOEntry
EvictionAlgorithm#isLIFOMemory
EvictionAlgorithm#isLIFO
}}}
|
Do not write code that makes use of this methods or constants.
|
| 05/06/09
|
#40674
|
FunctionService.onServers() does not execute on all servers but on the servers the pool is currently connected to
|
6.0
|
closed
|
FunctionService.onServers() API may not execute on all servers in a pool
|
In prior versions of GemFire, the FunctionService.onServer('poolName') API did not ensure that the function was executed on all servers configured in the pool. It is possible that at the time the function execution is initiated, the pool may not have an active connection to one or more of its servers. GemFire 6.1 fixes this and ensures that connections to all servers configured in the pool are active. If an attempt to create a connection fails, the function execution fails.
|
|
| 05/05/09
|
#40668
|
ResultsBag fromData() throws NPE.
|
6.0
|
closed
|
NPE in ResultsBag fromData()
|
This happened when ResultBag.fromData() is called. This is fixed in 6.5 and also ported to gemfire601_maint branch.
|
|
| 05/04/09
|
#40666
|
During start up, a process may try to connect to other processes even after it knew that those processes were gone
|
|
closed
|
DistributedSystem attempts to connect to members that have left
|
It is possible that the DistributedSystem will become confused and attempt to connect to members that have left the system while it was starting up. When this happens you will see the departed members admitted into membership in "P2P message reader" threads. This happens when the departing members see the new member and connect to it, causing them to be "surprise members" to the new process.
|
|
| 04/28/09
|
#40648
|
oplog rolling fails reading with Bad file descriptor
|
6.0
|
closed
|
Oplog roller fails with "Bad file descriptor"
|
If oplog rolling is enabled and overflow to disk is configured then a small race condition exists in which the roller may fail causing the region to be closed. The following is an example failure:
{{{
[info 2009/04/28 10:52:22.800 PDT <main> tid=0x1] Closing oplog early since it is empty. It is for region /myReg and has oplog#22
[error 2009/04/28 10:52:22.800 PDT <OplogRoller /myReg for oplog 22> tid=0xf] A DiskAccessException has occurred while writing to the disk for region /myReg. The region will be closed.
com.gemstone.gemfire.cache.DiskAccessException: For Region: /myReg: Failed reading from "/export/jade1b1/users/darrel/gfbuild/BACKUP_myReg_22".
oplogID = 22
Offset being read=10,300,824 Current Oplog Size=10,400,832 Actual File Size =10,400,832 IS ASYNCH MODE =false IS ASYNCH WRITER ALIVE=false, caused by java.io.IOException: Bad file descriptor
at com.gemstone.gemfire.internal.cache.Oplog.basicGetForRoller(Oplog.java:3727)
at com.gemstone.gemfire.internal.cache.Oplog.getBytesAndBitsForSwitchingEntry(Oplog.java:2356)
at com.gemstone.gemfire.internal.cache.ComplexDiskRegion$OplogRoller.rollBackup(ComplexDiskRegion.java:919)
at com.gemstone.gemfire.internal.cache.ComplexDiskRegion$OplogRoller.roll(ComplexDiskRegion.java:1157)
at com.gemstone.gemfire.internal.cache.ComplexDiskRegion$OplogRoller.run(ComplexDiskRegion.java:1215)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Bad file descriptor
at java.io.RandomAccessFile.seek(Native Method)
at com.gemstone.gemfire.internal.cache.Oplog.basicGetForRoller(Oplog.java:3694)
}}}
|
Setting the system property "gemfire.disk.KEEP_EMPTY_OPLOGS" to "true" will prevent this bug.
|
| 04/24/09
|
#40642
|
HeapLRUStatistics.heapUsage does not represent the amount of heap currently in use (in bytes)
|
6.0
|
closed
|
HeapLRUStatistics.heapUsage stat removed
|
HeapLRUStatistics.heapUsage stat has been removed, please refer to the ResourceManager stats instead.
|
|
| 04/23/09
|
#40635
|
socket/thread leak with conserve-sockets=false
|
|
closed
|
Thread and Socket leak when conserve-sockets=false
|
When configured with conserve-sockets=false, GemFire may accumulate idle threads that have names similar to this:
P2P message reader for ent(42524):2331/2296 SHARED=true ORDERED=false UID=1371
These threads and the sockets they are reading from are created to transmit message replies. They may accumulate if they were created for sole use by a particular thread and that thread no longer exists.
|
|
| 04/22/09
|
#40632
|
PR expiration with localDestroy fails with InternalGemFireError
|
|
closed
|
Using localDestroy as the expiration action for a PR throws InternalGemfireError
|
Setting the expiration action of localDestroy on a partitioned region causes an InternalGemFireError to be logged. No expiration happens.
Starting with version 6.0, setting the expiration action to localDestroy will throw an error on region creation. Use the destroy action instead.
|
Don't use local destroy, use destroy instead. This expires all copies of the entry.
|
| 04/16/09
|
#40603
|
suspect strings: ClassCastException thrown from EvictionAttributesImpl.fromData() => ObjectInputStream.defaultReadFields()
|
6.0
|
closed
|
ClassCastException thrown from EvictionAttributesImpl
|
ClassCastException thrown from EvictionAttributesImpl. This is JDK issue reported in 6.0. This is been taken care in 6.5 by moving to later jdk version.
|
|
| 04/08/09
|
#40551
|
RegionMembershipListener.initialMembers is not invoked when added using AttributesMutator
|
|
closed
|
A RegionMembershipListener added after a Region is created does not have its initialMembers() method invoked
|
If you add a RegionMembershipListener cache listener to a Region after the Region has been created, the listener will never have its initialMembers() method invoked. Only listeners added through cache.xml or through RegionAttributes at the time the Region is created will have their initialMembers() method invoked.
|
|
| 04/07/09
|
#40545
|
gii receives no response from source vm
|
6.0
|
closed
|
hang creating region with disable-tcp set to true
|
A bug in the startup code in the fragmentation protocol used for UDP messaging was found to cause a hang in region creation when the distributed system property disable-tcp is set to true. The hang is caused by a race condition that causes the member that is creating the region to ignore a message from a member that has been selected to send the contents of the region.
|
|
| 04/06/09
|
#40523
|
InternalGemFireException: While calling refresh() causedBy: javax.management.InstanceNotFoundException
|
6.0
|
closed
|
InternalGemFireException received when invoking SystemMemberCache.getRegion(..) JMX API on the AdminAgent on IBM J9 JVM
|
This is caused by a known issue in the IBM JVM. It may not occur consistently. The solution is to turn off JIT compilation for RegionStatisticsResponse.create().
|
Turn off JIT compilation for com.gemstone.gemfire.internal.admin.remote.RegionStatisticsResponse.create()
|
| 04/03/09
|
#40509
|
There is no error given when we try starting the agent specifying an incorrect path for its property-file.
|
6.0
|
closed
|
Admin agent would silently apply default properties if it could not find its properties file.
|
Admin agent used to silently apply default properties if it could not find its properties file. Now the agent adds a log entry when it applies default values for its configuration properties. The logged string is: "Using default configuration because property file was not found".
|
|
| 04/01/09
|
#40500
|
JMX Agent startup fails with ipv6 enabled
|
6.0
|
closed
|
JMX agent fails to start when using IPv6
|
This problem occurs when using the default rmi-bind-address, "localhost", and IPv6 on a machine where the address returned by a call to java.net.InetAddress.getLocalhost() returns an IPv6 link-local address. This is primarily a Windows issue because of the IPv6 implementation requiring a link-local address to also be create when configuring a machine to support IPv6 and the order that these are created in varies from machine to machine.
This error will manifest as an AgentImpl$StartupException.
{{{
A quick synopsis of the stack is provided below:
com.gemstone.gemfire.admin.jmx.internal.AgentImpl$StartupException: Failed to start RMI service
at com.gemstone.gemfire.admin.jmx.internal.AgentImpl.startRMIConnectorServer(AgentImpl.java:1141)
at com.gemstone.gemfire.admin.jmx.internal.AgentImpl.start(AgentImpl.java:263)
at hydra.AgentHelper.startAgent(AgentHelper.java:129)
at admin.AdminTest.startAgentTask(AdminTest.java:120)
...
Caused by: java.io.IOException: Cannot bind to URL [rmi://:26120/jmxconnector]: javax.naming.NoPermissionException [Root exception is java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
java.rmi.AccessException: Registry.Registry.bind disallowed; origin /fe80:0:0:0:21a:a0ff:fe27:ddbe is non-local host]
...
Caused by: javax.naming.NoPermissionException [Root exception is java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
...
Caused by: java.rmi.ServerException: RemoteException occurred in server thread; nested exception is:
java.rmi.AccessException: Registry.Registry.bind disallowed; origin /fe80:0:0:0:21a:a0ff:fe27:ddbe is non-local host
...
Caused by: java.rmi.AccessException: Registry.Registry.bind disallowed; origin /fe80:0:0:0:21a:a0ff:fe27:ddbe is non-local host
}}}
|
Specify an RMI bind address using the rmi-bind-address property:
./agent start rmi-bind-address=<ipv6 address>
or in a gemfire.properties file
rmi-bind-address=<ipv6 address>
Second workaround:
Edit the Windows hosts file, usually located in c:\WINDOWS\system32\drivers\etc\hosts to map a literal address to the hostname. Note entries are required for both IPv4 and IPv6 on machines that support both protocols (even for non-gemfire )
Create two entries:
[ipv4 literal] [full qualified host] [optional short hostname]
[ipv6 literal] [full qualified host] [optional short hostname]
Example:
15.168.12.81 mymachine.gemstone.com mymachine
fdf0:7c6f:eda8:9449::19 mymachine.gemstone.com mymachine
|
| 03/30/09
|
#40475
|
Distribution Locator Properties section in GFE SysAdminGuide might be confusing
|
6.0
|
closed
|
Sys Admin Guide has incorrect Distribution Locator syntax
|
System Administrator’s Guide -> chapter 8 -> section 'Distribution Locator Properties': The table of properties & the example below that mention properties required to use locators incorrectly.
The locators property should be configured as:
locators=host1[port1],host2[port2]
|
|
| 03/27/09
|
#40472
|
java.net.SocketException: Address family not supported by protocol family: bind encountered while starting bridge server
|
6.0
|
closed
|
java.net.SocketException: Address family not supported by protocol family: bind encountered while starting bridge server
|
When starting a GemFire cache server under Microsoft Windows, GemFire throws an exception when it tries to bind a server socket to an IPv6 address.
{{{
java.net.SocketException: Address family not supported by protocol family: bind
at sun.nio.ch.Net.bind(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl.<init>(AcceptorImpl.java:336)
at com.gemstone.gemfire.internal.cache.BridgeServerImpl.start(BridgeServerImpl.java:276)
}}}
This is caused by a JVM bug, #6230761, that causes Java "New I/O" sockets to not work with IPv6 on Microsoft Windows machines.
GemFire 6.0 detects this condition and automatically sets max-threads to zero after issuing this warning:
{{{
Ignoring max-threads setting and using zero instead due to Java bug 6230761: NIO does not work with IPv6 on Windows. See GemFire bug #40472
}}}
|
To work around this problem, disable the thread pool in the GemFire server by setting max-threads to zero.
|
| 03/25/09
|
#40461
|
Suspect string DiskAccessException caused by ArrayIndexOutOfBoundsException
|
6.0
|
closed
|
com.gemstone.gemfire.cache.DiskAccessException thrown when using persistent regions
|
Previous versions of GemFire (6.0 and earlier) used to occassionally see an ArrayIndexOutOfBoundsException wrapped as a DiskAccessException. This was coming out of the JDBM code that we used in conjunction with tran logging in our persistence layer.
The use of JDBM has been completely removed in 6.5
|
|
| 03/23/09
|
#40442
|
PartitionedRegion ops hang in waitForPrimary member after NPE thrown from BucketAdvisor.sendProfileUpdate()
|
6.0
|
closed
|
NullPointerException from Thread.holdsLock with JRockit
|
With the Jrockit VM, we have on rare occasions seen NullPointerExceptions from the java.lang.Thread.holdsLock method.
|
|
| 03/12/09
|
#40390
|
Stats sampling should occur implicitly when a JMX client connects to the AdminAgent
|
5.7
|
closed
|
On start up, JMX Admin Agent now immediately connects in the GemFire Distributed System and initializes Member & Statistics MBeans
|
Default value for JMX Admin Agent boolean property 'auto-connect' is changed to 'true'. Hence, on start up, the JMX Admin Agent now immediately connects in the GemFire Distributed System & initializes MBeans for existing GemFire Members. In addition to this, while initializing a Member MBean, associated Statistics MBeans are also initialized.
|
|
| 03/12/09
|
#40389
|
PR eviction to disk degrades with number of buckets
|
|
closed
|
6.5 oplog new design
|
6.5's oplog design resolved this issue. All the buckets shared the same oplog file.
|
|
| 03/10/09
|
#40369
|
Hang while creating region during StateFlushOperation.flush
|
6.0
|
closed
|
hang creating a region with scope distributed-no-ack and using disable-tcp=true
|
It is possible for GemFire to hang when attempting to create a Region if the distributed system property "disable-tcp" is set to true and the distribution scope of the region is "distributed-no-ack".
|
|
| 03/08/09
|
#40360
|
FileNotFoundException is logged for /tmp/agent.ser while running the Agent test.
|
6.0
|
closed
|
Failure to persist updated agent configuration causes FileNotFoundException
|
Failure to persist agent configuration information causes the following warning to be logged without terminating the agent: "Encountered a java.io.FileNotFoundException while saving StatAlertDefinitions." All changes to the configuration are lost. An attribute 'canPersistStatAlertDefs' for AdminDistributedSystem MBean indicates whether the information could be persisted or not.
|
Validate that the current working directory/ the -dir option has full write permissions for the user launching the agent. A boolean attribute 'canPersistStatAlertDefs' for AdminDistributedSystem MBean indicates whether the working directory has full write permissions for the user launching the agent.
|
| 03/04/09
|
#40350
|
SIGSEGV in CacheClientProxy with SUN JRE 1.6.0_10
|
|
closed
|
SIGSEGV in CacheClientProxy with SUN JRE 1.6.0_10
|
SIGSEGV in CacheClientProxy with SUN JRE 1.6.0_10.
This is observed with 6.0 and in 6.5 the later version of JDK is used.
|
|
| 02/26/09
|
#40324
|
NullPointerException in CacheClientProxy.processMessage
|
|
closed
|
NullPointerException in cache server during spike in data operations
|
In very rare instances, a cache server would encounter a NullPointerException due to a race.
|
|
| 02/16/09
|
#40250
|
If roller is active at the time of region.close it can end up writing a dummy byte & thuse loose the original value
|
|
closed
|
Closing a persistent region results in a missing value
|
In rare cases, closing a persistent region can lead to a single value in the persistent data being lost.
|
|
| 02/15/09
|
#40243
|
Test fails with Timeout during netsearch/netload/netwrite (IllegalMonitorStateException during pushing message )
|
6.0
|
closed
|
IllegalMonitorException exceptions with JDK 1.6
|
If you encounter IllegalMonitorStateExceptions while using GemFire with Sun's implementation of JDK 1.6, we advise using the VM option
{{{
-XX:+UseHeavyMonitors
}}}
|
|
| 02/09/09
|
#40198
|
BridgeServer with SELECTOR enabled shutdown timeout
|
6.0
|
closed
|
GemFire hangs during attempt to close the cache
|
Running on Microsoft Windows with the JRockit JVM, we have seen GemFire hang when an attempt is made to close the cache in a server VM. The hung thread will have a stack similar to this:
{{{
-- Blocked trying to get lock: java/lang/Object@0x048F4128[thin lock]
at jrockit/vm/Threads.sleep(I)V(Native Method)
at jrockit/vm/Locks.waitForThinRelease(Locks.java:1209)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1342)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1259)[optimized]
at jrockit/vm/Locks.monitorEnter(Locks.java:2439)[optimized]
at sun/nio/ch/WindowsSelectorImpl.wakeup(WindowsSelectorImpl.java:75)
at com/gemstone/gemfire/internal/cache/tier/sockets/AcceptorImpl.close(AcceptorImpl.java:1548)
^-- Holding lock: java/lang/Object@0x048F3EB8[thin lock]
at com/gemstone/gemfire/internal/cache/BridgeServerImpl.stop(BridgeServerImpl.java:351)
^-- Holding lock: com/gemstone/gemfire/internal/cache/BridgeServerImpl@0x04F329A8[thin lock]
at com/gemstone/gemfire/internal/cache/GemFireCache.stopServers(GemFireCache.java:1118)
^-- Holding lock: java/lang/Object@0x04981AD0[thin lock]
at com/gemstone/gemfire/internal/cache/GemFireCache.close(GemFireCache.java:913)
^-- Holding lock: java/lang/Class@0x0436A180[recursive]
at com/gemstone/gemfire/internal/cache/GemFireCache.close(GemFireCache.java:793)
}}}
This is due to a flaw in JRockit's implementation of NIO socket selectors. GemFire v6.0 detects the use of JRockit on Windows and disables the use of NIO socket selectors after issuing this warning:
Ignoring max-threads setting and using zero instead due to JRockit NIO bugs. See GemFire bug #40198
|
|
| 02/09/09
|
#40196
|
executeCqOnRedundantsAndPrimary throws CQException "Failed to execute the CQ ... Error from last server: Primary discovery failed"
|
6.0
|
closed
|
Error while executing CQ.
|
This was happening due to multiple threads accessing the same CQ. This is fixed in GemFire 6.0.
|
|
| 02/04/09
|
#40159
|
Hang in MapInterfaceTest.testBlockGlobalScopeInSingleVM
|
6.0
|
closed
|
Distributed lock requests fail to timeout
|
Lock requests may fail to timeout under certain conditions. A thread requesting a distributed lock may continue waiting beyond the configured lock-timeout or specified waitTimeMillis.
This should be a temporary condition and the thread will eventually either acquire the lock after waiting longer than it should or it will timeout later than it should.
The most likely condition leading to this is lock requests, or Global Region puts, initiated while locking is suspended or while the Global Region is initializing (get initial image) in any member of the distributed system.
|
|
| 02/02/09
|
#40147
|
Serialization types should be registerable via cache.xml declaration
|
5.7
|
closed
|
Dataserializable types have to be programmatically registered with the GemFire server cluster
|
In prior versions of GemFire, users were required to register types programmatically by defining a static initializer block on each VM that supplied the type of the class being registered.
Starting GemFire 6.0, types can be defined declaratively in the cache.xml file using the following syntax.
{{{
<serialization-registration>
<serializer>
<class-name>com.gemstone.util.MySerializer</class-name>
</serializer>
<instantiator id="101">
<class-name>com.gemstone.util.DateTest</class-name>
</instantiator>
<instantiator id="102">
<class-name>com.gemstone.util.IndexMap</class-name>
</instantiator>
</serialization-registration>
}}}
|
|
| 01/28/09
|
#40129
|
lastModifiedTime from an empty region is 0
|
5.7
|
closed
|
Expiration is broken when actions originate on a region with DataPolicy.EMPTY
|
Prior to 6.0, the lastModifiedTime (used for calculating expiration time for an entry in a region) was being set to 0 if the entry was modified from a VM that had the region with a data policy set to DataPolicy.EMPTY, causing incorrect expiration behavior for the entry.
In 6.0, the lastModifiedTime is propagated from the accessing node and applied correctly across the system.
|
|
| 01/24/09
|
#40109
|
OOME in parReg/parRegCreateDestroy
|
6.0
|
closed
|
Server could run out of memory during rebalancing
|
In prior versions of GemFire, creation and destruction of Partitioned Regions could eventually lead to the server running out of memory. This was most likely to occur during intensive re-balancing operations on the partitioned regions. This has been fixed in GemFire 6.1
|
|
| 01/23/09
|
#40105
|
CacheClientProxy stats leak
|
5.7
|
closed
|
Garbage CacheClientProxy stats building up on the server
|
Killing clients isn't cleaning up the CacheClientProxy stats for that client on the server side. Over time, these stat objects take up memory and CPU.
|
|
| 01/20/09
|
#40082
|
JMX Agent startup should ignore any gemfire.properties present in the path
|
6.0
|
closed
|
Conflicting properties in gemfire.properties and agent's properties file could prevent the admin agent from functioning properly
|
The agent now uses only the properties listed in its own properties file (default name: agent.properties or specified through property-file=<my agent's property filename>) and ignores the gemfire.properties file that may exist in either of:
(1) The current directory, or
(2) user home directory, or
(3) the class path.
|
|
| 01/19/09
|
#40078
|
Region.keySetOnServer() has unexpected behavior when server regions are DataPolicy.NORMAL/EMPTY mix
|
5.0
|
closed
|
Region.keySetOnServer and containsKeyOnServer can provide inconsistent results if server regions are not replicated or partitioned
|
Client calls to keySetOnServer and containsKeyOnServer can return incomplete or inconsistent results if your server regions are not configured as partitioned, replicated or empty. Normal and mixed (replicated, normal, empty) server region configurations give inconsistent results since they allow different data on different servers. There is no additional messaging on the servers, so no union of keys across servers or checking other servers for the key in question occurs.
|
|
| 01/16/09
|
#40057
|
Assertion error while creating bucket in region.(Test:parReg/event/concParRegEvent.conf)
|
6.0
|
closed
|
InternalGemFireError thrown when putting a value into a partitione region
|
When calling Region.put(Object) on a Partitioned Region, it is possible that the region will throw an InternalGemFireError stating "Did not finish sending image, but region, cache, and DS are alive."
This is caused by a faulty termination check in one of GemFire's data replication algorithms.
|
|
| 01/07/09
|
#40011
|
EnforceUniqueHostStorageAllocation allows bucket copies on the same host
|
6.0
|
closed
|
Two copies of a bucket in the same host with EnforceUniqueHostStorageAllocation
|
There is a small window where setting the EnforceUniqueHostStorageAllocation flag fails to prevent two copies of bucket from ending up the same host. This can occur when a rebalance operation is performed simultaneously with the first update to the bucket.
|
|
| 12/18/08
|
#39943
|
New vm unable to contact locator
|
6.0
|
closed
|
GemFireConfigException states that no Locators could be contacted
|
A GemFireConfigException with the text
{{{
Unable to contact a Locator service. Operation either timed out or Locator does not exist. Configured list of locators is
}}}
(followed by a list of the configured locators) may be thrown when the locators were up and reported the VM correctly contacting them.
The problem is caused by a race condition between two threads in JGroups startup code.
|
|
| 12/17/08
|
#39931
|
hang creating region when peer logs that "Peer has disappeared from view"
|
|
closed
|
Hang creating region when peer logs that "Peer has disappeared from view
|
A vm logs that it did not receive all of the expected startup responses within 15 seconds, and then hangs trying to create a Region. Another vm logged that it failed to send a Startup response to the hung vm because it had "disappeared from view".
The hang is caused by a race condition in the other vm that caused it to incorrectly shun the new vm.
|
|
| 12/17/08
|
#39930
|
ConcurrentModificationException during shutdown
|
6.0
|
closed
|
ConcurrentModificationException thrown by DistributedSystem.disconnect()
|
Under rare circumstances, it is possible for DistributedSystem.disconnect() to throw a ConcurrentModificationException. The property disable-tcp must be set to true for this to happen, and another vm must be starting up concurrently.
|
|
| 12/15/08
|
#39925
|
primary balancing after VM recycled not yet implemented
|
6.0
|
closed
|
Primary buckets not balanced after recovery
|
If a member hosting a partitioned region crashes and is subsequently restarted, it will not receive any primary buckets. This can lead to an imbalance in load across the members.
|
|
| 11/26/08
|
#39859
|
Hang in JChannel.disconnect()
|
|
closed
|
Hang in DistributedSystem.disconnect() waiting for JGroups to disconnect
|
In very rare circumstances, the DistributedSystem.disconnect() method may hang trying to shut down the JGroups membership stack. This is due to a defect in the JGroups Promise class, and has been fixed in GemFire v6.0
|
|
| 11/07/08
|
#39800
|
Query Authorization needs a (public) mechanism to modify SelectResults
|
5.7
|
closed
|
Query Authorization needs a (public) mechanism to modify SelectResults
|
There is no way for a user to modify the query results using public classes when "isModifiable" returns false. Currently there is no mechanism in GemFire to allow a post operation security callback to modify query result being sent to the client if the result is unmodifiable.
|
|
| 11/07/08
|
#39799
|
IllegalThreadStateException thrown by JGroups JChannel when network dropped during DistributedSystem.connect()
|
6.0
|
closed
|
IllegalThreadStateException thrown during DistributedSystem.connect()
|
When attempting to connect to GemFire with DistributedSystem.connect(), in rare circumstances the method may throw an IllegalThreadStateException. We have observed this happening when enable-network-partition-detection is enabled in the distributed system properties and a network partition occurs during the connection attempt.
|
|
| 11/03/08
|
#39772
|
Queues are filling up and not draining in WAN tests
|
|
closed
|
WAN Gateways May Not Initialize Correctly
|
There is a race condition when starting a gateway that may cause a primary gateway to never process any incoming events. This can be confirmed by identifying messages in the logs indicating that the gateway queues are not draining.
|
Stop and restart the gateway.
|
| 10/29/08
|
#39760
|
BucketAdvisor fails assertion in Loner because of DummyExecutor
|
5.7
|
closed
|
Partioned Regions are not supported for loner members
|
Loner member (a GemFire connection defined by mcast-port of zero and no locators) should not use Partitioned Regions. Use a Local Region instead.
Versions of GemFire prior to 6.0 may throw unexpected InternalGemFireErrors if attempting to use a Partitioned Region in a Loner, especially with redundancy > 0. GemFire 6.0 will allow this, but it's not a practical configuration except for testing purposes.
{{{
Assertion error creating bucket in region
com.gemstone.gemfire.InternalGemFireError: Attempting to sendProfileUpdate while synchronized may result in deadlock
at com.gemstone.gemfire.internal.Assert.throwError(Assert.java:75)
at com.gemstone.gemfire.internal.Assert.assertTrue(Assert.java:93)
at com.gemstone.gemfire.internal.cache.BucketAdvisor.sendProfileUpdate(BucketAdvisor.java:808)
at com.gemstone.gemfire.internal.cache.BucketAdvisor.acquiredPrimaryLock(BucketAdvisor.java:579)
at com.gemstone.gemfire.internal.cache.BucketAdvisor$VolunteeringDelegate.doVolunteerForPrimary(BucketAdvisor.java:1443)
at com.gemstone.gemfire.internal.cache.BucketAdvisor$5.run(BucketAdvisor.java:1398)
at com.gemstone.gemfire.internal.cache.BucketAdvisor$6.run(BucketAdvisor.java:1645)
at com.gemstone.gemfire.distributed.internal.LonerDistributionManager$DummyExecutor.execute(LonerDistributionManager.java:441)
at com.gemstone.gemfire.internal.cache.BucketAdvisor$VolunteeringDelegate.execute(BucketAdvisor.java:1600)
at com.gemstone.gemfire.internal.cache.BucketAdvisor$VolunteeringDelegate.volunteerForPrimary(BucketAdvisor.java:1396)
at com.gemstone.gemfire.internal.cache.BucketAdvisor.volunteerForPrimary(BucketAdvisor.java:541)
}}}
|
Loners should use Local Regions. Partitioned Regions should only be used by a distributed system of two or more members.
|
| 10/27/08
|
#39753
|
JVM version issue for AIX
|
5.7
|
closed
|
AIX JVM 1.6 version issue
|
The BlockingHARegionJUnitTest will fail for 2 reasons:
1) it became very slow, and 30 seconds is not enough to feed 20000 entries
while 1.5 and new 1.6 can.
2) the total region size will exceed 20000. We set the region capacity to 10000,
it should only contain up to 20000 entries.
1.5. and new 1.6 do not have this problem.
The root cause is the used AIX jvm version has problem. It is:
java version "1.6.0-internal"
Java(TM) SE Runtime Environment (build pap3260-20070819_01)
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260-20070817_13537
(JIT enabled)
J9VM - 20070817_013537_bHdSMR
JIT - dev_20070817_1300
GC - 20070815_AA)
The old workable 1.5 JVM is:
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pap32dev-20071008 (SR6))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20071007 (JIT
enabled)
J9VM - 20071004_14218_bHdSMR
JIT - 20070820_1846ifx1_r8
GC - 200708_10)
JCL - 20071008
The new workable 1.6 JVM is:
java version "1.6.0"
Java(TM) SE Runtime Environment (build pap3260sr2-20080818_01(SR2))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260-20080816_22093
(JIT enabled, AOT enabled)
J9VM - 20080816_022093_bHdSMr
JIT - r9_20080721_1330ifx2
GC - 20080724_AA)
JCL - 20080808_02
|
|
| 10/22/08
|
#39738
|
load may be invoked more than once for a single get
|
|
closed
|
Load may be invoked more than once for a single get
|
In versions prior to 6.0, if the loader returned null, it would get invoked a second time.
Starting 6.0, a return value of null is considered a successful invocation of the loader.
The public javadocs on load now state this:
{{{
@return the value supplied for this key, or null if no value can be
supplied. A local loader will always be invoked if one exists.
Otherwise one remote loader is invoked.
Returning <code>null</code> causes
{@link Region#get(Object, Object)} to return <code>null</code>.
}}}
|
|
| 10/17/08
|
#39724
|
unnecessary credential verification being performed every 10 seconds
|
5.5
|
closed
|
Unnecessary credential verification being performed every 10 seconds
|
GemFire periodically retransmits membership information to all members of the distributed system. There is a flaw in the product that currently causes re-verification of security credentials when this happens. The retransmission period is based on the member-timeout setting of the distributed system and is currently set at twice the member-timeout interval.
|
|
| 10/16/08
|
#39723
|
Test hangs in CqService.closeNonDurableClientCqs() during shutdown
|
6.0
|
closed
|
resolved JVM issue
|
Same bug as 40490, 39130, 40243. It's been identified as a JVM issue and has been fixed in 1.6.0_14 and later.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6699669
Confirmed with Dick, we have suggested customers to use 1.6.0.17. So the problem will not been seen
|
|
| 09/26/08
|
#39656
|
New member incorrectly shunned
|
5.7
|
closed
|
New member is incorrectly shunned by other members of the distributed system
|
When a new member starts up and attempts to connect to the distributed system, it may hang trying to create tcp/ip connections to existing members of the system. This can happen if the new member uses the same UDP membership port as a recently departed member on the same machine.
GemFire uses this UDP port and host address to identify members of the distributed system. When a member leaves the distributed system, it is shunned for a short period of time to prevent inappropriate communications from taking place. If this UDP port is reused, as can happen on some operating systems (Windows) more easily than others (*nix), the new member that is reusing the port will be incorrectly shunned by other members.
|
Restart the application
|
| 09/26/08
|
#39654
|
Client throws an exception if it encounters UNDEFINED in query results
|
5.7
|
closed
|
Client exception with UNDEFINED value in query results
|
This could happen when compiled select encounters null/undefined value. This is fixed in GemFire 6.0 release.
|
|
| 09/22/08
|
#39632
|
losingSide VM does not process afterRegionDestroyed (FORCED_DISCONNECT) event and hangs in destroy operation after networkPartition
|
5.7
|
closed
|
Network partition with a gateway enabled can result in hang
|
If a network partition occurs in a site with a gateway, the gateway member may hang trying to process events.
|
|
| 09/19/08
|
#39624
|
bloom-vm failure with ServerConnectivityException: Pool unexpected socket timed out on client
|
5.7
|
closed
|
Unexpected socket timed out on client with 1.6.0_5, 1.6.0_7
|
Due to a bug in the java, using Sun JDK 1.6.0_5, 1.6.0_7 and configuring the bridge server's max-threads setting to something other than 0 can result in the client seeing this error "ServerConnectivityException: Pool unexpected socket timed out on client"
|
Set this system property to true to work around the issue, or upgrade to later JDK.
-DCacheServer.NIO_SELECTOR_WORKAROUND=true
|
| 09/18/08
|
#39618
|
Updates can be lost with WAN Gateway failover in mlRioWithConflation
|
5.7
|
closed
|
Updates can be lost during WAN Gateway failover when conflation is enabled
|
With conflation enabled on a WAN gateway, if the primary gateway fails on the sending side, there is a small window where an event that occurs on the sending side can fail to be transmitted to the receiving side.
|
|
| 09/10/08
|
#39582
|
Need API for localPut on client
|
|
closed
|
Client side localPut API support.
|
After further discussions, and given that we plan to simplify our region interfaces in the future to allow client only operations using the same API set that we have today, we decided to shelve this feature request.
|
|
| 09/09/08
|
#39578
|
getInitialImage misses a concurrent operation
|
5.7
|
closed
|
New replicate region inconsistent with other replicates when transactions are being performed
|
When transactions are being performed on a replicate region and another cache creates a new replicate of the region, the new replicate may miss operations performed in the transaction.
|
There is no workaround. This bug is fixed in GemFire v6.0
|
| 07/29/08
|
#39338
|
getInitialImage test fails when multiple VMs miss a create event
|
5.7
|
closed
|
Multicast may deliver no-ack events out of order
|
When using multicast for message distribution with Regions having distributed-no-ack scope, operations may be applied out of order in other VMs. This is caused by a race condition between the multicast and unicast reader threads when multicast retransmissions are performed.
|
use distributed-ack scope, or do not use multicast for distribution
|
| 07/28/08
|
#39329
|
Java-level deadlock in InternalDistributedSystem.disconnect
|
5.7
|
closed
|
Java-level deadlock in InternalDistributedSystem.disconnect
|
While rare it is possible to encounter a Java-level deadlock while calling DistributedSystem.disconnect()
|
|
| 07/26/08
|
#39323
|
JMX agent command line doesn't start agent
|
|
closed
|
JMX agent command line fails silently
|
The JMX agent launcher does not correctly detect and report problems
in starting the agent. For instance, if one of the TCP/IP ports
is in use by another process, the agent will not start the service
on that port but will launch without reporting any problems.
|
Examine the agent.log file to see if there were any problems in launching the agent.
|
| 07/23/08
|
#39310
|
NPE thrown from IndexCreationMessage.operateOnPartitionedRegion
|
5.7
|
closed
|
NullPointerException may occur when creating an index on a partitioned region
|
A NullPointerException may sometimes occur when an index is being created on a partitioned region and a separate thread is removing the same index at approximately the same time. If this occurs, the NullPointerException can be safely ignored.
|
|
| 07/23/08
|
#39308
|
Reinitializing vms get tangled up trying to create indexes
|
5.7
|
closed
|
Hang with index creation on partition region.
|
A index creation could cause deadlock between the threads in two different vms in distributed system hosting the same partition region, because of synchronization code locking the same object while processing the request and response between the vms.
|
|
| 07/22/08
|
#39298
|
async writer thread will cause puts to lock forever if it exits
|
|
closed
|
Puts could hang when using asynchronous disk persistence
|
In versions prior to 6.0, puts would hang if the disk persistence mechanism encountered a I/O error causing the disk writer thread to exit prematurely.
This has been addressed in 6.0 and the disk writer thread does not exit prematurely on encountering any errors. It logs an exception which causes the region to be closed allowing other threads and members to continue.
|
|
| 07/01/08
|
#39174
|
Two VMs using different mcast addresses still discover each other
|
5.7
|
closed
|
Distributed systems with different multicast addresses find each other and join the same group
|
Due to the way Linux interprets RFC 1112, multicast sockets using the same port will receive datagrams from each other even if using different multicast addresses.
{{{
See these links for more information:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=231899
http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4701650
http://www.uwsg.iu.edu/hypermail/linux/net/0211.1/0003.html
}}}
|
Make sure to select different multicast ports for different distributed systems to keep them isolated from one another.
|
| 06/25/08
|
#39144
|
The OQL TO_DATE function does not support minutes properly
|
5.5
|
closed
|
OQL_TO_DATE function incorrectly processed the MM formatting token
|
In versions prior to 6.0, the OQL engine does not distinguish between the formatting strings for month and minutes (MM and mm respectively).
In 6.0, this has been addressed.
|
|
| 05/29/08
|
#39011
|
Redundant buckets should always be on different host when possible
|
5.5
|
closed
|
Redundant copies of data should always be on different hosts when possible
|
GemFire tries to locate redundant copies of data on different physical hosts to protect the system from process failure as well as machine failure. In situations where multiple hosts are not available, redundant copies may be colocated on the same machine, protecting the system against process failure but not machine failure.
|
|
| 05/23/08
|
#38991
|
Instantiators are not sent from server to client when client connects.
|
5.5
|
closed
|
Clients did not receive instantiators already registered on the server
|
Instantiators enable optimization of the deserialization of DataSerializable types. In prior versions of GemFire, a client connecting to a server may not always receive the instantiations already registered on the server.
In GemFire 6.1, these registered instantiators are sent by the server to the client during the connection setup.
|
|
| 04/25/08
|
#38843
|
Conflicting transaction can proceed if both the transaction manager and grantor crash
|
5.5.1
|
closed
|
Conflicting transaction can proceed if the transaction manager crashes while distributing the commit
|
If the transaction manager (the member performing the transaction)
crashes while transaction participants are in the process of applying
the commit, then it's possible for a new transaction to begin and commit
with key conflicts that are not detected.
|
There is a workaround for application members. This bug can be prevented by adding a method call in members with regions that are involved in transactions. After creating the GemFire Cache, make this call:
com.gemstone.gemfire.internal.cache.locks.TXLockService.createDTLS();
This only needs to be done once for any Cache instance.
|
| 04/15/08
|
#38782
|
Installer throws FileNotFoundException when it is run from a directory with spaces
|
5.5.0
|
closed
|
GemFire Installation fails with java.util.zip.ZipException when run from a directory with spaces in it
|
The GemFire installer does not correctly handle spaces in the name of the directory which contains the installer itself. Note this is literally the directory that the installer is in, not the directory the user selected as the destination.
The failure comes with a stack trace like this:
The system cannot find the path specified
Exception in thread "main" java.util.zip.ZipException: The system cannot find the path specified
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:203)
at java.util.zip.ZipFile.<init>(ZipFile.java:234)
at ZipSelfExtractor.extract(ZipSelfExtractor.java:99)
at ZipSelfExtractor.main(ZipSelfExtractor.java:34)
|
Move the Gemfire installer jar into a directory without spaces and rerun it.
|
| 04/11/08
|
#38776
|
thin clients get unexpected nulls from bridge server
|
5.5
|
closed
|
gets begin to return null for keys that are known to be in the cache
|
The symptoms are that you have client connected to a bridge server and he is doing puts and gets and then after about 90 seconds all of your gets start returning null despite the fact that you had already put data for those keys into the cache. 90 seconds is roughly how long it will take for the hotspot to begin optimizing the ConnectionProxyImpl class in GemFire and then the problem is manifest.
First verify that this isn't simply a case of your eviction policy causing your data to be evicted before you do a get.
The cause is a JVM optimization in Sun's 1.6.0_4 JRE and later versions also.
To identify this bug:
start your JVM with -Xint to force the VM to run in interpreted mode (no hotspot compilation will occur). You should be able to a series of puts and gets for a sustained period (5-10 minutes ought to do it) without getting errant nulls back as values.
|
Use 1.6.0_3 JRE or earlier, the optimization is not present in these JRE versions.
Or you can use a .hotspot file to prevent compilation of the problematic method. See Sun's documentation for more detailed information on using a file to control the hotspot compilation.
Add this to your java command line: -XX:CompileCommandFile=someFile.txt
Then inside someFile.txt addthis single line:
exclude com/gemstone/gemfire/internal/cache/tier/sockets/ConnectionImpl getObject
|
| 04/11/08
|
#38773
|
Missing CQ event (no HA)
|
|
closed
|
Missing CQ event during GII
|
This could happen when events are getting destroyed when secondary buckets are getting created, the key may not be there as part of the GII, and if the same secondary becomes primary and event is re-routed the CQ processing doesn't find the value and CQ processing fails. In 6.5 change is made so that events are tracked/flushed during GII.
|
|
| 03/31/08
|
#38719
|
RegionMembershipListener doesn't work for PR
|
6.0
|
closed
|
Partitioned Regions do not fire RegionMembershipListener events
|
If a RegionMemberShipListener is added to a Partitioned Region, the following methods do not fire for the listener:
initialMembers
afterRemoteRegionCreate
afterRemoteRegionDeparture
afterRemoteRegionCrash
|
|
| 03/27/08
|
#38709
|
Hitachi: HAClientQueue tries to participate in transaction, fails.
|
5.1
|
closed
|
NullPointerException received during transaction commit on servers
|
The configurations that could produce this exception are:
1) A client either a) registers interest in a region or b) creates a continuous query (aka CQ) with the region name in the query, both of which require the client property establishCallbackConnection=true
2) A server, to which the previously mentioned client is connected, performs an operation in a transaction that matches a) a region the client is interested in and b) matches the interest or CQ conditions the client has expressed.
3) The above transaction commits (versus rollback).
The transaction can be initiated as a JTA transaction or a GemFire transaction. If the above configuration is met, the thread committing the transaction will receive a NullPointerException with a stack similar to the following:
[severe 2008/03/25 16:39:49.471 PDS <Thread-4> nid=0x5f1ba8] CacheClientProxy[identity(client1(:loner):1:6364ecbb:ClientName1,connection=2); port=4623; primary=true]: Exception occurred while attempting to add message to queue
java.lang.NullPointerException
at com.gemstone.gemfire.internal.jta.TransactionImpl.registerSynchronization(TransactionImpl.java:197)
at com.gemstone.gemfire.internal.cache.LocalRegion.getJTAEnlistedTX(LocalRegion.java:5173)
at com.gemstone.gemfire.internal.cache.LocalRegion.put(LocalRegion.java:1098)
at com.gemstone.gemfire.internal.cache.AbstractRegion.put(AbstractRegion.java:188)
at com.gemstone.gemfire.internal.cache.ha.HARegionQueue.put(HARegionQueue.java:386)
at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientProxy$MessageDispatcher.enqueueMessage(CacheClientProxy.java:1724)
at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientProxy.processMessage(CacheClientProxy.java:674)
at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientNotifier.deliver(CacheClientNotifier.java:693)
at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientNotifier.notifyClients(CacheClientNotifier.java:376)
at com.gemstone.gemfire.internal.cache.BridgeServerImpl.notifyClients(BridgeServerImpl.java:257)
at com.gemstone.gemfire.internal.cache.LocalRegion.notifyBridgeClients(LocalRegion.java:3750)
at com.gemstone.gemfire.internal.cache.LocalRegion.invokePutCallbacks(LocalRegion.java:3716)
If this occurs, the transaction will have been partially applied to the local heap. It will not, however, have been distributed to other VMs that would have received the transaction updates.
The cause of this failure is the internal usage of regions to deliver to the client interest and continuous query data, particularly in the face of server failures (aka highly available or HA).
|
Avoid transactions on a Bridge Server.
|
| 02/29/08
|
#38555
|
Many EOF errors in cache.tier.sockets.HandShake
|
5.5
|
closed
|
Servers and clients may report each other's failures incorrectly
|
If a server crashes unexpectedly, a client that it was connected to may report the failure in a number of misleading ways, including indicating a corrupted message stream from that process.
Likewise, if a client crashes unexpectedly, a server that it was connected to may report the failure in a number of misleading ways, including indicating a corrupted message stream from that process.
|
Ignore these messages in the log.
|
| 02/28/08
|
#38548
|
memberCrashed is invoked when a new endpoint is added to a BridgeWriter
|
|
closed
|
memberCrashed is invoked when a new endpoint is added to a BridgeWriter
|
When BridgeWriter.addEndPoint() is invoked to add a new endpoint to the bridge writer the memberCrashed method is invoked on the BridgeMembershipListener with the new endpoint even if the new endpoint is actually available. It should invoke the memberJoined method as soon as the endpoint is actually live.
|
|
| 01/31/08
|
#38344
|
jmx test failure: could not getDistributedSystem during initialize -- Casued by java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: jmxconnector
|
5.1
|
closed
|
Gemfire agent command fails with RMI Naming errors on Windows Server with IPv6 enabled
|
See Sun Microsystem's bug on this issue:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6301779
An Excerpt from that url:
"The problem is caused by the fact that Java does not handle IPv6 link-local addresses correctly. The reason this problem is only seen on amd64 is to do with the IPv6 default setup on Windows 2003 Server - it maps link-local addresses to interfaces so that a call to InetAddress.getAllByName() on W2003S will return link-local addresses. (No link-local addresses are returned on XP)."
In Gemfire this problem manifests as either a SocketBindException or a
java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: jmxconnector.
|
workaround:
Use JRE 1.6.0 or higher
or
Add a line like this to C:\WINDOWS\system32\drivers\etc\hosts
fdf0:76cf::affd:9449:18 yourname.gemstone.com yourname
Where "fdf0:76cf::affd:9449:18" is a global IPv6 address for the machine named "yourname"
You only need to add this to the hosts file on the machine "yourname". You do not have to add an entry for "yourname" to each machine on your network.
|
| 01/30/08
|
#38330
|
PartitionedRegion tests can hang with threads in gemfire/internal/util/IdentityHash.index() [ IBM VM ]
|
5.1.0.4
|
closed
|
entry operations on PartitionedRegions can hang in HashIdentity.index() during DataSerializer.write() with IBM 1.5 VM
|
With IBM 1.5.0 VM, entry operations on PartitionRegions can hang in IdentityHash.index() during DataSerializer.write().
This is extremely rare and is a suspected JIT issue with the IBM VM.
|
|
| 01/28/08
|
#38294
|
Vestigial instances of Timer prevent WAR undeploy
|
5.0
|
closed
|
gemfire.jar does not correctly undeploy from an EJB server
|
Once an EJB application server is connected to a distributed system, it may not be able to correctly undeploy gemfire.jar.
|
If possible, try to configure your application server so that it does not attempt to undeploy the GemFire application.
|
| 01/20/08
|
#38235
|
Issue in CountDownLatch.await while creating disk region in diskRegionRecoveryAfterVmCrash.conf test
|
|
closed
|
Cache member hangs while creating a region
|
Under certain high availability conditions, a cache member may hang while attempting to recover a region from another member that has crashed.
In order for this to happen, a cache member needs to be creating a local copy of a region at the same time that another cache member crashes.
|
Kill and restart the hung cache member.
|
| 01/14/08
|
#38193
|
user defined DataSerializer instances need client server support
|
5.0, 5.1
|
closed
|
Newly registered DataSerializer not recognized on cache server and clients
|
Registration of a DataSerializer on a node with GemFire's data serialization framework was only propagated to other peer servers. It did not get propagated to clients. If the registration was done on a client, it was not sent to the cache servers. Registrations are now propagated to all cache servers and clients.
|
|
| 01/11/08
|
#38188
|
Transactions encapsulating multiple regions fail LRU eviction on recipient members
|
5.1
|
closed
|
Transactions that include multiple regions cause LRU problems in remote caches
|
This problem occurs when a GemFire transaction includes many Regions, like this:
txmgr = cache.getCacheTransactionManager()
txmgr.begin();
region1.put("a", "one");
region2.put("b", "two");
region3.put("c", "three");
txmgr.commit();
and two or more of the regions have LRU eviction configured in VMs that are remote to the VM where the transaction originates. In this scenario, the LRU mechanism in the remote VMs does not consistently evict the proper number of entries. The problem does not affect eviction in the VM where the transaction originates.
|
Only include a single region in a transaction or only have one region be configured with LRU behavior.
|
| 01/10/08
|
#38180
|
DLockTokens objects are not removed when the lock is released
|
5.1
|
closed
|
DistributedLockService does not remove resources for tracking locks
|
The DistributedLockService does not free up resources related to tracking locks. This also affects Global Regions, Partitioned Regions, and Gateway Hubs.
Calls to DistributedLockService.freeResources(Object) does nothing, thus introducing a memory leak for each distributed lock that is acquired.
|
The only workaround is to destroy the DistributedLockService. Destroying the DistributedLockService frees up all memory used to track locks.
DistributedLockServices that are explicitly created and used must be destroyed to free up resources for all locks.
For Global Regions, the Global region itself must be locally destroyed to free up all locking resources created for each key.
For Partitioned Regions, the Cache must be closed to free up locking resources.
For Gateway Hubs, the DistributedSystem must be disconnected to free up locking resources.
|
| 01/07/08
|
#38152
|
AssertionError: InitialImageOperation$RequestImageMessage <85> Did not finish sending message, but didn<92>t throw RegionDestroyed or CacheClosedException
|
|
closed
|
Failed initial image creation may throw AssertionError
|
If you close your cache while initializing the data in a distributed region, you may end up with a faulty AssertionError in the system logs.
|
Ignore this assertion error. It is harmless.
|
| 11/25/07
|
#38013
|
PR regions do deserialization on remote bucket during get causing NoClassDefFoundError
|
5.1
|
closed
|
Partitioned region puts throw NoClassDefFoundError on remote partitioned region members if the value class is not on the classpath
|
A partitioned region put will fail with NoClassDefFoundError if the value Object's class is not on the classpath of every member that configures data storage for that partitioned region. The only members that should require the class are those that need the value in object form (for example the member that actually does a get to read the value or the member with a CacheListener that calls getNewValue).
|
Add the value Object's class to all members that define the partitioned region.
|
| 11/21/07
|
#38011
|
memory leak when conserve-sockets false
|
5.0, 5.1
|
closed
|
conserve-sockets=false may run out of sockets
|
It is possible to see a member run out of sockets when using conserve-sockets=false. This can be caused by threads that own their own sockets having a short lifetime and new threads being created quickly that also own their own sockets.
|
Call DistributedSystem.releaseThreadsSockets before a thread's life comes to an end. This can be done from a finally block on the thread's run method.
|
| 11/01/07
|
#37942
|
OutOfMemoryError Causes Distributed System Failure
|
5.1
|
closed
|
Improper handling of instances of Java VirtualMachineError
|
When a Java virtual machine sends an instance of VirtualMachineError to a thread, it has indicated that it has broken the fundamental programming contract and can no longer be trusted.
The most common instance of this is OutOfMemoryError, which will be sent to <em>one</em> Thread somewhere in the JVM. All other Threads are effectively suspended at their next attempt to allocate memory until either a) enough memory becomes available, or b) the original thread that was signaled disappears.
In prior versions, GemFire did not properly handle VirtualMachineErrors. This improper handling manifested in numerous bugs in the system.
GemFire now has a cooperative mechanism by which a cache member can reliably recuse itself from the distributed system when a VirtualMachineError occurs. Notice, however, that in order for this to be reliable, your applications must also correctly trap and signal VirtualMachineError when they are thrown.
See the Javadocs for SystemFailure for details on this new API.
|
|
| 10/25/07
|
#37905
|
data-polcy="partition' is insufficient, <partition-attributes/> is required to create PR
|
5.1
|
closed
|
Partition region creation requires a partition-attributes element or a PartitionAttributes setting in the API
|
Setting the region data-policy to 'PARTITION' should cause a region to be created as a partitioned region, but it doesn't. The data-policy setting is accurately reported, but this setting does not cause the region to partition its data.
For the region to be created as a partitioned region, the region attributes must have a partition-attributes element in the cache.xml or a PartitionAttributes setting through the API. You do not need to set any non-default partition attributes settings, just use the partition attributes.
|
In the xml, add a partition-attributes element to the definition of the region, even if the element is empty.
In the API set the partition attributes through the region AttributesFactory setPartitionAttributes method, even if you just pass it a default PartitionAttributes instance.
|
| 10/08/07
|
#37821
|
Hang in shutdown while deleting file
|
6.0
|
closed
|
Hang while deleting the oplog during shutdown
|
While shutting down the system, the system gets hung while deleting the oplog. During this time the CPU seems to be 100%. From the stack trace this seems to be JVM related issue. This part of the code is removed in the latest 6.5 code base.
|
|
| 10/05/07
|
#37819
|
throughput decreases as number of buckets increases
|
GFD
|
closed
|
Partitioned Region read and write throughput decrease as buckets increases
|
For a given partitioned region, the larger the value for the totalNumBuckets attribute (setTotalNumBuckets), the smaller the throughput for create and get operations. During testing with 100 VMs participating in the partitioned region, 50 which store data, 50 which do not (setLocalMaxMemory to 0), the most dramatic change occurred when the totalNumBuckets attribute exceeded 499 buckets.
|
Use fewer than 499 buckets; however, only testing will truly indicate the proper values.
|
| 10/03/07
|
#37803
|
Installation Paths with spaces will cause the Native Client msi to error on some systems
|
5.1 GA
|
closed
|
Installation paths with spaces prevent the Native Client from installing correctly
|
While installing the Native Client on Microsoft Windows, if a path is specified that contains spaces, for example "C:\Program Files\GemStone\GemFire", the msi installer that is invoked from setupWin32_gf51.exe will fail causing a dialog box that details the msi command line syntax to appear. After dismissing this dialog the installation will continue and appear to have succeeded.
The Gemfire installation itself is OK, but the Native Client installation is not: only the native_client.msi and a few html files are installed for the Native Client.
|
Uninstall the product to clean up the system from the failed install. Then reinstall the product into a path without spaces.
|
| 10/02/07
|
#37795
|
partitioned region buckets are not balanced
|
5.1
|
closed
|
Partitioned Region data storage is skewed
|
When quickly loading data into a partitioned region, the number of buckets from one data store to the next may vary as much as 100%. Due to the seemingly random allocation of buckets this requires that all VMs for the partitioned region have up to two times the required memory for actual storage. Increasing the maximum number of buckets exaggerates the problem.
|
There are two ways to potentially work around this problem:
1) Artificially slow the rate of data loaded into a partitioned region.
2) Using the PartitionedRegionStats bucketCount to determine an imbalanced system (from VM to VM), for each VM with the worst imbalance, introduce a new VM to the partitioned region and then shutdown the offending VM.
|
| 10/01/07
|
#37779
|
member wrongly evicted by failure-detection does recognize membership changes
|
5.1
|
closed
|
Member that is kicked out of the distributed system may not realize it and continue to operate, eventually causing hangs
|
If you are using the gemfire.useFD or gemfire.FD_TIMEOUT system properties to select the alternative GemFire UDP-heartbeat failure detection mechanism, a member can be forcibly disconnected from
the distributed system if it does not respond quickly enough to "are you alive" messages. The member-timeout and gemfire.FD_TIMEOUT settings control this disconnect timeout. In version 5.1.x of GemFire, the disconnected member does not realize that it has been kicked out of the system and continues to try to operate. Eventually other members may hang.
We have only observed this with the alternate failure detection mechanism and only under significant CPU load. However, setting a short member-timeout period may exacerbate the problem and cause it to happen more easily.
|
Set a reasonably long member-timeout period when using gemfire.useFD, or set the timeout period with the deprecated gemfire.FD_TIMEOUT system
property.
|
| 10/01/07
|
#37772
|
Region recovery from disk fails with "DiskAccessException: Failed loading keys from <diskReg dirs>, Caused by: java.io.EOFException
|
5.1
|
closed
|
Region recovery from disk fails with "java.lang.Error: CRITICAL: page header magic for block *** not OK 0"
|
When switching out files for repair, this exception may disrupt recovery from disk. Switching is done when a JDBM exception has been encountered at least once already.
|
|
| 09/25/07
|
#37743
|
Region destroy/close does not close LRUStatistics
|
5.0, 5.1
|
closed
|
Eviction regions with short life span have unexpected memory and cpu consumption
|
Region close, localDestroyRegion, and destroyRegion on a region with eviction configured will not close the LRUStatistics object. If a large number of region destroys are done, this can cause the statistic sampler to consume an entire CPU and the unclosed statistic object to consume around 100 bytes of memory.
|
To prevent the memory leak avoid giving your LRU regions a short life span. This can be done by using region clear instead of doing a destroy/create.
To prevent the CPU consumption, you can disable statistic sampling.
|
| 09/24/07
|
#37736
|
Unusually high number of eviction failures in trunk build 132
|
5.1
|
closed
|
LRU Region eviction may happen early or late
|
The LRU limit is not strictly complied with when doing evictions. Evictions might be done slightly early (causing less space to be used than was specified) or slightly late (causing more space to be used than was specified).
|
You can set -Dgemfire.STRIPED_STATS_DISABLED=true to get the older version of statistics that causes strict compliance to the eviction limit.
|
| 09/23/07
|
#37727
|
Hang in waitForRegionCreateEvent of newly restarted VM during shutdown
|
5.1
|
closed
|
Hang during Cache close
|
In a client/serve high-availability test that repeatedly destroyed and created Regions and Caches in multiple VMs, we experienced a hang in a server VM. The server was in the process of exiting, and the GemFire shutdown hook was attempting to close the Cache. A stack dump (kill -QUIT) showed the hung thread was waiting on initialization of a Region, but no other threads were involved with creating a Region.
"vm_3_thr_3_bridge1_hs20c_11833" daemon prio=1 tid=0x085ab338 nid=0x2d24 in Object.wait() [0x5f0ce000..0x5f0ce5f0]
at java.lang.Object.wait(Native Method)
- waiting on <0x58cc26d8> (a com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch)
at java.lang.Object.wait(Object.java:432)
at com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:364)
at com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch.await(CountDownLatch.java:234)
- locked <0x58cc26d8> (a com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch)
at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:53)
at com.gemstone.gemfire.internal.cache.LocalRegion.waitOnInitialization(LocalRegion.java:3029)
at com.gemstone.gemfire.internal.cache.LocalRegion.waitForRegionCreateEvent(LocalRegion.java:1633)
at com.gemstone.gemfire.internal.cache.LocalRegion.dispatchEvent(LocalRegion.java:5290)
at com.gemstone.gemfire.internal.cache.LocalRegion.dispatchListenerEvent(LocalRegion.java:4240)
at com.gemstone.gemfire.internal.cache.LocalRegion.sendPendingRegionDestroyEvents(LocalRegion.java:4476)
at com.gemstone.gemfire.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:3868)
at com.gemstone.gemfire.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1250)
at com.gemstone.gemfire.internal.cache.LocalRegion.handleCacheClose(LocalRegion.java:4515)
at com.gemstone.gemfire.internal.cache.DistributedRegion.handleCacheClose(DistributedRegion.java:1700)
at com.gemstone.gemfire.internal.cache.GemFireCache.close(GemFireCache.java:581)
- locked <0x470b5ef8> (a java.lang.Class)
- locked <0x4b1cfe68> (a com.gemstone.gemfire.internal.cache.GemFireCache)
at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.doDisconnects(InternalDistributedSystem.java:773)
at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:904)
at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:668)
at com.gemstone.gemfire.distributed.DistributedSystem.disconnect(DistributedSystem.java:960)
at hydra.RemoteTestModule$2.run(RemoteTestModule.java:372)
|
No workaround
|
| 09/21/07
|
#37718
|
sudden heap growth in multicast smoke performance test
|
5.0
|
closed
|
Multicast retransmissions cause a slow memory leak
|
When using distribution scopes of DISTRIBUTED_ACK or GLOBAL with multicast-enabled=true, it is possible (though unlikely) that a VM will experience a memory leak. The leak is caused by multicast retransmission logic and can cause the VM to run out of heap space.
|
Change your configuration to use TCP instead of multicast
|
| 09/18/07
|
#37692
|
local scope persistent regions do not allow register interest
|
5.0
|
closed
|
CacheWriterException thrown from registerInterest on local persistent replicates
|
When the registerInterest method is called on a region with local scope and persistence enabled it will always throw a CacheWriterException with the message "Interest registration not supported on replicated regions".
|
|
| 09/16/07
|
#37657
|
Assertion: Commit data for TXLockId not found; expected values not distributed to all peers
|
5.1
|
closed
|
Severe log messages indicating transaction failures
|
A VM configured with conserve-sockets=false which originates a transaction may cause severe log messages in a receiving VM similar to the following:
Uncaught exception processing CommitProcessForLockIdMessage@17373340 lockId=TXLockId: newton(18461):40211/45363-2
java.lang.AssertionError: Commit data for TXLockId: TXLockId: newton(18461):4021
An indicator of problem on the sending VM is the occurrence of warning messages starting with the text: "Attempting TCP/IP reconnect to"
Regardless of the conserve-sockets setting, this failure should not occur when the transaction contains only Scope.DISTRIBUTED_NO_ACK regions.
|
Avoid mixing transactions and conserve-sockets false in the same VM.
|
| 09/05/07
|
#37563
|
PR put fails with AssertionError
|
5.1
|
closed
|
Calling getRegion on RegionExistsException returns partially initialized region.
|
If you are creating root regions, catching RegionExistsException and then calling the getRegion method on the RegionExistsException the region returned may not yet be initialized.
|
The workaround is to do this before you use the region returned by getRegion()
import com.gemstone.gemfire.internal.cache.LocalRegion;
catch (RegionExistsException ex) {
LocalRegion lr = (LocalRegion)ex.getRegion();
lr.waitOnInitialization();
// it is now ok to use the region returned by getRegion
|
| 09/05/07
|
#37562
|
DistributedSystem.connect() fails to return existing system
|
5.0
|
closed
|
DistributedSystem.connect() fails to return existing system
|
Calling DistributedSystem.connect() can result in the exception
java.lang.IllegalStateException: A connection to a distributed
system already exists in this VM. It has the following
configuration:
followed by the configuration.
This bug is caused by the mcast-flow-control setting not being properly handled when comparing the properties passed to the connect method (or provided in gemfire.properties) with the properties already held in existing system(s).
|
No workaround except to remove the mcast-flow-control setting from the properties.
This bug is fixed in GemFire v5.1.
|
| 08/31/07
|
#37549
|
split-brain in partitioned region: same partitioned region with multiple prId identifiers
|
6.0
|
closed
|
Split brain in partitioned regions
|
There is a rare race condition that can occur in assigning an internal identifier to a partitioned region. The condition causes the system to assign more than one identifier to a single partitioned region, with some processes using one identifier and some using another. Because of this, the processes with one identifier do not recognize operations performed on the Region by the processes using the other identifier and vice-versa.
We have not been able to isolate the cause of this race condition. It occurs very rarely and appears to happen when many processes attempt to initialize at the same time.
We have added a distributed consistency check that verifies that the correct internal identifier is being used. If the consistency check fails, you will see a warning message in one of two forms:
node(processID)memberID is using PRID 1 for regionName but this process maps that PRID to 2
node(processId)memberID is using PRID 1 for regionName but this process is using PRID 2
|
|
| 08/15/07
|
#37388
|
ArrayIndexOutOfBoundsException when log-disk-space-limit is set
|
5.0.1
|
closed
|
ArrayIndexOutOfBoundsException when log rolling enabled and a log-disk-space-limit configured
|
When log rolling is enabled and a log-disk-space-limit is configured then the code that checks the disk space limit may throw an ArrayIndexOutOfBoundsException. An example stack follows:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at com.gemstone.gemfire.internal.ManagerLogWriter.checkDiskSpace(ManagerLogWriter.java:440)
at com.gemstone.gemfire.internal.ManagerLogWriter.checkDiskSpace(ManagerLogWriter.java:452)
at com.gemstone.gemfire.internal.ManagerLogWriter.switchLogs(ManagerLogWriter.java:213)
at com.gemstone.gemfire.internal.ManagerLogWriter.rollLog(ManagerLogWriter.java:457)
at com.gemstone.gemfire.internal.ManagerLogWriter.put(ManagerLogWriter.java:496)
at com.
|
Since rolling logs also leaks file descriptors you should disable log rolling in 5.0.1 by setting log-file-size-limit to zero.
If you are willing to live with the file descriptor leak then you can work around this ArrayIndexOutOfBoundsException by setting log-disk-space-limit to zero.
|
| 07/25/07
|
#37229
|
put from client into PR region fails with IMQException returned from cacheserver
|
|
closed
|
IMQException while doing put in PR.
|
put from client into PR region fails with IMQException returned from cacheserver.
Reason:
For a query to work correctly it has to have a real Java object (POJO) to work with, this poses an interesting situation for any kind of remote query, that is a query sent from one VM to another. The issue arises when the remote VM, whose object storage may be in serialized form (true for Partitioned Regions as well as Cache/Bridge Servers) needs to de-serialize the stored object into a POJO. If the class can$1t be de-serialized, then the query fails. So the user needs to know the steps to allow for successful de-serialization to avoid the problem described in this bug.
|
|
| 06/21/07
|
#37004
|
For bucket id 26, expected 2 members in primary list, but found 3
|
prFeb07
|
closed
|
Partitioned Region meta data may contain incorrect information after VM failures
|
For a given Partitioned Region, if participating VMs have failed either through network problems, hardware failures, or software crashes, Partitioned Region meta data for a given participant may contain incorrect information for one or more buckets. The result of such incorrect information is potentially slower access to the information in that bucket. The higher the redundantCopies setting the greater the potential to become incorrect. The redundantCopies setting 0 does not suffer from this issue.
|
|
| 06/20/07
|
#36990
|
non-zero log-file-size-limit causes file descriptors to leak
|
5.0
|
closed
|
Non-zero log-file-size-limit causes file descriptors to leak
|
Configuring gemfire to roll log files by specifing log-file-size-limit to something other than 0 can result in a leak of a file descriptor every time gemfire rolls the log file.
|
|
| 06/14/07
|
#36975
|
CacheTransactionManager can refer to a closed DistributedSystem
|
all
|
closed
|
Cache Transaction Manager may refer to closed distributed system
|
If you close your distributed system and then create a new one, your cache transaction manager may attempt to use the old (closed) distributed system. Transactions may fail or erroneously appear to succeed.
|
If you use transactions, do not close your distributed system after creating it. Exit the JVM if you need to create a new distributed system.
|
| 06/05/07
|
#36921
|
poor get performance for partition region
|
|
closed
|
Partitioned Region get performance degraded
|
Performing a get() on a Partitioned Region is 3x worse than release 5.0.1.
|
|
| 04/25/07
|
#36688
|
GemFire transaction svc doesn't do proper write-write conflict detection
|
5.0.1
|
closed
|
write-write conflicts not always detected
|
If a key is read in one transaction, another transaction modifies the key, and finally the first transaction modifies the key, the conflict is not detected and transaction is committed.
|
|
| 04/10/07
|
#36597
|
Java-level deadlock in InternalDistributedLockService.checkLockGrantorInfo leads to stuck lock and hung message reader thread
|
5.0
|
closed
|
Java deadlock in DistributedLockService can lead to stuck lock and hung message reader
|
Destroying a DistributedLockService while there are pending lock requests still active can result in those pending locks becoming stuck and unavailable system-wide until the VM that requested such a lock disconnects. In addition, the VM may quit processing messages sent by the member from which it was acquiring the lock remotely.
This affects all features that use DLS. For example, Global Regions must lock the key in order to put or destroy the cache entry. Any calls to do so for a key that has a stuck lock would then hang until the VM that caused the problem disconnects from the system. In general, when you close or destroy a feature that uses a DistributedLockService, then that DistributedLockService is destroyed.
|
The workaround is to destroy the DLS when there are no threads actively trying to acquire locks.
|
| 03/24/07
|
#36512
|
GemFireCache.close is not thread safe
|
5.0.1
|
closed
|
GemFireCache.close is not thread safe
|
If one thread attempts to create a new cache while another thread is closing the old cache, one or more static resources may be nulled out, left in an unknown or incorrect state, or never cleaned up.
|
Use the same thread to close the old cache and create the new cache.
|
| 03/20/07
|
#36483
|
PR-HA test hangs while waiting to connect to killed VM
|
5.0.1
|
closed
|
System deadlock during conditions of extreme membership volatility
|
Under certain conditions with volatile membership changes (cache members departing under busy conditions), there is a potential for system deadlock.
The confused cache member will have a message similar to the following in its logs:
[warning 2007/03/19 22:14:47.635 PDT gemfire3_huey_22603 <vm_7_thr_9_client3_huey_22603> nid=0x1a] Error sending message to huey(22596):56886/48525 (will reattempt): java.net.ConnectException: Connection refused
|
The best solution is to avoid conditions of extreme membership volatility (cache members arriving and departing with great frequency).
If this condition is detected in a running system, the deadlock can be safely broken by killing the hung cache member.
|
| 03/16/07
|
#36475
|
StateFlushOperation may hang with Global scoped regions
|
5.0.1
|
closed
|
New replicate in region with global scope can cause system hang
|
If a region has global scope, it is possible for a new replicate to
cause a hang in the distributed system. Operations on regions with
global scope are not performed in token mode but are put in the
waiting thread pool until the region they're modifying is done with
getInitialImage. StateFlushOperation will invade other VMs and wait
for these messages to finish being processed before allowing the
getInitialImage to complete.
|
Not applicable.
|
| 03/14/07
|
#36461
|
BridgeClient receives BridgeWriterException: InterruptedException on region.get() with server in the process of shutting down (due to InterruptedException/shutdown in progress issues)
|
5.0.1
|
closed
|
Cache member shutdown is not reliable
|
Under certain circumstances, especially if there are outstanding operations in a cache member, there is a possibility that the cache member will hang (not completely exit) during shutdown processing.
|
If a cache member does not completely exit, it is safe to directly kill its process using operating system tools (kill -9 in Solaris or Linux, or the task manager in Windows).
|
| 03/02/07
|
#36421
|
Query shortcut on Region doesn't use index
|
5.0.1
|
closed
|
Region.query shortcut method does not use Indexes
|
The query shortcut method in the Region interface does not make use of indexing. Also the QueryService Query instances do not use indexing if the region is passed in as a parameter to the query.
|
Use Query instances obtained from the QueryService and reference regions by full path rather than by passing them in as parameters.
|
| 03/02/07
|
#36420
|
NPE reported from GrantorRequestProcessor.startElderCall()
|
5.0.1
|
closed
|
NPE reported from DLockRequestProcessor
|
The NullPointerException is caused by an assertion error. Lock grants that arrive after the lock service is destroyed must be released to prevent a stuck lock. This NPE causes the associated lock to remain stuck until the VM's Distributed System connection closes.
This is the error output to the logs:
[severe 2007/03/02 12:45:43.536 PST gemfire5_newton_24123 nid=0x75407bb0] Uncaught exception processing DLockRequestProcessor.DLockResponseMessage responding GRANT; serviceName=Partitioned Region Lock Service; objectName=#partitionedRegion; responseCode=0; keyIfFailed=null; leaseExpireTime=9223372036854775807; processorId=807; lockId=807
java.lang.NullPointerException
at com.gemstone.gemfire.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:209)
|
|
| 02/27/07
|
#36406
|
List of departed members grows without bound inside of VMs
|
5.0
|
closed
|
Frequent cache membership changes uses memory, degrades performance
|
When a system member leaves a distributed system, they are placed in a departed member list by the remaining system members. This list is not cleared out and thus grows without bound. The list uses a certain amount of extra memory, but--more importantly--as successive members depart the distributed system, the amount of processing time associated with handling the departures increases.
|
If the membership of your distributed system is rather stable (a small number of departures), no workaround is required. If, however, your configuration requires a large number of cache members to join and depart, you need to restart any long-lived cache members on a periodic basis to prevent performance degradation or possibly even memory exhaustion.
|
| 02/16/07
|
#36376
|
ValueConstraint will causes all objects to be deserialized
|
5.0
|
closed
|
Setting a value constraint for a region's values causes all objects to be deserialized
|
The ValueConstraint region attribute allows you to declare the class of all the values for a region. But if you specify a constraint, then every value in the region must be deserialized to check the constraint.
|
none/not applicable
|
| 02/16/07
|
#36371
|
Slow gateway shutdown can leave cache open
|
5.0
|
closed
|
Slow gateway shutdown can produce CacheExistsException
|
If you close and reopen a cache that has a gateway, on rare occasions this produces a CacheExistsException.
|
Stop the VM and restart it.
|
| 02/14/07
|
#36358
|
PartitionedRegionException: registerPartitionedRegion: /PartitionedRegion_9 caught exception dumpPRId:prIdToPR Map@18550851: caused by java.lang.InternalError: Got RegionExistsException
|
5.01
|
closed
|
Creation of a PartitionedRegion may fail with exception "registerPartitionedRegion: /PartitionedRegion_9 caught exception dumpPRId:prIdToPR"
|
During concurrent creation and destruction of a partitioned region with a specific name, it is possible for a PartitionedRegionException to be thrown during createRegion with the message "registerPartitionedRegion: /PartitionedRegion_9 caught exception dumpPRId:prIdToPR".
|
Catch this exception and re-create the region.
|
| 02/08/07
|
#36329
|
Bridge hangs on close waiting for a GrantorRequest response from a member that has departed the DS
|
5.01
|
closed
|
Cache server shutdown can cause a system-wide hang
|
On rare occasions, a cache server can experience a problem during shutdown that causes a system-wide hang. This situation happens when the server tries to shut down while it is waiting on a response from another member that has left the distributed system. The server logs a message of this type:
[severe ... ] While pushing message <message> to <recipients>
com.gemstone.gemfire.ThreadInterruptedException: sleep interrupted
Caused by: sleep interrupted, caused by
java.lang.InterruptedException: sleep interrupted
This problem does not cause data corruption, and the distributed system will restart successfully. Kill your processes and restart all your system members according to your usual procedures.
|
none/not applicable
|
| 02/07/07
|
#36320
|
Multiple ServerMonitors with same-named endpoints can cause recursive endpoint died/recovered cycle
|
5.0.1
|
closed
|
Multiple server definitions with the same name and port can cause a client to enter an endless loop
|
This problem only affects clients running on very fast systems. On fast systems, if any two instances of BridgeLoader or BridgeWriter define the same server name and port pair, a loss of server connection can send the client's server health monitor into an endless loop. The health monitor maintains the client's live and dead server lists.
When the client enters into this loop, it appears as if the servers are going up and down.
|
Define each server name and port pair exactly once for any client VM. This means that the BridgeLoader and BridgeWriter for a single region must use different names for the same server endpoint. It also means that you mustn't create multiple instances of a single BridgeWriter or BridgeLoader definition. Starting with version 4.3, the API automatically manages reuse of the same loader and writer instances when the definitions are the same, so no explicit action is required on your part.
This example shows how to avoid defining the BridgeLoader and BridgeWriter with the same name and port pairs:
Properties writerProps = new Properties();
writerProps.setProperty("endpoints", "serverWA=localhost:44441,serverWB=localhost:44442");
BridgeWriter bWriter = new BridgeWriter();
bWriter.init(writerProps);
Properties loaderProps = new Properties();
loaderProps.setProperty("endpoints", "serverLA=localhost:44441,serverLB=localhost:44442");
BridgeLoader bLoader = new BridgeLoader();
bLoader.init(loaderProps);
This problem is fixed in version 5.0.1.
|
| 01/31/07
|
#36279
|
A bridge client putting an empty byte array causes a server NullPointerException
|
5.0.1
|
closed
|
Empty byte[ ] causes exception in client/server topology
|
In a client/server topology, you can't put an empty byte[] into the cache as a value. You can have an empty byte[] key.
A client attempting to put an empty byte[] into the cache causes the following exception on the server:
[java] [warning 2007/01/31 13:27:22.285 PST "server" <ServerConnection 0.0.0.0/0.0.0.0:44444 Thread 12> nid=0x1e4a47e] Server connection from [identity(bishop(:loner):1:0d64f66b,connection=2); port=52631]: Unexpected Exception
[java] java.lang.NullPointerException
[java] at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:632)
. . .
|
One workaround would be to store a one-element byte[] as the value.
|
| 01/31/07
|
#36275
|
hang during shutdown in TimeScheduler
|
5.0
|
closed
|
Hang during DistributedSystem disconnect in TimeScheduler
|
Under very rare circumstances, the DistributedSystem in GemFire may hang when
it is being disconnected. Symptoms of this problem are
* The thread that is disconnecting the DistributedSystem will be in
com.gemstone.org.jgroups.util.TimerScheduler.stop.
* A thread named TimeScheduler.Thread will be in this state:
Object.wait() [0xffffffff5bbff000..0xffffffff5bbff828]
at java.lang.Object.wait(Native Method)
- waiting on <0xffffffff6558e7e0> (a com.gemstone.org.jgroups.util.TimeScheduler$TaskQueue)
The hang will not affect other processes. The hung VM should be terminated
manually.
If you encounter this defect, please contact GemStone Technical Support.
|
No workaround
|
| 01/18/07
|
#36218
|
SystemConnectException: Received no connection acknowledgements from any one of the 1 senior cache members, but both members have each other in their view
|
5.0
|
closed
|
Timeout receiving startup responses
|
When a cache member joins an existing distributed system, it must receive an acknowledgment from at least one senior member of the system. If it fails to receive a response in a timely manner, the cache member's startup will fail with a message similar to this:
Received no connection acknowledgments from any of the 1 senior cache members:
|
This is usually an indicator of a grossly overloaded system that will not perform satisfactorily in a production environment. If it is not possible to reconfigure your system to allow cache members to respond more quickly, tune the system property
DistributionManager.STARTUP_TIMEOUT
which controls the amount of time a cache member waits for replies. The default value is 15000 ms (15 seconds), and raising this value may alleviate this symptom.
|
| 01/15/07
|
#36204
|
Unable to start a cacheserver on Win2003 64-bit edition
|
5.0
|
closed
|
GemFire batch files do not execute correctly on 64bit versions of Windows
|
The origin of this problem is that DOS evaluates variables as it reads the line, so
set PATH=someString (x86);%PATH%
is expanded to
set PATH=someString (x86); ACTUAL VALUE OF THE PATH VARIABLE
Because of the parentheses, the expression is further expanded into two separate commands, like this
set PATH=someString
(x86); ACTUAL VALUE OF THE PATH VARIABLE
The first line executes correctly and the second causes an error.
For GemFire Enterprise, a path containing parentheses "()" breaks the setenv.bat script, leaving the PATH without the gemfire.dll. This forces the GemFire application into Pure Java mode.
This problem is not limited to 64bit Windows it is just more reproducible because WOW64 replaces paths like c:\windows\system32 with "C:\windows\system32 (x86)", causing more errors than might be caused by regularly specified paths.
|
Avoid references to paths such as "C:\Program Files" and "C:\windows\system32" in your PATH environment variable.
|
| 01/12/07
|
#36190
|
getElderState hangs waiting for reply from remote VM which appears hung in getGrantorForRemoteElderRecovery
|
5.0
|
closed
|
A VM departure with multiple global regions or lock services can cause a system-wide hang
|
If you have more than one global region or more than one instance of DistributedLockService in your distributed system, on rare occasions a VM departure can cause a system-wide hang. The hang affects all VMs that use either the DistributedLockService or any features that rely on the DistributedLockService, such as global regions, partitioned regions, and transactions.
|
none/not applicable
|
| 01/11/07
|
#36185
|
Clients can not use registerInterest on regions with DataPolicy EMPTY
|
5.0
|
closed
|
IllegalStateException when client calls registerInterest on a region with data-policy of empty
|
If a bridge client tries to register interest on a region whose data policy is empty, the call returns an IllegalStateException saying 'No mirror type corresponds to data policy "EMPTY".' This error message refers to the deprecated mirror-type region attribute, which has been subsumed by the data-policy attribute.
The fundamental bug in this case is that the product does not allow you to register interest in a region with data policy set to empty.
|
There is no workaround in this version of the product. If you need to use an empty data policy and register interest in a client region, upgrade to GemFire Enterprise version 5.0.1.
|
| 12/18/06
|
#36113
|
Unable to install GFE 5.0 on Windows Vista
|
5.0
|
closed
|
Unable to install GFE 5.0 on unsupported platform
|
The GemFire Enterprise installer provides product installation only for the supported platforms.
|
Generally, to install and try the product on an unsupported platform, you should contact the GemStone technical support for a .zip file.
If you want to install on Windows Vista, you can install on an XP machine and then copy the product tree to the Vista machine.
|
| 12/11/06
|
#36077
|
hang in parRegCreateDestroy waiting for replies
|
5.0RC1
|
closed
|
Heavily loaded systems may cause membership failure
|
If a TCP/IP connection between two cache members is disrupted by extremely heavy system loading, it is possible for one or more members of the distributed system to incorrectly assume that a peer has departed the system.
This leads to an inconsistent accounting--between cache members--of the currently active members of the system. This in turn can lead to cache corruption or system deadlocks.
The level of loading required to generate this type of failure is huge. For instance, one test case in-house had a CPU load of 40 (many cache members on a single underpowered host) running for 15 minutes before this failure reproduced.
|
Users should be careful to monitor processor utilization on the hosts running GemFire cache members and to avoid extreme overloading.
|
| 12/05/06
|
#36042
|
Inconsistent PR data, too many bucket owners
|
5.0RC1
|
closed
|
IO Exceptions can cause data loss when Partitioned Region redundantCopies=0
|
This problem occurs when there are no redundant copies in a PartitionedRegion. Under some failure conditions during communication it is possible for data loss to occur. These include the following failure types:
[warning ... ] Ran out of thread owned resources so switching to conserve-sockets=true. Because: com.gemstone.gemfire.internal.tcp.ConnectExceptions: Could not connect to: somehost(15188):2243/2165 Causes: {java.io.IOException: An existing connection was forcibly closed by the remote host}
[warning ... ] Failed sending {com.gemstone.gemfire.internal.cache.UpdateOperation$UpdateMessage(region path='/__PRRoot/__Bucket2NodeRegion_#partitionedRegion'; sender=somehost(16924):2225/2162; callbackArg=null; processorId=0; op=CREATE; appliedOperation=false; earlyAck=false; directAck=true; lastModified=0101010101010; key=105; newValue=null; valueIsSerialized=true)} to member {somehost(11188):2210/2160} with stub {tcp:///192.168.1.1:2160} who is now considered to have crashed because: com.gemstone.gemfire.internal.tcp.ConnectionException: Not connected to tcp:///192.168.1.1:2160
[warning ... ] Error sending message to somehost(16924):2225/2162: java.io.IOException: An established connection was aborted by the software in your host machine blished connection was aborted by the software in your host machine
|
Configure your application to allow for data loss such that the storage of record can be accessed via a CacheLoader. Restart all members reporting such warnings in addition to those members referred to in the warning messages.
|
| 11/30/06
|
#36014
|
Internal PartitionedRegionException is thrown from public API
|
5.0
|
closed
|
Internal PartitionedRegionException is thrown from public API
|
Some partitioned region operations throw product internal exceptions, such as com.gemstone.gemfire.internal.cache.PartitionedRegionException. Typically these exceptions indicate internal problems with the product. If they do occur, please contact support with the exception, all associated logs and statistic files.
|
|
| 11/20/06
|
#35985
|
inconsistent bucket stores in partitioned region with redundancy=1
|
5.0
|
closed
|
Inconsistent bucket stores in partitioned region with redundancy=1
|
There is a race condition in the propagation of entry operations in partitioned regions that can cause inconsistent data, resulting in the order of operations being mixed.
|
For any given entry operations at any given time, ensure that there is only one writing thread. One way to accomplish this is to use the DLock system to order operations.
|
| 11/09/06
|
#35948
|
JMX tests fail with OOM with 3.0.2 libraries for MX4J (and 1.4.2 JRE)
|
5.0
|
closed
|
JMX Agent unstable in GemFire version 5.0
|
The JMX Agent is unstable in GemFire 5.0. GemFire 5.0 uses MX4J 3.0.1 (for both JDK 1.4 and 1.5) which has serious bugs causing OutOfMemory errors.
The main errors that you might see occur during method invocation on MBeans that are hosted in the GemFire JMX agent. The errors are java.lang.OutOfMemoryErrors wrapped inside javax.management.MBeanExceptions.
The 5.0 agent should not be used in production systems, but may be used for development or testing purposes.
|
There is no suitable workaround in 5.0. We recommend upgrading to version 5.0.1 to resolve this problem. The 5.0.1 version of GemFire uses JDK 1.5 JMX for the 1.5 JDK and MX4J 2.0.1 for the 1.4 JDK.
|
| 10/12/06
|
#35790
|
Uncaught InterruptedException in ServerConnection thread (ThreadPoolExecutor)
|
5.0
|
closed
|
Uncaught InterruptedException in ServerConnection thread (ThreadPoolExecutor)
|
During bridge server shutdown, the ServerConnection ThreadPoolExecutor may log a message of this type:
[severe 2006/10/11 23:38:59.214 PDT gfserver1 nid=0x9c1a1bb0] Uncaught exception in thread java.lang.InterruptedException ...
This is a small bug in shutdown handling that has no negative effect on VM health or behavior. You can safely ignore the message.
|
|
| 10/05/06
|
#35741
|
Restarted VM fails to createVMRegion due to PartitionedRegionException: Could not get Partitioned Region from Id 2
|
5.0
|
closed
|
Creation of a PartitionedRegion may fail with exception "Could not get Partitioned Region from Id"
|
During creation of a Partitioned Region, an identifier is created. There are conditions under which the identifier creation/discovery process fails for a given VM. This failure causes a PartitionedRegionException to be thrown during Region creation. Typically the cause of such a failure is related to distributed race conditions.
|
Catch the exception and retry the creation operation.
|
| 09/07/06
|
#35555
|
Unexpected keys found in partitionedRegion (region size is greater than expected)
|
5.0
|
closed
|
Unexpected keys found in partitioned region (region entry count is greater than expected)
|
This kind of data inconsistency can happen when concurrent destroy and create/put operations are performed on an entry by multiple threads. The threads can be in any number of VMs.
|
Either use redundantCopies=0, or if that is not possible, prevent concurrent entry operations (put, invalidate, destroy) on a per-entry basis. If the writing operations can be limited to a single VM, use synchronization to coordinate threads in that VM. If the writing threads must be distributed among multiple VMs, use the DLock system to coordinate entry write operations.
|
| 06/07/11
|
#43536
|
Server attempts to deserialize too early with function execution and pdx
|
|
deferred
|
Function APIs require that classes be on the JVM's classpath
|
The function APIs do early deserialization during messaging of function results, filters, arguments, and the functions themselves. So the class for these objects must be on the JVM's classpath. It is not possible to define your own class loader just before you read a function result or get the arguments passed to your function code.
|
Add the classes for functions, function arguments, function filters, and function results to your JVM's classpath.
|
| 04/07/08
|
#38753
|
Gateway uses P2P reader thread to distribute and wait for ack causing deadlock
|
5.5
|
deferred
|
Member hosting GatewayHub may deadlock if performing cache operations or hosting more than one GatewayHub
|
The GatewayHub thread that distributes gateway events is the same thread that reads in messages. This provides guaranteed redundancy with secondary backups, but can result in deadlock if either member tries to perform cache operations or to host more than one GatewayHub.
|
1) Use -Dgemfire.gateway-queue-no-ack=true
2) Host only one GatewayHub in any given member and dedicate that member to hosting the GatewayHub. It should not perform cache operations or do anything other than feed the gateway.
|
| 07/13/11
|
#43685
|
Setting ack-severe-alert-threshold causes severe alerts for rebalancing dlock
|
|
new
|
Severe alerts with ack-severe-alert-threshold set
|
When running with the ack-severe-alert-threshold property set, there may be severe messages logged to the system log that look like this:
"20 seconds have elapsed waiting for the partitioned region lock held by ...".
These severe alerts can be safely ignored.
|
Ignore severe alerts related to "waiting for the partitioned region lock" if ack-severe-alert-threshold is set.
|
| 07/12/11
|
#43677
|
ClassNotFoundExceptions still occuring with 6.5.1.19 even after fix
|
6.5
|
closed
|
Locators no more attempt to unnecessarily load custom data serializers or instantiators.
|
Prior to this release, locators used to log warnings saying ClassNotFoundException while attempting to unnecessarily load custom data serializers or instantiators. This has been fixed so that no unnecessary loading of classes happen at locators.
This is fixed in 6.5.1.20 but NOT in 6.6
|
No workaround. If such warnings are seen in locators logs for data serializers or instantiators, these could be ignored.
|
| 05/05/11
|
#43312
|
Assertion failed in Oplog recovery
|
|
closed
|
Entry remains after clear of persistent region with async writes
|
This problem happens in persistent regions that are configured for asynchronous disk writes. When you put an entry into such a region, the put is first done in the region, then the put event is added to the async queue to be flushed to disk. If the region is cleared while the put is taking place, before the entry event is written to the queue, the event will not be cleared along with the other enqueued events. The event will be then enqueued and written to disk and the entry can be recovered back into the region.
|
None. Avoid clearing a region while entry creations and updates are being done.
|
| 01/27/11
|
#42671
|
'GemFireConfigException: Unable to contact a Locator service. Operation either timed out or Locator does not exist' reported by re-started VM, but prior to this all VMs report BindException: Address already in use (while trying to contact the locator)
|
6.5
|
closed
|
Network connection fails on Windows with java.net.BindException: Address already in use
|
This is an ephemeral sockets exhaustion problem. The machine needs to be configured to have more ephemeral sockets. The clue is that whenever the "address already in use" exception happens on the client side of the socket connection, not when the server tries to bind a server-socket to an address/port.
|
See this msdn link for information to fix this:
http://msdn.microsoft.com/en-us/library/aa560610.aspx
The following registry setting need to be added/updated:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet?\Services\Tcpip\Parameters] "TcpTimedWaitDelay?"=dword:0000003c
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet?\Services\Tcpip\Parameters] "MaxUserPort?"=dword:00008fff
|
| 02/12/07
|
#36349
|
Bridge client region.put() completes without exception, but entry value is not updated at the server
|
5.0
|
closed
|
Server's entry value is not updated although client region.put completes without exception
|
This happens when operations are performed out of order on the server. The problem arises from this sequence of events:
1. A client attempts to put value X one or more times, but each attempt times out.
2. Each failed attempt "orphans" a thread on the server.
3. The client picks a new connection (and its associated server thread) and continues to perform its sequential updates (X+1, X+2, ... X+n).
4. The orphan threads are eventually scheduled and successfully perform the put with value X, overwriting the previous values (X+1 or X+2 or X+n).
|
Disable timeout behavior for the BridgeLoader, BridgeWriter, or BridgeClient by setting its "readTimeout" parameter to zero. This causes all Region operations supported by the client to block until the server has finished with the operation, preserving client ordering.
The "retryAttempts" configuration will still be used when there are communication failures with the server or when the server cache closes in the midst of the operation.
|