GemFire 6.6 - Bugs Fixed Since v5.0

Context Navigation


Created ID Summary Ver Status Bugnote Title Bugnote Description Workaround
07/14/11 #43686 ConcurrentModificationException observed while iterating over System Properties 6.5 closed ConcurrentModificationException with modification of System Properties during initialization of GemFire Cache. Adding or modifying the System Properties during initialization of GemFire Cache sometimes used to result in ConcurrentModificationException. This issue is now fixed.
07/06/11 #43659 concurrent ops on PR hang while looking for node managing the bucket 6.5 closed Hang after rebalancing in rare cases In extremely rare cases, after a rebalance members can get stuck in the PartitionedRegion.putInBucket method.
07/05/11 #43652 jgroups messages pile up in receivers when senders fail to satisfy retransmission requests closed OutOfMemoryException with large number of messages in the NakReceiverWindow of the JGroups NAKACK protocol A problem in JGroups message retransmission may cause members of the distributed cache to accumulate messages and eventually run out of memory. This affects both multicast and non-multicast configurations of the product. The problem is caused by faulty size calculations in JGroups during the bundling of messages for retransmission. decrease the udp-fragment-size setting in gemfire.properties to 40000 bytes.
06/29/11 #43630 BucketMovedException should always be thrown before lastResult is called 6.5 closed BucketMovedException is thrown after all results are collected/ after last result is sent. During function execution on partitioned region, we check for the local buckets availability. While doing function execution, if the local buckets are not available on datastore then BucketMovedException is thrown. It makes sense if we do this check before starting function execution. But doesn't make sense if function execution is complete and last result is already sent. It will be better if we add this check before sending last result.
06/28/11 #43626 all threads stuck with ops, online backup and rebalancing (no HA) closed Hang while performing online backup and rebalance simultaneously In rare cases, performing an online backup at the same time as a rebalance can result in a hang Don't invoke the backup command while rebalancing the system.
06/28/11 #43625 Improve warning message for "redundancy not satisfied" closed Warning message about 'partition regions redundancy is not satisfied' needs improvement while trying to create a redundant bucket,if there are not sufficient datastores available, a warning message is generated. This warning message needs to be changed so that it will be more readable.
06/24/11 #43620 when DAE happened during online compaction, files (init file, lock file) used in diskstore are not closed closed exception in online compaction, some files are left open when doing online compaction, if there's any exception happened, it will go to DSI.handleDiskAccessException(), then in DSI.close, we expected to close everything in the order of oplog then filelock then diskinit file. But since CacelException? might happen in DSI.stopAsyncFlusher, the close() did not close everything. This will cause issue in windows.
06/24/11 #43619 offline compaction left krf open for empty child oplog closed offline compaction left krf open for empty child oplog Offline compaction will open a krf for child. If the child contained data, it will close the krf. However if the child is empty, it forgot to close krf.
06/23/11 #43613 Chance of missing events when members crash while rebalancing a persistent PR. closed Chance of missing events when members crash while rebalancing a persistent PR. In rare cases, if a number of members equal to the redundancy-level crash while rebalancing a persistent PR, it is possible some updates will be missing when the persistent PR is recovered.
06/22/11 #43601 Client doesn't get Cq close event when other client destroy concerned Region closed CQ Destroy Event to clients after Region Destroy This has been fixed in GemFire 6.6. Now client gets appropriate CQ Info to close/unregister it in client, if CQ was registered on a PrtitionedRegion on cacheserver.
06/20/11 #43597 NPE in index maintenance when the indexed expression have same values for a regionEntry closed NullPointerException encountered in specific cases during index maintenance NullPointerException encountered in specific cases, during index maintenance while doing an update or destroy operation on the entries of the region.
06/16/11 #43583 PR initialization hangs waiting for response of StateFlushOperation (which targets a departed member) 5.1 closed hang in StateFlushOperation.flush waiting for replies while targeting a process that is no longer there It is possible for a member of the peer to peer cache to hang attempting to create a Region. The hang will show the process stuck in StateFlushOperation.flush() waiting for a message from a member that recently left the distributed system. The 15 second warning will show that it is targeting that member and that it is waiting for replies from other members.
06/16/11 #43578 PDX test generates DiskAccessException while re-initializing in end-task closed IOexception will cause creating empty krf If out of disk space or any other IOexception happened when creating krf, it will create an empty krf file. The fix is to remove these krf files if IOexception happened.
06/15/11 #43575 Internal GemFire Error in ProcessorKeeper21 when we increment id and it wraps around. 6.5 closed InternalGemFireError after large number of distribution messages An InternalGemFireError that may occur when more than 2,147,483,647 messages are sent to peer members is fixed in this release.
06/14/11 #43570 Persistent parent PR can be rebalanced before child PR is recovered from disk closed Rebalancing parent persistent colocated PR before recreating child leads to unrecoverable region. If two persistent partitioned regions are colocated, it is possible to recreate only the parent partitioned region and rebalance it. At that point, the child regions persistent data is no longer colocated with the parent data, making the child region unrecoverable. Make sure to recreate all persistent colocated PRs before rebalancing the system.
06/10/11 #43553 Fire and forget function execution from client need to wait for reply(an exception) so that connection will be properly released 6.5 closed Client with only one server connection should experience AllConncetionInUseException rather than connection timeout while doing any operation after fire and forget function execution. When client is configured with only one server connection, then the operation executed on the same connection on which long running fire and forget function execution is already executed, in this case client should get AllConncetionInUseException instead on connection time out.
06/08/11 #43544 Unexpected PartitionOfflineException when dataStore recycled (with fixedPartitioning) 6.5 closed PartitionOfflineException after fewer than redundancy-level members crash In rare cases, a member crashing during initial bucket creation can result in a receiving a PartitionOfflineException when doing operations on a partitioned region, even though fewer members crashed than the redundancy-level of the partitioned region.
06/03/11 #43519 Error using Functional Index on attribute of type Short 6.5 closed index on field of SHORT type. ClassCastException is thrown when a delete operation is performed on the region, that has index created on a field thats of type SHORT.
06/02/11 #43513 A function execution (with no result expected) for lifetime of the server blocks a connection on a Cacheserver while letting client to use same connection for some other task 6.5 closed Client with only one server connection experience connection timeout while doing operation after fire and forget function execution When client is configured with only one server connection, then the operation executed on the same connection on which long running fire and forget function execution is already executed, in this case client will get a connection timeout exception since the same connection on server side is busy doing the long running function execution. set minimum number of connection to more than 1.
06/01/11 #43495 The gateway socket-read-timeout setting is silently ignored, and 0 is used instead closed The gateway socket-read-timeout setting is silently ignored, and 0 is used instead The gateway socket-read-timeout setting should not be used because the system cannot honor the setting. The time taken to process a batch of messages may vary and timing out can cause spurious failures. This setting is not honored by the system.
05/31/11 #43489 RegionDestroyedException on PR bucket during commit while rebalancing is moving buckets closed RegionDestroyedException on commit A possible RegionDestroyedException on transaction commit when rebalancing is in progress has been fixed in this release.
05/26/11 #43483 TX destroy (in server hosting PR) not distributed to 1 of 3 edge clients 6.5 closed transactional invalidates/destroys not delivered to all clients transactional invalidates/destroys on partitioned regions may not be delivered to all clients with registered interest when rebalancing is in progress. This has been fixed in this release.
05/23/11 #43452 hang during disconnect while shutting down vms closed hang during disconnect while shutting down vms with DistributionManager.waitForThreadsToStop() blocked The distributed system may hang during disconnect. Thread dumps will show the disconnecting thread blocked in DistributionManager.waitForThreadsToStop() and will show a MemberInvoker thread. This was caused by a faulty time-interval calculation and failure to interrupt the MemberInvoker thread.
05/16/11 #43414 Disk conversion tool does not allow conversion from 5.8 when no WAN queue is present closed Conversion from gemfire 5.8 to gemfire 6.5 persistence files fails Using the disk conversion utility to convert gemfire 5.8 disk files to gemfire 6.5 format fails with the error: "Gemfire 5.8 is not supported by this tool." Get the latest version of the conversion tool. Note that 5.8 persistent WAN queues are still not supported.
05/11/11 #43371 Transaction RemoteGetMessage can go to a region that has not finished GII closed Incorrect results for transactional operations during PR initialization When transactional operations are performed on a PR which is being initialized/rebalanced, incorrect results may be returned if data is still being fetched from another member. This has been fixed in this release.
05/11/11 #43363 Bad Message : Current Connection count of 2 is greater than the 800 Max 6.5 closed Confusing message Connection count 2 is greater than 800 This has been fixed in GemFire 6.5. maintbranch r32163. Confusing message has been replaced by meaningful message.
05/09/11 #43343 Cache Port Configuration Conflict leads to hang of gemfire shut-down-all 6.5 closed Customer wants to remove a problem node from the DS when the hang of the shut-down-all It's caused by misconfiged node, which hang shutdownall. The fix is to let disconnect always be called when exception happened.
05/06/11 #43324 entry available through keySetOnServer() prior to completion of create via putIfAbsent() 6.5 closed replay of operations after failover returns incorrect result If there is a server failure during execution of an operation clients using that server will retry the operation on another server. If the operation had already been applied and distributed to other servers before the server failure this can result in an incorrect result from the operation. This affects all operations that return a result that is based on server state, such as create(), putIfAbsent(), remove(K,V) and replace(K,V,V). Another side effect of this behavior is that other clients may see the change before it completes on the client that initiated the operation.
05/03/11 #43279 order by support with PR queries. closed ORDER BY support in OQL query. ORDER BY support is added to queries executed on Partition Region. And also in case where LIMIT and ORDER BY used, the query engine behavior is changed to apply order by first and then LIMIT. And also if indexes are present the ORDER BY clause is made to use natural ordering of the index.
05/02/11 #43264 dead lock found by ConcurrentRegionOperationJUnitTest closed deadlock in oplog force rolling When doing forceRoll in 2 threads, both could end up at switchOplog() at the same time. The fix is to move the closing oplog out of synchronization section.
04/29/11 #43255 PR recovery issues with overflow 6.5 closed NPE in PR recovery when there's eviction definition This is the same bug as #42614. Merge the fix into 6.5 maint.
04/28/11 #43247 durable clients miss destroy/invalidate events on reconnect 6.5 closed Change in the order of method calls for reconnecting Durable Clients The order of method calls on durable client reconnection needs to be changed. Before 6.6 version, interest registration method call was recommended after Cache readyForEvents method. Program your durable client's reconnection to: 1. Connect, initialize the client cache, regions, any cache listeners, and create and execute any durable continuous queries. 2. Run all interest registration calls. 3. Call ClientCache.readyForEvents so the server will replay stored events. If the ready message is sent earlier, the client may lose events. ClientCache clientCache = ClientCacheFactory.create(); // Here, create regions, listeners, and CQs that are not defined in the cache.xml . . . // Here, run all register interest calls before doing anything else clientCache.readyForEvents(); Modify your durable client code accordingly.This will help in preventing occasional miss of destroy/invalidate events on reconnect.
04/25/11 #43223 Index usage causes values to be returned rather than keys 6.5 closed index usage causes values to be returned rather than keys When a query to fetch keys on overflow region executed. If the query used an index to process, then instead of returning key, it used return associated value. Ex.: select key.ID from /region.keys key where key.ID = 1 This one was returning the ID from value instead from the key.
04/22/11 #43212 Index Creation on Overflow Regions is too limited 6.5 closed Index creation with Method Invocation in Index Expression. This has been fixed in GemFire 6.6. Now Index expression can have method call on an identifier which corresponds to a data object in the region. Like following can be used, <region name="test"> <region-attributes disk-store-name="sample" data-policy="persistent-replicate" id="sample" enable-gateway="false" statistics-enabled="true"> <eviction-attributes> <lru-entry-count maximum="100" action="overflow-to-disk"/> </eviction-attributes> </region-attributes> <index name="sample"> <functional from-clause="/test.keys k" expression="k.getValue()"/> </index> </region>
04/21/11 #43200 InternalGemFireException: java.io.NotSerializableException: java.util.HashMap$KeySet thrown during gii exchange of FilterInfo 6.5 closed InternalGemFireException: java.io.NotSerializableException: java.util.HashMap$KeySet thrown during message dispatcher initialization When a server cache is initializing the message queue and message dispatcher for a client it may throw an InternalGemFireException. This is caused by an attempt to serialize a KeySet view of interest-registration information. These view objects are not serializable.
04/21/11 #43196 gemfire encrypt-password -help produces NPE 6.5 closed gemfire encrypt-password -help produces NPE Now this script does not produce a NullPointerException no matter what the arguments are. If the arguments are wrong then it displays the proper usage pattern.
04/20/11 #43189 Nullpointer in query at nomura closed NPE while evaluating dependencies in OQL The query engine could throw NPE while evaluating the dependency.
04/19/11 #43182 compaction (both online and offline) will cause krf gone closed compaction (both online and offline) will cause krf gone This is a missing feature. Add creating krf when close cache.
04/18/11 #43176 Events from transactions involving empty regions not sent to clients 6.5 closed Transactional events not delivered to clients for empty regions Transactions on server did not transmit all event updates to clients with register interest if last region in transaction was empty. This has been fixed in this release.
04/18/11 #43174 ConcurrentModificationException during putAll closed ConcurrentModificationException updating PR on cache server In rare cases, updating a partitioned region on a member that is also a cache server can result in a ConcurrentModificationException.
04/14/11 #43156 Transactional Function Execution fails with TransactionDataNotColocated when servers recycled 6.5 closed Wrong exception is thrown when servers are recycled while doing transactional function execution When servers are recycled while doing transactional function execution;is there is mismatch between the member from the transaction state and member calculated using the key, TransactionDataNotColocatedException is thrown. Instead of TransactionDataNotColocatedException, TransactionDataRebalancedException should be thrown.
04/14/11 #43153 Admin API can fail with certain containers 6.5 closed Admin API or JMX Agent may fail if any GemFire member cannot locate its gemfire.jar GemFire attempts to find its gemfire.jar for response to monitoring attempts by the Admin API and JMX Agent. The following three locations are searched: 1) getProtectionDomain().getCodeSource().getLocation() 2) Searches "java.class.path" for gemfire.jar 3) Searches "sun.boot.class.path" for gemfire.jar If a JVM or container environment does not return a URL in #1 that can be used to open a stream, then the Admin API or JMX Agent may hang or fail if the gemfire.jar cannot be found on either the "java.class.path" or "sun.boot.class.path". Place the gemfire.jar on "java.class.path" or "sun.boot.class.path".
04/14/11 #43151 NullPointerException in DataSerializer.readRegion while disconnecting from DS closed NullPointerException from DataSerializer.readRegion It is possible for DataSerializer.readRegion to throw a NullPointerException. This only happens when it is called while the Cache is being closed.
04/12/11 #43126 FunctionService.onRegion(region).execute("functionId") throws Exception if the Function is not with default attributes on server 6.5 closed FunctionService.onRegion(region).execute("functionId") throws FunctionException if the Function is not with default attributes on server If client wants to execute function on server by providing functionId then the function has to be registered with default attributes on the server. The other client API's which need function attributes as parameters require user to provide attributes parameter on client side each time they want to execute function on server even if the function is actually registered on server.
04/11/11 #43108 CacheClosedException (without Caused by: ForcedDisconnectException) caught during network partition 6.5 closed a process that is forced from membership does not always see the reason in the "cause" of CacheClosedExceptions A number of parts of the product were found to be throwing CacheClosedException without setting the cause of the closure.
04/11/11 #43106 Security log will not roll on log-file-size-limit closed SecurityLogWriter is not rolled by log-file-size-limit log-file-size-limit should trigger both ManagerLogWriter and SecurityManagerLogWriter. It's re-arched and fixed.
04/08/11 #43101 NPE thrown by PartitionManager.prCheck() while creating primary bucket 6.5 closed NPE thrown by PartitionManager.prCheck() while creating primary bucket We now check if the partitioned regions is created or not, rather than throwing NullPointerException
04/07/11 #43094 NPE in Oplog.recoverCrf closed NPE in Oplog.recoverCrf Root cause is: thread-1 is recovering a disk store while thread-2 is using this disk store to create a disk region. We added dsi into GemFireCache first, then do recovery. We should finish recovery, then add it into gfc.
04/05/11 #43082 6.5 edge clients cannot communicate with 6.6 bridgeServers: "IOException: Unknown String header 0" thrown from InternalDataSerializer() 6.5 closed 6.5 edge clients cannot communicate with 6.6 bridgeServers due to change in ServerLocation class Some new attributes were added to ServerLocation which older clients could not read. Because of this clients cannot connect to the locator as well as servers.
04/05/11 #43081 Region#get() does not discover JTA transaction closed No CommitConflicts in read only JTA transactions JTA TransactionManager now detects conflicts when an entry is read in one transaction (but not modified) while being modified in another transaction concurrently.
04/01/11 #43064 CS_TX: unexpected load in bridgeServer when edge client iterates over Region.values() collection 6.5 closed values() iterator invoking cache loader Iterating over values in a region no longer causes the loader to be invoked for an invalidated key.
03/31/11 #43063 persistent txs do not persist entry destroys 6.5 closed Transactional entry destroys are not correctly persisted and may cause the disk to be corrupted Transactional entry destroys are not correctly persisted and may cause the disk to be corrupted. This will only happen if -Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true is used. The transactional destroys will not be written to disk and in some causes causes the disk store to be in a state that causes it to always throw an IllegalArgumentException during recovery.
03/31/11 #43060 Creating buckets in child colocated region throws exception if accessor does not have the child region 6.0 closed Accessors must have all colocated regions If a colocated region is created on all of the data stores that host the parent partitioned region, but the an accessor only has the parent region, bucket creation will fail in the data stores. Create the child PR in the accessors as well.
03/29/11 #43038 Client requires dataSerializers even if not required to work with the cache data 6.5 closed Data serializer and instantiator classes are no more eagerly loaded by GemFire client. Prior to this release, when a client connected to a server, the server would send all custom data serializers and instantiators that might be required for object deserialization and the client would load all of the classes. If any classes were missing from the client CLASSPATH, the connection attempt would throw a NoSubscriptionServersAvailableException. Now, classes are not eagerly loaded. GemFire loads each class as needed, the first time the GemFire needs to deserialize the object. If GemFire does not find the class it needs to load, the deserialization attempt throws a ClassNotFoundException.
03/28/11 #43027 losingSide VM does not process afterRegionDestroyed (FORCED_DISCONNECT) in timely fashion after forceful disconnect 6.5 closed member is slow to process a forced-disconnect When network-partition-detection is enabled a member will shut itself down if it is unable to periodically contact a locator process. When this type of shutdown occurs it is possible for the shutdown sequence to stall for a period of time. This was caused by a stuck timer task.
03/28/11 #43024 ConcurrentModificationException thrown while iterating over ServerBucketProfiles in PartitionedRegion.virtualPut 6.5 closed In pr single hop mode, ConcurrentModificationException thrown while iterating over ServerBucketProfiles In pr single hop mode, ConcurrentModificationException thrown while iterating over ServerBucketProfiles
03/21/11 #42979 Shared PR meta data can prevent start-up of nodes 6.5 closed Changing PR attributes and restarting some members without restarting DS results in IllegalStateException Stopping all of the members that host PR one, but leaving some members running that host PR one, changing the attributes of PR one, and restarting can result in an error about incompatible region attributes. Stop all members when changing PR attributes.
03/18/11 #42966 Wan events not received when some events are for region that does not exist closed The WAN link appears broken or hung if a gateway-enabled region is not defined on both WAN sites If a gateway-enabled region is not defined on all WAN sites, it appears as if the WAN link is broken or hung. To work-around this issue in releases before 6.6, each gateway-enabled region must be defined on all WAN sites.
03/17/11 #42947 NullPointerException while sending Exception during fucntion execution in P2P case 6.5 closed In P2P, If an exception is sent from function execution using ResultSender#sendException, then NullPointerException is observed In case pf peer-to peer function execution, if the exception is sent through function execution using ResultSender#sendException, then a NullPointerException is observed. Ideally exception to be send must be added to ResultCollector.
03/16/11 #42944 containsKey on a PR should not throw an exception when the bucket does not exist 6.5 closed containsKey/containsValueForKey on PR throws exception containsKey/containsValueForKey for a PR no longer throws an exception when the bucket for the key does not exist.
03/15/11 #42937 Functions sending multiple lastresults cause hang in Execution.execute() closed Functions sending multiple lastresults cause hang in Execution.execute() Though the contract is to send lastResult only once, if a Function sends result by calling ResultSedner.lastResult() multiple times, it causes hang in FunctionStreamingResultCollector
03/14/11 #42927 JTA: CommitConflictExceptions are wrapped with extraneous text 6.5 closed Extraneous text in JTA error messages A number of exceptions thrown by the GemFire JTA manager have extraneous text in the form "TransactionManagerImpl::operation::". This text has been removed.
03/10/11 #42921 pr-single-hop-enabled="true" and server-group do not work together 6.5 closed Client ignores server-groups when pr-single-hop is enabled. Client using pr-single-hop acquires the PartitionedRegion meta-data and creates the connections directly to the nodes with the buckets to provide single-hop access even if the node is not in the server-group the client is connected to.
03/10/11 #42920 isOriginRemote flag on cacheWriter event in datastore has incorrect value 6.5 closed Incorrect value of isOriginRemote flag in PR cache writer When a client is connected to a datastore for a partitioned region, the events on the cacheWriter now have the isOriginRemote true.
03/09/11 #42911 OSProcess.bgexec in pure java mode depends on Bash shell 6.0 closed OSProcess.bgexec in pure java mode depends on Bash shell We now have added a system property -Dgemfire.commandShell to pass the shell name. OSProcess.exec uses this shell name rather than assuming shell name as bash shell
03/09/11 #42903 gemfire encrypt-password produces non-readable output closed gemfire encrypt-password utility produces non-readable output gemfire encrypt-password utility now supports base-64 encoding. So the output is in readable form.
03/08/11 #42898 Enhancing LIKE predicate to support regEx closed Enhancements to OQL LIKE predicate. The like predicate is enhanced to support special chars (% and _) in any place of the matching string. Earlier only % is supported at the end of the string. This is supported using java Regex. The index usage with LIKE is disabled.
03/08/11 #42897 Client Index not updated by initial register interest in GFE6.5.1.4 6.5 closed Client Index update after register-interest This has been fixed in GemFire 6.6. Now in client cache if index is created declaratively using cache.xml and then registerInterest is called from client cache, the Index will be updated accordingly.
03/07/11 #42890 Java level deadlock reported by PRFunctionStreamingResultCollector related to com.gemstone.gemfire.distributed.internal.membership.InternalDistributedMember during memberDeparture 6.5 closed FunctionExecution on PR could deadlock in HA scenario When a node goes down, PRFunctionStreamingResultCollector.getResult() could deadlock handling memberDeparted from two different code paths. One as a membership listener and other from preWait() while waiting for result.
03/06/11 #42882 Wan queue events not received after disk file conversion of wan queue closed wan config issue Before running disk convert tool, the region list in pre6.5 and 6.5 cache.xml should match. We should not remove any regions in new xml files (but we can add new regions in new xml file). Some gateway events might be directed to these removed regions and cause exception.
03/06/11 #42881 Unknown header byte while converting old version wan queue to 6.6 closed disk file convertion failed on gemfire5.8 5.8 is a special version. It's GatewayEventImpl? does not contain _createTime, while 5.7, 6.0, 6.5 have it. If detected old gemfire's version is 5.8 and has wan configration, will exit.
03/04/11 #42877 Loading of cache-xml-file as a resource fails if OS path.separator is not '/' 3.0 closed Windows fails to load cache-xml-file as resource if contained in non-default Java package Specifying cache-xml-file will fail on Windows (or any OS with a path.separator different than '/') if the the value points to a classpath resource rather than an actual file. <p> java.lang.ClassLoader#getResource(String) specifies that '/' must be used as the path separator regardless of the OS path.separator System property. <p> Internally, GemFire stores the value of cache-xml-file in an instance of java.io.File which changes '/' characters to be the OS path.separator character. Then when the cache is created, an exception similar to the following is thrown: <p> com.gemstone.gemfire.cache.CacheXmlException: Declarative Cache XML file/resource "com\example\cache.xml" does not exist. at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.getCacheXmlURL(GemFireCacheImpl.java:583) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:622) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.init(GemFireCacheImpl.java:533) at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:403) at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:178) at com.gemstone.gemfire.cache.CacheFactory.create(CacheFactory.java:223) Try to use cache-xml-file only to specify an actual OS file rather than a resource. <p> Another alternative is for the application to load the desired cache.xml as a resource and then feed that into com.gemstone.gemfire.cache.Cache#loadCacheXml(InputStream). Example: <p> Cache cache = new CacheFactory().create();<br> URL url = getClass().getClassLoader().getResource("com/example/cache.xml");<br> cache.loadCacheXml(url.openStream());<br>
03/03/11 #42874 Uncaught exception in thread Thread[Idle OplogCompactor,5,Oplog Compactor Thread Group]: NPE in BucketRegion.updateCounter() 6.5 closed NPE in BucketRegion.updateCounter() Race condition when BucketRegion() is calling super() where LRU has been triggered to use BucketRegion object.
03/03/11 #42872 InternalGemFireError thrown from ProxyBucketRegion.setHosting() when asserting this.realBucket != null 6.5 closed InternalGemFireError after concurrently destroying and recreating a PR in multiple members Calling Region.destroyRegion in one member and at the same time calling Region.localDestroyRegion in another member, and then recreating the region, can in rare cases result in an InternalGemFireError.
03/02/11 #42865 jgroups messages pile up in the NAKACK sent-messages collection when security is enabled 6.0 closed memory leak in JGroups A flaw in one of the JGroups messaging protocols causes it to retain messages that should be garbage-collected. This shows up in the "distribution stats" jgNAKACKSentMessages statistic as a steady increase in the number of messages sent by that protocol that are being retained for retransmission.
03/01/11 #42857 LicenseException: Could not find a license occurs in container environments 1.0 closed License validation fails if gemfire.jar cannot be found License validation fails if the gemfire.jar cannot be located. GemFire licensing attempts to find the gemfire.jar for license validation in the following three locations: 1) getProtectionDomain().getCodeSource().getLocation() 2) Searches "java.class.path" for gemfire.jar 3) Searches "sun.boot.class.path" for gemfire.jar If a JVM or container environment does not return a URL in #1 that can be used to open a stream, then a LicenseException will be thrown if the gemfire.jar cannot be found on either the "java.class.path" or "sun.boot.class.path". Place the gemfire.jar on "java.class.path" or "sun.boot.class.path".
02/28/11 #42850 Reading stats from JMX not working as needed in customer environment 6.5 closed ObjectName for StatisticResource MBean includes Statistic Type for improved readability ObjectName for StatisticResource MBean now includes Statistic Type for improved readability through Tools like JConsole.
02/25/11 #42830 delta object disappears from region after transactional update closed Incorrect value for transactional get with Delta Transaction on an empty member (peer) where delta is used may not return correct value for transactional get operations. This has been fixed in this release.
02/23/11 #42815 ClassCastException executing a function on a PR in a loner member closed Client experience ClassCastException while executing function on Loner Member when the function is executed on a loner server (by setting the mcast-port to 0 and the locators to the empty string), client gets ClassCastException. Most of the time servers are not configured with LonerDistributedSystem. starting locator will be sufficient.
02/22/11 #42807 Index Creation fails with overflow region error on CACHING_PROXY client closed Index creation in client using CACHING_PROXY When index on overflow region is created on the client (using CACHING_PROXY), exception is thrown.
02/22/11 #42804 CQ query fails using: select * from /exampleRegion where get('key1') = '2' 6.5 closed Query Execution without Alias. This has been fixed in GemFire 6.6. Now Alias is not required to call a function on region values. Like following can be successfully used, select * from /exampleRegion where get('key1') = '2'
02/10/11 #42751 Disk Stores errors when using Japanese Locale 6.5 closed Non-english locales can cause disk store errors Non-english locales can cause disk store recovery to fail. Specify -Duser.language=en on the JVM command line.
02/09/11 #42747 UserTransaction#getStatus() returns wrong status 6.5 closed UserTransaction#getStatus() throws Exception UserTransaction#getStatus() now returns Status.STATUS_NO_TRANSACTION when there is no active transaction.
02/08/11 #42738 AssertionError: value in RegionEntry should not be INVALID closed Query using Async index maintenance throws assertion error with INVALID value. With async index maintenance, while concurrent modification is happening with the entry getting evaluated by the query engine, used to throw assertion error with INVALID value found.
02/07/11 #42735 edge create not replicated to all servers during server HA; not recovered from remaining servers during subsequent gii at startup 6.5 closed events not properly distributed after failed concurrent-map operations from client cache When a replace(K,V,V) or putIfAbsent(K,V) operation failed it would sometimes leave a phantom entry in the cache that interfered with subsequent operations on the same key. This would sometimes cause operations to fail to propagate from one server to another.
02/04/11 #42725 Disk file converter fails with ClassCastException closed Disk file converter fails with ClassCastException Did not consider that xml is new version which contain disk-store-name intead of overflow-directory. Changed xsl to handle this case.
02/04/11 #42724 disk file converter fails with non-existing disk dir closed disk file converter fails with non-existing disk dir Due to gateway-hub defined 2 gate-way queue, which case is not handled before.
02/02/11 #42713 The client metadata is incorrect on data recovered from disk 6.5 closed In pr single hop mode, client metadata can be incorrect on data recovered from disk clients may get a different metadata depending on which server it connects to. This issue reproduces only in the case where server is started after region creation.
02/02/11 #42708 remove(K,V) returns true even though key (K) does not exist 6.5 closed remove(K,V) returns true even though key (K) does not exist When a region was being concurrently destroyed a remove(K,V) operation might return true instead of throwing a RegionDestroyedException.
01/27/11 #42666 client getAll shouldn't add entries to local cache if entry doesn't exist on server 5.7 closed client getAll adds entries to local cache even if it doens't exist on server client getAll adds entries to local cache even if it doens't exist on server. If we later create an entry for those key then client will throw EntryExistException.
01/24/11 #42641 PR does not support containsValue(Object) 6.5 closed containValue(Object) for ParitionedRegion throws UnsupportedException PartitionedRegion now supports containsValue(Object).
01/24/11 #42638 replace(K,V,V) returns true and results in afterCreate event when oldValue does not match 6.5 closed replace(K,V,V) returns true and results in afterCreate event when oldValue does not match The ConcurrentMap replace(K,V,V) method sometimes applies the operation in a client cache when it shouldn't. A race condition with concurrent operations in the same VM can cause the operation to change into a create() and be applied both in the client and the server.
01/21/11 #42633 online backup produces no backup files after converting disk files and creating regions with older version xml closed Persistent regions with disk-write-attributes not backed up. Persistent files created by regions that use the deprecated disk-write-attributes rather than the disk-store attribute are not backed up.
01/17/11 #42614 Missing buckets in persistence tests closed NPE in PR recovery when defined eviction There're 2 places to save the eviction controller's stats object. One is from RegionAttributes. Another is from DiskStore. When we create the PR, it will set the former, not the latter. When we recover from DiskStore, it will set the latter not the former. when creating BR, we are checking the former while creating PR itself will check the latter. In recovery case, we will only set the latter, not the former even our region attribute did defined eviction because we thought we've got this stats from DSI, however when doing BR creation, we only check the former. we should keep use the stats for EvctionController in PR's Region Attributes.
01/13/11 #42608 Issuing multiple "agent start" commands not handled properly 7296 6.5 closed Unexpected behavior for multiple invocations of the 'agent start' command. When 'agent start' command was executed multiple times from the same working directory, the JMX Admin Agent launcher couldn't read the status correctly and used to start a new process to fail later. This issue is now fixed also with the correct status reporting using 'agent status'. JMX Admin Agent launcher creates a status file (default name: .agent.ser). The binary format for this file is changed. It's advised to delete/move the older status file from the JMX Admin Agent's working directory.
01/12/11 #42602 'GemFireConfigException: Unable to contact a Locator service' after getting a ForcedDisconnectException in locator logs. 6.5 closed Over-aggressive suspect processing forces member from the system It is possible for a member to be forcibly removed from the system if it does not accept tcp/ip connections in a reasonable amount of time and is not able to respond to wellness queries quickly enough. This was caused by an error in a timing calculation in tcp/ip connection formation. The faulty calculation takes place after the ack-wait-timeout period elapses (15 seconds) and it causes excessive suspect processing to be initiated on the unresponsive member.
01/07/11 #42585 NullPointerException found in JGroupMembershipManager.suspectMember 6.5 closed NullPointerException from JGroupMembershipManager.suspectMember() during initialization When network-partition-detection is enabled it is possible during initialization of the distributed system that a NullPointerException will be thrown, resulting in a SystemConnectException and failure to connect to the distributed system. This was caused by a race condition in the membership view installation system.
01/05/11 #42573 No buffer exception while executing a function closed No buffer exception while executing a function on server in selctor mode When BridgeServer's maxThreads are more than 0(i.e. Selector mode is enabled) and on client side pr single hop is disabled function execution can throw IOException saying there is no buffer available
01/04/11 #42567 EXCEPTION_ACCESS_VIOLATION for java.net.Inet6AddressImpl.lookupAllHostAddr with hitachi JVM 1.5.0_11-b03-CDK0850 6.5 closed
01/04/11 #42566 Can't bind RMI Registry to start on a specific IP address on a multi NIC host. 6.5 closed Can't bind RMI Registry to start on a specific IP address on a multi NIC host. On a host with multiple network interfaces, RMI Registry started by the JMX Agent gets bound to all the network interfaces and currently the user can not bind RMI Registry to a specific network interface/IP. Use different rmi-ports for different agents.
01/03/11 #42563 security logging will cause problems with rolling logs closed security logging will cause problems with rolling logs Re-arch ManageLogWriter? and GemfireStatSampler. Use instance attributes instead of static attributes. Move archive related attributes and method from ManagerLogWriter into GemfireStatSampler.
01/03/11 #42562 locator rolling logs never cleaned up closed Locators configured for log rolling never clean up child logs If a locator is configured to roll its logs then it will never remove any of its child logs.
12/30/10 #42555 'InternalGemFireException: unexpected exception on member null' caused by 'java.io.IOException: Could not create directory' while creating online backup. 6.5 closed Could not create directory error during backup. Due to a jdk issue, in rare having multiple members backing up to a directory that does not exist yet can result in "java.io.IOException: Could not create directory". Create the backup directory before invoking backup, or reattampt the backup after receiving a failure.
12/23/10 #42548 Timeout on client shutdown while waiting for responses from departed members 6.5 closed cache hangs when it should have automatically closed If a member becomes unresponsive for a long period of time, perhaps several minutes, it may be forcibly removed from the distributed system but never realize that this has happened and end up hanging waiting for responses to messages from members that are either gone or are ignoring it. This was a flaw in the membership system that allowed a shunned member to temporarily rejoin the distributed system. When a 6 minute timeout period elapsed the member was again purged from the system but was never told that this was happening and ended up hanging.
12/15/10 #42529 prSingleHop client meta data not always cleaned up 6.5 closed prSingleHop client meta data not is not cleaned up when a server goes down prSingleHop relies on a byte received as a reply of cache operation to determine whether hop has taken place and fetches metadata. Though this ensures that if a server goes down then it will be removed, however we can also use Ping operation to determine and remove when a server goes down.
12/15/10 #42527 OOME when statistics are not enabled with ConcurrentLinkedQueue 6.0 closed Disabling statistic sampling can cause a memory leak If statistic-sampling-enabled is set to false or the sampling interval is set to a very large value then GemFire can leak memory when writing statistics. The leaked memory is released every time a sample is taken. The more threads you have writing statistics the worse the leak. Do one of the following to work around this memory leak: 1. Set statistic-sampling-enabled=true and the statistic-sample-rate to a reasonable value in gemfire.properties. The leaked memory is released every time a sample is taken. 2. Set -Dgemfire.STRIPED_STATS_DISABLED=true. This will cause more thread contention when writing statistics but avoids the memory leak in the striped statistics implementation.
12/14/10 #42523 Disk file converter hangs if old version xml given to tool is really a current version xml file closed Disk file converter hangs if old version xml given to tool is really a current version xml file Check the gemfire version for loadsnapshot step. If used 6.5, exit.
12/03/10 #42495 Disk directory created with online backup does not pass the disk validator closed Unable to restore from backup files in rare cases In rare cases, backing up a system using the gemfire backup tool and recovering from a the backup can result in the following error: "java.lang.IllegalStateException: The following required files could not be found" Validate backup files using the gemfire validate-disk-store. If you receive this exception, perform another backup.
12/02/10 #42490 commit that is not permitted in function causes txState to be corrupted closed corrupted transaction state in transactional function A commit call in a transactional function (when the transaction was started on a remote node) no longer leaves the transaction in a corrupted state.
12/02/10 #42484 if JTA afterCompletion is called from a different thread the GemFire transaction remains visible to the original thread closed jta afterCompletion call if JTA afterCompletion is called from a different thread the GemFire transaction remains visible to the original thread
11/19/10 #42470 multiple instances of FunctionServiceStats on client with multiple threads and on servers closed Multiple instances of FunctionServiceStats and FunctionStats for each function has been observed in client stats. Ideally there should be only one instance of FunctionServiceStats one cache and only one instance FunctionStats per function. But due to race condition is product multiple instances of FucntionServiceStats and FucntionStats has been observed in client stats.
11/16/10 #42463 prSingleHop from client fails if hashCode is negative closed Client partition region single hop may need to do multiple hops Client partition region single hop may need to do multiple hops even though it should only need to do a single hop. This happens if the hashCode method on your keys returns a value less than zero. Implement a hashCode that always returns a positive value.
11/08/10 #42451 operations on persistent regions may fail with spurious out of disk exceptions 6.0 closed Unexpected DiskAccessException when disk space runs low You may see a DiskAccessException telling you that all disk dirs are full. This should only happen if rolling is not enabled. When rolling is enabled you should only get warnings about the disk dirs being full. But in some cases it will happen even though rolling is enabled. Set the dir-size to be a value much larger than 10G.
11/05/10 #42449 Gemfire considers link local addresses when checking to see in members are on the same host closed Presence of link local address impairs redundancy satisfaction When using the gemfire.EnforceUniqueHostAllocation flag, gemfire will not place redundant copies of data on the same host. Gemfire uses the ip addresses of each host to determine what host it is running on. Certain backup tools create the same link local address on every machine. The presence of this address caused gemfire to consider every member to be on the same host.
11/05/10 #42448 putIfAbsent followed by invalidate on PR results in invalid entry which can't be GIId closed Create with null value on persistent PR results in lost invalid entry Doing a create with a null value (to create an entry in the invalid state) on a persistent partitioned region while a member is offline can result in a situation where the offline member will not mark the entry as invalid when it recovers.
10/25/10 #42435 Inconsistent state in colocated PR when a non persistent PR is colocated with a persistent PR closed If a user colocates a non persistent PR with a persistent PR, there are certain cases where we can end up creating the colocated buckets in only some members If a user colocates a non persistent PR with a persistent PR, there are certain cases where we can end up creating the colocated buckets in only some members. This can lead to hangs if the missing colocated bucket should be primary. The issue is caused because when we recover the parent bucket, we also try to create the colocated buckets. However, the colocated bucket creation will fail (as designed) if the colocated region is not created on all the nodes that host the parent region. The issue is that the very last member to create the colocated region will actually succeed in creating colocated buckets, but other members that host the parent bucket didn't create those buckets.
10/25/10 #42434 Parent PR with a colocated child PR may fail to restore redundancy closed In case of colocation, bucket redundancy is not satisfied for colocated regions. and later operations on missing buckets can cause potential hang. In case of colocation, first the parent region is created and buckets are created(populate region) on this region, and then child region is created without any bucket creation (not populating region), then though the colocation is complete , child region will not have the buckets corresponding to the parent region buckets.
10/25/10 #42433 getAll does not update last accessed time 5.7 closed A getAll done from a client will not update that last access time of the entries it reads on the server A getAll done from a client will not update that last access time of the entries it reads on the server.
10/24/10 #42429 Hang in waitForPrimaryMember when EndBucket message overlaps with a node going down 6.5 closed
10/23/10 #42427 Unable to connect to DS on laptop not on any network closed unable to run GemFire when not connected to a network GemFire fails to connect on startup if the machine's NIC is not connected to a network. The cause of the exception will be similar to this: com.gemstone.gemfire.IncompatibleSystemException?: Peer localhost:4195/4192 has no network interfaces This is caused by the operating system turning off external addresses when there is no cable connected to the NIC.
10/22/10 #42424 In 4 node serial wan topologies, events can bounce between sites closed Gateway receivers are now added to the GatewayEvent CallbackArgument as the event is replicated across the wan Gateway receivers are now added to the GatewayEvent CallbackArgument as the event is replicated across the wan. It keeps track of which receivers have applied this event and does not send the event back to the receivers
10/21/10 #42419 backed up disk stores map contains null key instead of member; cannot restore backup files closed Backup from admin API may miss the local members files Invoking AdminDistributedSystem.backupAllMembers from within a member that has a disk store can result in not backing up the member, and listing the member as null in the BackupStatus. Don't invoke backup from within a member that has a disk store.
10/21/10 #42418 Online backup run from command line tools sometimes reports disk dirs both offline and backed up closed Online backup run from command line tools sometimes reports disk dirs both offline and backed up It's possible that when calling getMissingPersistentMembers, some members are still creating/recovering regions, and at FinishBackupRequest.send, the regions at the members are ready. Logically, since the members in successfulMembers should override the previous missingMembers
10/17/10 #42410 Expiration of remote entries is not correctly working 6.5 closed Remote gets do not update last access time in some cases allowing expiration to still occur In some cases a remote get of a region entry will not update the entries last access time. This can cause it to still expire even that it has been recently read.
10/14/10 #42408 Memory leak in PartitionedRegion entry expiry closed EntryExpiryTask gets added to PartitionedRegion entryExpiryTasks causing Memory leak In case of PartitionedRegion expiration of entries are managed at BucketRegion level. The entryExpiryMap is per bucket and entryExpiryTasks are added to it. entryExpiryTasks should not be added to extryExpiryMap of PartitionedRegion since the entry doesn't exist in PartitionedRegion but BucketRegion. This causes EntryNotFoundException when the entryExpiryTask is getting scheduled for expiry.
10/06/10 #42394 PR distribution advisor issue: wrong stub used for communications after new member uses same direct-channel port as old (crashed) member 6.5 closed hang trying to communicate with a departed member If the DistributedMember identity of a new member happens to be the same as that of an old member it is possible that partitioned region operations will hang trying to communicate with the old member. This was caused by a flaw that allowed the old communication information to be retained.
10/01/10 #42382 Feature gap: overflow regions do not allow indexes closed Index support on overflow region. Application can create indexes on overflow region, given that the index expressions satisfies the compact-range index requirements.
09/27/10 #42369 hang in messaging layer if toData throws an exception 6.0 closed Paritioned region operations may hang if toData throws an exception If your serialization code that implements DataSerializable.toData throws an exception it can cause some partitioned region operations to hang. This is because the product keep retrying the operation thinking it is caused by a transient network error.
09/24/10 #42358 Static reference to Refresh Timer MBean in MBeanUtil remains even after the agent is stopped 6.5 closed GemFire Admin Agent should not be restarted in the same JVM process If the agent is restarted in the same JVM process, auto-refresh for SystemMember, CacheVm & StatisticResource MBeans does not happen and hence GemFire statistics & other attributes of these MBeans are not refreshed. There is no known work-around yet. The Agent JVM process should be stopped and the Admin Agent should be started in a new JVM process.
09/17/10 #42346 CachePerfStats regions stat count is incremented for internal region. closed Regions statistic value may be larger than it should be The CachePerfStats "regions" statistic may have a larger value than is correct. This is because internal regions are used by some product features and the statistics was also being incremented for the internal regions.
09/17/10 #42343 the instance name given to PartitionedRegionStats is too verbose closed The PartitionedRegionStats instance name is too long The PartitionedRegionStats instance name is "Partitioned Region " + fullRegionName + " Statistics". It should just be fullRegionName.
09/15/10 #42334 NPE in PartitionRegionHelper methods. 6.0 closed PartitionRegionHelper's getLocalData and getLocalPrimaryData throws NullPointerException A null check is missing from region instance passed to PartitionRegionHelper#getLocalData and PartitionRegionHelper#getLocalPrimaryData. If null is passed instead of actual region object, IllegalArgumentException should be thrown.
09/03/10 #42312 ClientHealthMonitor.getStatusForAllClients() should ignore connections that are not for clients. 6.5 closed For a Gateway Hub, GemFire monitoring module includes Gateways from remote site as clients. While retrieving information about clients of a member that hosts a Gateway Hub, GemFire Admin module includes Gateways from remote site also as clients of the member. This issue is limited only to monitoring.
09/02/10 #42309 Server swallows exception during cq execution 6.0 closed Failure message with CQ execution is not reported at client. The client does not report the cause of the failure that happens during CQ execution; event though the server logs shows the actual error message. The user has to look into the server log to see the error message.
08/30/10 #42296 NPE thrown from LocalRegion.serverPut() on replace(K,V,V), expected CacheClosedException 6.0 closed NullPointerException thrown by replace(K,V,V) when the cache is closed The replace(K,V,V) operation throws a null pointer exception when attempted on a closed cache instead of throwing a CacheClosedException.
08/19/10 #42281 Hang in waitForPrimary in inserts on child region when it is created after inserts on parent 6.5 closed No colocation if parent PR populated brefore creating child PR If parent partitioned region is populated before creating child partitioned regions, then populating child partitioned regions do not colocate buckets as per the parent partitioned region buckets.
08/18/10 #42280 Edge client stops getting events causing test to fail with data inconsistency closed client cache stops getting updates after server failure It is possible that if server redundancy is lost and new servers are starting at the same time a client's sole server is shutting down that the new servers will not recover subscription information for the client and will stop sending it updates.
08/11/10 #42265 PR get ops should be executed in P2P reader threads to avoid contention on VMThinDiskLRURegionEntry (see AbstractDiskLRURegionEntry.setBits()) 6.0 closed Hang with current PR operations With a partitioned region with persistence or overflow configured, and using conserve-sockets false, several concurrent operations on different members can cause a hang in very rare cases.
08/11/10 #42264 socket-lease-time has no impact due to typo p2p.idleConnectionTime[out] 6.0 closed Connections continue to be closed even when socket-lease-time="0" Connections continue to be closed even when socket-lease-time="0"
08/11/10 #42261 DistributedSystem disconnect hang after NPE reported by VERIFY_SUSPECT.stop() 6.0 closed Hang during shutdown In rare circumstances we have seen tests hang during shutdown after throwing a NullPointerException in VERIFY_SUSPECT.stop(). Killing the process that threw the exception will resolve the hang.
08/10/10 #42253 InternalGemFireError from UpdateOperation$UpdateMessage.setNewValueInEvent closed InternalGemFireError reported by UpdateOperation$UpdateMessage.setNewValueInEvent *Fixed in 6.6* This issue arises when mixed object types, i.e. some implementing Delta and other not, are used for updating keys in a region. We have already mentioned in the docs that such updates are not supported and that a ClassCastException is thrown if these are seen. May be we can further add that such a usage may cause other problems, including possible loss/corruption of data. As of now, the server which receives such a delta update from client does not throw the exception but when it distributes that delta update to its peers, those peers may throw this exception back to it, which in turn may give it back to the client. In server-to-client path, client doesn't throw ClassCastException back to the application but simply logs it as a warning.
08/05/10 #42244 primary HA region queues are not balanced closed Primary client subscription queues are not balanced While client subscription queues were balanced fairly across servers, primaries were not properly balanced, causing performance problems. Now, primary queues are much more balanced.
08/05/10 #42241 Unexpected ServerOperationException caused by CacheClosedException closed PutAll partial result behavior Partial result will not return ServerOperationException caused by CacheClosedException to user application. In stead, user application will get CancelException directly.
08/02/10 #42221 replace(K,V) from client did not put new value in client closed replace(K,V) from client does not put V in client When an entry exists on the server with key K and value null, a replace(K,V) from client replaces the value on the server, but does not put new value V in client cache. However, get(K) from client will fetch the value from server.
07/19/10 #42153 destroying region hung waiting for replies on vm waiting for destroy lock closed Hang creating a region with concurrent region destroy If a member is creating a region while other members are doing a distributed destroy of the same region, that member could hang while creating the region in rare cases.
07/15/10 #42139 Proctor logging should not be Fine Level when hitting low memory thresholds 6.5 closed Log level need to be upgraded from fine to warning when hitting low mwmory thresholds. When the memory is chronically low, then it should be logged as warning rather than fine level log.
06/30/10 #42103 Gateway shutdown hang: GemFireCache.close -> GemFireCache.stopServers => GatewayImpl.stop => PoolImpl.acquireConnection 6.0 closed Hang closing a gateway during network partition If a gateway is closed before the gateway has established a connection to the remote side, closing the gateway may hang if a network partition occurs.
06/28/10 #42091 LinuxSystemStats.processes statistic is incorrect closed The LinuxSystemStats "processes" may be incorrect on RedHat The LinuxSystemStats "processes" may be incorrect on certain versions of RedHat.
06/28/10 #42087 NullPointerException for DistributionManager.getChannelId 6.0 closed NullPointerException thrown by DistributedSystem.connect() It is remotely possible that DistributedSystem.connect() will throw a NullPointerException in DistributionManager.getChannelId(). This can happen when enable-network-partition-detection has been enabled and the connection attempt succeeds but is immediately disconnected by a network partition event.
06/22/10 #42076 member hangs in DistributedSytem.connect() [ClientGmsImpl.findInitialMembers] during network partition 6.0 closed Hang during DistributedSystem.connect() It is remotely possible for DistributedSystem.connect() to hang in ClientGmsImpl.findInitialMembers(). Thread dumps will show another thread named "UDP ucast receiver" blocked in PingWaiter.getPossibleCoordinator(). This can happen if the system is attempting to connect to a locator that was running on a machine that crashed during the connection attempt.
06/14/10 #42058 DiskAccessException while creating diskStore caused by java.io.IOException: Input/output error closed Input/output error when creating a disk store on NFS mount We have observed that when persisting to an NFS mount on redhat 5 we occasionally see this error when creating the persistent store: java.io.IOException: Input/output error.
05/07/10 #41957 Interaction between registerInterest and eviction produces incorrect number of entries in region closed Incorrect number of entries in client region after registerInterest If a client region is configured with eviction, the eviction stats can be inaccurate after a call to registerInterest with InterestResultPolicy.KEYS_VALUES. This will result in evicting the wrong number of entries.
05/04/10 #41941 when a cached object changes from serialized to deserialized its size is not updated closed ObjectSizer not consulted when deserializing objects When using memory sized based eviction, an object sizer can be provided to ensure that gemfire accurately calculates the size memory usage of each object. This object sizer is not being consulted in certain cases when gemfire has the serialized form available for the object and then later deserializes it. Instead, gemfire remembers the serialized size. This can lead to inaccuracy in when gemfire performs memory based eviction.
04/29/10 #41921 Distributed deadlock when gateway startup is concurrent with ops and conserve-sockets=true closed Distributed deadlock when gateway startup is concurrent with ops and conserve-sockets=true In rare circumstances, startup of a gateway could hang when cache operations are concurrently occurring. This can only happen if the gemfire property conserve-sockets=true is set. This race condition has been fixed.
04/28/10 #41917 transportUdp: peer hangs in Flow control (replenishments) processing 6.0 closed hang in FC flow control protocol It is possible under heavy load with disable-tcp=true for the system to lose messages. This sometimes manifests as a hang in the com.gemstone.org.jgroups.protocols.FC protocol. The problem is caused by flaws in UDP message dispatching.
04/26/10 #41889 entry operations hang in waitForReplies from surviving side when network dropped (network partition tests) 6.0 closed operations hang waiting for replies from crashed machines with enable-network-partition-detection=true and IBM JVM Using the IBM 1.5 JVM we have found that invoking Thread.isAlive() or Thread.isDead() on a thread that is reading on a socket connected to a machine that has crashed can hang. This causes operations to block until the OS keepalive timeout expires. We have removed these checks when running in an IBM JVM when network-partition-detection is enabled.
04/20/10 #41865 hang in BucketAdvisor.releasePrimaryLock waiting for replies from member that was previously shutdown closed Hang closing a partitioned region If a member crashes while another member is closing a partitioned region or closing a cache containing a partitioned region, the member doing the close may hang while performing the close operation in rare cases.
04/20/10 #41857 JMX Agent can't use the RMI Registry that is already running. 6.0 closed JMX Admin Agent can now use external RMI Registry when rmi-registry-enabled is set to false. JMX Admin Agent boolean property rmi-registry-enabled indicates whether it should start the RMI Registry or use an external RMI Registry. Default value for this is true and the RMI Registry is started by the JMX Admin Agent. When this property is set to false, the JMX Admin Agent can now use the external RMI Registry.
04/20/10 #41855 While starting JMX Agent, there should be a way to configure the RMI Connector Server port. 6.0 closed Well-defined ports should be configurable for the Agent. Additional properties now to define well-known ports are: (1)rmi-server-port: The port on which the RMI Connector Server should start. (2)membership-port-range: The allowed range of UDP ports for use in forming an unique membership identifier. This range is given as two numbers separated by a minus sign. (3)tcp-port: TCP/IP port number to use in the agent's distributed system These properties are useful for starting the agent behind a firewall.
04/19/10 #41850 giiWhileMultiplePublishing fails when 3 of 10 (replicated) members do not have the entire keySet 6.0 closed message loss with disable-tcp=true It is possible for UDP messages to be lost under heavy load. This is caused by faults in the UDP unicast dispatching code.
04/16/10 #41829 Constraints on valid characters for use in region names should include OQL query string constraints 6.0 closed Querying on region with special characters Queries referring the region with special character (supported with regionName) are not supported. The support is added in 6.5.
03/31/10 #41739 shutDownAllMembers() appears to disconnect admin vm closed ShutDownAll assumptions ShutDownAll will only shutdown members with cache. Locator, admin members are not shut down.
03/29/10 #41726 NPE generated in DataSerializer.readClass if getContextClassLoader returns null 6.0 closed NPE in DataSerializer when using GemFire as an OSGi bundle When GemFire is used as an OSGi bundle, a NPE is thrown (visible only in fine logs)
03/22/10 #41705 Assertion thrown from RegionAdvisor.getBucket() during bucket recovery 6.0 closed Assertion error during bucket recovery After a new member joins, an assertion error could be thrown when we try to restore the redundant copy for a colocated partitioned region.
03/17/10 #41686 async disk region leaks memory 5.7 closed Memory leak with with async persistence or overflow With a region configured with asynchronous persistence or overflow, the disk region may create and retain many byte buffers while getting an initial image from another peer. After this point the byte buffers are not released, resulting in excessive memory usage.
03/12/10 #41671 ConcurrentModificationException thrown while iterating over DistributedRegion.getHeapThresholdReachedMembers HashMap 6.0 closed ConcurrentModificationException in DistributedRegion.getHeapThresholdReachedMembers() A ConcurrentModificationException may be thrown when a remote member exceeds Critical memory threshold.
03/10/10 #41663 tests with concurrent region (region create, region destroy) operations fail with OOME 6.0 closed EventTracker memory leak A small flaw in Region destruction causes the cache to retain references to EventTracker objects that should otherwise be discarded. EventTracker objects record information about which events have been applied to a cache Region. It could impact an application that has a high thread count across the distributed system and which performs a lot of Region destruction operations.
03/06/10 #41628 Need to remove the use of BlowFishJ from GemFire closed GemFire uses BlowFishJ GemFire no longer uses BlowFishJ, it has been replaced by JDK supported BlowFish algorithm.
02/22/10 #41568 test hangs while creating cache with ipv6 closed Hang creating a connection while admin console is running If there is an admin console running that is receiving alerts from the gemfire members, and a newly created VM can't connect to the admin console within p2p.handshakeTimeoutMS (60 seconds), the member in trouble could hang during the DistributedSystem.connect call.
02/11/10 #41553 Support to include keys as part of the CQ result set. closed CQ Results to include keys. When CQ is executed with "executeWithInitialResults" option, the resultset returned does not contain the keys as part of the result set, because of this it is harder to correlate between result set and the CQ events generated in later stages, the CQ Event includes the key on which the update happened.
02/09/10 #41539 Shutdown timeout with Distributed system shutdown hook waiting for responses to UpdateAttributes requests (from departed members) 6.0 closed hang during shutdown waiting for responses from departed members It is possible for the product to hang during shutdown, issuing a warning message that it has not received responses to a message from members that have shut down. This is caused by early termination of notification of membership changes in some parts of the product.
02/09/10 #41538 unexpected afterRemoteRegionCrash event refers to vm that should be healthy closed gemfire deadlocks and is kicked out of distributed system It is possible for gemfire to hang while attempting to send an alert to a member that is no longer there. The code sending the alert holds a lock that prevents the member from being able to respond to failure-detection probes or membership changes.
01/25/10 #41509 ClassCastException running an OQL query closed ClassCastException running an OQL query This is fixed in gemfire57_hotfix and is ported to GemFire 6.5.
01/25/10 #41508 hang creating region when peer logs that "Peer has disappeared from view" 6.0 closed hang attempting to connect to departed member If a new member happens to reuse the peer-to-peer port number of a recently departed member it is possible that the product will hang trying to communicate with the departed member after logging "Peer has disappeared from view". This is due to a bookkeeping error in membership management.
01/15/10 #41482 Managed Resources related to regions are not removed even after the region is destroyed/removed/lost. 6.0 closed Clean up managed resources in Agent created for regions in the Cache Managed resources are created in Agent for regions in the cache in a member of a distributed system. These are now removed when a region gets destroyed. Also there are four new notifications available for JMX clients through JMX on the MBeans - SystemMember and CacheVm. The notifications are: (1)gemfire.distributedsystem.cache.created - Creation of a cache on a member (2)gemfire.distributedsystem.cache.closed - Closure of a cache on a member (3)gemfire.distributedsystem.cache.region.created - Creation of a region in a cache on a member (4)gemfire.distributedsystem.cache.region.lost - Removal of a region from a cache on a member
01/13/10 #41473 Members should send notifications for changes in the set of clients. 6.0 closed JMX Notifications for GemFire cache client connections are now sent by JMX Admin Agent to JMX Clients GemFire Client membership information for SystemMember & CacheVm MBeans is available through these notifications: (1) gemfire.distributedsystem.cache.client.joined - When a cache client connects with a cache server (2) gemfire.distributedsystem.cache.client.left - When a cache client gets disconnected gracefully from a cache server (3) gemfire.distributedsystem.cache.client.crashed - When a cache client gets crashed and/or abruptly loses connection with a cache server
01/12/10 #41468 Data consistency between CQ Result Set and the region data. 6.0 closed Data consistency between CQ Result Set and the region data. When CQ is executed using executeWithInitialResults option, there is a possibility that CQ can miss the events that is applied While resultset is being sent to client. This is fixed in 6.5 by queuing event that occurs during CQ execution on the client and replaying once CQ is completely initialized. NOTE: There is a possibility that the change may already reflected in the result set, still the CQ listener can see the same change (resulting in duplicate event), the client application need to manage the duplicate event (if it needs to ignore the event or apply the same on the result set).
12/10/09 #41402 data inconsistency PR datastore with functionExecution HA re-execution 6.0 closed Executing write operations on a cache inside a function can cause inconsistency when redundant copies is greater than 1 This issue occurs when the primary on which the function has partially executed is killed and it has done the following 1> It has distributed the operation to one of the two secondaries. 2> The secondary that received the operation becomes the primary. 3> Re-execution of the function happens on the new primary. The product executes the function on a thread pool. When the cache operation like say destroy, is done in the function body, it goes through the normal process of generating an event id based on member, thread and sequenceid on that node. When the retry comes in, it happens on a different node, on a different thread and has a different sequence id. So there is no way to detect this as a re-execution of a previous function. This is actually no different from the case where we put data into a region on a peer which is the primary and kill it. The redundant nodes will be inconsistent and there is nothing that can be done. Prudent practices: 1> Use a redundancy level of 1 for the partitioned region 2> If 1 is not feasible, use transaction if you need the all or nothing behavior for cache operations running inside a function
11/24/09 #41357 Shutdown hang with ConcurrentModificationException thrown from LogWriterImpl.cleanUpThreadGroups during InternalDistributedSystem disconnect 6.0 closed DistributedSystem disconnect throws ConcurrentModificationException During shutdown it is possible for DistributedSystem.disconnect() to throw a ConcurrentModificationException. This can happen if an administrative member is disconnecting at the same time. The exception is thrown from LogWriterImpl.cleanUpThreadGroups().
11/24/09 #41355 SystemConnectException: Unable to become coordinator of existing group because no view responses were received 6.0 closed locator startup fails When authorization is used or enable-network-partition-detection is enabled it is possible for locator startup to fail with the message "Unable to become coordinator of existing group because no view responses were received".
11/18/09 #41323 peer PR member misses destroy (while performing bucket gii) during rebalancing 6.0 closed Missing CQ event when bucket re-balance in progress This is an missing event issues. This was first seen in eventFilterOpt branch and is fixed in 6.5 release.
11/16/09 #41306 Unexpected DiskAccessException, Data for diskEntry could not be obtained from Disk. closed DiskAccessException when applying ConcurrentMap operations to a region When processing a ConcurrentMap operation GemFire may throw a DiskAccessException. The stack will show that the product is attempting to read information from the disk but that the data could not be located: com.gemstone.gemfire.cache.DiskAccessException: For Region: /testRegion: Data for DiskEntry having DiskId as Oplog ID = -1; Offset in Oplog = 832485; Value Length = 23; UserBits is = 1 could not be obtained from Disk. A clear operation may have deleted the oplogs
11/11/09 #41283 Getting a server's PR entry from a client doesn't update its lastAccessedTime closed lastAccessedTime on an entry does not reflect when the entry was accessed last from any client in the system This is a trade-off unlikely to be ever changed. In order to scale gets, we allow gets to be satisfied from primary or secondary data stores. The lastAccessedTime is maintained locally on the store. So it is likely that key X has been fetched on a secondary recently but has idle timed out on the primary due to load balancing. We do ensure that when an entry expires out on a primary, it is removed from the entire system
11/11/09 #41280 createCQfetchInitialResult fails, Caused by: NPE from CqService.executeCq() 6.0 closed NPE with CQ Execution Reported when CQ is executed. One cause of this bug was unsynchronized code that establishes the identity of a client based on its first connection's port. Fixed in GemFire 6.5 release.
11/05/09 #41269 Unexpected replies processed in bridge servers 6.0 closed warning messages in logs about unexpected replies When using Delta, if one of the members has a region with DataPolicy EMPTY, the following warning message is logged "Received reply from member <memberId> but was not expecting one."
10/28/09 #41248 locator fails to start with GemFireConfigException closed locator fails to start with GemFireConfigException If the system property gemfire.locators is used to configure the locators setting and the property doesn't include the locator being started, startup will fail with a GemFireConfigException {{{ com.gemstone.gemfire.GemFireConfigException: Unable to contact a Locator service. Operation either timed out or Locator does not exist. Configured list of locators is "[frodo:15964]". at com.gemstone.org.jgroups.protocols.TCPGOSSIP.sendGetMembersRequest(TCPGOSSIP.java:183) at com.gemstone.org.jgroups.protocols.PingSender.run(PingSender.java:82) at java.lang.Thread.run(Thread.java:619) }}} As a workaround, make sure that the gemfire.locators property includes the locator being started.
10/13/09 #41206 ClassNotFoundException when DataSerializer attempts to deserialize an object array that has an array component type closed Deserializing a multidimensional array fails If you serialize an array that array fields with DataSerializer, gemfire will throw a ClassNotFoundException when deserializing the array.
10/05/09 #41188 GII recipient could incorrectly ignore an event because it is marked as a possible duplicate closed Crash while creating a replicate region could result in a lost update In rare cases an update may be lost if one cache server is creating a replicated region and another cacher server with the same region crashes while applying them update from a client. After the crash, the cache server that just created the region may miss the update.
09/30/09 #41163 A large number of the following AdminException are seen in the Agent logfiles ... 6.0 closed Few occurrences of AdminException about failure to refresh statistics There could be few occurrences of "AdminException: Failed to refresh statistics". These could be ignored if around the same time there is a log statement logged as: "Processing client membership event from <Server_Id> for client with id: <Client_Id> running on host: <Client_Host>". These exceptions could appear in logs until the clean up event received from the server is processed completely.
09/28/09 #41160 closing cache hangs waiting for replies from vm making no attempt to respond closed hang during shutdown with disable-tcp=true It is possible for the product to hang during shutdown when disable-tcp=true. This is caused by faults in the UDP unicast dispatching code.
09/25/09 #41155 Client id is not random enough (getting duplicates) closed Duplicate client cache ID It is possible for two client caches to use the same membership ID, causing servers to become confused and mis-deliver events. The caches must be running on the same machine for this to happen.
09/21/09 #41136 Eviction is not evicting the least recently used entries for normal regions 6.0 closed Entry other than the least recently used was evicted Eviction does not always evict the least recently used entry.
09/21/09 #41131 Hang with mix of gets and puts using same key with Partitioned region closed Hang with concurrent operations on the same key with statistics enabled In rare cases, concurrent operations on the same key in partitioned region can result in a hang if statistics are enabled.
09/17/09 #41117 InternalGemFireError: Assert thrown from partitioned.DestroyMessage during PR invalidate region 6.0 closed invalidateRegion() is not supported for PartitionedRegions PartitionedRegion now supports invalidateRegion() operation.
09/14/09 #41097 EnforceUniqueHostStorageAllocation flag prevents moving a bucket between two VMs on the same host 6.0 closed gemfire.EnforceUniqueHostStorageAllocation setting has an inattended impact on partitioned region rebalancing Setting the gemfire.EnforceUniqueHostStorageAllocation prevents buckets from moving one VM to another on the same host during a rebalance operation.
09/14/09 #41096 Enabling both eviction and expiration in a partitioned region leaves entries in the cache. 6.0 closed Partition Region eviction may prevent entries from expiring In prior version of GemFire entries would not get expired on partition region secondaries. This would occur if eviction of an entry in a partition region primary occurred before expiration, and the eviction action was "LOCAL_DESTROY".
09/14/09 #41093 Transactional entry-create in region destroyed within same transaction is unexpectedly processed by CacheListener and TransactionListener. 6.0 closed Transactional load does not cause conflict A load done to satisfy a get operation does not cause a CommitConflictException even though the same entry is modified by another thread.
09/14/09 #41091 Missing primary detected after member forcefully disconnected from DS (underlying InternalGemFireError: Trying to clear a bucket region that was not destroyed) 6.0 closed Redundancy not satisfied after network partition If network partition detection is enabled, in rare cases gemfire can fail to restore redundancy after the partition.
09/11/09 #41085 The LRU list can get into a state where it won't clean up properly 5.7 closed Memory leak with LRU eviction In a region with eviction configured, if eviction never actually occurs, but many destroy operations are performed on the region, some metadata for the destroyed entries may be retained, resulting in excess heap usage. This can also affect gemfire gateways.
09/11/09 #41084 GatewayEventImpl.isUpdate uses a transient variable in its determination 5.7 closed memory leak can occur in gateway queues that enable conflation In versions prior to 6.5, a memory leak could occur in VMs containing gateway queues that have conflation enabled and whose events overflow to disk. This leak was fixed in 6.5.
09/08/09 #41076 Memory leak of EntryExpiryTasks in BucketRegion.pendingSecondaryExpires closed Memory leak in partition region secondaries When an entry in a partition region secondary is destroyed, the expiration task associated with the entry is not released until the secondary switched to being the primary.
09/08/09 #41075 accessor vms hang in waitForPrimaryMember after a dataStore is forcefully disconnected from the DS 6.0 closed hang caused by alert listener notification It is possible for GemFire to deadlock trying to notify an admin member of an alert. Thread dumps will show a thread in ManagerLogWriter.notifyAlertListeners() with other threads waiting to lock the membership view.
09/07/09 #41060 JMX operation SystemMemberCache.getRegionSnapshot fails completely if creating snapshot for even one of the regions fails. 6.0 closed Occurrence of an exception in admin agent while retrieving region information would prevent the retrieval of region information for other regions on the member. The admin agent logs failures encountered while retrieving information about regions in a cache, and continues with the retrieval of information of the other regions on the member. In versions of GemFire Enterprise prior to 6.5, this failure would prevent the admin agent from retrieving information about all regions present in a cache. This behavior was most commonly seen when invoking the SystemMemberCache.getRegionSnapshot MBean operation.
09/04/09 #41058 Missing requirement that agent needs instantiators on its classpath 5.7 closed Admin Agent in GFE 5.7 or older needs instantiators on its classpath This is specific to GemFire versions 5.7 & older. Consider the case where: (1)The application has custom classes that it uses to store data into the cache. (2)These custom classes implement DataSerializable interface & provide their own instantiator. When the JMXAgent starts it would require the application specific classes on it's classpath. The exception to this is: If the agent is started before the cacheserver(s), in which case a warning is logged on the absence of application specific classes (in the agent classpath) but agent operations continue.
08/30/09 #41025 hasDelta/toDelta are invoked on the client side even if Delta Propagation property is turned off 6.0 closed Client sends delta even if delta-propagation=false in the distributed system Client have no knowledge whether delta-propagation is turned on or off on the server, and attempts to send deltas during updates. This does not cause any data errors. The server handles the incoming delta bytes and does not propagate the update as a delta.
08/27/09 #41022 Cacheserver ignores log-file property from gemfire.properties file 6.0 closed Cacheserver script ignores log-file property When starting a cache server using the cacheserver script, the log-file property in a gemfire.properties was being ignored. Now, the search order for the log-file property is: 1. command line arg 2. gemfire.properties 3. cacheserver.log default.
08/27/09 #41021 New EventTrackers are not tracked properly by the ExpiryTask 6.0 closed A memory leak involving event trackers. The cache uses event trackers to ensure that we can detect duplicates coming in from a single thread (events that may been retransmitted due to primary servers going down). These trackers are supposed to expire after a specified idle timeout period. In 6.0, the expiration task was not removing these event trackers leading to a memory leak. This is an issue for long running systems where publishing threads keep changing over the lifetime of the system. This has been addressed in 6.5
08/26/09 #41014 JMX Agent error reading mcast-port property 6.0 closed Leading and trailing whitespace in property values would prevent a cache server or agent process from starting. Preceding or trailing spaces in the values in the gemfire.properties or the agent's properties files could result in exception preventing the process from getting launched. Now all values are trimmed of leading & trailing white spaces.
08/21/09 #41001 DataSerializer.register throws the wrong exception 6.0 closed DataSerializer.register throws incorrect exception type If the id specified for a DataSerializer type clashes with that of a type already registered with the data serialization framework, GFE throws an IllegalArgumentException instead of an IllegalStateException as documented. The exception message, though, correctly described the reason for this exception and also names the class that is also registered.
08/19/09 #40996 PartitionedRegion#getEntry can access an entry before it is created closed Early escape of Region.Entry from CacheWriter It was possible for the cache writer to get a reference to a Region.Entry before it was initialized. A call to getEntry now returns null.
08/18/09 #40985 Possible infinite loop in GrantorRequestProcessor.startElderCall closed Hang while closing a global region In rare conditions, closing a global region could result in a hang. This may cause other members to hang trying to lock entries while updating them.
08/17/09 #40977 Entries are lost in PartitionedRegions by cycling dataStore VMs 6.0 closed During HA event, destroy operation failed with EntryNotFoundException When a destroy operation is done on a PartitionedRegion and the primary member for that key crashes, an EntryNotFoundException may be thrown.
08/11/09 #40955 Iterating on PR local data invokes PartitionResolver closed Improvements to partition resolver PartitionResolver is now invoked only once per operation. Iterating over local data does not invoke resolver. Iterators from peer accessors does not invoke resolver in the accessor.
08/11/09 #40953 Region javadoc for putAll states it is unsupported on PR 5.7 closed Region javadoc for putAll states it is unsupported on PartitionedRegions The javadoc for putAll states: {{{ throws UnsupportedOperationException If the region is a partitioned region }}} This is a mistake, putAll has been supported on all region types since the GemFire 5.7 release. Customers using GemFire 5.7 or later are encouraged to use putAll on partitioned regions.
08/06/09 #40943 Reblancing colocated regions moves fewer buckets than expected 6.0 closed Reblancing colocated regions moves less data than expected Due to a bug in the rebalancing algorithm, gemfire does not move data during a rebalance even though it appears there is space for the data. This bug only appears when using colocated regions. Gemfire is erroneously comparing the total size of data to be moved for all of the colocated regions with the local-max-memory setting of each individual region. If the total amount of data is greater than the remaining capacity of the region, gemfire will not move the data. Increase the local-max-memory of all of the regions.
08/04/09 #40932 GemFire cannot serialize a String who's logical length is < 0xFFFF, but who's utf-8 encoded length is > 0xFFFF closed GemFire cannot serialize a String who's logical length is < 0xFFFF, but who's utf-8 encoded length is > 0xFFFF If you have a string with some multibyte characters that is less than 0xFFFF characters long, but will be more than 0xFFFF bytes when serialized using UTF, a UTFDataFormatException is thrown when serializing the string with gemfire.
07/29/09 #40916 IllegalArgumentException thrown if multiple regions configured using same EvictionAttributes 5.0 closed IllegalArgumentException thrown if multiple regions configured using same EvictionAttributes If a single instance of EvictionAttributes was shared among multiple region creations, an IllegalArgumentException was thrown. This is now fixed.
07/24/09 #40906 socket-buffer-size can not exceed 16,777,215 5.7 closed GemFire API allows socket-buffer-size to be configured to values greater than Java allows. Setting the "socket-buffer-size" to a value greater than 16,777,215 will trigger an exception: {{{ java.lang.IllegalStateException?: tcp message exceeded max size of 16,777,215 }}} Do not set the "socket-buffer-size" to a value greater than 16,777,215.
07/17/09 #40886 MulticastSocket.setInterface call fails on Windows Server 2008 closed GemFire cannot create a multicast socket on WIndows Server 2008, Windows Vista, or Windows 7 Due complications related to JGroups bug JGRP-777 GemFire throws an exception with the root cause stating "An operation was attempted on something that is not a socket" when configured to use Multicast for membership discovery on Windows Server 2008. {{{ Caused by: java.net.SocketException: An operation was attempted on something that is not a socket at java.net.PlainDatagramSocketImpl.socketSetOption(Native Method) at java.net.PlainDatagramSocketImpl.setOption(PlainDatagramSocketImpl.ja va:299) at java.net.MulticastSocket.setInterface(MulticastSocket.java:420) at com.gemstone.org.jgroups.protocols.UDP.createSockets(UDP.java:631) at com.gemstone.org.jgroups.protocols.UDP.start(UDP.java:502) at com.gemstone.org.jgroups.stack.Protocol.handleSpecialDownEvent(Protoc ol.java:874) ... 78 more }}} Use locators instead of multicast for discovery.
07/16/09 #40883 fromDelta called twice for same update with CQs and HA closed Delta callback method fromDelta() may get invoked twice for same update *Fixed in 6.6* In delta propagation feature, it is possible that a vm may invoke fromDelta() twice on its value for the same event. The second invocation may result in generating a new value which may differ from the actual value which triggered the event. This could be avoided by taking this into account while writing the implementation of Delta so that the second invocation of fromDelta() for the same event becomes a no-op.
07/16/09 #40882 In RemoteGfManagerAgent, exceptions occurred while connecting to the DS and handling joined members should be handled properly. 5.8 closed Before failing to connect in distributed system due to missing license information on a member, the Agent should try every member of a distributed system If there is more than one member running and the agent fails to retrieve license information for the distributed system from the first member, the agent tries the next member. In addition, failure to retrieve the license information from one of the members is now logged at both the member and the agent.
07/13/09 #40872 Incorrect mbean descriptor in JMX AdminAgent 6.0 closed Incorrect descriptor JMX operation SystemMember.manageStat removed Removed non-existing operation descriptor manageStat that was described for SystemMember MBean.
06/26/09 #40835 Locators fail to start on Windows in Pure Java Mode 5.7 closed Locators fail to start on Windows in Pure Java Mode A locator cannot be started in pure Java Mode by using the following command-line: gemfire start-locator -port=8888 The locator.log has the following message: '" true true "' is not a valid IP address for this machine. Use the following command to workaround the issue by specifying values for the bind address, hostname for clients, and logfile. gemfire start-locator -address=%bindaddr% -hostname-for-clients=locator_%bindaddr% -Dgemfire.log-file=%logfile% where bindaddr is a suitable bind address for the machine and logfile is any filename other than "locator.log"
06/16/09 #40808 CQ doesn't send update events in case of evication (overflow to disk). closed CQ Events with update on evicted value When an update happens on the region entry whose value is written to disk, the cq applies the query condition on only new value, as the old value is not available during that case it just ignores applying the query condition on old value. The issue will be seen only if the event is not cached before. Fixed in 6.5.
06/12/09 #40797 Events due to eviction on PR are not firing closed CacheListener Events due to eviction on PartitionedRegions do not get invoked This bug impacts a region configured to be a PartitionedRegion with a listener and eviction. The expected behavior is that a listener would invoke the void afterDestroy(EntryEvent e) method whenever an entry was evicted from the cache. While eviction does take place, the listener event is not triggered. All other listener events do behave correctly though. Use Distributed Regions with a manual partitioning scheme.
06/11/09 #40790 region-time-to-live and region-idle-time have not been implemented for PR closed 'region-time-to-live' and 'region-idle-time' attributes have no effect on Partitioned Regions Distributed regions support 'region-time-to-live' and 'region-idle-time' expiration attributes for their entries. These expiration attributes are not supported in partitioned regions and are ignored.
06/03/09 #40757 HAClientQueues (not persistent) do not get deleted when client disconnects 5.7 closed Client queues on server may cause the server to lock up or run out of memory In cases where a client disconnect soon after connecting to a server, the client's queue did not cleaned up. If this happened frequently, these queues would cause the server to run out of memory, or the queues to fill up with events causing the server to lock up while trying to insert events into the queue. This has been fixed in GemFire 6.1.
06/02/09 #40751 A RuntimeException from a user's toData method causes a hang 6.0 closed A RuntimeException from a user's toData method can cause a distributed member to hang If a runtime exception is thrown from the toData method of a user's DataSerializable object while doing a distributed put, GemFire will become hung. Code toData methods defensively to catch RuntimeException and handle it in an alternate way.
06/02/09 #40749 Using multiple GII providers with a persistent region can resurrect destroyed entries 6.0 closed Using multiple GII providers with a persistent region can resurrect destroyed entries When more than one member with the 'provider' attribute set to true is present, a new member coming up does a union GII from all of the providers in addition to what is on disk. The result is that if there are entries on disk which have been destroyed in the providers, the new member will resurrect those destroyed entries.
05/27/09 #40735 Disk recovery fails if using -Duser.language=ja 6.0 closed Disk Regions do not function correctly if the locale's language is "ja", such as when -Duser.language=ja Due to an error in how the filename's prefix is handled by the localization code GemFire will fail to find a disk persistence file even if it exists at the path specified by the user's configuration. The code works correctly for all user language's except Japanese ("ja"). Setting the java system property user.language to English via the command line will avoid this problem. java -Duser.language=en ...
05/27/09 #40731 gateways are limited to 10G of persistence/overflow 6.0 closed gateways are limited to 10G of persistence/overflow In 6.0 gateways were changed to no longer roll oplogs. Gateways always have a single directory whose dir-size is the default of 10G. Note that dir-size only applies to oplogs but that is all a gateway has now since it never rolls. Once the oplogs on a gateway reach 10G the next write will fail with an out of disk space error.
05/21/09 #40722 Partitioned Region expiration does not distribute events 5.8 closed Destroy and invalidate events not sent to clients or cache listeners in a partition region When a Region with DataPolicy.PARTITION is configured with Eviction enabled, and with EvictionAction set to either DESTROY or INVALIDATE, an AFTER_DESTROY or AFTER_INVALIDATE event is not sent to cache client, or CacheListeners.
05/21/09 #40718 HeapLRU with ObjectSizer will expose CachedDeserializable instances to user code 6.0 closed Configuring HeapLRU with an ObjectSizer it will expose CachedDeserializable instances to application code If you configure a HeapLRU and an ObjectSizer for it then GemFire will mistakenly pass instances of our internal CachedDeserializable instances to the customers implementation of ObjectSizer.sizeof(Object) Customers can workaround this bug by adding the following code in any implementation of ObjectSizer. {{{ import com.gemstone.gemfire.internal.cache.lru.Sizeable; public class MyObjectSizer implements ObjectSize { public int sizeof(Object o) { if (o instanceof Sizeable) { return ((Sizeable)o).getSizeInBytes(); } // customer's sizeof code goes here } } }}}
05/20/09 #40714 Registering a function on a Java client changes the behavior when executing an instance of the function 6.0 closed Incorrect function may be executed in Execution.execute(Function f) API In prior versions of GemFire, the Execution.execute(Function f) API resulted in the execution of a function other than the one supplied as a parameter if the ID of this instance matched that of a function already registered on the server. The registered function was executed instead.
05/20/09 #40713 LIFO Eviction APIs should not be visible to customers 5.7 closed LIFO Eviction APIs should not be part of the public API The following methods and constants were intentionality exposed as part of the GemFire API. They are not intended for customer use and should be considered strongly deprecated. {{{ Package: com.gemstone.gemfire.cache EvictionAttributes#createLIFOEntryAttributes EvictionAttributes#createLIFOMemoryAttributes EvictionAlgorithm.LIFO_ENTRY EvictionAlgorithm.LIFO_MEMORY EvictionAlgorithm#isLIFOEntry EvictionAlgorithm#isLIFOMemory EvictionAlgorithm#isLIFO }}} Do not write code that makes use of this methods or constants.
05/06/09 #40674 FunctionService.onServers() does not execute on all servers but on the servers the pool is currently connected to 6.0 closed FunctionService.onServers() API may not execute on all servers in a pool In prior versions of GemFire, the FunctionService.onServer('poolName') API did not ensure that the function was executed on all servers configured in the pool. It is possible that at the time the function execution is initiated, the pool may not have an active connection to one or more of its servers. GemFire 6.1 fixes this and ensures that connections to all servers configured in the pool are active. If an attempt to create a connection fails, the function execution fails.
05/05/09 #40668 ResultsBag fromData() throws NPE. 6.0 closed NPE in ResultsBag fromData() This happened when ResultBag.fromData() is called. This is fixed in 6.5 and also ported to gemfire601_maint branch.
05/04/09 #40666 During start up, a process may try to connect to other processes even after it knew that those processes were gone closed DistributedSystem attempts to connect to members that have left It is possible that the DistributedSystem will become confused and attempt to connect to members that have left the system while it was starting up. When this happens you will see the departed members admitted into membership in "P2P message reader" threads. This happens when the departing members see the new member and connect to it, causing them to be "surprise members" to the new process.
04/28/09 #40648 oplog rolling fails reading with Bad file descriptor 6.0 closed Oplog roller fails with "Bad file descriptor" If oplog rolling is enabled and overflow to disk is configured then a small race condition exists in which the roller may fail causing the region to be closed. The following is an example failure: {{{ [info 2009/04/28 10:52:22.800 PDT <main> tid=0x1] Closing oplog early since it is empty. It is for region /myReg and has oplog#22 [error 2009/04/28 10:52:22.800 PDT <OplogRoller /myReg for oplog 22> tid=0xf] A DiskAccessException has occurred while writing to the disk for region /myReg. The region will be closed. com.gemstone.gemfire.cache.DiskAccessException: For Region: /myReg: Failed reading from "/export/jade1b1/users/darrel/gfbuild/BACKUP_myReg_22". oplogID = 22 Offset being read=10,300,824 Current Oplog Size=10,400,832 Actual File Size =10,400,832 IS ASYNCH MODE =false IS ASYNCH WRITER ALIVE=false, caused by java.io.IOException: Bad file descriptor at com.gemstone.gemfire.internal.cache.Oplog.basicGetForRoller(Oplog.java:3727) at com.gemstone.gemfire.internal.cache.Oplog.getBytesAndBitsForSwitchingEntry(Oplog.java:2356) at com.gemstone.gemfire.internal.cache.ComplexDiskRegion$OplogRoller.rollBackup(ComplexDiskRegion.java:919) at com.gemstone.gemfire.internal.cache.ComplexDiskRegion$OplogRoller.roll(ComplexDiskRegion.java:1157) at com.gemstone.gemfire.internal.cache.ComplexDiskRegion$OplogRoller.run(ComplexDiskRegion.java:1215) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Bad file descriptor at java.io.RandomAccessFile.seek(Native Method) at com.gemstone.gemfire.internal.cache.Oplog.basicGetForRoller(Oplog.java:3694) }}} Setting the system property "gemfire.disk.KEEP_EMPTY_OPLOGS" to "true" will prevent this bug.
04/24/09 #40642 HeapLRUStatistics.heapUsage does not represent the amount of heap currently in use (in bytes) 6.0 closed HeapLRUStatistics.heapUsage stat removed HeapLRUStatistics.heapUsage stat has been removed, please refer to the ResourceManager stats instead.
04/23/09 #40635 socket/thread leak with conserve-sockets=false closed Thread and Socket leak when conserve-sockets=false When configured with conserve-sockets=false, GemFire may accumulate idle threads that have names similar to this: P2P message reader for ent(42524):2331/2296 SHARED=true ORDERED=false UID=1371 These threads and the sockets they are reading from are created to transmit message replies. They may accumulate if they were created for sole use by a particular thread and that thread no longer exists.
04/22/09 #40632 PR expiration with localDestroy fails with InternalGemFireError closed Using localDestroy as the expiration action for a PR throws InternalGemfireError Setting the expiration action of localDestroy on a partitioned region causes an InternalGemFireError to be logged. No expiration happens. Starting with version 6.0, setting the expiration action to localDestroy will throw an error on region creation. Use the destroy action instead. Don't use local destroy, use destroy instead. This expires all copies of the entry.
04/16/09 #40603 suspect strings: ClassCastException thrown from EvictionAttributesImpl.fromData() => ObjectInputStream.defaultReadFields() 6.0 closed ClassCastException thrown from EvictionAttributesImpl ClassCastException thrown from EvictionAttributesImpl. This is JDK issue reported in 6.0. This is been taken care in 6.5 by moving to later jdk version.
04/08/09 #40551 RegionMembershipListener.initialMembers is not invoked when added using AttributesMutator closed A RegionMembershipListener added after a Region is created does not have its initialMembers() method invoked If you add a RegionMembershipListener cache listener to a Region after the Region has been created, the listener will never have its initialMembers() method invoked. Only listeners added through cache.xml or through RegionAttributes at the time the Region is created will have their initialMembers() method invoked.
04/07/09 #40545 gii receives no response from source vm 6.0 closed hang creating region with disable-tcp set to true A bug in the startup code in the fragmentation protocol used for UDP messaging was found to cause a hang in region creation when the distributed system property disable-tcp is set to true. The hang is caused by a race condition that causes the member that is creating the region to ignore a message from a member that has been selected to send the contents of the region.
04/06/09 #40523 InternalGemFireException: While calling refresh() causedBy: javax.management.InstanceNotFoundException 6.0 closed InternalGemFireException received when invoking SystemMemberCache.getRegion(..) JMX API on the AdminAgent on IBM J9 JVM This is caused by a known issue in the IBM JVM. It may not occur consistently. The solution is to turn off JIT compilation for RegionStatisticsResponse.create(). Turn off JIT compilation for com.gemstone.gemfire.internal.admin.remote.RegionStatisticsResponse.create()
04/03/09 #40509 There is no error given when we try starting the agent specifying an incorrect path for its property-file. 6.0 closed Admin agent would silently apply default properties if it could not find its properties file. Admin agent used to silently apply default properties if it could not find its properties file. Now the agent adds a log entry when it applies default values for its configuration properties. The logged string is: "Using default configuration because property file was not found".
04/01/09 #40500 JMX Agent startup fails with ipv6 enabled 6.0 closed JMX agent fails to start when using IPv6 This problem occurs when using the default rmi-bind-address, "localhost", and IPv6 on a machine where the address returned by a call to java.net.InetAddress.getLocalhost() returns an IPv6 link-local address. This is primarily a Windows issue because of the IPv6 implementation requiring a link-local address to also be create when configuring a machine to support IPv6 and the order that these are created in varies from machine to machine. This error will manifest as an AgentImpl$StartupException. {{{ A quick synopsis of the stack is provided below: com.gemstone.gemfire.admin.jmx.internal.AgentImpl$StartupException: Failed to start RMI service at com.gemstone.gemfire.admin.jmx.internal.AgentImpl.startRMIConnectorServer(AgentImpl.java:1141) at com.gemstone.gemfire.admin.jmx.internal.AgentImpl.start(AgentImpl.java:263) at hydra.AgentHelper.startAgent(AgentHelper.java:129) at admin.AdminTest.startAgentTask(AdminTest.java:120) ... Caused by: java.io.IOException: Cannot bind to URL [rmi://:26120/jmxconnector]: javax.naming.NoPermissionException [Root exception is java.rmi.ServerException: RemoteException occurred in server thread; nested exception is: java.rmi.AccessException: Registry.Registry.bind disallowed; origin /fe80:0:0:0:21a:a0ff:fe27:ddbe is non-local host] ... Caused by: javax.naming.NoPermissionException [Root exception is java.rmi.ServerException: RemoteException occurred in server thread; nested exception is: ... Caused by: java.rmi.ServerException: RemoteException occurred in server thread; nested exception is: java.rmi.AccessException: Registry.Registry.bind disallowed; origin /fe80:0:0:0:21a:a0ff:fe27:ddbe is non-local host ... Caused by: java.rmi.AccessException: Registry.Registry.bind disallowed; origin /fe80:0:0:0:21a:a0ff:fe27:ddbe is non-local host }}} Specify an RMI bind address using the rmi-bind-address property: ./agent start rmi-bind-address=<ipv6 address> or in a gemfire.properties file rmi-bind-address=<ipv6 address> Second workaround: Edit the Windows hosts file, usually located in c:\WINDOWS\system32\drivers\etc\hosts to map a literal address to the hostname. Note entries are required for both IPv4 and IPv6 on machines that support both protocols (even for non-gemfire ) Create two entries: [ipv4 literal] [full qualified host] [optional short hostname] [ipv6 literal] [full qualified host] [optional short hostname] Example: 15.168.12.81 mymachine.gemstone.com mymachine fdf0:7c6f:eda8:9449::19 mymachine.gemstone.com mymachine
03/30/09 #40475 Distribution Locator Properties section in GFE SysAdminGuide might be confusing 6.0 closed Sys Admin Guide has incorrect Distribution Locator syntax System Administrator’s Guide -> chapter 8 -> section 'Distribution Locator Properties': The table of properties & the example below that mention properties required to use locators incorrectly. The locators property should be configured as: locators=host1[port1],host2[port2]
03/27/09 #40472 java.net.SocketException: Address family not supported by protocol family: bind encountered while starting bridge server 6.0 closed java.net.SocketException: Address family not supported by protocol family: bind encountered while starting bridge server When starting a GemFire cache server under Microsoft Windows, GemFire throws an exception when it tries to bind a server socket to an IPv6 address. {{{ java.net.SocketException: Address family not supported by protocol family: bind at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at com.gemstone.gemfire.internal.cache.tier.sockets.AcceptorImpl.<init>(AcceptorImpl.java:336) at com.gemstone.gemfire.internal.cache.BridgeServerImpl.start(BridgeServerImpl.java:276) }}} This is caused by a JVM bug, #6230761, that causes Java "New I/O" sockets to not work with IPv6 on Microsoft Windows machines. GemFire 6.0 detects this condition and automatically sets max-threads to zero after issuing this warning: {{{ Ignoring max-threads setting and using zero instead due to Java bug 6230761: NIO does not work with IPv6 on Windows. See GemFire bug #40472 }}} To work around this problem, disable the thread pool in the GemFire server by setting max-threads to zero.
03/25/09 #40461 Suspect string DiskAccessException caused by ArrayIndexOutOfBoundsException 6.0 closed com.gemstone.gemfire.cache.DiskAccessException thrown when using persistent regions Previous versions of GemFire (6.0 and earlier) used to occassionally see an ArrayIndexOutOfBoundsException wrapped as a DiskAccessException. This was coming out of the JDBM code that we used in conjunction with tran logging in our persistence layer. The use of JDBM has been completely removed in 6.5
03/23/09 #40442 PartitionedRegion ops hang in waitForPrimary member after NPE thrown from BucketAdvisor.sendProfileUpdate() 6.0 closed NullPointerException from Thread.holdsLock with JRockit With the Jrockit VM, we have on rare occasions seen NullPointerExceptions from the java.lang.Thread.holdsLock method.
03/12/09 #40390 Stats sampling should occur implicitly when a JMX client connects to the AdminAgent 5.7 closed On start up, JMX Admin Agent now immediately connects in the GemFire Distributed System and initializes Member & Statistics MBeans Default value for JMX Admin Agent boolean property 'auto-connect' is changed to 'true'. Hence, on start up, the JMX Admin Agent now immediately connects in the GemFire Distributed System & initializes MBeans for existing GemFire Members. In addition to this, while initializing a Member MBean, associated Statistics MBeans are also initialized.
03/12/09 #40389 PR eviction to disk degrades with number of buckets closed 6.5 oplog new design 6.5's oplog design resolved this issue. All the buckets shared the same oplog file.
03/10/09 #40369 Hang while creating region during StateFlushOperation.flush 6.0 closed hang creating a region with scope distributed-no-ack and using disable-tcp=true It is possible for GemFire to hang when attempting to create a Region if the distributed system property "disable-tcp" is set to true and the distribution scope of the region is "distributed-no-ack".
03/08/09 #40360 FileNotFoundException is logged for /tmp/agent.ser while running the Agent test. 6.0 closed Failure to persist updated agent configuration causes FileNotFoundException Failure to persist agent configuration information causes the following warning to be logged without terminating the agent: "Encountered a java.io.FileNotFoundException while saving StatAlertDefinitions." All changes to the configuration are lost. An attribute 'canPersistStatAlertDefs' for AdminDistributedSystem MBean indicates whether the information could be persisted or not. Validate that the current working directory/ the -dir option has full write permissions for the user launching the agent. A boolean attribute 'canPersistStatAlertDefs' for AdminDistributedSystem MBean indicates whether the working directory has full write permissions for the user launching the agent.
03/04/09 #40350 SIGSEGV in CacheClientProxy with SUN JRE 1.6.0_10 closed SIGSEGV in CacheClientProxy with SUN JRE 1.6.0_10 SIGSEGV in CacheClientProxy with SUN JRE 1.6.0_10. This is observed with 6.0 and in 6.5 the later version of JDK is used.
02/26/09 #40324 NullPointerException in CacheClientProxy.processMessage closed NullPointerException in cache server during spike in data operations In very rare instances, a cache server would encounter a NullPointerException due to a race.
02/16/09 #40250 If roller is active at the time of region.close it can end up writing a dummy byte & thuse loose the original value closed Closing a persistent region results in a missing value In rare cases, closing a persistent region can lead to a single value in the persistent data being lost.
02/15/09 #40243 Test fails with Timeout during netsearch/netload/netwrite (IllegalMonitorStateException during pushing message ) 6.0 closed IllegalMonitorException exceptions with JDK 1.6 If you encounter IllegalMonitorStateExceptions while using GemFire with Sun's implementation of JDK 1.6, we advise using the VM option {{{ -XX:+UseHeavyMonitors }}}
02/09/09 #40198 BridgeServer with SELECTOR enabled shutdown timeout 6.0 closed GemFire hangs during attempt to close the cache Running on Microsoft Windows with the JRockit JVM, we have seen GemFire hang when an attempt is made to close the cache in a server VM. The hung thread will have a stack similar to this: {{{ -- Blocked trying to get lock: java/lang/Object@0x048F4128[thin lock] at jrockit/vm/Threads.sleep(I)V(Native Method) at jrockit/vm/Locks.waitForThinRelease(Locks.java:1209)[optimized] at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1342)[optimized] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1259)[optimized] at jrockit/vm/Locks.monitorEnter(Locks.java:2439)[optimized] at sun/nio/ch/WindowsSelectorImpl.wakeup(WindowsSelectorImpl.java:75) at com/gemstone/gemfire/internal/cache/tier/sockets/AcceptorImpl.close(AcceptorImpl.java:1548) ^-- Holding lock: java/lang/Object@0x048F3EB8[thin lock] at com/gemstone/gemfire/internal/cache/BridgeServerImpl.stop(BridgeServerImpl.java:351) ^-- Holding lock: com/gemstone/gemfire/internal/cache/BridgeServerImpl@0x04F329A8[thin lock] at com/gemstone/gemfire/internal/cache/GemFireCache.stopServers(GemFireCache.java:1118) ^-- Holding lock: java/lang/Object@0x04981AD0[thin lock] at com/gemstone/gemfire/internal/cache/GemFireCache.close(GemFireCache.java:913) ^-- Holding lock: java/lang/Class@0x0436A180[recursive] at com/gemstone/gemfire/internal/cache/GemFireCache.close(GemFireCache.java:793) }}} This is due to a flaw in JRockit's implementation of NIO socket selectors. GemFire v6.0 detects the use of JRockit on Windows and disables the use of NIO socket selectors after issuing this warning: Ignoring max-threads setting and using zero instead due to JRockit NIO bugs. See GemFire bug #40198
02/09/09 #40196 executeCqOnRedundantsAndPrimary throws CQException "Failed to execute the CQ ... Error from last server: Primary discovery failed" 6.0 closed Error while executing CQ. This was happening due to multiple threads accessing the same CQ. This is fixed in GemFire 6.0.
02/04/09 #40159 Hang in MapInterfaceTest.testBlockGlobalScopeInSingleVM 6.0 closed Distributed lock requests fail to timeout Lock requests may fail to timeout under certain conditions. A thread requesting a distributed lock may continue waiting beyond the configured lock-timeout or specified waitTimeMillis. This should be a temporary condition and the thread will eventually either acquire the lock after waiting longer than it should or it will timeout later than it should. The most likely condition leading to this is lock requests, or Global Region puts, initiated while locking is suspended or while the Global Region is initializing (get initial image) in any member of the distributed system.
02/02/09 #40147 Serialization types should be registerable via cache.xml declaration 5.7 closed Dataserializable types have to be programmatically registered with the GemFire server cluster In prior versions of GemFire, users were required to register types programmatically by defining a static initializer block on each VM that supplied the type of the class being registered. Starting GemFire 6.0, types can be defined declaratively in the cache.xml file using the following syntax. {{{ <serialization-registration> <serializer> <class-name>com.gemstone.util.MySerializer</class-name> </serializer> <instantiator id="101"> <class-name>com.gemstone.util.DateTest</class-name> </instantiator> <instantiator id="102"> <class-name>com.gemstone.util.IndexMap</class-name> </instantiator> </serialization-registration> }}}
01/28/09 #40129 lastModifiedTime from an empty region is 0 5.7 closed Expiration is broken when actions originate on a region with DataPolicy.EMPTY Prior to 6.0, the lastModifiedTime (used for calculating expiration time for an entry in a region) was being set to 0 if the entry was modified from a VM that had the region with a data policy set to DataPolicy.EMPTY, causing incorrect expiration behavior for the entry. In 6.0, the lastModifiedTime is propagated from the accessing node and applied correctly across the system.
01/24/09 #40109 OOME in parReg/parRegCreateDestroy 6.0 closed Server could run out of memory during rebalancing In prior versions of GemFire, creation and destruction of Partitioned Regions could eventually lead to the server running out of memory. This was most likely to occur during intensive re-balancing operations on the partitioned regions. This has been fixed in GemFire 6.1
01/23/09 #40105 CacheClientProxy stats leak 5.7 closed Garbage CacheClientProxy stats building up on the server Killing clients isn't cleaning up the CacheClientProxy stats for that client on the server side. Over time, these stat objects take up memory and CPU.
01/20/09 #40082 JMX Agent startup should ignore any gemfire.properties present in the path 6.0 closed Conflicting properties in gemfire.properties and agent's properties file could prevent the admin agent from functioning properly The agent now uses only the properties listed in its own properties file (default name: agent.properties or specified through property-file=<my agent's property filename>) and ignores the gemfire.properties file that may exist in either of: (1) The current directory, or (2) user home directory, or (3) the class path.
01/19/09 #40078 Region.keySetOnServer() has unexpected behavior when server regions are DataPolicy.NORMAL/EMPTY mix 5.0 closed Region.keySetOnServer and containsKeyOnServer can provide inconsistent results if server regions are not replicated or partitioned Client calls to keySetOnServer and containsKeyOnServer can return incomplete or inconsistent results if your server regions are not configured as partitioned, replicated or empty. Normal and mixed (replicated, normal, empty) server region configurations give inconsistent results since they allow different data on different servers. There is no additional messaging on the servers, so no union of keys across servers or checking other servers for the key in question occurs.
01/16/09 #40057 Assertion error while creating bucket in region.(Test:parReg/event/concParRegEvent.conf) 6.0 closed InternalGemFireError thrown when putting a value into a partitione region When calling Region.put(Object) on a Partitioned Region, it is possible that the region will throw an InternalGemFireError stating "Did not finish sending image, but region, cache, and DS are alive." This is caused by a faulty termination check in one of GemFire's data replication algorithms.
01/07/09 #40011 EnforceUniqueHostStorageAllocation allows bucket copies on the same host 6.0 closed Two copies of a bucket in the same host with EnforceUniqueHostStorageAllocation There is a small window where setting the EnforceUniqueHostStorageAllocation flag fails to prevent two copies of bucket from ending up the same host. This can occur when a rebalance operation is performed simultaneously with the first update to the bucket.
12/18/08 #39943 New vm unable to contact locator 6.0 closed GemFireConfigException states that no Locators could be contacted A GemFireConfigException with the text {{{ Unable to contact a Locator service. Operation either timed out or Locator does not exist. Configured list of locators is }}} (followed by a list of the configured locators) may be thrown when the locators were up and reported the VM correctly contacting them. The problem is caused by a race condition between two threads in JGroups startup code.
12/17/08 #39931 hang creating region when peer logs that "Peer has disappeared from view" closed Hang creating region when peer logs that "Peer has disappeared from view A vm logs that it did not receive all of the expected startup responses within 15 seconds, and then hangs trying to create a Region. Another vm logged that it failed to send a Startup response to the hung vm because it had "disappeared from view". The hang is caused by a race condition in the other vm that caused it to incorrectly shun the new vm.
12/17/08 #39930 ConcurrentModificationException during shutdown 6.0 closed ConcurrentModificationException thrown by DistributedSystem.disconnect() Under rare circumstances, it is possible for DistributedSystem.disconnect() to throw a ConcurrentModificationException. The property disable-tcp must be set to true for this to happen, and another vm must be starting up concurrently.
12/15/08 #39925 primary balancing after VM recycled not yet implemented 6.0 closed Primary buckets not balanced after recovery If a member hosting a partitioned region crashes and is subsequently restarted, it will not receive any primary buckets. This can lead to an imbalance in load across the members.
11/26/08 #39859 Hang in JChannel.disconnect() closed Hang in DistributedSystem.disconnect() waiting for JGroups to disconnect In very rare circumstances, the DistributedSystem.disconnect() method may hang trying to shut down the JGroups membership stack. This is due to a defect in the JGroups Promise class, and has been fixed in GemFire v6.0
11/07/08 #39800 Query Authorization needs a (public) mechanism to modify SelectResults 5.7 closed Query Authorization needs a (public) mechanism to modify SelectResults There is no way for a user to modify the query results using public classes when "isModifiable" returns false. Currently there is no mechanism in GemFire to allow a post operation security callback to modify query result being sent to the client if the result is unmodifiable.
11/07/08 #39799 IllegalThreadStateException thrown by JGroups JChannel when network dropped during DistributedSystem.connect() 6.0 closed IllegalThreadStateException thrown during DistributedSystem.connect() When attempting to connect to GemFire with DistributedSystem.connect(), in rare circumstances the method may throw an IllegalThreadStateException. We have observed this happening when enable-network-partition-detection is enabled in the distributed system properties and a network partition occurs during the connection attempt.
11/03/08 #39772 Queues are filling up and not draining in WAN tests closed WAN Gateways May Not Initialize Correctly There is a race condition when starting a gateway that may cause a primary gateway to never process any incoming events. This can be confirmed by identifying messages in the logs indicating that the gateway queues are not draining. Stop and restart the gateway.
10/29/08 #39760 BucketAdvisor fails assertion in Loner because of DummyExecutor 5.7 closed Partioned Regions are not supported for loner members Loner member (a GemFire connection defined by mcast-port of zero and no locators) should not use Partitioned Regions. Use a Local Region instead. Versions of GemFire prior to 6.0 may throw unexpected InternalGemFireErrors if attempting to use a Partitioned Region in a Loner, especially with redundancy > 0. GemFire 6.0 will allow this, but it's not a practical configuration except for testing purposes. {{{ Assertion error creating bucket in region com.gemstone.gemfire.InternalGemFireError: Attempting to sendProfileUpdate while synchronized may result in deadlock at com.gemstone.gemfire.internal.Assert.throwError(Assert.java:75) at com.gemstone.gemfire.internal.Assert.assertTrue(Assert.java:93) at com.gemstone.gemfire.internal.cache.BucketAdvisor.sendProfileUpdate(BucketAdvisor.java:808) at com.gemstone.gemfire.internal.cache.BucketAdvisor.acquiredPrimaryLock(BucketAdvisor.java:579) at com.gemstone.gemfire.internal.cache.BucketAdvisor$VolunteeringDelegate.doVolunteerForPrimary(BucketAdvisor.java:1443) at com.gemstone.gemfire.internal.cache.BucketAdvisor$5.run(BucketAdvisor.java:1398) at com.gemstone.gemfire.internal.cache.BucketAdvisor$6.run(BucketAdvisor.java:1645) at com.gemstone.gemfire.distributed.internal.LonerDistributionManager$DummyExecutor.execute(LonerDistributionManager.java:441) at com.gemstone.gemfire.internal.cache.BucketAdvisor$VolunteeringDelegate.execute(BucketAdvisor.java:1600) at com.gemstone.gemfire.internal.cache.BucketAdvisor$VolunteeringDelegate.volunteerForPrimary(BucketAdvisor.java:1396) at com.gemstone.gemfire.internal.cache.BucketAdvisor.volunteerForPrimary(BucketAdvisor.java:541) }}} Loners should use Local Regions. Partitioned Regions should only be used by a distributed system of two or more members.
10/27/08 #39753 JVM version issue for AIX 5.7 closed AIX JVM 1.6 version issue The BlockingHARegionJUnitTest will fail for 2 reasons: 1) it became very slow, and 30 seconds is not enough to feed 20000 entries while 1.5 and new 1.6 can. 2) the total region size will exceed 20000. We set the region capacity to 10000, it should only contain up to 20000 entries. 1.5. and new 1.6 do not have this problem. The root cause is the used AIX jvm version has problem. It is: java version "1.6.0-internal" Java(TM) SE Runtime Environment (build pap3260-20070819_01) IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260-20070817_13537 (JIT enabled) J9VM - 20070817_013537_bHdSMR JIT - dev_20070817_1300 GC - 20070815_AA) The old workable 1.5 JVM is: java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pap32dev-20071008 (SR6)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20071007 (JIT enabled) J9VM - 20071004_14218_bHdSMR JIT - 20070820_1846ifx1_r8 GC - 200708_10) JCL - 20071008 The new workable 1.6 JVM is: java version "1.6.0" Java(TM) SE Runtime Environment (build pap3260sr2-20080818_01(SR2)) IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260-20080816_22093 (JIT enabled, AOT enabled) J9VM - 20080816_022093_bHdSMr JIT - r9_20080721_1330ifx2 GC - 20080724_AA) JCL - 20080808_02
10/22/08 #39738 load may be invoked more than once for a single get closed Load may be invoked more than once for a single get In versions prior to 6.0, if the loader returned null, it would get invoked a second time. Starting 6.0, a return value of null is considered a successful invocation of the loader. The public javadocs on load now state this: {{{ @return the value supplied for this key, or null if no value can be supplied. A local loader will always be invoked if one exists. Otherwise one remote loader is invoked. Returning <code>null</code> causes {@link Region#get(Object, Object)} to return <code>null</code>. }}}
10/17/08 #39724 unnecessary credential verification being performed every 10 seconds 5.5 closed Unnecessary credential verification being performed every 10 seconds GemFire periodically retransmits membership information to all members of the distributed system. There is a flaw in the product that currently causes re-verification of security credentials when this happens. The retransmission period is based on the member-timeout setting of the distributed system and is currently set at twice the member-timeout interval.
10/16/08 #39723 Test hangs in CqService.closeNonDurableClientCqs() during shutdown 6.0 closed resolved JVM issue Same bug as 40490, 39130, 40243. It's been identified as a JVM issue and has been fixed in 1.6.0_14 and later. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6699669 Confirmed with Dick, we have suggested customers to use 1.6.0.17. So the problem will not been seen
09/26/08 #39656 New member incorrectly shunned 5.7 closed New member is incorrectly shunned by other members of the distributed system When a new member starts up and attempts to connect to the distributed system, it may hang trying to create tcp/ip connections to existing members of the system. This can happen if the new member uses the same UDP membership port as a recently departed member on the same machine. GemFire uses this UDP port and host address to identify members of the distributed system. When a member leaves the distributed system, it is shunned for a short period of time to prevent inappropriate communications from taking place. If this UDP port is reused, as can happen on some operating systems (Windows) more easily than others (*nix), the new member that is reusing the port will be incorrectly shunned by other members. Restart the application
09/26/08 #39654 Client throws an exception if it encounters UNDEFINED in query results 5.7 closed Client exception with UNDEFINED value in query results This could happen when compiled select encounters null/undefined value. This is fixed in GemFire 6.0 release.
09/22/08 #39632 losingSide VM does not process afterRegionDestroyed (FORCED_DISCONNECT) event and hangs in destroy operation after networkPartition 5.7 closed Network partition with a gateway enabled can result in hang If a network partition occurs in a site with a gateway, the gateway member may hang trying to process events.
09/19/08 #39624 bloom-vm failure with ServerConnectivityException: Pool unexpected socket timed out on client 5.7 closed Unexpected socket timed out on client with 1.6.0_5, 1.6.0_7 Due to a bug in the java, using Sun JDK 1.6.0_5, 1.6.0_7 and configuring the bridge server's max-threads setting to something other than 0 can result in the client seeing this error "ServerConnectivityException: Pool unexpected socket timed out on client" Set this system property to true to work around the issue, or upgrade to later JDK. -DCacheServer.NIO_SELECTOR_WORKAROUND=true
09/18/08 #39618 Updates can be lost with WAN Gateway failover in mlRioWithConflation 5.7 closed Updates can be lost during WAN Gateway failover when conflation is enabled With conflation enabled on a WAN gateway, if the primary gateway fails on the sending side, there is a small window where an event that occurs on the sending side can fail to be transmitted to the receiving side.
09/10/08 #39582 Need API for localPut on client closed Client side localPut API support. After further discussions, and given that we plan to simplify our region interfaces in the future to allow client only operations using the same API set that we have today, we decided to shelve this feature request.
09/09/08 #39578 getInitialImage misses a concurrent operation 5.7 closed New replicate region inconsistent with other replicates when transactions are being performed When transactions are being performed on a replicate region and another cache creates a new replicate of the region, the new replicate may miss operations performed in the transaction. There is no workaround. This bug is fixed in GemFire v6.0
07/29/08 #39338 getInitialImage test fails when multiple VMs miss a create event 5.7 closed Multicast may deliver no-ack events out of order When using multicast for message distribution with Regions having distributed-no-ack scope, operations may be applied out of order in other VMs. This is caused by a race condition between the multicast and unicast reader threads when multicast retransmissions are performed. use distributed-ack scope, or do not use multicast for distribution
07/28/08 #39329 Java-level deadlock in InternalDistributedSystem.disconnect 5.7 closed Java-level deadlock in InternalDistributedSystem.disconnect While rare it is possible to encounter a Java-level deadlock while calling DistributedSystem.disconnect()
07/26/08 #39323 JMX agent command line doesn't start agent closed JMX agent command line fails silently The JMX agent launcher does not correctly detect and report problems in starting the agent. For instance, if one of the TCP/IP ports is in use by another process, the agent will not start the service on that port but will launch without reporting any problems. Examine the agent.log file to see if there were any problems in launching the agent.
07/23/08 #39310 NPE thrown from IndexCreationMessage.operateOnPartitionedRegion 5.7 closed NullPointerException may occur when creating an index on a partitioned region A NullPointerException may sometimes occur when an index is being created on a partitioned region and a separate thread is removing the same index at approximately the same time. If this occurs, the NullPointerException can be safely ignored.
07/23/08 #39308 Reinitializing vms get tangled up trying to create indexes 5.7 closed Hang with index creation on partition region. A index creation could cause deadlock between the threads in two different vms in distributed system hosting the same partition region, because of synchronization code locking the same object while processing the request and response between the vms.
07/22/08 #39298 async writer thread will cause puts to lock forever if it exits closed Puts could hang when using asynchronous disk persistence In versions prior to 6.0, puts would hang if the disk persistence mechanism encountered a I/O error causing the disk writer thread to exit prematurely. This has been addressed in 6.0 and the disk writer thread does not exit prematurely on encountering any errors. It logs an exception which causes the region to be closed allowing other threads and members to continue.
07/01/08 #39174 Two VMs using different mcast addresses still discover each other 5.7 closed Distributed systems with different multicast addresses find each other and join the same group Due to the way Linux interprets RFC 1112, multicast sockets using the same port will receive datagrams from each other even if using different multicast addresses. {{{ See these links for more information: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=231899 http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4701650 http://www.uwsg.iu.edu/hypermail/linux/net/0211.1/0003.html }}} Make sure to select different multicast ports for different distributed systems to keep them isolated from one another.
06/25/08 #39144 The OQL TO_DATE function does not support minutes properly 5.5 closed OQL_TO_DATE function incorrectly processed the MM formatting token In versions prior to 6.0, the OQL engine does not distinguish between the formatting strings for month and minutes (MM and mm respectively). In 6.0, this has been addressed.
05/29/08 #39011 Redundant buckets should always be on different host when possible 5.5 closed Redundant copies of data should always be on different hosts when possible GemFire tries to locate redundant copies of data on different physical hosts to protect the system from process failure as well as machine failure. In situations where multiple hosts are not available, redundant copies may be colocated on the same machine, protecting the system against process failure but not machine failure.
05/23/08 #38991 Instantiators are not sent from server to client when client connects. 5.5 closed Clients did not receive instantiators already registered on the server Instantiators enable optimization of the deserialization of DataSerializable types. In prior versions of GemFire, a client connecting to a server may not always receive the instantiations already registered on the server. In GemFire 6.1, these registered instantiators are sent by the server to the client during the connection setup.
04/25/08 #38843 Conflicting transaction can proceed if both the transaction manager and grantor crash 5.5.1 closed Conflicting transaction can proceed if the transaction manager crashes while distributing the commit If the transaction manager (the member performing the transaction) crashes while transaction participants are in the process of applying the commit, then it's possible for a new transaction to begin and commit with key conflicts that are not detected. There is a workaround for application members. This bug can be prevented by adding a method call in members with regions that are involved in transactions. After creating the GemFire Cache, make this call: com.gemstone.gemfire.internal.cache.locks.TXLockService.createDTLS(); This only needs to be done once for any Cache instance.
04/15/08 #38782 Installer throws FileNotFoundException when it is run from a directory with spaces 5.5.0 closed GemFire Installation fails with java.util.zip.ZipException when run from a directory with spaces in it The GemFire installer does not correctly handle spaces in the name of the directory which contains the installer itself. Note this is literally the directory that the installer is in, not the directory the user selected as the destination. The failure comes with a stack trace like this: The system cannot find the path specified Exception in thread "main" java.util.zip.ZipException: The system cannot find the path specified at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.<init>(ZipFile.java:203) at java.util.zip.ZipFile.<init>(ZipFile.java:234) at ZipSelfExtractor.extract(ZipSelfExtractor.java:99) at ZipSelfExtractor.main(ZipSelfExtractor.java:34) Move the Gemfire installer jar into a directory without spaces and rerun it.
04/11/08 #38776 thin clients get unexpected nulls from bridge server 5.5 closed gets begin to return null for keys that are known to be in the cache The symptoms are that you have client connected to a bridge server and he is doing puts and gets and then after about 90 seconds all of your gets start returning null despite the fact that you had already put data for those keys into the cache. 90 seconds is roughly how long it will take for the hotspot to begin optimizing the ConnectionProxyImpl class in GemFire and then the problem is manifest. First verify that this isn't simply a case of your eviction policy causing your data to be evicted before you do a get. The cause is a JVM optimization in Sun's 1.6.0_4 JRE and later versions also. To identify this bug: start your JVM with -Xint to force the VM to run in interpreted mode (no hotspot compilation will occur). You should be able to a series of puts and gets for a sustained period (5-10 minutes ought to do it) without getting errant nulls back as values. Use 1.6.0_3 JRE or earlier, the optimization is not present in these JRE versions. Or you can use a .hotspot file to prevent compilation of the problematic method. See Sun's documentation for more detailed information on using a file to control the hotspot compilation. Add this to your java command line: -XX:CompileCommandFile=someFile.txt Then inside someFile.txt addthis single line: exclude com/gemstone/gemfire/internal/cache/tier/sockets/ConnectionImpl getObject
04/11/08 #38773 Missing CQ event (no HA) closed Missing CQ event during GII This could happen when events are getting destroyed when secondary buckets are getting created, the key may not be there as part of the GII, and if the same secondary becomes primary and event is re-routed the CQ processing doesn't find the value and CQ processing fails. In 6.5 change is made so that events are tracked/flushed during GII.
03/31/08 #38719 RegionMembershipListener doesn't work for PR 6.0 closed Partitioned Regions do not fire RegionMembershipListener events If a RegionMemberShipListener is added to a Partitioned Region, the following methods do not fire for the listener: initialMembers afterRemoteRegionCreate afterRemoteRegionDeparture afterRemoteRegionCrash
03/27/08 #38709 Hitachi: HAClientQueue tries to participate in transaction, fails. 5.1 closed NullPointerException received during transaction commit on servers The configurations that could produce this exception are: 1) A client either a) registers interest in a region or b) creates a continuous query (aka CQ) with the region name in the query, both of which require the client property establishCallbackConnection=true 2) A server, to which the previously mentioned client is connected, performs an operation in a transaction that matches a) a region the client is interested in and b) matches the interest or CQ conditions the client has expressed. 3) The above transaction commits (versus rollback). The transaction can be initiated as a JTA transaction or a GemFire transaction. If the above configuration is met, the thread committing the transaction will receive a NullPointerException with a stack similar to the following: [severe 2008/03/25 16:39:49.471 PDS <Thread-4> nid=0x5f1ba8] CacheClientProxy[identity(client1(:loner):1:6364ecbb:ClientName1,connection=2); port=4623; primary=true]: Exception occurred while attempting to add message to queue java.lang.NullPointerException at com.gemstone.gemfire.internal.jta.TransactionImpl.registerSynchronization(TransactionImpl.java:197) at com.gemstone.gemfire.internal.cache.LocalRegion.getJTAEnlistedTX(LocalRegion.java:5173) at com.gemstone.gemfire.internal.cache.LocalRegion.put(LocalRegion.java:1098) at com.gemstone.gemfire.internal.cache.AbstractRegion.put(AbstractRegion.java:188) at com.gemstone.gemfire.internal.cache.ha.HARegionQueue.put(HARegionQueue.java:386) at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientProxy$MessageDispatcher.enqueueMessage(CacheClientProxy.java:1724) at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientProxy.processMessage(CacheClientProxy.java:674) at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientNotifier.deliver(CacheClientNotifier.java:693) at com.gemstone.gemfire.internal.cache.tier.sockets.CacheClientNotifier.notifyClients(CacheClientNotifier.java:376) at com.gemstone.gemfire.internal.cache.BridgeServerImpl.notifyClients(BridgeServerImpl.java:257) at com.gemstone.gemfire.internal.cache.LocalRegion.notifyBridgeClients(LocalRegion.java:3750) at com.gemstone.gemfire.internal.cache.LocalRegion.invokePutCallbacks(LocalRegion.java:3716) If this occurs, the transaction will have been partially applied to the local heap. It will not, however, have been distributed to other VMs that would have received the transaction updates. The cause of this failure is the internal usage of regions to deliver to the client interest and continuous query data, particularly in the face of server failures (aka highly available or HA). Avoid transactions on a Bridge Server.
02/29/08 #38555 Many EOF errors in cache.tier.sockets.HandShake 5.5 closed Servers and clients may report each other's failures incorrectly If a server crashes unexpectedly, a client that it was connected to may report the failure in a number of misleading ways, including indicating a corrupted message stream from that process. Likewise, if a client crashes unexpectedly, a server that it was connected to may report the failure in a number of misleading ways, including indicating a corrupted message stream from that process. Ignore these messages in the log.
02/28/08 #38548 memberCrashed is invoked when a new endpoint is added to a BridgeWriter closed memberCrashed is invoked when a new endpoint is added to a BridgeWriter When BridgeWriter.addEndPoint() is invoked to add a new endpoint to the bridge writer the memberCrashed method is invoked on the BridgeMembershipListener with the new endpoint even if the new endpoint is actually available. It should invoke the memberJoined method as soon as the endpoint is actually live.
01/31/08 #38344 jmx test failure: could not getDistributedSystem during initialize -- Casued by java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: jmxconnector 5.1 closed Gemfire agent command fails with RMI Naming errors on Windows Server with IPv6 enabled See Sun Microsystem's bug on this issue: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6301779 An Excerpt from that url: "The problem is caused by the fact that Java does not handle IPv6 link-local addresses correctly. The reason this problem is only seen on amd64 is to do with the IPv6 default setup on Windows 2003 Server - it maps link-local addresses to interfaces so that a call to InetAddress.getAllByName() on W2003S will return link-local addresses. (No link-local addresses are returned on XP)." In Gemfire this problem manifests as either a SocketBindException or a java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: jmxconnector. workaround: Use JRE 1.6.0 or higher or Add a line like this to C:\WINDOWS\system32\drivers\etc\hosts fdf0:76cf::affd:9449:18 yourname.gemstone.com yourname Where "fdf0:76cf::affd:9449:18" is a global IPv6 address for the machine named "yourname" You only need to add this to the hosts file on the machine "yourname". You do not have to add an entry for "yourname" to each machine on your network.
01/30/08 #38330 PartitionedRegion tests can hang with threads in gemfire/internal/util/IdentityHash.index() [ IBM VM ] 5.1.0.4 closed entry operations on PartitionedRegions can hang in HashIdentity.index() during DataSerializer.write() with IBM 1.5 VM With IBM 1.5.0 VM, entry operations on PartitionRegions can hang in IdentityHash.index() during DataSerializer.write(). This is extremely rare and is a suspected JIT issue with the IBM VM.
01/28/08 #38294 Vestigial instances of Timer prevent WAR undeploy 5.0 closed gemfire.jar does not correctly undeploy from an EJB server Once an EJB application server is connected to a distributed system, it may not be able to correctly undeploy gemfire.jar. If possible, try to configure your application server so that it does not attempt to undeploy the GemFire application.
01/20/08 #38235 Issue in CountDownLatch.await while creating disk region in diskRegionRecoveryAfterVmCrash.conf test closed Cache member hangs while creating a region Under certain high availability conditions, a cache member may hang while attempting to recover a region from another member that has crashed. In order for this to happen, a cache member needs to be creating a local copy of a region at the same time that another cache member crashes. Kill and restart the hung cache member.
01/14/08 #38193 user defined DataSerializer instances need client server support 5.0, 5.1 closed Newly registered DataSerializer not recognized on cache server and clients Registration of a DataSerializer on a node with GemFire's data serialization framework was only propagated to other peer servers. It did not get propagated to clients. If the registration was done on a client, it was not sent to the cache servers. Registrations are now propagated to all cache servers and clients.
01/11/08 #38188 Transactions encapsulating multiple regions fail LRU eviction on recipient members 5.1 closed Transactions that include multiple regions cause LRU problems in remote caches This problem occurs when a GemFire transaction includes many Regions, like this: txmgr = cache.getCacheTransactionManager() txmgr.begin(); region1.put("a", "one"); region2.put("b", "two"); region3.put("c", "three"); txmgr.commit(); and two or more of the regions have LRU eviction configured in VMs that are remote to the VM where the transaction originates. In this scenario, the LRU mechanism in the remote VMs does not consistently evict the proper number of entries. The problem does not affect eviction in the VM where the transaction originates. Only include a single region in a transaction or only have one region be configured with LRU behavior.
01/10/08 #38180 DLockTokens objects are not removed when the lock is released 5.1 closed DistributedLockService does not remove resources for tracking locks The DistributedLockService does not free up resources related to tracking locks. This also affects Global Regions, Partitioned Regions, and Gateway Hubs. Calls to DistributedLockService.freeResources(Object) does nothing, thus introducing a memory leak for each distributed lock that is acquired. The only workaround is to destroy the DistributedLockService. Destroying the DistributedLockService frees up all memory used to track locks. DistributedLockServices that are explicitly created and used must be destroyed to free up resources for all locks. For Global Regions, the Global region itself must be locally destroyed to free up all locking resources created for each key. For Partitioned Regions, the Cache must be closed to free up locking resources. For Gateway Hubs, the DistributedSystem must be disconnected to free up locking resources.
01/07/08 #38152 AssertionError: InitialImageOperation$RequestImageMessage <85> Did not finish sending message, but didn<92>t throw RegionDestroyed or CacheClosedException closed Failed initial image creation may throw AssertionError If you close your cache while initializing the data in a distributed region, you may end up with a faulty AssertionError in the system logs. Ignore this assertion error. It is harmless.
11/25/07 #38013 PR regions do deserialization on remote bucket during get causing NoClassDefFoundError 5.1 closed Partitioned region puts throw NoClassDefFoundError on remote partitioned region members if the value class is not on the classpath A partitioned region put will fail with NoClassDefFoundError if the value Object's class is not on the classpath of every member that configures data storage for that partitioned region. The only members that should require the class are those that need the value in object form (for example the member that actually does a get to read the value or the member with a CacheListener that calls getNewValue). Add the value Object's class to all members that define the partitioned region.
11/21/07 #38011 memory leak when conserve-sockets false 5.0, 5.1 closed conserve-sockets=false may run out of sockets It is possible to see a member run out of sockets when using conserve-sockets=false. This can be caused by threads that own their own sockets having a short lifetime and new threads being created quickly that also own their own sockets. Call DistributedSystem.releaseThreadsSockets before a thread's life comes to an end. This can be done from a finally block on the thread's run method.
11/01/07 #37942 OutOfMemoryError Causes Distributed System Failure 5.1 closed Improper handling of instances of Java VirtualMachineError When a Java virtual machine sends an instance of VirtualMachineError to a thread, it has indicated that it has broken the fundamental programming contract and can no longer be trusted. The most common instance of this is OutOfMemoryError, which will be sent to <em>one</em> Thread somewhere in the JVM. All other Threads are effectively suspended at their next attempt to allocate memory until either a) enough memory becomes available, or b) the original thread that was signaled disappears. In prior versions, GemFire did not properly handle VirtualMachineErrors. This improper handling manifested in numerous bugs in the system. GemFire now has a cooperative mechanism by which a cache member can reliably recuse itself from the distributed system when a VirtualMachineError occurs. Notice, however, that in order for this to be reliable, your applications must also correctly trap and signal VirtualMachineError when they are thrown. See the Javadocs for SystemFailure for details on this new API.
10/25/07 #37905 data-polcy="partition' is insufficient, <partition-attributes/> is required to create PR 5.1 closed Partition region creation requires a partition-attributes element or a PartitionAttributes setting in the API Setting the region data-policy to 'PARTITION' should cause a region to be created as a partitioned region, but it doesn't. The data-policy setting is accurately reported, but this setting does not cause the region to partition its data. For the region to be created as a partitioned region, the region attributes must have a partition-attributes element in the cache.xml or a PartitionAttributes setting through the API. You do not need to set any non-default partition attributes settings, just use the partition attributes. In the xml, add a partition-attributes element to the definition of the region, even if the element is empty. In the API set the partition attributes through the region AttributesFactory setPartitionAttributes method, even if you just pass it a default PartitionAttributes instance.
10/08/07 #37821 Hang in shutdown while deleting file 6.0 closed Hang while deleting the oplog during shutdown While shutting down the system, the system gets hung while deleting the oplog. During this time the CPU seems to be 100%. From the stack trace this seems to be JVM related issue. This part of the code is removed in the latest 6.5 code base.
10/05/07 #37819 throughput decreases as number of buckets increases GFD closed Partitioned Region read and write throughput decrease as buckets increases For a given partitioned region, the larger the value for the totalNumBuckets attribute (setTotalNumBuckets), the smaller the throughput for create and get operations. During testing with 100 VMs participating in the partitioned region, 50 which store data, 50 which do not (setLocalMaxMemory to 0), the most dramatic change occurred when the totalNumBuckets attribute exceeded 499 buckets. Use fewer than 499 buckets; however, only testing will truly indicate the proper values.
10/03/07 #37803 Installation Paths with spaces will cause the Native Client msi to error on some systems 5.1 GA closed Installation paths with spaces prevent the Native Client from installing correctly While installing the Native Client on Microsoft Windows, if a path is specified that contains spaces, for example "C:\Program Files\GemStone\GemFire", the msi installer that is invoked from setupWin32_gf51.exe will fail causing a dialog box that details the msi command line syntax to appear. After dismissing this dialog the installation will continue and appear to have succeeded. The Gemfire installation itself is OK, but the Native Client installation is not: only the native_client.msi and a few html files are installed for the Native Client. Uninstall the product to clean up the system from the failed install. Then reinstall the product into a path without spaces.
10/02/07 #37795 partitioned region buckets are not balanced 5.1 closed Partitioned Region data storage is skewed When quickly loading data into a partitioned region, the number of buckets from one data store to the next may vary as much as 100%. Due to the seemingly random allocation of buckets this requires that all VMs for the partitioned region have up to two times the required memory for actual storage. Increasing the maximum number of buckets exaggerates the problem. There are two ways to potentially work around this problem: 1) Artificially slow the rate of data loaded into a partitioned region. 2) Using the PartitionedRegionStats bucketCount to determine an imbalanced system (from VM to VM), for each VM with the worst imbalance, introduce a new VM to the partitioned region and then shutdown the offending VM.
10/01/07 #37779 member wrongly evicted by failure-detection does recognize membership changes 5.1 closed Member that is kicked out of the distributed system may not realize it and continue to operate, eventually causing hangs If you are using the gemfire.useFD or gemfire.FD_TIMEOUT system properties to select the alternative GemFire UDP-heartbeat failure detection mechanism, a member can be forcibly disconnected from the distributed system if it does not respond quickly enough to "are you alive" messages. The member-timeout and gemfire.FD_TIMEOUT settings control this disconnect timeout. In version 5.1.x of GemFire, the disconnected member does not realize that it has been kicked out of the system and continues to try to operate. Eventually other members may hang. We have only observed this with the alternate failure detection mechanism and only under significant CPU load. However, setting a short member-timeout period may exacerbate the problem and cause it to happen more easily. Set a reasonably long member-timeout period when using gemfire.useFD, or set the timeout period with the deprecated gemfire.FD_TIMEOUT system property.
10/01/07 #37772 Region recovery from disk fails with "DiskAccessException: Failed loading keys from <diskReg dirs>, Caused by: java.io.EOFException 5.1 closed Region recovery from disk fails with "java.lang.Error: CRITICAL: page header magic for block *** not OK 0" When switching out files for repair, this exception may disrupt recovery from disk. Switching is done when a JDBM exception has been encountered at least once already.
09/25/07 #37743 Region destroy/close does not close LRUStatistics 5.0, 5.1 closed Eviction regions with short life span have unexpected memory and cpu consumption Region close, localDestroyRegion, and destroyRegion on a region with eviction configured will not close the LRUStatistics object. If a large number of region destroys are done, this can cause the statistic sampler to consume an entire CPU and the unclosed statistic object to consume around 100 bytes of memory. To prevent the memory leak avoid giving your LRU regions a short life span. This can be done by using region clear instead of doing a destroy/create. To prevent the CPU consumption, you can disable statistic sampling.
09/24/07 #37736 Unusually high number of eviction failures in trunk build 132 5.1 closed LRU Region eviction may happen early or late The LRU limit is not strictly complied with when doing evictions. Evictions might be done slightly early (causing less space to be used than was specified) or slightly late (causing more space to be used than was specified). You can set -Dgemfire.STRIPED_STATS_DISABLED=true to get the older version of statistics that causes strict compliance to the eviction limit.
09/23/07 #37727 Hang in waitForRegionCreateEvent of newly restarted VM during shutdown 5.1 closed Hang during Cache close In a client/serve high-availability test that repeatedly destroyed and created Regions and Caches in multiple VMs, we experienced a hang in a server VM. The server was in the process of exiting, and the GemFire shutdown hook was attempting to close the Cache. A stack dump (kill -QUIT) showed the hung thread was waiting on initialization of a Region, but no other threads were involved with creating a Region. "vm_3_thr_3_bridge1_hs20c_11833" daemon prio=1 tid=0x085ab338 nid=0x2d24 in Object.wait() [0x5f0ce000..0x5f0ce5f0] at java.lang.Object.wait(Native Method) - waiting on <0x58cc26d8> (a com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch) at java.lang.Object.wait(Object.java:432) at com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:364) at com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch.await(CountDownLatch.java:234) - locked <0x58cc26d8> (a com.gemstone.bp.edu.emory.mathcs.backport.java.util.concurrent.CountDownLatch) at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:53) at com.gemstone.gemfire.internal.cache.LocalRegion.waitOnInitialization(LocalRegion.java:3029) at com.gemstone.gemfire.internal.cache.LocalRegion.waitForRegionCreateEvent(LocalRegion.java:1633) at com.gemstone.gemfire.internal.cache.LocalRegion.dispatchEvent(LocalRegion.java:5290) at com.gemstone.gemfire.internal.cache.LocalRegion.dispatchListenerEvent(LocalRegion.java:4240) at com.gemstone.gemfire.internal.cache.LocalRegion.sendPendingRegionDestroyEvents(LocalRegion.java:4476) at com.gemstone.gemfire.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:3868) at com.gemstone.gemfire.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1250) at com.gemstone.gemfire.internal.cache.LocalRegion.handleCacheClose(LocalRegion.java:4515) at com.gemstone.gemfire.internal.cache.DistributedRegion.handleCacheClose(DistributedRegion.java:1700) at com.gemstone.gemfire.internal.cache.GemFireCache.close(GemFireCache.java:581) - locked <0x470b5ef8> (a java.lang.Class) - locked <0x4b1cfe68> (a com.gemstone.gemfire.internal.cache.GemFireCache) at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.doDisconnects(InternalDistributedSystem.java:773) at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:904) at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:668) at com.gemstone.gemfire.distributed.DistributedSystem.disconnect(DistributedSystem.java:960) at hydra.RemoteTestModule$2.run(RemoteTestModule.java:372) No workaround
09/21/07 #37718 sudden heap growth in multicast smoke performance test 5.0 closed Multicast retransmissions cause a slow memory leak When using distribution scopes of DISTRIBUTED_ACK or GLOBAL with multicast-enabled=true, it is possible (though unlikely) that a VM will experience a memory leak. The leak is caused by multicast retransmission logic and can cause the VM to run out of heap space. Change your configuration to use TCP instead of multicast
09/18/07 #37692 local scope persistent regions do not allow register interest 5.0 closed CacheWriterException thrown from registerInterest on local persistent replicates When the registerInterest method is called on a region with local scope and persistence enabled it will always throw a CacheWriterException with the message "Interest registration not supported on replicated regions".
09/16/07 #37657 Assertion: Commit data for TXLockId not found; expected values not distributed to all peers 5.1 closed Severe log messages indicating transaction failures A VM configured with conserve-sockets=false which originates a transaction may cause severe log messages in a receiving VM similar to the following: Uncaught exception processing CommitProcessForLockIdMessage@17373340 lockId=TXLockId: newton(18461):40211/45363-2 java.lang.AssertionError: Commit data for TXLockId: TXLockId: newton(18461):4021 An indicator of problem on the sending VM is the occurrence of warning messages starting with the text: "Attempting TCP/IP reconnect to" Regardless of the conserve-sockets setting, this failure should not occur when the transaction contains only Scope.DISTRIBUTED_NO_ACK regions. Avoid mixing transactions and conserve-sockets false in the same VM.
09/05/07 #37563 PR put fails with AssertionError 5.1 closed Calling getRegion on RegionExistsException returns partially initialized region. If you are creating root regions, catching RegionExistsException and then calling the getRegion method on the RegionExistsException the region returned may not yet be initialized. The workaround is to do this before you use the region returned by getRegion() import com.gemstone.gemfire.internal.cache.LocalRegion; catch (RegionExistsException ex) { LocalRegion lr = (LocalRegion)ex.getRegion(); lr.waitOnInitialization(); // it is now ok to use the region returned by getRegion
09/05/07 #37562 DistributedSystem.connect() fails to return existing system 5.0 closed DistributedSystem.connect() fails to return existing system Calling DistributedSystem.connect() can result in the exception java.lang.IllegalStateException: A connection to a distributed system already exists in this VM. It has the following configuration: followed by the configuration. This bug is caused by the mcast-flow-control setting not being properly handled when comparing the properties passed to the connect method (or provided in gemfire.properties) with the properties already held in existing system(s). No workaround except to remove the mcast-flow-control setting from the properties. This bug is fixed in GemFire v5.1.
08/31/07 #37549 split-brain in partitioned region: same partitioned region with multiple prId identifiers 6.0 closed Split brain in partitioned regions There is a rare race condition that can occur in assigning an internal identifier to a partitioned region. The condition causes the system to assign more than one identifier to a single partitioned region, with some processes using one identifier and some using another. Because of this, the processes with one identifier do not recognize operations performed on the Region by the processes using the other identifier and vice-versa. We have not been able to isolate the cause of this race condition. It occurs very rarely and appears to happen when many processes attempt to initialize at the same time. We have added a distributed consistency check that verifies that the correct internal identifier is being used. If the consistency check fails, you will see a warning message in one of two forms: node(processID)memberID is using PRID 1 for regionName but this process maps that PRID to 2 node(processId)memberID is using PRID 1 for regionName but this process is using PRID 2
08/15/07 #37388 ArrayIndexOutOfBoundsException when log-disk-space-limit is set 5.0.1 closed ArrayIndexOutOfBoundsException when log rolling enabled and a log-disk-space-limit configured When log rolling is enabled and a log-disk-space-limit is configured then the code that checks the disk space limit may throw an ArrayIndexOutOfBoundsException. An example stack follows: Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at com.gemstone.gemfire.internal.ManagerLogWriter.checkDiskSpace(ManagerLogWriter.java:440) at com.gemstone.gemfire.internal.ManagerLogWriter.checkDiskSpace(ManagerLogWriter.java:452) at com.gemstone.gemfire.internal.ManagerLogWriter.switchLogs(ManagerLogWriter.java:213) at com.gemstone.gemfire.internal.ManagerLogWriter.rollLog(ManagerLogWriter.java:457) at com.gemstone.gemfire.internal.ManagerLogWriter.put(ManagerLogWriter.java:496) at com. Since rolling logs also leaks file descriptors you should disable log rolling in 5.0.1 by setting log-file-size-limit to zero. If you are willing to live with the file descriptor leak then you can work around this ArrayIndexOutOfBoundsException by setting log-disk-space-limit to zero.
07/25/07 #37229 put from client into PR region fails with IMQException returned from cacheserver closed IMQException while doing put in PR. put from client into PR region fails with IMQException returned from cacheserver. Reason: For a query to work correctly it has to have a real Java object (POJO) to work with, this poses an interesting situation for any kind of remote query, that is a query sent from one VM to another. The issue arises when the remote VM, whose object storage may be in serialized form (true for Partitioned Regions as well as Cache/Bridge Servers) needs to de-serialize the stored object into a POJO. If the class can$1t be de-serialized, then the query fails. So the user needs to know the steps to allow for successful de-serialization to avoid the problem described in this bug.
06/21/07 #37004 For bucket id 26, expected 2 members in primary list, but found 3 prFeb07 closed Partitioned Region meta data may contain incorrect information after VM failures For a given Partitioned Region, if participating VMs have failed either through network problems, hardware failures, or software crashes, Partitioned Region meta data for a given participant may contain incorrect information for one or more buckets. The result of such incorrect information is potentially slower access to the information in that bucket. The higher the redundantCopies setting the greater the potential to become incorrect. The redundantCopies setting 0 does not suffer from this issue.
06/20/07 #36990 non-zero log-file-size-limit causes file descriptors to leak 5.0 closed Non-zero log-file-size-limit causes file descriptors to leak Configuring gemfire to roll log files by specifing log-file-size-limit to something other than 0 can result in a leak of a file descriptor every time gemfire rolls the log file.
06/14/07 #36975 CacheTransactionManager can refer to a closed DistributedSystem all closed Cache Transaction Manager may refer to closed distributed system If you close your distributed system and then create a new one, your cache transaction manager may attempt to use the old (closed) distributed system. Transactions may fail or erroneously appear to succeed. If you use transactions, do not close your distributed system after creating it. Exit the JVM if you need to create a new distributed system.
06/05/07 #36921 poor get performance for partition region closed Partitioned Region get performance degraded Performing a get() on a Partitioned Region is 3x worse than release 5.0.1.
04/25/07 #36688 GemFire transaction svc doesn't do proper write-write conflict detection 5.0.1 closed write-write conflicts not always detected If a key is read in one transaction, another transaction modifies the key, and finally the first transaction modifies the key, the conflict is not detected and transaction is committed.
04/10/07 #36597 Java-level deadlock in InternalDistributedLockService.checkLockGrantorInfo leads to stuck lock and hung message reader thread 5.0 closed Java deadlock in DistributedLockService can lead to stuck lock and hung message reader Destroying a DistributedLockService while there are pending lock requests still active can result in those pending locks becoming stuck and unavailable system-wide until the VM that requested such a lock disconnects. In addition, the VM may quit processing messages sent by the member from which it was acquiring the lock remotely. This affects all features that use DLS. For example, Global Regions must lock the key in order to put or destroy the cache entry. Any calls to do so for a key that has a stuck lock would then hang until the VM that caused the problem disconnects from the system. In general, when you close or destroy a feature that uses a DistributedLockService, then that DistributedLockService is destroyed. The workaround is to destroy the DLS when there are no threads actively trying to acquire locks.
03/24/07 #36512 GemFireCache.close is not thread safe 5.0.1 closed GemFireCache.close is not thread safe If one thread attempts to create a new cache while another thread is closing the old cache, one or more static resources may be nulled out, left in an unknown or incorrect state, or never cleaned up. Use the same thread to close the old cache and create the new cache.
03/20/07 #36483 PR-HA test hangs while waiting to connect to killed VM 5.0.1 closed System deadlock during conditions of extreme membership volatility Under certain conditions with volatile membership changes (cache members departing under busy conditions), there is a potential for system deadlock. The confused cache member will have a message similar to the following in its logs: [warning 2007/03/19 22:14:47.635 PDT gemfire3_huey_22603 <vm_7_thr_9_client3_huey_22603> nid=0x1a] Error sending message to huey(22596):56886/48525 (will reattempt): java.net.ConnectException: Connection refused The best solution is to avoid conditions of extreme membership volatility (cache members arriving and departing with great frequency). If this condition is detected in a running system, the deadlock can be safely broken by killing the hung cache member.
03/16/07 #36475 StateFlushOperation may hang with Global scoped regions 5.0.1 closed New replicate in region with global scope can cause system hang If a region has global scope, it is possible for a new replicate to cause a hang in the distributed system. Operations on regions with global scope are not performed in token mode but are put in the waiting thread pool until the region they're modifying is done with getInitialImage. StateFlushOperation will invade other VMs and wait for these messages to finish being processed before allowing the getInitialImage to complete. Not applicable.
03/14/07 #36461 BridgeClient receives BridgeWriterException: InterruptedException on region.get() with server in the process of shutting down (due to InterruptedException/shutdown in progress issues) 5.0.1 closed Cache member shutdown is not reliable Under certain circumstances, especially if there are outstanding operations in a cache member, there is a possibility that the cache member will hang (not completely exit) during shutdown processing. If a cache member does not completely exit, it is safe to directly kill its process using operating system tools (kill -9 in Solaris or Linux, or the task manager in Windows).
03/02/07 #36421 Query shortcut on Region doesn't use index 5.0.1 closed Region.query shortcut method does not use Indexes The query shortcut method in the Region interface does not make use of indexing. Also the QueryService Query instances do not use indexing if the region is passed in as a parameter to the query. Use Query instances obtained from the QueryService and reference regions by full path rather than by passing them in as parameters.
03/02/07 #36420 NPE reported from GrantorRequestProcessor.startElderCall() 5.0.1 closed NPE reported from DLockRequestProcessor The NullPointerException is caused by an assertion error. Lock grants that arrive after the lock service is destroyed must be released to prevent a stuck lock. This NPE causes the associated lock to remain stuck until the VM's Distributed System connection closes. This is the error output to the logs: [severe 2007/03/02 12:45:43.536 PST gemfire5_newton_24123 nid=0x75407bb0] Uncaught exception processing DLockRequestProcessor.DLockResponseMessage responding GRANT; serviceName=Partitioned Region Lock Service; objectName=#partitionedRegion; responseCode=0; keyIfFailed=null; leaseExpireTime=9223372036854775807; processorId=807; lockId=807 java.lang.NullPointerException at com.gemstone.gemfire.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:209)
02/27/07 #36406 List of departed members grows without bound inside of VMs 5.0 closed Frequent cache membership changes uses memory, degrades performance When a system member leaves a distributed system, they are placed in a departed member list by the remaining system members. This list is not cleared out and thus grows without bound. The list uses a certain amount of extra memory, but--more importantly--as successive members depart the distributed system, the amount of processing time associated with handling the departures increases. If the membership of your distributed system is rather stable (a small number of departures), no workaround is required. If, however, your configuration requires a large number of cache members to join and depart, you need to restart any long-lived cache members on a periodic basis to prevent performance degradation or possibly even memory exhaustion.
02/16/07 #36376 ValueConstraint will causes all objects to be deserialized 5.0 closed Setting a value constraint for a region's values causes all objects to be deserialized The ValueConstraint region attribute allows you to declare the class of all the values for a region. But if you specify a constraint, then every value in the region must be deserialized to check the constraint. none/not applicable
02/16/07 #36371 Slow gateway shutdown can leave cache open 5.0 closed Slow gateway shutdown can produce CacheExistsException If you close and reopen a cache that has a gateway, on rare occasions this produces a CacheExistsException. Stop the VM and restart it.
02/14/07 #36358 PartitionedRegionException: registerPartitionedRegion: /PartitionedRegion_9 caught exception dumpPRId:prIdToPR Map@18550851: caused by java.lang.InternalError: Got RegionExistsException 5.01 closed Creation of a PartitionedRegion may fail with exception "registerPartitionedRegion: /PartitionedRegion_9 caught exception dumpPRId:prIdToPR" During concurrent creation and destruction of a partitioned region with a specific name, it is possible for a PartitionedRegionException to be thrown during createRegion with the message "registerPartitionedRegion: /PartitionedRegion_9 caught exception dumpPRId:prIdToPR". Catch this exception and re-create the region.
02/08/07 #36329 Bridge hangs on close waiting for a GrantorRequest response from a member that has departed the DS 5.01 closed Cache server shutdown can cause a system-wide hang On rare occasions, a cache server can experience a problem during shutdown that causes a system-wide hang. This situation happens when the server tries to shut down while it is waiting on a response from another member that has left the distributed system. The server logs a message of this type: [severe ... ] While pushing message <message> to <recipients> com.gemstone.gemfire.ThreadInterruptedException: sleep interrupted Caused by: sleep interrupted, caused by java.lang.InterruptedException: sleep interrupted This problem does not cause data corruption, and the distributed system will restart successfully. Kill your processes and restart all your system members according to your usual procedures. none/not applicable
02/07/07 #36320 Multiple ServerMonitors with same-named endpoints can cause recursive endpoint died/recovered cycle 5.0.1 closed Multiple server definitions with the same name and port can cause a client to enter an endless loop This problem only affects clients running on very fast systems. On fast systems, if any two instances of BridgeLoader or BridgeWriter define the same server name and port pair, a loss of server connection can send the client's server health monitor into an endless loop. The health monitor maintains the client's live and dead server lists. When the client enters into this loop, it appears as if the servers are going up and down. Define each server name and port pair exactly once for any client VM. This means that the BridgeLoader and BridgeWriter for a single region must use different names for the same server endpoint. It also means that you mustn't create multiple instances of a single BridgeWriter or BridgeLoader definition. Starting with version 4.3, the API automatically manages reuse of the same loader and writer instances when the definitions are the same, so no explicit action is required on your part. This example shows how to avoid defining the BridgeLoader and BridgeWriter with the same name and port pairs: Properties writerProps = new Properties(); writerProps.setProperty("endpoints", "serverWA=localhost:44441,serverWB=localhost:44442"); BridgeWriter bWriter = new BridgeWriter(); bWriter.init(writerProps); Properties loaderProps = new Properties(); loaderProps.setProperty("endpoints", "serverLA=localhost:44441,serverLB=localhost:44442"); BridgeLoader bLoader = new BridgeLoader(); bLoader.init(loaderProps); This problem is fixed in version 5.0.1.
01/31/07 #36279 A bridge client putting an empty byte array causes a server NullPointerException 5.0.1 closed Empty byte[ ] causes exception in client/server topology In a client/server topology, you can't put an empty byte[] into the cache as a value. You can have an empty byte[] key. A client attempting to put an empty byte[] into the cache causes the following exception on the server: [java] [warning 2007/01/31 13:27:22.285 PST "server" <ServerConnection 0.0.0.0/0.0.0.0:44444 Thread 12> nid=0x1e4a47e] Server connection from [identity(bishop(:loner):1:0d64f66b,connection=2); port=52631]: Unexpected Exception [java] java.lang.NullPointerException [java] at com.gemstone.gemfire.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:632) . . . One workaround would be to store a one-element byte[] as the value.
01/31/07 #36275 hang during shutdown in TimeScheduler 5.0 closed Hang during DistributedSystem disconnect in TimeScheduler Under very rare circumstances, the DistributedSystem in GemFire may hang when it is being disconnected. Symptoms of this problem are * The thread that is disconnecting the DistributedSystem will be in com.gemstone.org.jgroups.util.TimerScheduler.stop. * A thread named TimeScheduler.Thread will be in this state: Object.wait() [0xffffffff5bbff000..0xffffffff5bbff828] at java.lang.Object.wait(Native Method) - waiting on <0xffffffff6558e7e0> (a com.gemstone.org.jgroups.util.TimeScheduler$TaskQueue) The hang will not affect other processes. The hung VM should be terminated manually. If you encounter this defect, please contact GemStone Technical Support. No workaround
01/18/07 #36218 SystemConnectException: Received no connection acknowledgements from any one of the 1 senior cache members, but both members have each other in their view 5.0 closed Timeout receiving startup responses When a cache member joins an existing distributed system, it must receive an acknowledgment from at least one senior member of the system. If it fails to receive a response in a timely manner, the cache member's startup will fail with a message similar to this: Received no connection acknowledgments from any of the 1 senior cache members: This is usually an indicator of a grossly overloaded system that will not perform satisfactorily in a production environment. If it is not possible to reconfigure your system to allow cache members to respond more quickly, tune the system property DistributionManager.STARTUP_TIMEOUT which controls the amount of time a cache member waits for replies. The default value is 15000 ms (15 seconds), and raising this value may alleviate this symptom.
01/15/07 #36204 Unable to start a cacheserver on Win2003 64-bit edition 5.0 closed GemFire batch files do not execute correctly on 64bit versions of Windows The origin of this problem is that DOS evaluates variables as it reads the line, so set PATH=someString (x86);%PATH% is expanded to set PATH=someString (x86); ACTUAL VALUE OF THE PATH VARIABLE Because of the parentheses, the expression is further expanded into two separate commands, like this set PATH=someString (x86); ACTUAL VALUE OF THE PATH VARIABLE The first line executes correctly and the second causes an error. For GemFire Enterprise, a path containing parentheses "()" breaks the setenv.bat script, leaving the PATH without the gemfire.dll. This forces the GemFire application into Pure Java mode. This problem is not limited to 64bit Windows it is just more reproducible because WOW64 replaces paths like c:\windows\system32 with "C:\windows\system32 (x86)", causing more errors than might be caused by regularly specified paths. Avoid references to paths such as "C:\Program Files" and "C:\windows\system32" in your PATH environment variable.
01/12/07 #36190 getElderState hangs waiting for reply from remote VM which appears hung in getGrantorForRemoteElderRecovery 5.0 closed A VM departure with multiple global regions or lock services can cause a system-wide hang If you have more than one global region or more than one instance of DistributedLockService in your distributed system, on rare occasions a VM departure can cause a system-wide hang. The hang affects all VMs that use either the DistributedLockService or any features that rely on the DistributedLockService, such as global regions, partitioned regions, and transactions. none/not applicable
01/11/07 #36185 Clients can not use registerInterest on regions with DataPolicy EMPTY 5.0 closed IllegalStateException when client calls registerInterest on a region with data-policy of empty If a bridge client tries to register interest on a region whose data policy is empty, the call returns an IllegalStateException saying 'No mirror type corresponds to data policy "EMPTY".' This error message refers to the deprecated mirror-type region attribute, which has been subsumed by the data-policy attribute. The fundamental bug in this case is that the product does not allow you to register interest in a region with data policy set to empty. There is no workaround in this version of the product. If you need to use an empty data policy and register interest in a client region, upgrade to GemFire Enterprise version 5.0.1.
12/18/06 #36113 Unable to install GFE 5.0 on Windows Vista 5.0 closed Unable to install GFE 5.0 on unsupported platform The GemFire Enterprise installer provides product installation only for the supported platforms. Generally, to install and try the product on an unsupported platform, you should contact the GemStone technical support for a .zip file. If you want to install on Windows Vista, you can install on an XP machine and then copy the product tree to the Vista machine.
12/11/06 #36077 hang in parRegCreateDestroy waiting for replies 5.0RC1 closed Heavily loaded systems may cause membership failure If a TCP/IP connection between two cache members is disrupted by extremely heavy system loading, it is possible for one or more members of the distributed system to incorrectly assume that a peer has departed the system. This leads to an inconsistent accounting--between cache members--of the currently active members of the system. This in turn can lead to cache corruption or system deadlocks. The level of loading required to generate this type of failure is huge. For instance, one test case in-house had a CPU load of 40 (many cache members on a single underpowered host) running for 15 minutes before this failure reproduced. Users should be careful to monitor processor utilization on the hosts running GemFire cache members and to avoid extreme overloading.
12/05/06 #36042 Inconsistent PR data, too many bucket owners 5.0RC1 closed IO Exceptions can cause data loss when Partitioned Region redundantCopies=0 This problem occurs when there are no redundant copies in a PartitionedRegion. Under some failure conditions during communication it is possible for data loss to occur. These include the following failure types: [warning ... ] Ran out of thread owned resources so switching to conserve-sockets=true. Because: com.gemstone.gemfire.internal.tcp.ConnectExceptions: Could not connect to: somehost(15188):2243/2165 Causes: {java.io.IOException: An existing connection was forcibly closed by the remote host} [warning ... ] Failed sending {com.gemstone.gemfire.internal.cache.UpdateOperation$UpdateMessage(region path='/__PRRoot/__Bucket2NodeRegion_#partitionedRegion'; sender=somehost(16924):2225/2162; callbackArg=null; processorId=0; op=CREATE; appliedOperation=false; earlyAck=false; directAck=true; lastModified=0101010101010; key=105; newValue=null; valueIsSerialized=true)} to member {somehost(11188):2210/2160} with stub {tcp:///192.168.1.1:2160} who is now considered to have crashed because: com.gemstone.gemfire.internal.tcp.ConnectionException: Not connected to tcp:///192.168.1.1:2160 [warning ... ] Error sending message to somehost(16924):2225/2162: java.io.IOException: An established connection was aborted by the software in your host machine blished connection was aborted by the software in your host machine Configure your application to allow for data loss such that the storage of record can be accessed via a CacheLoader. Restart all members reporting such warnings in addition to those members referred to in the warning messages.
11/30/06 #36014 Internal PartitionedRegionException is thrown from public API 5.0 closed Internal PartitionedRegionException is thrown from public API Some partitioned region operations throw product internal exceptions, such as com.gemstone.gemfire.internal.cache.PartitionedRegionException. Typically these exceptions indicate internal problems with the product. If they do occur, please contact support with the exception, all associated logs and statistic files.
11/20/06 #35985 inconsistent bucket stores in partitioned region with redundancy=1 5.0 closed Inconsistent bucket stores in partitioned region with redundancy=1 There is a race condition in the propagation of entry operations in partitioned regions that can cause inconsistent data, resulting in the order of operations being mixed. For any given entry operations at any given time, ensure that there is only one writing thread. One way to accomplish this is to use the DLock system to order operations.
11/09/06 #35948 JMX tests fail with OOM with 3.0.2 libraries for MX4J (and 1.4.2 JRE) 5.0 closed JMX Agent unstable in GemFire version 5.0 The JMX Agent is unstable in GemFire 5.0. GemFire 5.0 uses MX4J 3.0.1 (for both JDK 1.4 and 1.5) which has serious bugs causing OutOfMemory errors. The main errors that you might see occur during method invocation on MBeans that are hosted in the GemFire JMX agent. The errors are java.lang.OutOfMemoryErrors wrapped inside javax.management.MBeanExceptions. The 5.0 agent should not be used in production systems, but may be used for development or testing purposes. There is no suitable workaround in 5.0. We recommend upgrading to version 5.0.1 to resolve this problem. The 5.0.1 version of GemFire uses JDK 1.5 JMX for the 1.5 JDK and MX4J 2.0.1 for the 1.4 JDK.
10/12/06 #35790 Uncaught InterruptedException in ServerConnection thread (ThreadPoolExecutor) 5.0 closed Uncaught InterruptedException in ServerConnection thread (ThreadPoolExecutor) During bridge server shutdown, the ServerConnection ThreadPoolExecutor may log a message of this type: [severe 2006/10/11 23:38:59.214 PDT gfserver1 nid=0x9c1a1bb0] Uncaught exception in thread java.lang.InterruptedException ... This is a small bug in shutdown handling that has no negative effect on VM health or behavior. You can safely ignore the message.
10/05/06 #35741 Restarted VM fails to createVMRegion due to PartitionedRegionException: Could not get Partitioned Region from Id 2 5.0 closed Creation of a PartitionedRegion may fail with exception "Could not get Partitioned Region from Id" During creation of a Partitioned Region, an identifier is created. There are conditions under which the identifier creation/discovery process fails for a given VM. This failure causes a PartitionedRegionException to be thrown during Region creation. Typically the cause of such a failure is related to distributed race conditions. Catch the exception and retry the creation operation.
09/07/06 #35555 Unexpected keys found in partitionedRegion (region size is greater than expected) 5.0 closed Unexpected keys found in partitioned region (region entry count is greater than expected) This kind of data inconsistency can happen when concurrent destroy and create/put operations are performed on an entry by multiple threads. The threads can be in any number of VMs. Either use redundantCopies=0, or if that is not possible, prevent concurrent entry operations (put, invalidate, destroy) on a per-entry basis. If the writing operations can be limited to a single VM, use synchronization to coordinate threads in that VM. If the writing threads must be distributed among multiple VMs, use the DLock system to coordinate entry write operations.
06/07/11 #43536 Server attempts to deserialize too early with function execution and pdx deferred Function APIs require that classes be on the JVM's classpath The function APIs do early deserialization during messaging of function results, filters, arguments, and the functions themselves. So the class for these objects must be on the JVM's classpath. It is not possible to define your own class loader just before you read a function result or get the arguments passed to your function code. Add the classes for functions, function arguments, function filters, and function results to your JVM's classpath.
04/07/08 #38753 Gateway uses P2P reader thread to distribute and wait for ack causing deadlock 5.5 deferred Member hosting GatewayHub may deadlock if performing cache operations or hosting more than one GatewayHub The GatewayHub thread that distributes gateway events is the same thread that reads in messages. This provides guaranteed redundancy with secondary backups, but can result in deadlock if either member tries to perform cache operations or to host more than one GatewayHub. 1) Use -Dgemfire.gateway-queue-no-ack=true 2) Host only one GatewayHub in any given member and dedicate that member to hosting the GatewayHub. It should not perform cache operations or do anything other than feed the gateway.
07/13/11 #43685 Setting ack-severe-alert-threshold causes severe alerts for rebalancing dlock new Severe alerts with ack-severe-alert-threshold set When running with the ack-severe-alert-threshold property set, there may be severe messages logged to the system log that look like this: "20 seconds have elapsed waiting for the partitioned region lock held by ...". These severe alerts can be safely ignored. Ignore severe alerts related to "waiting for the partitioned region lock" if ack-severe-alert-threshold is set.
07/12/11 #43677 ClassNotFoundExceptions still occuring with 6.5.1.19 even after fix 6.5 closed Locators no more attempt to unnecessarily load custom data serializers or instantiators. Prior to this release, locators used to log warnings saying ClassNotFoundException while attempting to unnecessarily load custom data serializers or instantiators. This has been fixed so that no unnecessary loading of classes happen at locators. This is fixed in 6.5.1.20 but NOT in 6.6 No workaround. If such warnings are seen in locators logs for data serializers or instantiators, these could be ignored.
05/05/11 #43312 Assertion failed in Oplog recovery closed Entry remains after clear of persistent region with async writes This problem happens in persistent regions that are configured for asynchronous disk writes. When you put an entry into such a region, the put is first done in the region, then the put event is added to the async queue to be flushed to disk. If the region is cleared while the put is taking place, before the entry event is written to the queue, the event will not be cleared along with the other enqueued events. The event will be then enqueued and written to disk and the entry can be recovered back into the region. None. Avoid clearing a region while entry creations and updates are being done.
01/27/11 #42671 'GemFireConfigException: Unable to contact a Locator service. Operation either timed out or Locator does not exist' reported by re-started VM, but prior to this all VMs report BindException: Address already in use (while trying to contact the locator) 6.5 closed Network connection fails on Windows with java.net.BindException: Address already in use This is an ephemeral sockets exhaustion problem. The machine needs to be configured to have more ephemeral sockets. The clue is that whenever the "address already in use" exception happens on the client side of the socket connection, not when the server tries to bind a server-socket to an address/port. See this msdn link for information to fix this: http://msdn.microsoft.com/en-us/library/aa560610.aspx The following registry setting need to be added/updated: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet?\Services\Tcpip\Parameters] "TcpTimedWaitDelay?"=dword:0000003c [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet?\Services\Tcpip\Parameters] "MaxUserPort?"=dword:00008fff
02/12/07 #36349 Bridge client region.put() completes without exception, but entry value is not updated at the server 5.0 closed Server's entry value is not updated although client region.put completes without exception This happens when operations are performed out of order on the server. The problem arises from this sequence of events: 1. A client attempts to put value X one or more times, but each attempt times out. 2. Each failed attempt "orphans" a thread on the server. 3. The client picks a new connection (and its associated server thread) and continues to perform its sequential updates (X+1, X+2, ... X+n). 4. The orphan threads are eventually scheduled and successfully perform the put with value X, overwriting the previous values (X+1 or X+2 or X+n). Disable timeout behavior for the BridgeLoader, BridgeWriter, or BridgeClient by setting its "readTimeout" parameter to zero. This causes all Region operations supported by the client to block until the server has finished with the operation, preserving client ordering. The "retryAttempts" configuration will still be used when there are communication failures with the server or when the server cache closes in the midst of the operation.