February 2017

Partial degradation of UDP device service
On Feb 19-20 we observed a partial degradation to our UDP device service, which governs cellular device connections to the Particle Cloud. This resulted in a fraction of cellular devices temporarily losing the ability to publish data. The issue is no longer occurring and was resolved yesterday. We have also escalated and released a platform fix to address this issue going forward.
Feb 21, 14:17 PST
Telephony partner emergency maintenance took reporting API offline
Our Telephony partner has finished their maintenance and data reporting, and SIM management are back up and running.
Feb 20, 17:30 - Feb 21, 01:31 PST
[Scheduled] Metrics and alerting system upgrade
The scheduled maintenance has been completed.
Feb 16, 13:53 PST
[Scheduled] Telephony partner maintenance window
The scheduled maintenance has been completed.
Feb 14, 15:00-21:00 PST
Partial degradation of some Particle services (now resolved)
From 3:28PST to 3:43PST, we experienced degraded availability for some services. During this 15 minute window, the impact was: - Setting and enforcing individual SIM billing thresholds, changing billing plans, and monitoring cellular data usage, may have been unavailable - Some web hooks may have been delayed This issue is now resolved. Our cloud team is investigating the root cause and taking steps to ensure this will not occur again.
Feb 2, 17:34 PST

January 2017

Webhook and Integration Latency
This incident has been resolved.
Jan 19, 08:09-12:56 PST
Webook and integration latency
Performance has been great since the fixes. This issue is now considered resolved.
Jan 11, 23:28 - Jan 13, 11:27 PST

December 2016

No incidents reported for this month.

November 2016

[Scheduled] Telephony partner maintenance window
The scheduled maintenance has been completed.
Nov 29, 15:00-23:00 PST
Slow device connections
An elevated number of devices were experiencing slow connection times to the Particle cloud. Devices may have been blinking cyan for longer than normal durations. The issue has been identified and resolved.
Nov 26, 21:04 PST
Investigating a cache issue
We've resolved the issue with our cache system this morning so things should be back to normal. The graph of request history (success / error / timed out) for integrations on the console will be blank however, and will start collecting new metrics as of this morning.
Nov 14, 08:45-09:46 PST

October 2016

[Scheduled] Telephony partner maintenance window
The scheduled maintenance has been completed.
Oct 24, 15:01-23:00 PST
Temporary issues with DNS
This incident has been resolved.
Oct 21, 10:00-14:09 PST
Telephony partner API outage
Our telephony partner's API is functional again.
Oct 20, 20:59-23:40 PST
[Scheduled] Telephony provider - database upgrades
The scheduled maintenance has been completed.
Oct 5, 19:45 - Oct 6, 07:45 PST
Messaging systems down
System is back up and operating normally.
Oct 3, 19:40-20:30 PST

September 2016

[Scheduled] Telephony provider - database upgrades
The scheduled maintenance has been completed.
Sep 26, 13:01 - Sep 27, 13:08 PST
Telephony partner API outage
Our telephony partner has resolved the issue with their API.
Sep 23, 04:39-07:48 PST
Webhook system not firing events
We're closing this incident out after monitoring the system for a few hours. Webhooks are and have been firing properly for some time.
Sep 14, 19:16-22:41 PST
Temporary SIM setup issues
This incident has been resolved.
Sep 8, 10:15-17:25 PST
Temporary issues with DNS
This incident has been resolved.
Sep 6, 09:57-13:55 PST
[Scheduled] Device Service Improvements
Caching rollout is complete and went really well. Enjoy the speed! 🚀
Sep 1, 08:30-09:52 PST

August 2016

Delayed usage data from our telephony partner
This incident has been resolved.
Aug 26, 14:42 - Aug 31, 11:43 PST
Public API Interruption
In rolling out a change to one of our cloud services, an unexpected failure occurred. Additionally, the automated rollback strategy failed and required a manual intervention that took about 10 minutes to fully bring the service up again. The API is now operating as expected.
Aug 30, 14:11-14:44 PST
[Scheduled] Database maintenance
The scheduled maintenance has been completed.
Aug 12, 07:01-08:00 PST
[Scheduled] Database maintenance
The scheduled maintenance has been completed.
Aug 11, 07:01-10:00 PST
Updating API rate limits
With new rate limits in place API latency is the most consistent it's ever been. We'll be reaching out to users with overly aggressive scripts next week. If you see HTTP status 429, you know what to work on!
Aug 5, 19:08-19:56 PST
Devices take longer than normal to connect
All Particle Cloud systems and services have exhibited 100% normal behavior since the last update.
Aug 2, 09:37-13:19 PST
Electrons have trouble connecting to the cloud
This incident has been resolved.
Aug 2, 08:50-09:01 PST
Device service degraded performance
Database latency is now stable and where it was before our db provider began reporting issues. As a result all Particle services are functioning as expected again.
Aug 2, 07:06-07:53 PST
[Scheduled] Planned database maintenance on the billing system
We have completed the updates to the system! All services should now be working normally
Aug 2, 03:44-04:37 PST
[Scheduled] Planned database maintenance on the billing system
The scheduled maintenance has been completed.
Aug 1, 20:31-21:46 PST

July 2016

Build IDE binaries download not working
This incident has been resolved.
Jul 26, 04:58-07:16 PST
[Scheduled] Telephony provider emergency maintenance
The scheduled maintenance has been completed.
Jul 25, 15:00-22:00 PST
Telephony partner API errors
Telephony partner API continues to be operational
Jul 18, 16:00-22:10 PST
Intermittent DNS Issue
Particle cloud services are operating normally with the mitigation in place. We will continue to monitor the third party provider for updates.
Jul 18, 12:20-16:02 PST

June 2016

Telephony service provider emergency maintenance
Maintenance was a success. All the telephony services have been restored.
Jun 27, 15:26-23:36 PST
Billing Outage
This incident has been resolved.
Jun 23, 10:58-11:30 PST
Webhooks were partially unavailable overnight
We detected and resolved an issue this morning that didn't cause an alert to be triggered for our on-duty engineers. We're in the process of scaling out webhooks due to increased demand, and we're working on resolving the recently elevated error rate.
Jun 16, 07:31 PST
Webhooks Outage
This incident has been resolved.
Jun 13, 12:33-12:47 PST
Brief Webhooks issue
We discovered one of our webhooks workers had an issue this morning that wasn't caught by our automatic alerts. We've restored the worker and setup alerts so we can react more quickly in the future. Thanks!
Jun 2, 08:14 PST

May 2016

3rd party infrastructure changes preventing Build (build.particle.io) from loading
We've identified the source of the problem and fixed the underlying cause of it. We also know why our automated alerts did not catch this and notify us or automatically scale up/resolve the issue. The root cause was complex: A 3rd Party infrastructure change coupled with a new load bearing infrastructure dependency for a particular http endpoint in our code that'd been running fine in production for months. Our automated alerts did not catch it because that particular endpoint was not involved in the periodic healthcheck. Sorry for the inconvenience, and thanks for your patience.
May 16, 08:23-08:54 PST
Increased error rate
This incident has been resolved.
May 6, 09:21-10:09 PST
Cellular outage for some customers
Our telephony partner's outage has been resolved. Electrons using Particle SIMs are connecting as they should.
May 2, 19:14 - May 3, 00:10 PST

April 2016

Our telephony partner's API is down
Our telephony partner has resolved their API issue. SIM card activation and deactivation should work again.
Apr 25, 16:43 - Apr 26, 06:02 PST
Responded to some traffic spikes this morning
Resolving this as things are responding well to the load adjustments we made, but as always, we'll continue to monitor things closely.
Apr 7, 08:55-09:43 PST
Monitoring increased error rates with a Mobile Integration
Looks like this was resolved almost immediately, but we kept the service marked as degraded just in case there were other issues. Thanks!
Apr 4, 21:27 - Apr 5, 06:32 PST

March 2016

Elevated API and handshake errors
Latency and error rates are back to normal. Everything is humming along nicely.
Mar 24, 10:48-11:12 PST
Brief deploy in response to increased load
We rolled out a patch that we have been testing in response to an issue related to logging today. Devices may have been briefly slow to connect, we apologize for any inconvenience, and will continue to monitor things closely. Thanks!
Mar 16, 12:31 PST

February 2016

Telephony provider's API is down
This incident has been resolved.
Feb 25, 10:03-13:10 PST
Build IDE loading
We've identified the root cause of this issue and have deployed a fix. If you've used Build in the last couple of hours, you may need to click the "refresh libraries" button to see them all again.
Feb 23, 18:38-22:17 PST
Trouble logging into the dashboard
We've fixed the issue and redeployed the impacted service, logins should be behaving normally again.
Feb 22, 11:07-11:13 PST
Billing and MVNO -- Service Interrupted
We've resolved the issue with our billing and MVNO services
Feb 20, 11:29-12:20 PST
Electron data usage reporting delay
This incident has been resolved.
Feb 12, 22:36 - Feb 13, 08:48 PST
build.particle.io slow / unresponsive for some
We've scaled up the IDE again and things should be running smoothly, thanks!
Feb 8, 10:32-11:14 PST
[Scheduled] Brief window during scale-up this morning
The deploy went very smoothly as expected, and we'll continue to monitor the performance and reliability of the cloud. We will also be deploying smaller code changes later today as well. Thanks!
Feb 1, 08:00-09:38 PST

January 2016

Some users might be experiencing trouble reconnecting devices
Sorry about the restarts, things should be back to normal.
Jan 7, 16:45-16:59 PST

December 2015

Webhook disruption
All alerts have resolved. Everything is back to normal.
Dec 19, 06:41-06:48 PST
Cloud connectivity issues
We've restored the malfunctioning box, it looks like a config error caused the box to become unresponsive. We'll continue to monitor closely today.
Dec 6, 06:13-06:33 PST

November 2015

Webhooks Issue
One of our webhook nodes became unresponsive. Though an automated alert and restart fixed the issue, we'll continue looking into this to understand the root cause and prevent it from happening in the future.
Nov 17, 13:23 PST
investigating issue with webhooks
we've restarted the unhealthy node and webhooks should be back to normal, we'll be monitoring this closely, and we've fixed the malfunctioning alert.
Nov 15, 15:03-15:14 PST

October 2015

Issue with an upstream service provider resolved
We've been responded to an issue with an upstream service provider that required more API restarts than we normally allow in a given day. Although each instance of downtime was only a few seconds, we apologize for this disruption. This issue should be resolved and we'll continue to monitor it closely. Thanks!
Oct 21, 16:56 PST
Increased API error rates
Some API errors caused high CPU usage. All systems have recovered.
Oct 21, 12:23-12:30 PST
Cloud Compiler down
A dependency issue was resolved, compiling should be back to normal now.
Oct 12, 21:30-23:32 PST
A few API endpoints not behaving properly, causing the Dashboard not to render properly
During a backup, the database server became unavailable. The API's database connection didn't recover properly after the brief DB outage. We're back up and running now and will look further into causes and permanent remedies.
Oct 5, 23:22-23:55 PST
Compile service upgrade in progress
The compile service upgrade is finished.
Oct 2, 13:46-13:56 PST
Compile service
This incident has been resolved.
Oct 1, 13:00-15:19 PST

September 2015

Upstream issues from Heroku / Amazon
Heroku has now recovered from the outage. Services have returned to normal.
Sep 20, 08:43-10:37 PST
Updating build farm
We rolled out some new firmware options to the build farm today, so there were a few minutes of compiler interruptions, sorry! Should be fine now.
Sep 11, 15:46 PST

August 2015

Cloud compiler issues
For the last couple of hours the cloud compiler (used by the API, Web IDE (Build), CLI, and Local IDE (Dev) has not been performing as expected. We've added multiple alerts that will trigger again if this same issue arises and be able to address it more quickly. Additionally, we'll be investigating root cause more deeply this week to prevent the issue from arising in the first place.
Aug 22, 15:14 PST
updating build farm
updates are rolled out to the build farm and look good, thanks!
Aug 20, 10:38-11:22 PST
[Scheduled] Cloud Compiler Upgrades
The scheduled maintenance has been completed.
Aug 6, 09:21 - Aug 20, 10:38 PST

July 2015

Cloud Compile Service
Cloud compile service logs and metrics have appeared normal for the last 25 minutes, service is operating normally.
Jul 31, 09:41-10:12 PST
Rolled out some minor updates
Some small changes were rolled out to help address some flashing issues that have been reported in the IDE.
Jul 27, 10:17 PST
Build farm had a brief partial outage
We discovered one of our build farm workers ran out of memory, and was unable to perform builds for some time this afternoon. We'll be adding alerts to catch this earlier, and working to identify the source issue this week. Should be back up and happy now. Thanks!
Jul 26, 15:47 PST
Short API Downtime
We experienced a very short blip in API availability resulting from an attempted deployment. We have rolled back and service has been restored.
Jul 23, 14:46 PST
Cloud is trying to perform a factory-update on previously updated photons
We've fixed the configuration issue that caused this problem, and we're working on making sure this doesn't happen again.
Jul 20, 13:33-13:48 PST
Currently investigating an elevated error rate on the webhook service
We've identified and fixed this issue, we believe it was interfering with webhook responses sometimes being delivered as expected.
Jul 16, 14:37-15:23 PST
Compile errors returning 400 error
This incident has been resolved.
Jul 1, 20:09 - Jul 2, 02:16 PST

June 2015

Brief, partial messaging outage
Our third-party persistence service had another hiccup at 2:20am CST. This time we managed to capture a mountain of instrumentation data associated with the incident that will help help us understand the issue better and prevent it in the future.
Jun 7, 01:00 PST
Brief, partial messaging outage
Our third-party persistence service compose.io had a hiccup at 6:35pm PDT. Luckily, some of the ops team was online and working and noticed odd behavior even before alerts were triggered. With a restart of a few services all metrics were normal by 6:44pm PST. Given the recurring problem, we are researching alternatives.
Jun 3, 19:18 PST

May 2015

Compile service ran out of space briefly
Looks like the workspace cleanup wasn't happening as expected and some disks filled up, we'll look into why we weren't alerted, but in the meantime online IDE builds should be running normally again. :)
May 28, 22:15 PST
[Scheduled] IFTTT Channel Updates
The scheduled maintenance has been completed.
May 27, 10:00 - May 28, 15:00 PST
Temporary Dashboard Outage
dashboard.particle.io experienced a short outage this afternoon during an attempted deployment. The issue has been identified, and the application has been rolled back to its original state until a fix is implemented.
May 26, 16:09 PST
Devices not connecting
Everything's back up. We'll be actively working to mitigate future such events in the coming weeks.
May 24, 00:03-00:46 PST
community forums load balancer is experiencing issues
The issue with the forums should now be resolved. We were taking a snapshot of that machine instance, but that snapshot restarted the box without warning. Some forum services needed some attention after the unexpected restart. We'll continue to monitor this service closely today.
May 21, 10:38-10:57 PST
Degraded device service performance
We're confident all's well. Standing down.
May 21, 02:24-03:10 PST
Cloud changes to prep for Photon release
Post deploy, everything is looking good here. Can't wait for folks to get their Photons! :)
May 14, 09:55-10:23 PST
[Scheduled] Expected downtime
The scheduled maintenance has been completed. We're very happy to report no incidents and zero downtime.
May 2, 23:30 - May 3, 01:30 PST
Investigating some strange performance issues
This incident has been resolved.
May 1, 12:25-12:37 PST

April 2015

Degraded performance in the web IDE
We rolled back the IDE to the last-known-good version and all seems to be back to normal.
Apr 27, 13:44-14:05 PST
Some connectivity issues
This incident has been resolved.
Apr 26, 00:22-00:57 PST
Some brief issues verifying software in the IDE
We deployed new code this morning and quickly rolled back after we saw an increased error rate. We're fixing those issues now and will roll out the improvements as they're fixed.
Apr 22, 12:04 PST
Devices currently unable to connect
Looks like we were pushing the memory ceiling on one of the device service boxes. All's well now. We'll setup new alerts to catch this in the future before it causes an outage.
Apr 12, 00:43-01:50 PST
lingering errors from last nights brief outage
we restarted a service that was misbehaving after a brief issue last night, things should be back to normal.
Apr 5, 16:24 PST

March 2015

build farm is back up
Sorry about that! Looks like an attack on Github impacted our build farm even though our build farm should be protected against that, we'll look into what caused things to jam up. In the meantime the build farm should be back up and running, but we'll keep monitoring it closely.
Mar 27, 05:48 PST
[Scheduled] Scheduled Cloud Upgrade
Thanks everyone! The scheduled maintenance went well, and we're running on shiny new hardware! Woo! Things should be back to normal if not a bit faster, thanks!
Mar 23, 09:15-10:20 PST
webhook creation timing out
This incident has been resolved.
Mar 16, 15:14-16:12 PST
Seeing some cores having difficulties connecting
Our database hosting provider switched over between a primary and secondary servers and caused several device-service processes to stop responding to handshakes for a few minutes.
Mar 10, 21:37-21:52 PST

February 2015

The API is experiencing degraded performance, we're looking into it
we'll continue to monitor these services, but the issue should be resolved.
Feb 24, 15:41-16:33 PST
Community Site Down
Discourse automatic backups filled the disk space. We've cleared out old backups. This box is already scheduled for an upgrade this sprint. We look forward to avoiding these kinds of outages in the future.
Feb 8, 10:23-10:34 PST
Elevated API Errors
This incident has been resolved.
Feb 6, 18:34 - Feb 7, 08:40 PST
Devices unable to connect
There was a problem with the device service's database connections. Everything's back online now.
Feb 2, 06:27-06:39 PST

January 2015

Spark Cores are currently unable to connect
Everything's back online.
Jan 23, 17:04-17:07 PST
beta functionality degraded
Looks like Amazon had stopped our beta services box unexpectedly, we're adding more alerts and monitoring to our beta box so this doesn't happen again.
Jan 19, 12:47-12:56 PST
community site outage
The server hosting the community site ran out of disk space again, I've freed up some space. It looks like our pagerduty alerts weren't hooked up to the forums, so we weren't woken up for this one, sorry! We'll schedule some maintenance later today to upgrade the disk on that box, and setup alerts
Jan 7, 06:31 PST
The forums are experiencing an outage, we're looking into it
Looks like the server hosting the forums ran out of disk space, it should be back up now and we'll add alerts to avoid this in the future. Thanks!
Jan 3, 14:43-14:56 PST
Brief device-service outage this afternoon
We'll keep monitoring the services closely, Happy New Year!
Jan 1, 14:07-14:20 PST

December 2014

No incidents reported for this month.

November 2014

Investigating an alert
We're still keeping an eye on this, but resolving this incident for now
Nov 25, 16:06-19:29 PST

October 2014

device service downtime
balance restored, we might reboot things one more time to carry some changes forward, sorry about the downtime!
Oct 16, 13:38-13:49 PST
Reports of cores appearing offline to the API after some time.
The self-healing / routing patch is deployed and working well. It might take the cloud a few seconds sometimes, but we'll keep working on that and get it back down to instantaneous. Thanks again to everybody who reported issues!
Oct 8, 19:36 - Oct 9, 10:45 PST
Monitoring load / Scaling out services
We've been watching this closely, and we'll continue to monitor services as we scale up, but we'll close this issue for now.
Oct 6, 13:49-20:41 PST
investigating some unusual metrics
We've been watching things closely for the last 3 hours just to be safe, and things have continued to be very stable, so I'm marking this as resolved for now, but we'll keep monitoring for any usual behavior.
Oct 4, 15:33-18:55 PST

September 2014

Upgrading servers; cloud compile service not working as expected
In an effort to avoid downtime caused by Amazon, we caused some downtime pre-maturely... Should be good to go now, sorry for the inconvenience.
Sep 29, 16:25-16:29 PST
community downtime this morning
We're looking into the cause of some community downtime we had this morning, but we'll continue monitoring the site throughout the day, the site is back up after a restart in the meantime.
Sep 29, 05:31 PST
Database issues are not fully resolved yet
We will continue to monitor the database carefully, but things have been stable for the last few hours. Postmortem to follow! :)
Sep 19, 20:34-23:28 PST
Build site is down
Our host is fixing the failed secondary instance, and we reconfigured / restarted the site to ignore the secondary, and we'll need to update a database driver for the site since it didn't failover nicely. The rest of our services were not impacted by this downtime. Sorry!
Sep 19, 18:34-19:24 PST
compile server errors
This incident has been resolved.
Sep 10, 15:57-16:28 PST

August 2014

Partial forum outage
and we're back
Aug 22, 15:44-16:53 PST
Device service issues
For about 15 minutes today from 12:37 - 12:48 CST there was a service disruption that appeared to have affected a small percentage of all connected cores. During this time cores were unable to connect to the cloud. See community thread for more details or if you have any questions: https://community.spark.io/t/spark-cloud-updates-and-brief-down-time/6495
Aug 20, 11:09 PST
Build farm issues
I've contacted the user responsible, and the service is back to operating normally.
Aug 5, 10:06-10:42 PST

July 2014

[Scheduled] Community Forum Migration
The scheduled maintenance has been completed.
Jul 23, 13:00-14:00 PST
Heavy load on the forums
Hey all! You might have noticed a few minutes of downtime this evening due to heavy load on the forums as we adjusted the database. We'll be continuing to upgrade the forum servers over the next week, but we just wanted to give you a heads up. Thanks! David
Jul 11, 16:25 PST
Spark Community Forum DOWN 7.2.2014 7:39PMCST
Back Online at 10:19 CST
Jul 2, 17:42-20:21 PST

June 2014

www.spark.io impacted by underlying Heroku issues
This incident has been resolved.
Jun 23, 11:12-14:27 PST
Database Maintenance
Maintenance complete. Smooth as silk.
Jun 3, 22:00-22:09 PST

May 2014

build IDE isn't loading properly
All better, thanks!
May 5, 15:48-16:01 PST

April 2014

Firefox doesn't trust Comodo's CA cert for some reason
Fully functional certificate has been deployed to api.spark.io and community.spark.io.
Apr 19, 09:05 - Apr 21, 22:13 PST
Community SSL certificate temporary issue
So yesterday, for a few hours, Windows 7 users were not able to use the community site for a couple of hours due to certain browsers being unable to use a more modern, more secure SSL encryption algorithm. Very quickly, @bko and @peekay123 (at community.spark.io) alerted us that they were having problems, and we reverted it—thanks for letting us know so quickly! This incident was caused when we were in the process of rotating our SSL certificates in response to the HeartBleed bug. Note that we patched all of our systems within 24 hours of the initial announcement; SSL certificate rotation is an additional mitigation measure we're taking. In the coming days we'll be finishing this process. We'll be sure to update this status page if any issues arise. The Spark Team
Apr 11, 09:33 PST

March 2014

Main website and build site were degraded, but are back to normal now
It looks like one of the main website dynos stopped connecting to the database, and was degraded, we're investigating this issue.
Mar 31, 03:46 PST
One of the servers hosting the spark.io website on the load balancer is degraded
This problem has been resolved. The webserver appears to have been using an incorrectly cached IP for the database, and was experiencing connection issues for the last ~3 hours.
Mar 20, 02:25-02:35 PST
Community Being Upgraded & DNS Change
community.spark.io has been operating normally since 4:45. Within minutes of posting our first status update, two awesome people in our community, kennethlimcp and hypnopompia, helped us realize that image urls were being linked to via an old DNS record to sparkdevices.com rather than spark.io. This resulted in uploaded images not being viewable (for up to an hour) while the DNS change propagated.
Mar 5, 13:16-16:50 PST

February 2014

No incidents reported for this month.

January 2014

Bad Firmware
Tinker version 1 was being sent to Cores, even though the Device Service was telling Cores to update if they were at a version lower than 2. As intended, version 2 is now correctly being distributed.
Jan 23, 23:19-23:47 PST
Trouble manually claiming cores using Chrome
The issues associated with claiming cores manually using the build site at spark.io/build are resolved. For those affected, thanks for giving us the details we needed to help us diagnose and fix the error as well as your patience while we worked out a solution!
Dec 28, 13:45 - Jan 6, 11:48 PST

December 2013

No incidents reported for this month.