Incidents | Better Uptime Incidents reported on status page for Better Uptime https://migration.betteruptime.com/ en Delay in model metrics https://migration.betteruptime.com/incident/336169 Thu, 01 Feb 2024 02:01:25 -0000 https://migration.betteruptime.com/incident/336169#928b695515e6257c24acad0bfba69842a3a92d73d67f2b4cb79995fe9d105f5a New metrics are being populated again. Delay in model metrics https://migration.betteruptime.com/incident/336135 Thu, 01 Feb 2024 02:01:25 -0000 https://migration.betteruptime.com/incident/336135#72be17e72a7764a7200b1706911fcfe96ef94911f63231091cd1243e5dd2f0b9 New metrics are being populated again. Delay in model metrics https://migration.betteruptime.com/incident/336169 Thu, 01 Feb 2024 01:02:56 -0000 https://migration.betteruptime.com/incident/336169#5c48ae5c8b46341e7b76c60d31ed26170e86229d3c999f345c72ad08c6d502b0 Some models are missing recent metrics starting 4 minutes ago. We're investigating. Inference and logs are not impacted. Delay in model metrics https://migration.betteruptime.com/incident/336135 Thu, 01 Feb 2024 01:02:56 -0000 https://migration.betteruptime.com/incident/336135#2bd55117234d87c9418593f0fbdd8acfd01af3a89e565db2d7ac40aaf5495c17 Some models are missing recent metrics starting 4 minutes ago. We're investigating. Inference and logs are not impacted. Metrics are degraded for some models https://migration.betteruptime.com/incident/336170 Thu, 25 Jan 2024 02:03:47 -0000 https://migration.betteruptime.com/incident/336170#ff842c6e9bc701449a05e362149a9c9c4d5f54b0e22cb166fe296f712ef15323 We have resolved the issue, and metrics are available again. Metrics are degraded for some models https://migration.betteruptime.com/incident/336136 Thu, 25 Jan 2024 02:03:47 -0000 https://migration.betteruptime.com/incident/336136#7a7b256d86c0e33457ab4a755daa4061ab2e9be38b40dc0db681bf4813cca2d6 We have resolved the issue, and metrics are available again. Metrics are degraded for some models https://migration.betteruptime.com/incident/336170 Thu, 25 Jan 2024 02:03:03 -0000 https://migration.betteruptime.com/incident/336170#3b8c15fa582d27546a7c197bdf47cd61b3acd754f1d3561c967bb4e4bc9e64d3 We are continuing to investigate this issue. Metrics are degraded for some models https://migration.betteruptime.com/incident/336136 Thu, 25 Jan 2024 02:03:03 -0000 https://migration.betteruptime.com/incident/336136#4ff4506c49f7b4809b945aefb2b4c11e6f6de3f6218684465de577c372c8546d We are continuing to investigate this issue. Metrics are degraded for some models https://migration.betteruptime.com/incident/336170 Thu, 25 Jan 2024 02:02:22 -0000 https://migration.betteruptime.com/incident/336170#43cf4c4a10599025f49c212782a79059a0111ddcf4af1e810d6cb44ea1d74308 We have resolved the issue, and metrics are available again. Metrics are degraded for some models https://migration.betteruptime.com/incident/336136 Thu, 25 Jan 2024 02:02:22 -0000 https://migration.betteruptime.com/incident/336136#19adc7befff0052dc515b381ef882ddd50acf377e54a27e88ae5ffa1fb051ad3 We have resolved the issue, and metrics are available again. Metrics are degraded for some models https://migration.betteruptime.com/incident/336170 Thu, 25 Jan 2024 01:40:56 -0000 https://migration.betteruptime.com/incident/336170#45192fe0bbee75b4a97d3a3c727685e50f486f413f3f1ea4b54b2bb5cedf4d0d We are currently investigating this issue Metrics are degraded for some models https://migration.betteruptime.com/incident/336136 Thu, 25 Jan 2024 01:40:56 -0000 https://migration.betteruptime.com/incident/336136#cf586fe5f258b543b583dc7acdeda2990699be04c20132288f7d07e0faa10047 We are currently investigating this issue Logs are not showing up in the UI for some models https://migration.betteruptime.com/incident/336171 Tue, 23 Jan 2024 21:02:50 -0000 https://migration.betteruptime.com/incident/336171#9a1e9325e9dd0337c83facf752f3466ade0b70c1cd852f9d380126f0b8b94c35 This incident has been resolved. Logs are not showing up in the UI for some models https://migration.betteruptime.com/incident/336137 Tue, 23 Jan 2024 21:02:50 -0000 https://migration.betteruptime.com/incident/336137#219d2fd1880517f2e9462d844aa6e2188bdaac4a24e5a2943f7754f1c9300f7e This incident has been resolved. Logs are not showing up in the UI for some models https://migration.betteruptime.com/incident/336171 Tue, 23 Jan 2024 20:53:00 -0000 https://migration.betteruptime.com/incident/336171#dc421412a04316713fa0943e875329329ccb2a6aa1d73db739f3680d3f6fa92e A fix has been implemented and we are monitoring the results. Logs are not showing up in the UI for some models https://migration.betteruptime.com/incident/336137 Tue, 23 Jan 2024 20:53:00 -0000 https://migration.betteruptime.com/incident/336137#f7233c219c7fbf1b4107658c236f63d2e40477f30a26f060535d8916fcf3e97f A fix has been implemented and we are monitoring the results. Logs are not showing up in the UI for some models https://migration.betteruptime.com/incident/336171 Tue, 23 Jan 2024 20:32:13 -0000 https://migration.betteruptime.com/incident/336171#ac1e2c912302da8467204a06baa35dd6f636c655607df41ea45ff791ae28b3e6 Some model replicas are not showing logs in the UI since around 7h30 AM PST. The issue has been identified and a fix is being implemented. Logs are not showing up in the UI for some models https://migration.betteruptime.com/incident/336137 Tue, 23 Jan 2024 20:32:13 -0000 https://migration.betteruptime.com/incident/336137#5413419b45cb989fb3cf13aba3ae270ddf44e6e17b113e974c8fcd34bc7b657f Some model replicas are not showing logs in the UI since around 7h30 AM PST. The issue has been identified and a fix is being implemented. Degraded performance for some models using A100s https://migration.betteruptime.com/incident/336172 Mon, 08 Jan 2024 06:27:26 -0000 https://migration.betteruptime.com/incident/336172#88fb07ba7b0e00eee2fa8a13b3e6cdb0c8ab6e1e9a3952ff2e7f5c72709fc0a0 A100s are fully operational again. Degraded performance for some models using A100s https://migration.betteruptime.com/incident/336138 Mon, 08 Jan 2024 06:27:26 -0000 https://migration.betteruptime.com/incident/336138#c59e66402bb483dd53a0d0977ebabbef56afc150330f413f2ba171b37c530c12 A100s are fully operational again. Degraded performance for some models using A100s https://migration.betteruptime.com/incident/336172 Mon, 08 Jan 2024 06:01:53 -0000 https://migration.betteruptime.com/incident/336172#55c71197b3a5879b581b4608f4e954aeefd0083cb82d361f9dcfc4603f04aabf Certain models on A100s are seeing degraded performance in inference and scaling up. We're actively investigating. Degraded performance for some models using A100s https://migration.betteruptime.com/incident/336138 Mon, 08 Jan 2024 06:01:53 -0000 https://migration.betteruptime.com/incident/336138#afded04a72a8ee16544cb42bd8c5e808a5d02bd2f0780b85910a8e5fda9ff130 Certain models on A100s are seeing degraded performance in inference and scaling up. We're actively investigating. Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336173 Fri, 08 Dec 2023 16:24:05 -0000 https://migration.betteruptime.com/incident/336173#fd5660fc4d148a787b0fba0dec80ad2b5ba8987de6b9eb926ccf8769e8f5f4d4 This incident has been resolved. Start time for replicas using A100 GPUs is back to normal Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336139 Fri, 08 Dec 2023 16:24:05 -0000 https://migration.betteruptime.com/incident/336139#367c2bff074f388ffeae90d3031e70f19daefcf44544d76ce1e6160b92d6ee3e This incident has been resolved. Start time for replicas using A100 GPUs is back to normal Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336173 Fri, 08 Dec 2023 16:01:43 -0000 https://migration.betteruptime.com/incident/336173#c67d161609e3a4d502b020cc7d548ad3cd5e747aecc554b661daf34586513e5c A fix has been implemented and we are monitoring the results. Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336139 Fri, 08 Dec 2023 16:01:43 -0000 https://migration.betteruptime.com/incident/336139#285f36297267ac89d9672498032440af3d057d062489809a1283012f902c1527 A fix has been implemented and we are monitoring the results. Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336173 Fri, 08 Dec 2023 15:08:28 -0000 https://migration.betteruptime.com/incident/336173#a8751582d442c4159099d40ac498f1a1a32e5df2f40193f11d2c75c2a9a2a767 We are rolling out a fix and are seeing improvement in A100 start times. Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336139 Fri, 08 Dec 2023 15:08:28 -0000 https://migration.betteruptime.com/incident/336139#84acc6506af6369080db06621c136e91cdb3f45ed503899dbffb476e136417f1 We are rolling out a fix and are seeing improvement in A100 start times. Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336173 Fri, 08 Dec 2023 14:46:19 -0000 https://migration.betteruptime.com/incident/336173#c7ea31355e39af008fb806792d18d779848ffacda28ed47ffba9922ff7d35920 The issue has been identified and a fix is being implemented. Increased delay in starting replicas using A100 GPUs https://migration.betteruptime.com/incident/336139 Fri, 08 Dec 2023 14:46:19 -0000 https://migration.betteruptime.com/incident/336139#e0b320900eb1354045009fd1ba354f06c6a017dc5df1a03a1efbb4064c3a0084 The issue has been identified and a fix is being implemented. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336174 Wed, 13 Sep 2023 21:24:09 -0000 https://migration.betteruptime.com/incident/336174#40f1a16110dafea83dcd443ff6c822ea02af1bfac1a806000ecc116dc15d7f36 The issue is resolved. Model metrics are being tracked and displayed on-time. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336140 Wed, 13 Sep 2023 21:24:09 -0000 https://migration.betteruptime.com/incident/336140#c4a5df0cb80bb43344ed269f2c7a1493d6a45cff9f027a306af1eb81da226b00 The issue is resolved. Model metrics are being tracked and displayed on-time. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336174 Wed, 13 Sep 2023 21:03:54 -0000 https://migration.betteruptime.com/incident/336174#bb5b0742518bcc78aaae04a254d3d5d62ee17e86ca37ee0940daa600397f59f9 Some models' metrics dashboards are missing the last 5 minutes worth of data. Model inference API and performance is *not* impacted. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336140 Wed, 13 Sep 2023 21:03:54 -0000 https://migration.betteruptime.com/incident/336140#d3dfb8eaf82ed236fb861f6775459e485b8b195f0a9daebac04e859f120b4ea0 Some models' metrics dashboards are missing the last 5 minutes worth of data. Model inference API and performance is *not* impacted. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336175 Wed, 13 Sep 2023 06:24:06 -0000 https://migration.betteruptime.com/incident/336175#935b677bfe11ee2e4c46769376d9054bb8a8aa84c93b4b55d6415e3037d5ecda The issue is resolved. Model metrics are being tracked and displayed on-time. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336141 Wed, 13 Sep 2023 06:24:06 -0000 https://migration.betteruptime.com/incident/336141#495093e6d5ff776823958ebbe31d77b4fc10adf022b2f01ebd4457a4a417d9d8 The issue is resolved. Model metrics are being tracked and displayed on-time. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336175 Wed, 13 Sep 2023 06:07:26 -0000 https://migration.betteruptime.com/incident/336175#e02a05ae57865feb2be58dca32e1e298088573caf4dce08e013fb5671166d258 Some models' metrics dashboards are missing the last 20 minutes worth of data. Model inference API and performance is *not* impacted. Model metrics UI is missing recent data https://migration.betteruptime.com/incident/336141 Wed, 13 Sep 2023 06:07:26 -0000 https://migration.betteruptime.com/incident/336141#d53f6aa81fcda4918da9fbbc636c5ce99935efcc7e7741f9426dcf69cf72af3a Some models' metrics dashboards are missing the last 20 minutes worth of data. Model inference API and performance is *not* impacted. Slower than usual model deployment https://migration.betteruptime.com/incident/336176 Tue, 12 Sep 2023 23:28:27 -0000 https://migration.betteruptime.com/incident/336176#12ca122b4224b7c2af99a230c52f31292a00f8f60a872f21d916d2e1398da0af The issue has been fixed. Model deploys are fast again. Slower than usual model deployment https://migration.betteruptime.com/incident/336142 Tue, 12 Sep 2023 23:28:27 -0000 https://migration.betteruptime.com/incident/336142#141bd349ac5ac781ddb4942e37198747bff265efd2c682b4995cf2b6afd91552 The issue has been fixed. Model deploys are fast again. Slower than usual model deployment https://migration.betteruptime.com/incident/336176 Tue, 12 Sep 2023 23:15:53 -0000 https://migration.betteruptime.com/incident/336176#b8f1acab99491a06bd5b5178ba37086d3290f6966d0105763e75c99c6fcaa6cf Models deployed in the past 20 minutes are experiencing longer than usual deployment times; ~10 minutes longer than what's typical given the model size. Slower than usual model deployment https://migration.betteruptime.com/incident/336142 Tue, 12 Sep 2023 23:15:53 -0000 https://migration.betteruptime.com/incident/336142#93e43ec0da64e36dea63256e3fb24cef1bafb9e8e6ff7da5cb4f5edc80a31643 Models deployed in the past 20 minutes are experiencing longer than usual deployment times; ~10 minutes longer than what's typical given the model size. Error when deploying new models that need *multiple* GPUs https://migration.betteruptime.com/incident/336177 Tue, 30 May 2023 22:29:52 -0000 https://migration.betteruptime.com/incident/336177#11ae364b78c8728a7cad0b2eea117ee39d43752a830b3be02d0d97e7dc774b60 The issue has been resolved. Error when deploying new models that need *multiple* GPUs https://migration.betteruptime.com/incident/336143 Tue, 30 May 2023 22:29:52 -0000 https://migration.betteruptime.com/incident/336143#96a75b9c07e403ce2bccfb034654cbc7f02d969b773a02741b64fae11883f840 The issue has been resolved. Error when deploying new models that need *multiple* GPUs https://migration.betteruptime.com/incident/336177 Tue, 30 May 2023 20:28:37 -0000 https://migration.betteruptime.com/incident/336177#812e5f4d5e56de6ef233348361dd307b26d5ec8fabbcdf2301d8ec574fd0bab0 Deploying new models using multiple GPUs (e.g. 2xA100, 2xA10G) is currently broken. Should have a fix within a few minutes. Error when deploying new models that need *multiple* GPUs https://migration.betteruptime.com/incident/336143 Tue, 30 May 2023 20:28:37 -0000 https://migration.betteruptime.com/incident/336143#feb33f6ccc075501c7ea8bc096fa7086b51041f2292761f224702d6d1117b4da Deploying new models using multiple GPUs (e.g. 2xA100, 2xA10G) is currently broken. Should have a fix within a few minutes. Delay in deploying and activating models https://migration.betteruptime.com/incident/336178 Tue, 16 May 2023 00:45:01 -0000 https://migration.betteruptime.com/incident/336178#025abc8a68065c990417c78a56bc87f81c0b3fd2f11a259975561b44c64ea7d2 This incident has been resolved. Delay in deploying and activating models https://migration.betteruptime.com/incident/336144 Tue, 16 May 2023 00:45:01 -0000 https://migration.betteruptime.com/incident/336144#45a616a811fff67f2c955fae91597f146002e1c827bfe7a48e0a15e13dd8734f This incident has been resolved. Delay in deploying and activating models https://migration.betteruptime.com/incident/336178 Tue, 16 May 2023 00:44:22 -0000 https://migration.betteruptime.com/incident/336178#82d6125e2297e48af1fd1317d85edb032364289007e19d1886ff65d325449539 Deploying new and reactivating old models are back to being timely. Delay in deploying and activating models https://migration.betteruptime.com/incident/336144 Tue, 16 May 2023 00:44:22 -0000 https://migration.betteruptime.com/incident/336144#5a7b0772ca8a227766a5db2efbd6e3e9c054f1d9ff4b13b0864204ee1c5c4031 Deploying new and reactivating old models are back to being timely. Delay in deploying and activating models https://migration.betteruptime.com/incident/336178 Tue, 16 May 2023 00:12:40 -0000 https://migration.betteruptime.com/incident/336178#dc78af81dc937ade40f74e2101b3760bd33bea95aa0407a65a96d4e68724ef50 We're experiencing delays in deploying new models and activating deactivated models. Delay in deploying and activating models https://migration.betteruptime.com/incident/336144 Tue, 16 May 2023 00:12:40 -0000 https://migration.betteruptime.com/incident/336144#443ac36ead87d90a8fdb153022005983e7425b8e2955f3f93f96ed42a4481bfe We're experiencing delays in deploying new models and activating deactivated models. Model metrics UI is not rendering https://migration.betteruptime.com/incident/336179 Mon, 01 May 2023 00:19:59 -0000 https://migration.betteruptime.com/incident/336179#a5cbdcc07f66b9af59fc3fd4797883644d9186906043431e20bd183d427f07a3 All model metrics are displayed again. There was no data loss during this incident. Model metrics UI is not rendering https://migration.betteruptime.com/incident/336145 Mon, 01 May 2023 00:19:59 -0000 https://migration.betteruptime.com/incident/336145#94095c37325b65785f156f62ff3726aa330a67e2b401180c797277e9433d8490 All model metrics are displayed again. There was no data loss during this incident. Model metrics UI is not rendering https://migration.betteruptime.com/incident/336179 Mon, 01 May 2023 00:03:52 -0000 https://migration.betteruptime.com/incident/336179#12ac07f5d6a9d73fa81a84e9a1a789bbd8a37c046534cae53f1076bd482c8366 Model metrics UI (e.g. response time graph, GPU/memory usage graph, etc) are not being displayed. We are investigating. Model metrics UI is not rendering https://migration.betteruptime.com/incident/336145 Mon, 01 May 2023 00:03:52 -0000 https://migration.betteruptime.com/incident/336145#5db9ea568aa85a0932be147d7b6cc159b622673cb349669e95f4cda529a62a80 Model metrics UI (e.g. response time graph, GPU/memory usage graph, etc) are not being displayed. We are investigating. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336180 Fri, 29 Jul 2022 20:15:57 -0000 https://migration.betteruptime.com/incident/336180#99c5e0f653ab1baf15f3a185b88ab8123852c37a58756814a0bce9919e37fb49 This incident has been resolved. Traffic has trippled in the past hours, and the number of errors went up at the same time. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336146 Fri, 29 Jul 2022 20:15:57 -0000 https://migration.betteruptime.com/incident/336146#b8ab5e23baf74134fe1dc94792ebdece0245178bd52f165f906487d2bd7121bb This incident has been resolved. Traffic has trippled in the past hours, and the number of errors went up at the same time. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336180 Fri, 29 Jul 2022 20:11:50 -0000 https://migration.betteruptime.com/incident/336180#46131ecc3409e4d87877f192c777c0de52f70a38ee200b82f98b5e4e5452b803 A fix has been implemented and we are monitoring the results. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336146 Fri, 29 Jul 2022 20:11:50 -0000 https://migration.betteruptime.com/incident/336146#a6f056ac834f7281751c42b8e73367c09cc6fe49ec5f723c61c851af0a425107 A fix has been implemented and we are monitoring the results. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336180 Fri, 29 Jul 2022 20:01:44 -0000 https://migration.betteruptime.com/incident/336180#395737b18114dd40d1b160ef3322bf303610d537cc1a11262f3d41a1be5ea541 The issue has been identified and a fix is being implemented. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336146 Fri, 29 Jul 2022 20:01:44 -0000 https://migration.betteruptime.com/incident/336146#dd5379dea67994fd772187396efd58dbe8a929cbc80494c4cd5e89fc8ad7743c The issue has been identified and a fix is being implemented. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336180 Fri, 29 Jul 2022 19:48:09 -0000 https://migration.betteruptime.com/incident/336180#dcd6484d2c55f1c64654e39ff284fef7e1195de44517fc358fe3684edacc9964 We are currently investigating this issue. Baseten is currently slow and error rate is up https://migration.betteruptime.com/incident/336146 Fri, 29 Jul 2022 19:48:09 -0000 https://migration.betteruptime.com/incident/336146#5f841b6defdb17c9b315ada9e02e0f637dc26b130a5e88e5f5f754f7e19d5193 We are currently investigating this issue. Degraded performance https://migration.betteruptime.com/incident/336181 Fri, 04 Mar 2022 19:39:39 -0000 https://migration.betteruptime.com/incident/336181#45cc93349ae52e07bc4eeac572b4ee136d5ce56747611ed67cab4f66469c895a This incident has been resolved. Degraded performance https://migration.betteruptime.com/incident/336147 Fri, 04 Mar 2022 19:39:39 -0000 https://migration.betteruptime.com/incident/336147#aca1f8119af2f968bc5dce4ae4d12d12c2b05195da8394cc88978a6974742228 This incident has been resolved. Degraded performance https://migration.betteruptime.com/incident/336181 Fri, 04 Mar 2022 19:05:08 -0000 https://migration.betteruptime.com/incident/336181#495cc9ed0c74f87ae2f6b599ca4bfd82151e48acfa9c3bd9aa9fe37ea4c5ebac A fix has been implemented and we are monitoring the results. Degraded performance https://migration.betteruptime.com/incident/336147 Fri, 04 Mar 2022 19:05:08 -0000 https://migration.betteruptime.com/incident/336147#6c322479d27881f00a6d350cda40848680e88eeb48e2089b4d3b9274f06a365f A fix has been implemented and we are monitoring the results. Degraded performance https://migration.betteruptime.com/incident/336181 Fri, 04 Mar 2022 18:23:23 -0000 https://migration.betteruptime.com/incident/336181#da33ff9b412fb839d689c668f0f2812722f44b1d3138d2091c5749efa8d31515 We are getting errors with our internal stack which degrades applications perfomance and makes it return some error Degraded performance https://migration.betteruptime.com/incident/336147 Fri, 04 Mar 2022 18:23:23 -0000 https://migration.betteruptime.com/incident/336147#9c882b3e1bd47371e778593670cf0200f5c2fb8a31167057da9cff9db3a46030 We are getting errors with our internal stack which degrades applications perfomance and makes it return some error