summaryrefslogtreecommitdiff
path: root/ironic_python_agent
Commit message (Collapse)AuthorAgeFilesLines
* Make WSGI server respect listen_* directivesbugfix/6.2-eolbugfix/6.2Jay Faulkner2020-09-022-7/+51
| | | | | | | | | | | | The listen_port and listen_host directives are intended to allow deployers of IPA to change the port and host IPA listens on. These configs have not been obeyed since the migration to the oslo.service wsgi server. Story: 2008016 Task: 40668 Change-Id: I76235a6e6ffdf80a0f5476f577b055223cdf1585 (cherry picked from commit 7d0ad36ebd350a7162bc3c33bbefd26b9e962a78)
* Fix bootloader install issue with MDRAIDDoug Szumski2020-08-132-6/+45
| | | | | | | | | | | | | | | When no root_device hint is set, an MDRAID partition can be incorrectly selected as the root device which causes installation of the bootloader to the physical disks behind the MDRAID volume to fail. See the notes in the referenced Story for more detail. This change adds a little more specificity to the listing of block devices. Change-Id: I66db457e71a0586723ee753bef961aec5bf58827 Story: 2007905 Task: 40303 (cherry picked from commit 5e95b1321d6e4fe5c562092d0baba73ad6d5303e)
* Merge "Ignore devices with size 0 when collecting inventory" into bugfix/6.2Zuul2020-08-132-5/+19
|\
| * Ignore devices with size 0 when collecting inventoryDmitry Tantsur2020-08-102-5/+19
| | | | | | | | | | | | | | | | | | | | delete_configuration still fetches all devices as it needs to clean ones with broken RAID. Story: #2007907 Task: #40307 Change-Id: I4b0be2b0755108490f9cd3c4f3b71a5e036761a1 (cherry picked from commit 1f3b70c4e968464a93ea68e6f64c4836b90446de)
* | Fix TypeError on agent lookup failureJulia Kreger2020-08-072-4/+75
|/ | | | | | | | | | | | | | | | | | Agent lookups can fail as we presently use logging.exception, better known in our code as LOG.exception, which can also generate other fun issues on journald based systems where additional errors could be raised resulting in us being unable to troubleshoot the the actual issue. Because of the mis-use of LOG.exception and the default behavior of the backoff retry handler, the retry logic was also not functional as any error no matter how small caused IPA to just exit. Change-Id: Ic4608b7c6ff9773d1403926efb3d59869c71343b Story: 2007968 Task: 40465 (cherry picked from commit 5eab9bced63b2b9a6753cbbf594dda7ef9d03a3a)
* Refactor part of image module6.2.0Riccardo Pittau2020-07-071-27/+32
| | | | | | | Shuffle some functions around and reduce size of _is_bootloader_loaded moving logic out to a new function. Change-Id: I9c10bf05186dcebb37f175d61bf4ac9ff86b6510
* Merge "Limit Inspection->Lookup->Heartbeat lag"Zuul2020-07-067-6/+66
|\
| * Limit Inspection->Lookup->Heartbeat lagJulia Kreger2020-07-037-6/+66
| | | | | | | | | | | | | | | | | | | | Caches hardware information collected during inspection so that the initial lookup can occur without any delay. Also adds logging to track how long inventory collection takes. Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com> Change-Id: I3e0d237d37219e783d81913fa6cc490492b3f96a
* | Merge "Fix serializing ironic-lib exceptions"Zuul2020-07-062-0/+33
|\ \ | |/ |/|
| * Fix serializing ironic-lib exceptionsDmitry Tantsur2020-07-022-0/+33
| | | | | | | | | | | | Change-Id: If1408e4b81d263c56b4bbab618dd0737db5f762e Story: #2007889 Task: #40268
* | Increase the ESP partition size to 550 MiB when using software RAIDDmitry Tantsur2020-07-024-16/+21
| | | | | | | | | | | | | | This has been a popular guidance, and diskimage-builder has recently started following it. Change-Id: I794c846fb191c15b0a30546bf64d624dfbde0fd4
* | Merge "Mount all vfat partitions before calling grub2"Zuul2020-07-022-6/+73
|\ \ | |/ |/|
| * Mount all vfat partitions before calling grub2Arne Wiebalck2020-06-302-6/+73
| | | | | | | | | | | | | | | | | | In order to ensure grub2 finds all files it needs, mount all vfat partitions specified in the deployed image. Story: #2007618 Task: #39629 Change-Id: Ie5b6e0abc3f266409562f9ecb26538126b667056
* | Fixes minor issues in the read() retries patchDmitry Tantsur2020-06-302-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | Follow-up to commit c5b97eb781cf9851f9abe87a1500b4da55b8bde8. Two things slipped through the cracks: * ImageDownloadError was instantiated incorrectly, resulting in a wrong error message. This was uncovered by using assertRaisesRegext in tests. * We allowed calling write(None). This was uncovered by avoiding sleep(4) in tests and enabling more failed calls before timeout. Change-Id: If5e798c5461ea3e474a153574b0db2da96f2dfa8
* | Merge "Fix confusing logging when running asynchronous commands"Zuul2020-06-292-12/+14
|\ \
| * | Fix confusing logging when running asynchronous commandsDmitry Tantsur2020-06-262-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | We log them as completed when they start executing. Also fix a problem in remove_large_keys that prevented items with defaultdict from being logged. Change-Id: I34a06cc85f55c693416f8c4c9877d55d6affafc9
* | | Merge "Extend retries to 9, 10 seconds apart."Zuul2020-06-292-4/+8
|\ \ \
| * | | Extend retries to 9, 10 seconds apart.Julia Kreger2020-06-232-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The download retry interval was previously five seconds which is not long enough to recover after a hard network connectivity break where we may be reliant upon network port forwarding hold-down timers or even routing protocol route propogation to recover communication. Previously the time value was 5 seconds, with 3 attempts, meaning 15 seconds total ignoring the error detection timeouts. Now it is 10 seconds, with 10 attempts, meaning 100 seconds before the error detection timeouts. Change-Id: I6d11edc9a3156f2bdc21c3d432ecc7625d652699
* | | | Add debug message to node lookupRiccardo Pittau2020-06-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This should help identify the start of the node lookup. Change-Id: I72f0949fee84be5a2b06eab976c5560e252fa63a
* | | | Merge "Minor clean-up follow-up to timeout on read() fix"Zuul2020-06-252-8/+3
|\ \ \ \ | |/ / / |/| | |
| * | | Minor clean-up follow-up to timeout on read() fixJulia Kreger2020-06-242-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | Just some minor cleanup driven from the review process. Change-Id: I0b3d73c251d6da6d85e11279990dcc36751e27e7
* | | | Add full download retriesJulia Kreger2020-06-232-29/+77
|/ / / | | | | | | | | | | | | | | | | | | | | | Instead of just trying to get the connection and handler for the download, lets try to retry the whole action of of downloading. Change-Id: I9217792d32e6f33c70f146a9b7d3ef58c5644d8a
* | | Add timeout operations to try and prevent hang on read()Julia Kreger2020-06-232-1/+83
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Socket read operations can be blocking and may not timeout as expected when thinking of timeouts at the beginning of a socket request. This can occur when streaming file contents down to the agent and there is a hard connectivity break. In other words, we could be in a situation like: - read(fd, len) - Gets data - Select returns context to the program, we do things with data. ** hard connectivity break for next 90 seconds** - read(fd, len) - We drain the in-memory buffer side of the socket. - Select returns context, we do things with our remaining data ** Server retransmits ** ** Server times out due to no ack ** ** Server closes socket and issues a FIN,RST packet to the client ** ** Connectivity restored, Client never got FIN,RST ** ** Client socket still waiting for more data ** - read(fd, len) - No data returned - Select returns, yet we have no data to act on as the buffer is empty OR the buffered data doesn't meet our requried read len value. tl;dr noop - read(fd, len) <-- We continue to try and read until the socket is recognized as dead, which could be a long time. NOTE: The above read()s are python's read() on an contents being streamed. Lower level reads exist, but brains will hurt if we try to cover the dynamics at that level. As such, we need to keep an eye on when the last time we received a packet, and treat that as if we have timed out or not. Requests periodically yeilds back even when no data has been received, in order to allow the caller to wall clock the progress/status and take appropriate action. When we exceed the timeout time value with our wall clock, we will fail the download. Change-Id: I7214fc9dbd903789c9e39ee809f05454aeb5a240
* | Merge "Add a deploy step for writing an image"Zuul2020-06-204-7/+56
|\ \
| * | Add a deploy step for writing an imageDmitry Tantsur2020-06-024-7/+56
| | | | | | | | | | | | | | | | | | | | | The new step just invokes the appropriate method of the standby extension. Change-Id: Ic74f83ab2b7e58f8e4b46e0abfab79e221afeb3e Story: 2006963
* | | Make get_partition_uuids work with whole disk imagesDmitry Tantsur2020-06-172-52/+90
| | | | | | | | | | | | | | | | | | | | | We used to popular root UUID inside the message formatting function, move it to actual prepare_image/cache_image calls. Change-Id: Ifb22220dfd49633e8623dd76f7a6a128f5874b78
* | | Merge "Split and move logic for partition tables"Zuul2020-06-114-17/+49
|\ \ \
| * | | Split and move logic for partition tablesRiccardo Pittau2020-05-254-17/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move and split the logic to create the partition tables when applying raid configuration. Change-Id: Ic76dd2067ace02dd02351caca0c7f9b05571e510
* | | | Merge "New extension call to return partition UUIDs"Zuul2020-06-092-0/+20
|\ \ \ \ | | |/ / | |/| |
| * | | New extension call to return partition UUIDsDmitry Tantsur2020-06-022-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we parse the success message from the write_image call. This is inconvenient and incompatible with the deploy steps split. Change-Id: I258dc1ff1ad1c9df5cbc26a7825d9e7ef2f3205b Story: #2006963
* | | | Make the install_bootloader command asynchronousDmitry Tantsur2020-06-082-12/+13
|/ / / | | | | | | | | | | | | | | | | | | | | | It does not return anything, so it makes no point for it to be synchronous. Ironic always calls it with wait=True, so there is no problem with backward compatibility either. Change-Id: I44fec2e0cb54486328ce71263613d8592e384870
* | | Fix an issue with high cpu usage caused by ironic-python-agentFedor Tarasenko2020-05-251-1/+1
|/ / | | | | | | | | | | | | | | | | | | Currently running of ipa-centos8-stable-ussuri image causes 100% cpu usage while cleaning. Proposed change fixes this behavior and significantly speeds up cleaning. Change-Id: I2ba9a69f22b11830d8ff1bc346b17bf1a52f25b0 Story: #2007696 Task: #39809
* | Merge "Fix pep8 errors"Zuul2020-05-134-592/+610
|\ \
| * | Fix pep8 errorsRiccardo Pittau2020-05-124-592/+610
| |/ | | | | | | | | | | | | | | For some reason pep8 test started to complain causing mayhem. This patch fixes the issues and does some refactor of dmi_inspector tests moving pure data to a separate file. Change-Id: Ia244a496acd80abad679f8ae9832d4f0471500e7
* | Fix TypeError with newer version of lshwRiccardo Pittau2020-04-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | The issue with json output in lshw was fixed in version B.02.19 This patch makes the memory calculation compatible with that version and later versions that are included in recent distributions (e.g. Ubuntu 20.04, Fedora 31) Change-Id: Id5a30028b139c51cae6232cac73a50b917fea233 Story: 2007588 Task: 39527
* | Add function to calculate memoryRiccardo Pittau2020-04-271-16/+22
|/ | | | | | Move logic to calculate memory to its own function. Change-Id: I5ab98b6450ff45dff35ddae093a83140f37047a8
* Add timeout and retries when connection to an image server6.1.0Dmitry Tantsur2020-04-243-18/+107
| | | | | | | | If the server is stuck for any reason, the download will hang for a potentially long time. Provide a timeout (defaults to 60 seconds) and 2 retries on failure. Change-Id: Ie53519266edd914fdbfa82fe52b4a55151e5ec5f
* Merge "Add raid.apply_configuration deploy step"Zuul2020-04-213-12/+70
|\
| * Add raid.apply_configuration deploy stepDmitry Tantsur2020-04-203-12/+70
| | | | | | | | | | | | | | | | For compatibility with out-of-band RAID deploy steps, we need to have one apply_configuration step, not a create/delete pair. Change-Id: I55bbed96673c9fa247cafdac9a3ade3a6ff3f38d Story: #2006963
* | Merge "Simplify deduplicate_steps"Zuul2020-04-211-16/+5
|\ \
| * | Simplify deduplicate_stepsDmitry Tantsur2020-04-061-16/+5
| | | | | | | | | | | | | | | | | | The same result can be achieved using a multi-component sorting key. Change-Id: Ieacf9fcecb2a6de7b4ccd8889f789099af39aa37
* | | Mock get_node_boot_mode in software RAID unit testsDmitry Tantsur2020-04-201-0/+3
| |/ |/| | | | | | | | | | | This function checks for /sys/firmware/efi. Some tests do not mock isdir, so they fail on UEFI machines. Change-Id: I088218ddb88717ac07669d0b97c6cd50208ede8c
* | Use unittest.mock instead of third party mockSean McGinnis2020-04-181-1/+1
| | | | | | | | | | | | | | | | Now that we no longer support py27, we can use the standard library unittest.mock module instead of the third party mock lib. Change-Id: I5fdb2a02ee83c692d46cbe28266fcae033bec6f6 Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
* | Merge "A boot partition on a GPT disk should be considered an EFI partition"Zuul2020-04-162-70/+53
|\ \
| * | A boot partition on a GPT disk should be considered an EFI partitionDmitry Tantsur2020-04-152-70/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DIB builds instance images with EFI partitions that only have the boot flag, but not esp. According to parted documentation, boot is an alias for esp on GPT, so accept it as well. To avoid complexities when parsing parted output, the implementation is switched to existing utils and ironic-lib functions. Change-Id: I5f57535e5a89528c38d0879177b59db6c0f5c06e Story: #2007455 Task: #39423
* | | Fix the token logic to be compatible with older ironicDmitry Tantsur2020-04-151-3/+1
|/ / | | | | | | | | | | | | | | | | Currently we fail with HTTP 401 if both the known and the received tokens are None. This prevents IPA from being updated before ironic. Story: #2007557 Task: #39419 Change-Id: I80249bd3468b581dc035d72156cbfa2f5f225a1b
* | Merge "Move minimum ironic version to latest ocata"Zuul2020-04-151-1/+1
|\ \
| * | Move minimum ironic version to latest ocataRiccardo Pittau2020-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | All other API versions from releases before that are not supported anymore. Change-Id: I49fb3e4facdec42a4dab343c46a84f3cba6d2b7c
* | | Merge "Move logic to calculate raid sectors to raid_utils"Zuul2020-04-132-17/+32
|\ \ \
| * | | Move logic to calculate raid sectors to raid_utilsRiccardo Pittau2020-04-092-17/+32
| | | | | | | | | | | | | | | | | | | | | | | | Some more raid related logic moved to raid_utils. Change-Id: I08c73ad14e5b01ebac2490b83997c5452506d4a2