summaryrefslogtreecommitdiff
path: root/lib/libalpm/be_sync.c
Commit message (Collapse)AuthorAge
* Fix segfaults on opening invalid archive filesDan McGee2012-01-08
| | | | | | | | | | | | | | | "invalid" in this case simply means files that may or may not be archives. Discovered via a `pacman -Sc` operation with delta files in the package cache directory, but can be triggered if any file is passed to `pacman -Ql` that isn't an archive, for instance, or if the sync database file is not an archive. Fix it up so we are more careful about calling archive_read_finish() only on archives that are valid and have not already been closed, and teach our archive open function to set the returned archive to NULL if we aren't going to be returning something valid anyway. Signed-off-by: Dan McGee <dan@archlinux.org>
* Update sync database size estimation valuesDan McGee2012-01-07
| | | | | | As the comment states, this is more like a dartboard than science. Signed-off-by: Dan McGee <dan@archlinux.org>
* pkghash improvements/modificationsDan McGee2012-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | This patch changes a variety of small things related to our pkghash implementation with an eye toward performance, especially on native 32-bit systems. * Use `unsigned int` rather than `size_t` for hash sizes. We already return ERANGE for any attempted creation of a hash greater than 1 million elements, so unsigned int is more than large enough for our purposes. Switching to this type allows 32 bit systems to do native math without helper functions from libgcc. * _alpm_pkghash_create() now internally adds extra padding for additional array elements, rather than that being the responsibility of the caller. * #define values are moved into static const values in pkghash.c; a new `stride` value is also extracted (but remains set at 1). * Division and modulus operators are removed from the normal find and add paths if possible. We store the upper limit of the number of elements in the hash so we no longer need to calculate this every element addition. When doing wraparound position calculations, we only apply the modulus operator if the value is greater than the number of buckets. Signed-off-by: Dan McGee <dan@archlinux.org>
* Minor optimization on likely sync package lookupDan McGee2012-01-02
| | | | | | | We have a name_hash value here, so add a cheap compare of it before falling to the strcmp() call. Signed-off-by: Dan McGee <dan@archlinux.org>
* Only compile delta regex onceDan McGee2012-01-02
| | | | | | | | | | | | | | | | | This reduces the number of regcomp() calls when parsing delta entries in the database from once per entry to once for the entire context handle by storing the compiled regex data on the handle itself. Just as we do with the cURL handle, we initialize it the first time it is needed and free it when releasing the handle. A few other small tweaks to the parsing function also take place, including using the stack to store the transient and short file size string while parsing it. When parsing a sync database with 1378 delta entries, this reduces the time of a `pacman -Sl deltas` operation by 50% from 0.22s to 0.12s. Signed-off-by: Dan McGee <dan@archlinux.org>
* include config.h via MakefilesDave Reisner2011-12-21
| | | | | | | | | | | | | | | Ensures that config.h is always ordered correctly (first) in the includes. Also means that new source files get this for free without having to remember to add it. We opt for -imacros over -include as its more portable, and the added constraint by -imacros doesn't bother us for config.h. This also touches the HACKING file to remove the explicit mention of config.h as part of the includes. Signed-off-by: Dave Reisner <dreisner@archlinux.org> Signed-off-by: Dan McGee <dan@archlinux.org>
* Add helper method for creating and opening archive objectDan McGee2011-11-16
| | | | | | | | | | | | | This moves the common setup code of about 5 different callers into one method. Error messages will now be common and shared in all places; several paths did not have any messages at all before. In addition, we now pick an ideal block size for the archive read based off the larger value of our default buffer size or the st.st_blksize field. For a filesystem such as NFS, this is often much larger than the default 8192- values such as 32768 and 131072 are common. Signed-off-by: Dan McGee <dan@archlinux.org>
* Convert package and database archive reads to use file descriptorsDan McGee2011-11-01
| | | | | | | | | | | | | | This gives us a bit more control and over the archive reading process, and a bit less is done behind the scenes. It also allows us to use fstat() in preference to stat(), which should avoid some potential race conditions. Some reorganization is necessary to move the stat calls after the open() calls. Error handling and cleanup in general is also improved, as we had several potential memory and file handle leaks before in some error paths. Signed-off-by: Dan McGee <dan@archlinux.org>
* Introduce ALPM_BUFFER_SIZE constantDan McGee2011-10-27
| | | | | | | | | | | | | | This takes the place of three previously used constants: ARCHIVE_DEFAULT_BYTES_PER_BLOCK, BUFFER_SIZE, and CPBUFSIZE. In libarchive 3.0, the first constant will be no more, so we can ensure we are forward-compatible by removing our usage of it now. The rest are unified for consistency. By default, we will use the value of BUFSIZ provided by <stdio.h>, which is 8192 on Linux. If that is undefined, a default value is provided. Signed-off-by: Dan McGee <dan@archlinux.org>
* Refactor download payload reset and freeDan McGee2011-09-28
| | | | | | | | | | | | | | | This was done to squash a memory leak in the sync database download code. When we downloaded a database and then reused the payload struct, we could find ourselves calling get_fullpath() for the signatures and overwriting non-freed values we had left over from the database download. Refactor the payload_free function into a payload_reset function that we can call that does NOT free the payload itself, so we can reuse payload structs. This also allows us to move the payload to the stack in some call paths, relieving us of the need to alloc space. Signed-off-by: Dan McGee <dan@archlinux.org>
* be_sync: fetch only entry mode, not full stat structDan McGee2011-09-27
| | | | | | | This saves a lot of unnecessary work since we don't need any of the other fields in the stat struct. Signed-off-by: Dan McGee <dan@archlinux.org>
* Ensure database validity flags are set in invalid signature caseDan McGee2011-09-22
| | | | | | | We returned the right error code but never set the flags accordingly. Also, now that we can bail early, ensure we set the error code. Signed-off-by: Dan McGee <dan@archlinux.org>
* signing: add a process and retry loop for database signaturesDan McGee2011-09-20
| | | | | | | | | | | | | | | | | | | | | | | In reality, there is no retrying that happens as of now because we don't have any import or changing of the keyring going on, but the code is set up so we can drop this in our new _alpm_process_siglist() function. Wire up the basics to the sync database validation code, so we see something like the following: $ pacman -Ss unknowntrust error: core: signature from "Dan McGee <dpmcgee@gmail.com>" is unknown trust error: core: signature from "Dan McGee <dpmcgee@gmail.com>" is unknown trust error: database 'core' is not valid (invalid or corrupted database (PGP signature)) $ pacman -Ss missingsig error: core: missing required signature error: core: missing required signature error: database 'core' is not valid (invalid or corrupted database (PGP signature)) Yes, there is some double output, but this should be fixable in the future. Signed-off-by: Dan McGee <dan@archlinux.org>
* Allow our PGP helper method to pass back the signature resultsDan McGee2011-09-20
| | | | | | | This will make its way up the call chain eventually to allow trusting and importing of keys as necessary. Signed-off-by: Dan McGee <dan@archlinux.org>
* Clean up handling of size fieldsDan McGee2011-09-02
| | | | | | | | | | | | We currently have csize, isize, and size concepts, and sometimes the difference isn't clear. Ensure the following holds: * size (aka csize): always the compressed size of the package; available for everything except local packages (where it will return 0) * isize: always the installed size of the package; available for all three package types Signed-off-by: Dan McGee <dan@archlinux.org>
* Parse > 2GiB file sizes correctlyDan McGee2011-08-29
| | | | | | | | | | | | | | | | | | | | | | We were using atol(), which on 32 bit, cannot handle values greater than 2GiB, which is fail. Switch to a strtoull() wrapper function tailored toward parsing off_t values. This allows parsing of very large positive integer values. off_t is a signed type, but in our usages, we never parse or have a need for negative values, so the function will return -1 on error. Before: $ pacman -Si flightgear-data | grep Size Download Size : 2097152.00 K Installed Size : 2097152.00 K After: $ ./src/pacman/pacman -Si flightgear-data | grep Size Download Size : 2312592.52 KiB Installed Size : 5402896.00 KiB Signed-off-by: Dan McGee <dan@archlinux.org>
* Database read optimizationsDan McGee2011-08-28
| | | | | | | | | | | | | | | | | | | | | | Hard to believe there was still more room to improve on this, but I found an easily correctable oversight tonight. Our databases (both sync and local) contain many blank lines, and we were not moving onto the next line right away in these cases; instead we would proceed through our strcmp() conditional checks as normal. Some local numbers follow to show the effects of this patch: Sync `-Ss foobarbaz`: 71,709 blank lines skipped early ~1,505,889 strcmp() calls avoided (21 per line) ~15% speed improvement (.210 --> .179 sec) Local `-Qs foobarbaz`: 6,823 blank lines skipped early 115,991 strcmp() calls avoided (17 per line) ~6% speed improvement (.080 -> .071 sec) Signed-off-by: Dan McGee <dan@archlinux.org>
* be_sync.c: Fix memory leak in alpm_db_update()Lukas Fleischer2011-08-28
| | | | | | | Free "syncpath" and restore umask if we fail to grab a lock. Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de> Signed-off-by: Dan McGee <dan@archlinux.org>
* Remove argument from check_pgp_helperDan McGee2011-08-25
| | | | | | | This one wasn't all that necessary as we only used it in one place in the function, which can be checked easily enough at the call site. Signed-off-by: Dan McGee <dan@archlinux.org>
* Remove alpm_option_get_* usage from backendDan McGee2011-08-19
| | | | | | These are all available directly on the handle without indirection. Signed-off-by: Dan McGee <dan@archlinux.org>
* dload: add 'unlink_on_fail' to payload structDave Reisner2011-08-19
| | | | | | | | | Let callers of _alpm_download state whether we should delete on fail, rather than inferring it from context. We still override this decision and always unlink when a temp file is used. Signed-off-by: Dave Reisner <dreisner@archlinux.org> Signed-off-by: Dan McGee <dan@archlinux.org>
* Enhance and utilize database status flagsDan McGee2011-08-18
| | | | | | | | | | | | | | | | | | * Move is_local standalone field to status enum * Create VALID/INVALID flag pair * Create EXISTS/MISSING flag pair With these additional fields, we can be more intelligent with database loading and messages to the user. We now only warn once if a sync database does not exist and do not continue to try to load it once we have marked it as missing. The reason for the flags existing in pairs is so the unknown case can be represented. There should never be a time when both flags in the same group are true, but if they are both false, it represents the unknown case. Care is taken to always manipulate both flags at the same time. Signed-off-by: Dan McGee <dan@archlinux.org>
* Only check necessary signatures and checksumsDan McGee2011-08-15
| | | | | | | | | | | The precedence goes as follows: signature > sha256sum > md5sum Add some logic and helper methods to check what we have available when loading a package, and then only check what is necessary to verify the package. This should speed up sync database verifies as we no longer will be doing both a checksum and a signature validation. Signed-off-by: Dan McGee <dan@archlinux.org>
* Parse conflicts/provides/replaces at database load timeDan McGee2011-08-15
| | | | | | | | | | | | | | | | | | | | | | We did this with depends way back in commit c244cfecf654d3 in 2007. We can do it with these fields as well. Of note is the inclusion of provides even though only '=' is supported- we'll parse other things, but no guarantees are given as to behavior, which is more or less similar to before since we only looked for the equals sign. Also of note is the non-inclusion of optdepends; this will likely be resolved down the road. The biggest benefactors of this change will be the resolving code that formerly had to parse and reparse several of these fields; it only happens once now at load time. This does lead to the disadvantage that we will now always be parsing this information up front even if we never need it in the split form, but as these are uncommon fields and our parser is quite efficient it shouldn't be a big concern. Signed-off-by: Dan McGee <dan@archlinux.org>
* Load and allow access to sha256sumDan McGee2011-08-15
| | | | | | | | | This adds a field in the package struct for this checksum type as well as allowing access via the API to it. The frontend is now able to display any read value. Note that this does not implement any use or verification of the value internally. Signed-off-by: Dan McGee <dan@archlinux.org>
* Don't trim whitespace when reading database entriesDan McGee2011-08-02
| | | | | | | | | | | We don't write with extra or unknown whitespace, so there is little reason for us to trim it when reading either. This also fixes the hopefully never encountered "paths that start or end with spaces" issue, for which two pactests have been added. The tests also contain other evil characters that we have encountered before and handle just fine, but it doesn't hurt to ensure we don't break such support in the future. Signed-off-by: Dan McGee <dan@archlinux.org>
* Clean up my debug logger messDan McGee2011-07-19
| | | | Signed-off-by: Dan McGee <dan@archlinux.org>
* Fix test suite when GPGME is disabledDan McGee2011-07-18
| | | | | | | | As noted by Allan, we failed pretty hard if gpgme was compiled out. With these changes, only sign001.py fails. This can/will be fixed later once we beef up the test suite with more signing tests anyway. Signed-off-by: Dan McGee <dan@archlinux.org>
* absorb some _alpm_download params into payload structDave Reisner2011-07-05
| | | | | | | Restore some sanity to the number of arguments passed to _alpm_download and curl_download_internal. Signed-off-by: Dave Reisner <dreisner@archlinux.org>
* lib/dload: prevent large file attacksDave Reisner2011-07-05
| | | | | | | | | | | | | | | | | | | This means creating a new struct which can pass more descriptive data from the back end sync functions to the downloader. In particular, we're interested in the download size read from the sync DB. When the remote server reports a size larger than this (via a content-length header), abort the transfer. In cases where the size is unknown, we set a hard upper limit of: * 25MiB for a sync DB * 16KiB for a signature For reference, 25MiB is more than twice the size of all of the current binary repos (with files) combined, and 16KiB is a truly gargantuan signature. Signed-off-by: Dave Reisner <dreisner@archlinux.org>
* dload: handle irregular URLsDave Reisner2011-07-05
| | | | | | | | | | | | | | | | | | | | | | URLs might end with a slash and follow redirects, or could be a generated by a script such as /getpkg.php?id=12345. In both cases, we may have a better filename that we can write to, taken from either content-disposition header, or the effective URL. Specific to the first case, we write to a temporary file of the format 'alpmtmp.XXXXXX', where XXXXXX is randomized by mkstemp(3). Since this is a randomly generated file, we cannot support resuming and the file is unlinked in the event of an interrupt. We also run into the possibility of changing out the filename from under alpm on a -U operation, so callers of _alpm_download can optionally pass a pointer to a *char to be filled in by curl_download_internal with the actual filename we wrote to. Any sync operation will pass a NULL pointer here, as we rely on specific names for packages from a mirror. Fixes FS#22645. Signed-off-by: Dave Reisner <d@falconindy.com>
* Allow invalid sync DBs to be returned by the libraryDan McGee2011-07-05
| | | | | | | | | | | | | | They are placeholders, but important for things like trying to re-sync a database missing a signature. By using the alpm_db_validity() method at the right time, a client can take the appropriate action with these invalid databases as necessary. In pacman's case, we disallow just about anything that involves looking at a sync database outside of an '-Sy' operation (although we do check the validity immediately after). A few operations are still permitted- '-Q' ops that don't touch sync databases as well as '-R'. Signed-off-by: Dan McGee <dan@archlinux.org>
* signing: move to new signing verification and return schemeDan McGee2011-07-05
| | | | | | | | | | | | | | | | | This gives us more granularity than the former Never/Optional/Always trifecta. The frontend still uses these values temporarily but that will be changed in a future patch. * Use 'siglevel' consistenly in method names, 'level' as variable name * The level becomes an enum bitmask value for flexibility * Signature check methods now return a array of status codes rather than a simple integer success/failure value. This allows callers to determine whether things such as an unknown signature are valid. * Specific signature error codes mostly disappear in favor of the above returned status code; pm_errno is now set only to PKG_INVALID_SIG or DB_INVALID_SIG as appropriate. Signed-off-by: Dan McGee <dan@archlinux.org>
* Merge remote-tracking branch 'allan/ALPM'Dan McGee2011-07-03
|\ | | | | | | | | | | | | | | | | | | Conflicts: lib/libalpm/be_local.c lib/libalpm/be_package.c lib/libalpm/conflict.c lib/libalpm/diskspace.c lib/libalpm/dload.c lib/libalpm/remove.c
| * Prefix _alpm_errno_t members with ALPMAllan McRae2011-07-02
| | | | | | | | Signed-off-by: Allan McRae <allan@archlinux.org>
| * Prefix alpm_loglevel_t members with ALPMAllan McRae2011-07-02
| | | | | | | | Signed-off-by: Allan McRae <allan@archlinux.org>
* | be_sync: make READ_NEXT() a no-arg macroDan McGee2011-07-03
|/ | | | | | | | We passed in 'line', but not 'buf.line'. In addition, the macros building off of READ_NEXT() assume variable names anyway. Since we only use these macros in one function, might as well simplify them. Signed-off-by: Dan McGee <dan@archlinux.org>
* Make alpm_db_get_sigverify_level() publicDan McGee2011-06-30
| | | | Signed-off-by: Dan McGee <dan@archlinux.org>
* Add an alpm_db_get_valid() public functionDan McGee2011-06-30
| | | | | | This allows one to check if a database is valid or invalid. Signed-off-by: Dan McGee <dan@archlinux.org>
* Rename pmpkg_t to alpm_pkg_tAllan McRae2011-06-28
| | | | Signed-off-by: Allan McRae <allan@archlinux.org>
* Rename pmdb_t to alpm_db_tAllan McRae2011-06-28
| | | | Signed-off-by: Allan McRae <allan@archlinux.org>
* Rename pmhandle_t to alpm_handle_tAllan McRae2011-06-28
| | | | Signed-off-by: Allan McRae <allan@archlinux.org>
* Make sync DB reading a bit more flexibleDan McGee2011-06-24
| | | | | | | | | | | We can reorganize things a bit to not require reading a directory-only entry first (or at all). This was noticed while working on some pactest improvements, but should be a good step forward anyway. Also make _alpm_splitname() a bit more generic in where it stores the data it parses. Signed-off-by: Dan McGee <dan@archlinux.org>
* Don't require a transaction for sync DB updatesDan McGee2011-06-24
| | | | | | | Instead, just do the required locking directly in the backend in calls to alpm_db_update(). Signed-off-by: Dan McGee <dan@archlinux.org>
* Add a 'valid' flag to the database objectDan McGee2011-06-24
| | | | | | | | | | | | | | Start by converting all of our flags to a 'status' bitmask (pkgcache status, grpcache status). Add a new 'valid' flag as well. This will let us keep track if the database itself has been marked valid in whatever fashion. For local databases at the moment we ensure there are no depends files; for sync databases we ensure the PGP signature is valid if required/requested. The loading of the pkgcache is prohibited if the database is invalid. Signed-off-by: Dan McGee <dan@archlinux.org>
* Move database 'version' check to registration timeDan McGee2011-06-24
| | | | | | | | This is another step toward doing both local database validation (ensuring we don't have depends files) and sync database validation (via signatures if present) when the database is registered. Signed-off-by: Dan McGee <dan@archlinux.org>
* Do database signature checking at load timeDan McGee2011-06-24
| | | | | | | | | | | | | | | | | | This is the ideal place to do it as all clients should be checking the return value and ensuring there are no errors. This is similar to pkg_load(). We also add an additional step of validation after we download a new database; a subsequent '-y' operation can potentially invalidate the original check at registration time. Note that this implementation is still a bit naive; if a signature is invalid it is currently impossible to refresh and re-download the file without manually deleting it first. Similarly, if one downloads a database and the check fails, the database object is still there and can be used. These shortcomings will be addressed in a future commit. Signed-off-by: Dan McGee <dan@archlinux.org>
* Prevent segfault when parsing unexpected sync database fileDan McGee2011-06-24
| | | | | | | | | This doesn't fix the real (bigger) problem of failing to parse sync databases without directory entries, but it does prevent the parser from segfaulting when the first desc file encountered did not have a directory entry, among other conditions. Signed-off-by: Dan McGee <dan@archlinux.org>
* Documented purpose of be_*.cKerrick Staley2011-06-20
| | | | | | | | Added a line to the top of each of be_local.c, be_package.c, and be_sync.c indicating their purposes. Signed-off-by: Kerrick Staley <mail@kerrickstaley.com> Signed-off-by: Dan McGee <dan@archlinux.org>
* Ensure handle is valid and pm_errno is reset when calling into APIDan McGee2011-06-14
| | | | | | | | | | | | | | | | | We didn't do due diligence before and ensure prior pm_errno values weren't influencing what happened in further ALPM calls. I observed one case of early setup code setting pm_errno to PM_ERR_WRONG_ARGS and that flag persisting the entire time we were calling library code. Add a new CHECK_HANDLE() macro that does two things: 1) ensures the handle variable passed to it is non-NULL and 2) clears any existing pm_errno flag set on the handle. This macro can replace many places we used the ASSERT(handle != NULL, ...) pattern before. Several other other places only need a simple 'set to zero' of the pm_errno field. Signed-off-by: Dan McGee <dan@archlinux.org>